Model-Based Sound Synthesis -...

EURASIP Journal on Applied Signal Processing

Model-Based Sound Synthesis

Guest Editors: Vesa Välimäki, Augusto Sarti, Matti Karjalainen,Rudolf Rabenstein, and Lauri Savioja


Guest Editors: Vesa Välimäki, Augusto Sarti,Matti Karjalainen, Rudolf Rabenstein, and Lauri Savioja


Copyright © 2004 Hindawi Publishing Corporation. All rights reserved.

This is a special issue published in volume 2004 of “EURASIP Journal on Applied Signal Processing.” All articles are open accessarticles distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproductionin any medium, provided the original work is properly cited.

Editor-in-ChiefMarc Moonen, Belgium

Senior Advisory EditorK. J. Ray Liu, College Park, USA

Associate EditorsKiyoharu Aizawa, Japan A. Gorokhov, The Netherlands Antonio Ortega, USAGonzalo Arce, USA Peter Handel, Sweden Montse Pardas, SpainJaakko Astola, Finland Ulrich Heute, Germany Ioannis Pitas, GreeceKenneth Barner, USA John Homer, Australia Phillip Regalia, FranceMauro Barni, Italy Jiri Jan, Czech Markus Rupp, AustriaSankar Basu, USA Søren Holdt Jensen, Denmark Hideaki Sakai, JapanJacob Benesty, Canada Mark Kahrs, USA Bill Sandham, UKHelmut Bölcskei, Switzerland Thomas Kaiser, Germany Wan-Chi Siu, Hong KongChong-Yung Chi, Taiwan Moon Gi Kang, Korea Dirk Slock, FranceM. Reha Civanlar, Turkey Aggelos Katsaggelos, USA Piet Sommen, The NetherlandsTony Constantinides, UK Mos Kaveh, USA John Sorensen, DenmarkLuciano Costa, Brazil C.-C. Jay Kuo, USA Michael G. Strintzis, GreeceSatya Dharanipragada, USA Chin-Hui Lee, USA Sergios Theodoridis, GreecePetar M. Djurić, USA Kyoung Mu Lee, Korea Jacques Verly, BelgiumJean-Luc Dugelay, France Sang Uk Lee, Korea Xiaodong Wang, USATouradj Ebrahimi, Switzerland Y. Geoffrey Li, USA Douglas Williams, USASadaoki Furui, Japan Mark Liao, Taiwan An-Yen (Andy) Wu, TaiwanMoncef Gabbouj, Finland Bernie Mulgrew, UK Xiang-Gen Xia, USASharon Gannot, Israel King N. Ngan, Hong KongFulvio Gini, Italy Douglas O’Shaughnessy, Canada

Contents

Editorial, Vesa Välimäki, Augusto Sarti, Matti Karjalainen, Rudolf Rabenstein, and Lauri SaviojaVolume 2004 (2004), Issue 7, Pages 923-925

Physical Modeling of the Piano, N. Giordano and M. JiangVolume 2004 (2004), Issue 7, Pages 926-933

Sound Synthesis of the Harpsichord Using a Computationally Efficient Physical Model,Vesa Välimäki, Henri Penttinen, Jonte Knif, Mikael Laurson, and Cumhur ErkutVolume 2004 (2004), Issue 7, Pages 934-948

Multirate Simulations of String Vibrations Including Nonlinear Fret-String Interactions Using theFunctional Transformation Method, L. Trautmann and R. RabensteinVolume 2004 (2004), Issue 7, Pages 949-963

Physically Inspired Models for the Synthesis of Stiff Strings with Dispersive Waveguides, I. Testa,G. Evangelista, and S. CavaliereVolume 2004 (2004), Issue 7, Pages 964-977

Digital Waveguides versus Finite Difference Structures: Equivalence and Mixed Modeling,Matti Karjalainen and Cumhur ErkutVolume 2004 (2004), Issue 7, Pages 978-989

A Digital Synthesis Model of Double-Reed Wind Instruments, Ph. GuillemainVolume 2004 (2004), Issue 7, Pages 990-1000

Real-Time Gesture-Controlled Physical Modelling Music Synthesis with Tactile Feedback,David M. Howard and Stuart RimellVolume 2004 (2004), Issue 7, Pages 1001-1006

Vibrato in Singing Voice: The Link between Source-Filter and Sinusoidal Models, Ixone Arroabarrenand Alfonso CarlosenaVolume 2004 (2004), Issue 7, Pages 1007-1020

A Hybrid Resynthesis Model for Hammer-String Interaction of Piano Tones, Julien Bensa,Kristoffer Jensen, and Richard Kronland-MartinetVolume 2004 (2004), Issue 7, Pages 1021-1035

Warped Linear Prediction of Physical Model Excitations with Applications in Audio Compression andInstrument Synthesis, Alexis Glass and Kimitoshi FukudomeVolume 2004 (2004), Issue 7, Pages 1036-1044

EURASIP Journal on Applied Signal Processing 2004:7, 923–925c© 2004 Hindawi Publishing Corporation

Editorial

Vesa ValimakiLaboratory of Acoustics and Audio Signal Processing, Helsinki University of Technology, P.O. Box 3000,02015 Espoo, FinlandEmail: [email protected]

Augusto SartiDipartimento di Elettronica e Informazione, Politecnico di Milano, piazza Leonardo da Vinci 32,20133 Milan, ItalyEmail: [email protected]

Matti KarjalainenLaboratory of Acoustics and Audio Signal Processing, Helsinki University of Technology, P.O. Box 3000,02015 Espoo, FinlandEmail: [email protected]

Rudolf RabensteinMultimedia Communications and Signal Processing, University Erlangen-Nuremberg,91058 Erlangen, GermanyEmail: [email protected]

Lauri SaviojaLaboratory of Telecommunications Software and Multimedia, Helsinki University of Technology,P.O. Box 5400, 02015 Espoo, FinlandEmail: [email protected]

Model-based sound synthesis has become one of the mostactive research topics in musical signal processing and inmusical acoustics. The earliest attempts in generating mu-sical sound with a physical model were made over threedecades ago. The first commercial products were seen onlysome twenty years later. Recently, many refinements to pre-vious signal processing algorithms and several new ones havebeen introduced. We have learned that new signal processingmethods can still be devised or old ones modified to advancethe field.

Today there exist efficient model-based synthesis algo-rithms for many sound sources, while there are still somefor which we do not have a good model. Certain issues, suchas parameter estimation and real-time control, require fur-ther work for many model-based approaches. Finally, the ca-pabilities of human listeners to perceive details in syntheticsound should be accounted for in a way similar as in percep-tual audio coding in order to optimize the algorithms. Thesuccess and future of the model-based approach depends onresearchers and the results of their work.

The roots of this special issue are in a European project

called ALMA (Algorithms for the Modelling of Acous-tic Interactions, IST-2001-33059, see http://www-dsp.elet.polimi.it/alma/) where the guest editors and their researchteams collaborated in the period from 2001 to 2004. Thegoal of the ALMA project was to develop an elegant, gen-eral, and unifying strategy for a blockwise design of physi-cal models for sound synthesis. A “divide-and-conquer” ap-proach was taken, in which the elements of the structureare individually modeled and discretized, while their inter-action topology is separately designed and implemented in adynamical and physically sound fashion. As a result, severalhigh-quality demonstrations of virtual musical instrumentsplayed in a virtual environment were developed. During theALMA project, the guest editors realized that this special is-sue could be created, since the field was very active but therehad not been a special issue devoted to it for a long time.

This EURASIP JASP special issue presents ten examplesof recent research in model-based sound synthesis. The firsttwo papers are related to keyboard instruments. First Gior-dano and Jiang discuss physical modeling synthesis of the pi-ano using the finite-difference approach. Then Valimaki et al.

924 EURASIP Journal on Applied Signal Processing

show how to synthesize the sound of the harpsichord basedon measurements of a real instrument. An efficient imple-mentation using a visual software synthesis package is givenfor real-time synthesis.

In the third paper, Trautmann and Rabenstein present amultirate implementation of a vibrating string model that isbased on the functional transformation method. In the nextpaper, Testa et al. investigate the modeling of stiff string be-havior. The dispersive wave phenomenon, perceivable as in-harmonicity in many string instrument sounds, is studied byderiving different physically inspired models.

In the fourth paper, Karjalainen and Erkut propose a veryinteresting and general solution to the problem of how tobuild composite models from digital waveguides and finite-difference time-domain blocks. The next contribution isfrom Guillemain, who proposes a real-time synthesis modelof double-reed wind instruments based on a nonlinear phys-ical model.

The paper by Howard and Rimell provides a viewpointquite different from the others in this special issue. It dealswith the design and implementation of user interfaces formodel-based synthesis. An important aspect is the incorpo-ration of tactile feedback into the interface.

Arroabarren and Carlosena have studied the modelingand analysis of human voice production, particularly the vi-brato used in the singing voice. Source-filter modeling andsinusoidal modeling are compared to gain a deeper insightin these phenomena. Bensa et al. bring the discussion back tothe physical modeling of musical instruments, with particu-lar reference to the piano. They propose a source/resonatormodel of hammer-string interaction aimed at a realistic pro-duction of piano sound. Finally, Glass and Fukuodome in-corporate a plucked-string model into an audio coder for au-dio compression and instrument synthesis.

The guest editors would like to thank all the authors fortheir contributions. We would also like to express our deepgratitude to the reviewers for their diligent efforts in evaluat-ing all submitted manuscripts. We hope that this special issuewill stimulate further research work on model-based soundsynthesis.

Vesa ValimakiAugusto Sarti

Matti KarjalainenRudolf Rabenstein

Lauri Savioja

Vesa Valimaki was born in Kuorevesi, Fin-land, in 1968. He received the M.S. de-gree, the Licentiate of Science degree, andthe Doctor of Science degree, all in elec-trical engineering from Helsinki Univer-sity of Technology (HUT), Espoo, Fin-land, in 1992, 1994, and 1995, respec-tively. He was with the HUT Labora-tory of Acoustics and Audio Signal Pro-cessing from 1990 to 2001. In 1996, hewas a Postdoctoral Research Fellow with the University of

Westminster, London, UK. During the academic year 2001-2002he was Professor of signal processing at the Pori School of Tech-nology and Economics, Tampere University of Technology (TUT),Pori, Finland. In August 2002 he returned to HUT, where heis currently Professor of audio signal processing. He was ap-pointed Docent in signal processing at the Pori School of Tech-nology and Economics, TUT, in 2003. His research interests arein the application of digital signal processing to audio and mu-sic. Dr. Valimaki is a Senior Member of the IEEE Signal Process-ing Society and is a Member of the Audio Engineering Society,the Acoustical Society of Finland, and the Finnish MusicologicalSociety.

Augusto Sarti, born in 1963, received the“Laurea” degree (1988, cum laude) and thePh.D. (1993) in electrical engineering, fromthe University of Padua, Italy, with researchon nonlinear communication systems. Hecompleted his graduate studies at the Uni-versity of California at Berkeley, where hespent two years doing research on nonlinearsystem control and on motion planning ofnonholonomic systems. In 1993 he joinedthe Dipartimento di Elettronica e Informazione of the Politecincodi Milano, where he is now an Associate Professor. His current re-search interests are in the area of digital signal processing, withparticular focus on sound analysis, processing and synthesis, im-age processing, video coding and computer vision. Augusto Sartiauthored over 100 scientific publications. He is leading the Imageand Sound Processing Group (ISPG) at the Dipartimento di Elet-tronica e Informazione of the Politecnico di Milano, which con-tributed to numerous national projects and eight European re-search projects. He is currently coordinating the IST-2001-33059European Project “ALMA: Algorithms for the Modelling of Acous-tic Interactions,” and is co-coordinating the IST-2000-28436 Euro-pean Project “ORIGAMI: A new paradigm for high-quality mixingof real and virtual.”

Matti Karjalainen was born in Hankasalmi,Finland, in 1946. He received the M.S. andthe Dr.Tech. degrees in electrical engineer-ing from the Tampere University of Tech-nology, in 1970 and 1978, respectively. Since1980 he has been a Professor of acousticsand audio signal processing at the HelsinkiUniversity of Technology in the Faculty ofElectrical Engineering. In audio technologyhis interest is in audio signal processing,such as DSP for sound reproduction, perceptually based signal pro-cessing, as well as music DSP and sound synthesis. In addition toaudio DSP, his research activities cover speech synthesis, analysis,and recognition; perceptual auditory modeling and spatial hear-ing; DSP hardware, software, and programming environments; aswell as various branches of acoustics, including musical acousticsand modeling of musical instruments. He has written more than300 scientific or engineering articles and contributed to organiz-ing several conferences and workshops. Professor Karjalainen isan AES Fellow and a Member in IEEE (Institute of Electrical andElectronics Engineers), ASA (Acoustical Society of America), EAA(European Acoustics Association), ICMA (International Com-puter Music Association), ESCA (European Speech Communica-tion Association), and several Finnish scientific and engineeringsocieties.

Editorial 925

Rudolf Rabenstein received the “Diplom-Ingenieur” and “Doktor-Ingenieur” degreesin electrical engineering and the “Habilita-tion” degree in signal processing, all fromthe University of Erlangen-Nuremberg,Germany in 1981, 1991, and 1996, respec-tively. He worked with the Telecommuni-cations Laboratory, University of Erlangen-Nuremberg, from 1981 to 1987. From 1998to 1991, he was with the Physics Depart-ment of the University of Siegen, Germany. In 1991, he returned tothe Telecommunications Laboratory of the University of Erlangen-Nuremberg. His research interests are in the fields of multidimen-sional systems theory, multimedia signal processing, and computermusic. Rudolf Rabenstein is the author and coauthor of more than100 scientific publications, has contributed to various books andbook chapters, and holds several patents in audio engineering. Heis a Board Member of the School of Engineering of the Virtual Uni-versity of Bavaria, Germany and a member of several engineeringsocieties.

Lauri Savioja works as a Professor for theLaboratory of Telecommunications Soft-ware and Multimedia in the Helsinki Uni-versity of Technology (HUT), Finland. Hereceived the Doctor of Science degree inTechnology in 1999 from the Department ofComputer Science, HUT. His research inter-ests include virtual reality, room acoustics,and human-computer interaction.


Physical Modeling of the Piano

N. GiordanoDepartment of Physics, Purdue University, 525 Northwestern Avenue, West Lafayette, IN 47907-2036, USAEmail: [email protected]

M. JiangDepartment of Physics, Purdue University, 525 Northwestern Avenue, West Lafayette, IN 47907-2036, USA

Department of Computer Science, Montana State University, Bozeman, MT 59715, USAEmail: [email protected]

Received 21 June 2003; Revised 27 October 2003

A project aimed at constructing a physical model of the piano is described. Our goal is to calculate the sound produced by theinstrument entirely from Newton’s laws. The structure of the model is described along with experiments that augment and testthe model calculations. The state of the model and what can be learned from it are discussed.

Keywords and phrases: physical modeling, piano.

1. INTRODUCTION

This paper describes a long term project by our group aimedat physical modeling of the piano. The theme of this volume,model based sound synthesis of musical instruments, is quitebroad, so it is useful to begin by discussing precisely whatwe mean by the term “physical modeling.” The goal of ourproject is to use Newton’s laws to describe all aspects of thepiano. We aim to use F = ma to calculate the motion of thehammers, strings, and soundboard, and ultimately the soundthat reaches the listener.

Of course, we are not the first group to take such a New-ton’s law approach to the modeling of a musical instrument.For the piano, there have been such modeling studies of thehammer-string interaction [1, 2, 3, 4, 5, 6, 7, 8, 9], string vi-brations [8, 9, 10], and soundboard motion [11]. (Nice re-views of the physics of the piano are given in [12, 13, 14, 15].)There has been similar modeling of portions of other instru-ments (such as the guitar [16]), and of several other com-plete instruments, including the xylophone and the timpani[17, 18, 19]. Our work is inspired by and builds on this pre-vious work.

At this point, we should also mention how our work re-lates to other modeling work, such as the digital waveguideapproach, which was recently reviewed in [20]. The digitalwaveguide method makes extensive use of physics in choos-ing the structure of the algorithm; that is, in choosing theproper filter(s) and delay lines, connectivity, and so forth,to properly match and mimic the Newton’s law equations ofmotion of the strings, soundboard, and other components of

the instrument. However, as far as we can tell, certain fea-tures of the model, such as hammer-string impulse func-tions and the transfer function that ultimately relates thesound pressure to the soundboard motion (and other sim-ilar transfer functions), are taken from experiments on realinstruments. This approach is a powerful way to produce re-alistic musical tones efficiently, in real time and in a man-ner that can be played by a human performer. However, thisapproach cannot address certain questions. For example, itwould not be able to predict the sound that would be pro-duced if a radically new type of soundboard was employed,or if the hammers were covered with a completely differ-ent type of material than the conventional felt. The physi-cal modeling method that we describe in this paper can ad-dress such questions. Hence, we view the ideas and methodembodied in work of Bank and coworkers [20] (and the ref-erences therein) as complementary to the physical modelingapproach that is the focus of our work.

In this paper, we describe the route that we have takento assembling a complete physical model of the piano.This complete model is really composed of interacting sub-models which deal with (1) the motions of the hammers andstrings and their interaction, (2) soundboard vibrations, and(3) sound generation by the vibrating soundboard. For eachof these submodels we must consider several issues, includ-ing selection and implementation of the computational algo-rithm, determination of the values of the many parametersthat are involved, and testing the submodel. After consider-ing each of the submodels, we then describe how they arecombined to produce a complete computational piano. The

Physical Modeling of the Piano 927

quality of the calculated tones is discussed, along with thelessons we have learned from this work. A preliminary andabbreviated report on this project was given in [21].

2. OVERALL STRATEGY AND GOALS

One of the first modeling decisions that arises is the questionof whether to work in the frequency domain or the time do-main. In many situations, it is simplest and most instructiveto work in the frequency domain. For example, an under-standing of the distribution of normal mode frequencies, andthe nature of the associated eigenvectors for the body vibra-tions of a violin or a piano soundboard, is very instructive.However, we have chosen to base our modeling in the timedomain. We believe that this choice has several advantages.First, the initial excitation—in our case this is the motion ofa piano hammer just prior to striking a string—is describedmost conveniently in the time domain. Second, the interac-tion between various components of the instrument, suchas the strings and soundboard, is somewhat simpler whenviewed in the time domain, especially when one considersthe early “attack” portion of a tone. Third, our ultimate goalis to calculate the room pressure as a function of time, so it isappealing to start in the time domain with the hammer mo-tion and stay in the time domain throughout the calculation,ending with the pressure as would be received by a listener.Our time domain modeling is based on finite difference cal-culations [10] that describe all aspects of the instrument.

A second element of strategy involves the determinationof the many parameters that are required for describing thepiano. Ideally, one would like to determine all of these pa-rameters independently, rather than use them as fitting pa-rameters when comparing the modeling results to real (mea-sured) tones. This is indeed possible for all of the parame-ters. For example, dimensional parameters such as the stringdiameters and lengths, soundboard dimensions, and bridgepositions, can all be measured from a real piano. Likewise,various material properties such as the string stiffness, theelastic moduli of the soundboard, and the acoustical proper-ties of the room in which the numerical piano is located, arewell known from very straightforward measurements. For afew quantities, most notably the force-compression charac-teristics of the piano hammers, it is necessary to use separate(and independent) experiments.

This brings us to a third element of our modelingstrategy—the problem of how to test the calculations. Thefinal output is the sound at the listener, so one could “test”the model by simply evaluating the sounds via listening tests.However, it is very useful to separately test the submod-els. For example, the portion of the model that deals withsoundboard vibrations can be tested by comparing its pre-dictions for the acoustic impedance with direct measure-ments [11, 22, 23, 24]. Likewise, the room-soundboard com-putation can be compared with studies of sound productionby a harmonically driven soundboard [25]. This approach,involving tests against specially designed experiments, hasproven to be extremely valuable.

The issue of listening tests brings us to the question ofgoals, that is, what do we hope to accomplish with such amodeling project? At one level, we would hope that the cal-culated piano tones are realistic and convincing. The modelcould then be used to explore what various hypothetical pi-anos would sound like. For example, one could imagine con-structing a piano with a carbon fiber soundboard, and itwould be very useful to be able to predict its sound ahead oftime, or to use the model in the design of the new sound-board. On a different and more philosophical level, onemight want to ask questions such as “what are the most im-portant elements involved in making a piano sound like a pi-ano?” We emphasize that it is not our goal to make a real timemodel, nor do we wish to compete with the tones producedby other modeling methods, such as sampling synthesis anddigital waveguide modeling [20].

3. STRINGS AND HAMMERS

Our model begins with a piano hammer moving freely with aspeed vh just prior to making contact with a string (or strings,since most notes involve more than one string). Hence, weignore the mechanics of the action. This mechanics is, ofcourse, quite important from a player’s perspective, since itdetermines the touch and feel of the instrument [26]. Nev-ertheless, we will ignore these issues, since (at least to a firstapproximation) they are not directly relevant to the compo-sition of a piano tone and we simply take vh as an input pa-rameter. Typical values are in the range 1–4 m/s [9].

When a hammer strikes a string, there is an interactionforce that is a function of the compression of the hammerfelt, y f . This force determines the initial excitation and isthus a crucial factor in the composition of the resulting tone.Considerable effort has been devoted to understanding thehammer-string force [1, 2, 3, 4, 5, 6, 7, 27, 28, 29, 30, 31,32, 33]. Hammer felt is a very complicated material [34],and there is no “first principles” expression for the hammer-string force relation Fh(y f ). Much work has assumed a sim-ple power law function

Fh(y f) = F0y

pf , (1)

where the exponent p is typically in the range 2.5–4 and F0

is an overall amplitude. This power law form seems to be atleast qualitatively consistent with many experiments and wetherefore used (1) in our initial modeling calculations.

While (1) has been widely used to analyze and inter-pret experiments, and also in previous modeling work, ithas been known for some time that the force-compressioncharacteristic of most real piano hammers is not a simplereversible function [7, 27, 28, 29, 30]. Ignoring the hystere-sis has seemed reasonable, since the magnitude of the ir-reversibility is often found to be small. Figure 1 shows theforce-compression characteristic for a particular hammer (aSteinway hammer from the note middle C) measured intwo different ways. In the type I measurement, the hammerstruck a stationary force sensor and the resulting force andfelt compression were measured as described in [31]. We see


Hammer force characteristics

Hammer C4

Type I exp.

Type II exp.0

10

20

Fh

(N)

0 0.2 0.4 0.6

y f (mm)

Figure 1: Force-compression characteristics measured for a partic-ular piano hammer measured in two different ways. In the type I ex-periment (dotted curve), the hammer struck a stationary force sen-sor and the resulting force, Fh, and felt compression, y f , were mea-sured. The initial hammer velocity was approximately 1 m/s. Thesolid curve is the measured force-compression relation obtained ina type II measurement, in which the same hammer impacted a pi-ano string. This behavior is described qualitatively by (2), with pa-rameters p = 3.5, F0 = 1.0× 1013 N, ε0 = 0.90, and τ0 = 1.0× 10−5

second. The dashed arrows indicate compression/decompressionbranches.

that for a particular value of the felt compression, y f , theforce is larger during the compression phase of the hammer-string collision than during decompression. However, thisdifference is relatively small, generally no more than 10% ofthe total force. Provided that this hysteresis is ignored, thetype I result is described reasonably well by the power lawfunction (1) with p ≈ 3. However, we will see below that (1)is not adequate for our modeling work, and this has led us toconsider other forms for Fh.

In order to shed more light on the hammer-string force,we developed a new experimental approach, which we referto as a type II experiment, in which the force and felt com-pression are measured as the hammer impacts on a string[32, 35]. Since the string rebounds in response to the ham-mer, the hammer-string contact time in this case is consider-ably longer (by a factor of approximately 3) than in the type Imeasurement. The force-compression relation found in thistype II measurement is also shown in Figure 1. In contrast tothe type I measurements, the type II results for Fh(y) do notconsist of two simple branches (one for compression and an-other for decompression). Instead, the type II result exhibits“loops,” which arise for the following reason. When the ham-mer first contacts the string, it excites pulses that travel tothe ends of the string, are reflected at the ends, and then re-turn. These pulses return while the hammer is still in contactwith the string, and since they are inverted by the reflection,they cause an extra series of compression/decompression cy-

cles for the felt. There is considerable hysteresis during thesecycles, much more than might have been expected from thetype I result. The overall magnitude of the type II force is alsosomewhat smaller; the hammer is effectively “softer” underthe type II conditions. Since the type II arrangement is theone found in real piano, it is important to use this hammer-force characteristic in modeling.

We have chosen to model our hysteretic type II hammermeasurements following the proposal of Stulov [30, 33]. Hehas suggested the form

Fh(y f (t)

)= F0

[g(y f (t)

)− ε0

∫ −∞t

g(y f (t′)

)exp

(− (t − t′)/τ0)dt′].

(2)

Here, τ0 is a characteristic (memory) time scale associatedwith the felt, ε0 is a measure of the magnitude of the hystere-sis, and y f (t) is the variation of the compression with time.In other words, (2) says that the felt “remembers” its pre-vious compression history over a time of order τ0, and thatthe force is reduced according to how much the felt has beencompressed during that period. The inherent nonlinearity ofthe hammer is specified by the function g(z); Stulov took thisto be a power law

g(z) = zp. (3)

Stulov has compared (2) to measurements with real ham-mers and reported very good agreement using τ0, ε0, p, andF0 as fitting parameters. Our own tests of (2) have not shownsuch good agreement; we have found that it provides only aqualitative (and in some cases semiquantitative) descriptionof the hysteresis shown in Figure 1 [35]. Nevertheless, it iscurrently the best mathematical description available for thehysteresis, and we have employed it in our modeling calcula-tions.

Our string calculations are based on the equation of mo-tion [8, 10, 36]

∂2y

∂t2= c2

s

[∂2y

∂x2− ε∂

4y

∂x4

]− α1

∂y

∂t+ α2

∂3y

∂t3, (4)

where y(x, t) is the transverse string displacement at time t

and position x along the string. cs ≡√µ/T is the wave speed

for an ideal string (with stiffness and damping ignored), withT the tension and µ the mass per unit length of the string.When the parameters ε, α1, and α2 are zero, this is just thesimple wave equation. Equation (4) describes only the po-larization mode for which the string displacement is parallelto the initial velocity of the hammer. The other transversemode and also the longitudinal mode are both ignored; ex-periments have shown that both of these modes are excitedin real piano strings [37, 38, 39], but we will leave them forfuture modeling work. The term in (4) that is proportionalto ε arises from the stiffness of the string. It turns out that

csε = r2s

√Es/ρs, where rs, Es, and ρs are the radius, Young’s


modulus, and density of the string, respectively, [9, 36]. Fortypical piano strings, ε is of order 10−4, so the stiffness termin (4) is small, but it cannot be neglected as it produces thewell-known effect of stretched octaves [36]. Damping is ac-counted for with the terms involving α1 and α2; one of theseterms is proportional to the string velocity, while the other isproportional to ∂3y/∂t3. This combination makes the damp-ing dependent on frequency in a manner close to that ob-served experimentally [8, 10].

Our numerical treatment of the string motion employs afinite difference formulation in which both time t and posi-tion x are discretized in units ∆ts and ∆xs [8, 9, 10, 40]. Thestring displacement is then y(x, t) ≡ y(i∆xs,n∆ts) ≡ y(i,n).If the derivatives in (4) are written in finite difference form,this equation can be rearranged to express the string dis-placement at each spatial location i at time step n+1 in termsof the displacement at previous time steps as described byChaigne and Askenfelt [8, 10]. The equation of motion (4)does not contain the hammer force. This is included by theaddition of a term on the right-hand side proportional toFh, which acts at the hammer strike point. Since the ham-mer has a finite width, it is customary to spread this forceover a small length of the string [8]. So far as we know, thedetails of how this force is distributed have never been mea-sured; fortunately our modeling results are not very sensitiveto this factor (so long as the effective hammer width is qual-itatively reasonable). With this approach to the string calcu-lation, the need for numerical stability together with the de-sired frequency range require that each string be treated as50–100 vibrating numerical elements [8, 10].

4. THE SOUNDBOARD

Wood is a complicated material [41]. Soundboards are as-sembled from wood that is “quarter sawn,” which means thattwo of the principal axes of the elastic constant tensor lie inthe plane of the board.

The equation of motion for such a thin orthotropic plateis [11, 22, 23, 42]

ρbhb∂2z

∂t2= −Dx

∂4z

∂x4− (Dxνy + Dyνx + 4Dxy

) ∂4z

∂x2∂y2

−Dy∂4z

∂y4+ Fs(x, y)− β

∂z

∂t,

(5)

where the rigidity factors are

Dx = h3bEx

12(1− νxνy

) ,

Dy =h3bEy

12(1− νxνy

) ,

Dxy =h3bGxy

12.

(6)

Here, our board lies in the x − y plane and z is its displace-ment. (These x and y directions are, of course, not the sameas the x and y coordinates used in describing the string mo-

tion.) The soundboard coordinates x and y run perpendic-ular and parallel to the grain of the board. Ex and νx areYoung’s modulus and Poisson’s ration for the x direction,and so forth for y, Gxy is the shear modulus, hb is the boardthickness and ρb is its density. The values of all elastic con-stants were taken from [41]. In order to model the ribs andbridges, the thickness and rigidity factors are position depen-dent (since these factors are different at the ribs and bridgesthan on the “bare” board) as described in [11]. There arealso some additional terms that enter the equation of mo-tion (5) at the ends of bridges [11, 17, 18, 43]. Fs(x, y) is theforce from the strings on the bridge. This force acts at theappropriate bridge location; it is proportional to the com-ponent of the string tension perpendicular to the plane ofthe board, and is calculated from the string portion of themodel. Finally, we include a loss term proportional to theparameter β [11]. The physical origin of this term involveselastic losses within the board. We have not attempted tomodel this physics according to Newton’s laws, but have sim-ply chosen a value of β which yields a quality factor for thesoundboard modes which is similar to that observed experi-mentally [11, 24].1 Finally, we note that the soundboard “actsback” on the strings, since the bridge moves and the stringsare attached to the bridge. Hence, the interaction of strings ina unison group, and also sympathetic string vibrations (withthe dampers disengaged from the strings) are included in themodel.

For the solution of (5), we again employed a finite dif-ference algorithm. The space dimensions x and y were dis-cretized, both in steps of size ∆xb; this spatial step need not berelated to the step size for the string ∆xs. As in our previouswork on soundboard modeling [11], we chose ∆xb = 2 cm,since this is just small enough to capture the structure of theboard, including the widths of the ribs and bridges. Hence,the board was modeled as ∼ 100× 100 vibrating elements.

The behavior of our numerical soundboard can bejudged by calculations of the mechanical impedance, Z, asdefined by

Z = F

vb, (7)

where F is an applied force and vb is the resulting sound-board velocity. Here, we assume that F is a harmonic (singlefrequency) force applied at a point on the bridge and vb ismeasured at the same point. Figure 2 shows results calculatedfrom our model [11] for the soundboard from an upright pi-ano. Also shown are measurements for a real upright sound-board (with the same dimensions and bridge positions, etc.,as in the model). The agreement is quite acceptable, espe-cially considering that parameters such as the dimensions ofthe soundboard, the position and thickness of the ribs andbridges, and the elastic constants of the board were taken

1In principle, one might expect the soundboard losses to be frequencydependent, as found for the string. At present there is no good experimentaldata on this question, so we have chosen the simplest possible model withjust a single loss term in (5).


Experiment

Soundboard impedance

Upright piano at middle C

Model

100

200

500

1000

2000

5000

104

Z(k

g/s)

100 1000 104

f (Hz)

Figure 2: Calculated (solid curve) and measured (dotted curve)mechanical impedance for an upright piano soundboard. Here,the force was applied and the board velocity was measured at thepoint where the string for middle C crosses the bridge. Results from[11, 24].

from either direct measurements or handbook values (e.g.,Young’s modulus).

5. THE ROOM

Our time domain room modeling follows the work of Bottel-dooren [44, 45]. We begin with the usual coupled equationsfor the velocity and pressure in the room

ρa∂vx∂t

= −∂p

∂x,

ρa∂vy∂t

= −∂p

∂y,

ρa∂vz∂t

= −∂p

∂z,

∂p

∂t= ρac

2a

[− ∂vx

∂x− ∂vy

∂y− ∂vz

∂z

],

(8)

where p is the pressure, the velocity components are vx, vy ,and vz, ρa is the density, and ca is the speed of sound in air.This family of equations is similar in form to an electromag-netic problem, and much is known about how to deal with itnumerically. We employ a finite difference approach in whichstaggered grids in both space and time are used for the pres-sure and velocity. Given a time step ∆tr , the pressure is com-puted at times n∆tr while the velocity is computed at times(n+1/2)∆tr . A similar staggered grid is used for the space co-ordinates, with the pressure calculated on the grid i∆xr , j∆xr ,k∆xr , while vx is calculated on the staggered grid (i+1/2)∆xr ,

j∆xr , and k∆xr . The grids for vy and vz are arranged in a sim-ilar manner, as explained in [44, 45].

Sound is generated in this numerical room by the vibra-tion of the soundboard. We situate the soundboard from theprevious section on a plane perpendicular to the z directionin the room, approximately 1 m from the nearest parallel wall(i.e., the floor). At each time step the velocity vz of the roomair at the surface of the soundboard is set to the calculatedsoundboard velocity at that instant, as obtained from thesoundboard calculation.

The room is taken to be a rectangular box with the sameacoustical properties for all 6 walls. The walls of the room aremodeled in terms of their acoustic impedance, Z, with

p = Zvn, (9)

where vn is the component of the (air) velocity normal to thewall [46]. Measurements of Z for a number of materials [47]have found that it is typically frequency dependent with theform

Z(ω) ≈ Z0 − iZ′

ω, (10)

where ω is the angular frequency. Incorporating this fre-quency domain expression for the acoustic impedance intoour time domain treatment was done in the manner de-scribed in [45].

The time step for the room calculation was ∆tr =1/22050 ≈ 4.5 × 10−4 s, as explained in the next section.The choice of spatial step size ∆xr was then influenced bytwo considerations. First, in order for the finite difference al-gorithm to be numerically stable in three dimensions, onemust have ∆xr/(

√3∆tr) > ca. Second, it is convenient for the

spatial steps for the soundboard and room to be commen-surate. In the calculations described below, the room stepsize was ∆xr = 4 cm, that is, twice the soundboard step size.When using the calculated soundboard velocity to obtain theroom velocity at the soundboard surface, we averaged over4 soundboard grid points for each room grid point. Typicalnumerical rooms were 3×4×4 m3, and thus contained∼ 106

finite difference elements.Figure 3 shows results for the sound generation by an

upright soundboard. Here, the soundboard was driven har-monically at the point where the string for middle C contactsthe bridge, and we plot the sound pressure normalized bythe board velocity at the driving point [25]. It is seen that themodel results compare well with the experiments. This pro-vides a check on both the soundboard and the room models.

6. PUTTING IT ALL TOGETHER

Our model involves several distinct but coupled sub-systems—the hammers/strings, the soundboard, and theroom—and it is useful to review how they fit together com-putationally. The calculation begins by giving some initialvelocity to a particular hammer. This hammer then strikes astring (or strings), and they interact through either (1) or (2).This sets the string(s) for that note into motion, and these


Experiment

Sound generation

Soundboard driven at C4

Model

0.5

1

2

5

10

20

p/v b

(arb

.un

its)

20 100 103 104

Frequency (Hz)

Figure 3: Results for the sound pressure normalized by the sound-board velocity for an upright piano soundboard: calculated (solidcurve) and measured (dotted curve). The board was driven at thepoint where the string for middle C crosses the bridge. Results from[25].

in turn act on the bridge and soundboard. As we have al-ready mentioned, the vibrations of each component of ourmodel are calculated with a finite difference algorithm, eachwith an associated time step. Since the systems are coupled—that is, the strings drive the soundboard, the soundboard actsback on the strings, and the soundboard drives the room—it would be computationally simpler to use the same valueof the time step for all three subsystems. However, the equa-tion of motion for the soundboard is highly dispersive, andstability requirements demand a much smaller time step forthe soundboard than is needed for string and room simula-tions. Given the large number of room elements, this wouldgreatly (and unnecessarily) slow down the calculation. Wehave therefore chosen to instead make the various time stepscommensurate, with

∆tr =(

122050

)s,

∆ts = ∆tr4

,

∆tb = ∆ts6

,

(11)

where the subscripts correspond to the room (r), string (s),and soundboard (b). To explain this hierarchy, we first notethat the room time step is chosen to be compatible with com-mon audio hardware and software; 1/∆tr is commensuratewith the data rates commonly used in CD sound formats.We then see that each room time step contains 4 string timesteps; that is, the string algorithm makes 4 iterations for eachiteration of the room model. Likewise, each string time stepcontains 6 soundboard steps.

The overall computational speed is currently somewhatless than “real time.” With a typical personal computer (clockspeed 1 GHz), a 1 minute simulation requires approximately

30 minutes of computer time. Of course, this gap will nar-row in the future in accord with Moore’s law. In addition,the model should transfer easily to a cluster (i.e., multi-CPU)machine. We have also explored an alternative approach tothe room modeling involving a ray tracing approach [48].Ray tracing allows one to express the relationship betweensoundboard velocity and sound pressure as a multiparame-ter map, involving approximately 104 parameters. The valuesof these parameters can be precalculated and stored, resultingin about an order of magnitude speed-up in the calculationas compared to the room algorithm described above.

7. ANALYSIS OF THE RESULTS: WHAT HAVE WELEARNED AND WHERE DO WE GO NEXT?

In the previous section, we saw that a real-time Newton’s lawsimulation of the piano is well within reach. While such asimulation would certainly be interesting, it is not a primarygoal of our work. We instead wish to use the modeling tolearn about the instrument. With that in mind, we now con-sider the quality of the tones calculated with the current ver-sion of the model.

In our initial modeling, we employed power law ham-mers described by (1) with parameters based on type Ihammer experiments by our group [31]. The results weredisappointing—it is hard to accurately describe the tones inwords, but they sounded distinctly plucked and somewhatmetallic. While we cannot include our calculated soundsas part of this paper, they are available on our websitehttp://www.physics.purdue.edu/piano. After many modelingcalculations, we came to the conclusion that the hammermodel—for example, the power law description (1)—was theproblem. Note that we do not claim that power law ham-mers must always give unsatisfactory results. Our point isthat when the power law parameters are chosen to fit the typeI behavior of real hammers, the calculated tones are poor. It iscertainly possible (and indeed, likely) that power law param-eters that will yield good piano tones can be found. How-ever, based on our experience, it seems that these parametersshould be viewed as fitting parameters, as they may not ac-curately describe any real hammers.

This led us to the type II hammer experiments describedabove, and to a description of the hammer-string force interms of the Stulov function (2), with parameters (τ0, ε0, etc.)taken from these type II experiments [35]. The results weremuch improved. While they are not yet “Steinway quality,”it is our opinion that the calculated tones could be mistakenfor a real piano. In that sense, they pass a sort of acousticalTuring test. Our conclusion is that the hammers are an es-sential part of the instrument. This is hardly a revolutionaryresult. However, based on our modeling, we can also makea somewhat stronger statement: in order to obtain a real-istic piano tone, the modeling should be based on hammerparameters observed in type II measurements, with the hys-teresis included in the model.

There are a number of issues that we plan to addressin the future. (1) The hammer portion of the model still


needs attention. Our experiments [35] indicate that whilethe Stulov function does provide a qualitative descriptionof the hammer force hysteresis, there are significant quan-titative differences. It may be necessary to develop a bet-ter functional description to replace the Stulov form. (2)As it currently stands, our string model includes only onepolarization mode, corresponding to vibrations parallel tothe initial hammer velocity. It is well known that the othertransverse polarization mode can be important [37]. Thiscan be readily included, but will require a more generalsoundboard model since the two transverse modes couplethrough the motion of the bridge. (3) The soundboard ofa real piano is supported by a case. Measurements in ourlaboratory indicate that the case acceleration can be as largeas 5% or so of the soundboard acceleration, so the soundemitted by the case is considerable. (4) We plan to refinethe room model. Our current room model is certainly avery crude approximation to a realistic room. Real roomshave wall coverings of various types (with differing valuesof the acoustic impedances), and contain chairs and otherobjects. At our current level of sophistication, it appearsthat the hammers are more of a limitation than the roommodel, but this may well change as the hammer modeling isimproved.

In conclusion, we have made good progress in developinga physical model of the piano. It is now possible to producerealistic tones using Newton’s laws with realistic and inde-pendently determined instrument parameters. Further im-provements of the model seem quite feasible. We believe thatphysical modeling can provide new insights into the piano,and that similar approaches can be applied to other instru-ments.

ACKNOWLEDGMENTS

We thank P. Muzikar, T. Rossing, A. Tubis, and G. Weinre-ich for many helpful and critical discussions. We also are in-debted to A. Korty, J. Winans II, J. Millis, S. Dietz, J. Jourdan,J. Roberts, and L. Reuff for their contributions to our pianostudies. This work was supported by National Science Foun-dation (NSF) through Grant PHY-9988562.

REFERENCES

[1] D. E. Hall, “Piano string excitation in the case of small ham-mer mass,” Journal of the Acoustical Society of America, vol. 79,no. 1, pp. 141–147, 1986.

[2] D. E. Hall, “Piano string excitation II: General solution fora hard narrow hammer,” Journal of the Acoustical Society ofAmerica, vol. 81, no. 2, pp. 535–546, 1987.

[3] D. E. Hall, “Piano string excitation III: General solution fora soft narrow hammer,” Journal of the Acoustical Society ofAmerica, vol. 81, no. 2, pp. 547–555, 1987.

[4] D. E. Hall and A. Askenfelt, “Piano string excitation V: Spectrafor real hammers and strings,” Journal of the Acoustical Societyof America, vol. 83, no. 4, pp. 1627–1638, 1988.

[5] D. E. Hall, “Piano string excitation. VI: Nonlinear modeling,”Journal of the Acoustical Society of America, vol. 92, no. 1, pp.95–105, 1992.

[6] H. Suzuki, “Model analysis of a hammer-string interaction,”Journal of the Acoustical Society of America, vol. 82, no. 4, pp.1145–1151, 1987.

[7] X. Boutillon, “Model for piano hammers: Experimental de-termination and digital simulation,” Journal of the AcousticalSociety of America, vol. 83, no. 2, pp. 746–754, 1988.

[8] A. Chaigne and A. Askenfelt, “Numerical simulations of pi-ano strings. I. A physical model for a struck string using finitedifference method,” Journal of the Acoustical Society of Amer-ica, vol. 95, no. 2, pp. 1112–1118, 1994.

[9] A. Chaigne and A. Askenfelt, “Numerical simulations of pianostrings. II. Comparisons with measurements and systematicexploration of some hammer-string parameters,” Journal ofthe Acoustical Society of America, vol. 95, no. 3, pp. 1631–1640,1994.

[10] A. Chaigne, “On the use of finite differences for musical syn-thesis. Application to plucked stringed instruments,” Journald’Acoustique, vol. 5, no. 2, pp. 181–211, 1992.

[11] N. Giordano, “Simple model of a piano soundboard,” Journalof the Acoustical Society of America, vol. 102, no. 2, pp. 1159–1168, 1997.

[12] H. A. Conklin Jr., “Design and tone in the mechanoacousticpiano. Part I. Piano hammers and tonal effects,” Journal of theAcoustical Society of America, vol. 99, no. 6, pp. 3286–3296,1996.

[13] H. Suzuki and I. Nakamura, “Acoustics of pianos,” Appl.Acoustics, vol. 30, pp. 147–205, 1990.

[14] H. A. Conklin Jr., “Design and tone in the mechanoacous-tic piano. Part II. Piano structure,” Journal of the AcousticalSociety of America, vol. 100, no. 2, pp. 695–708, 1996.

[15] H. A. Conklin Jr., “Design and tone in the mechanoacousticpiano. Part III. Piano strings and scale design,” Journal of theAcoustical Society of America, vol. 100, no. 3, pp. 1286–1298,1996.

[16] B. E. Richardson, G. P. Walker, and M. Brooke, “Synthesisof guitar tones from fundamental parameters relating to con-struction,” Proceedings of the Institute of Acoustics, vol. 12, no.1, pp. 757–764, 1990.

[17] A. Chaigne and V. Doutaut, “Numerical simulations of xy-lophones. I. Time-domain modeling of the vibrating bars,”Journal of the Acoustical Society of America, vol. 101, no. 1, pp.539–557, 1997.

[18] V. Doutaut, D. Matignon, and A. Chaigne, “Numerical simu-lations of xylophones. II. Time-domain modeling of the res-onator and of the radiated sound pressure,” Journal of theAcoustical Society of America, vol. 104, no. 3, pp. 1633–1647,1998.

[19] L. Rhaouti, A. Chaigne, and P. Joly, “Time-domain model-ing and numerical simulation of a kettledrum,” Journal of theAcoustical Society of America, vol. 105, no. 6, pp. 3545–3562,1999.

[20] B. Bank, F. Avanzini, G. Borin, G. De Poli, F. Fontana, andD. Rocchesso, “Physically informed signal processing meth-ods for piano sound synthesis: a research overview,” EURASIPJournal on Applied Signal Processing, vol. 2003, no. 10, pp.941–952, 2003.

[21] N. Giordano, M. Jiang, and S. Dietz, “Experimental and com-putational studies of the piano,” in Proc. 17th InternationalCongress on Acoustics, vol. 4, Rome, Italy, September 2001.

[22] J. Kindel and I.-C. Wang, “Modal analysis and finite ele-ment analysis of a piano soundboard,” in Proc. 5th Interna-tional Modal Analysis Conference, pp. 1545–1549, Union Col-lege, Schenectady, NY, USA, 1987.

[23] J. Kindel, “Modal analysis and finite element analysis of apiano soundboard,” M.S. thesis, University of Cincinnati,Cincinnati, Ohio, USA, 1989.


[24] N. Giordano, “Mechanical impedance of a piano sound-board,” Journal of the Acoustical Society of America, vol. 103,no. 4, pp. 2128–2133, 1998.

[25] N. Giordano, “Sound production by a vibrating piano sound-board: Experiment,” Journal of the Acoustical Society of Amer-ica, vol. 104, no. 3, pp. 1648–1653, 1998.

[26] A. Askenfelt and E. V. Jansson, “From touch to string vibra-tions. II. The motion of the key and hammer,” Journal of theAcoustical Society of America, vol. 90, no. 5, pp. 2383–2393,1991.

[27] T. Yanagisawa, K. Nakamura, and H. Aiko, “Experimentalstudy on force-time curve during the contact between ham-mer and piano string,” Journal of the Acoustical Society ofJapan, vol. 37, pp. 627–633, 1981.

[28] T. Yanagisawa and K. Nakamura, “Dynamic compressioncharacteristics of piano hammer,” Transactions of MusicalAcoustics Technical Group Meeting of the Acoustic Society ofJapan, vol. 1, pp. 14–17, 1982.

[29] T. Yanagisawa and K. Nakamura, “Dynamic compressioncharacteristics of piano hammer felt,” Journal of the Acous-tical Society of Japan, vol. 40, pp. 725–729, 1984.

[30] A. Stulov, “Hysteretic model of the grand piano hammer felt,”Journal of the Acoustical Society of America, vol. 97, no. 4, pp.2577–2585, 1995.

[31] N. Giordano and J. P. Winans II, “Piano hammers and theirforce compression characteristics: does a power law makesense?,” Journal of the Acoustical Society of America, vol. 107,no. 4, pp. 2248–2255, 2000.

[32] N. Giordano and J. P. Millis, “Hysteretic behavior of pi-ano hammers,” in Proc. International Symposium on Musi-cal Acoustics, D. Bonsi, D. Gonzalez, and D. Stanzial, Eds., pp.237–240, Perugia, Umbria, Italy, September 2001.

[33] A. Stulov and A. Magi, “Piano hammer: Theory and experi-ment,” in Proc. International Symposium on Musical Acoustics,D. Bonsi, D. Gonzalez, and D. Stanzial, Eds., pp. 215–220, Pe-rugia, Umbria, Italy, September 2001.

[34] J. I. Dunlop, “Nonlinear vibration properties of felt pads,”Journal of the Acoustical Society of America, vol. 88, no. 2, pp.911–917, 1990.

[35] N. Giordano and J. P. Millis, “Using physical modeling tolearn about the piano: New insights into the hammer-stringforce,” in Proc. International Congress on Acoustics, S. Furui,H. Kanai, and Y. Iwaya, Eds., pp. III–2113, Kyoto, Japan, April2004.

[36] N. H. Fletcher and T. D. Rossing, The Physics of Musical In-struments, Springer-Verlag, New York, NY, USA, 1991.

[37] G. Weinreich, “Coupled piano strings,” Journal of the Acous-tical Society of America, vol. 62, no. 6, pp. 1474–1484, 1977.

[38] M. Podlesak and A. R. Lee, “Dispersion of waves in pianostrings,” Journal of the Acoustical Society of America, vol. 83,no. 1, pp. 305–317, 1988.

[39] N. Giordano and A. J. Korty, “Motion of a piano string: lon-gitudinal vibrations and the role of the bridge,” Journal of theAcoustical Society of America, vol. 100, no. 6, pp. 3899–3908,1996.

[40] N. Giordano, Computational Physics, Prentice-Hall, UpperSaddle River, NJ, USA, 1997.

[41] V. Bucur, Acoustics of Wood, CRC Press, Boca Raton, Fla,USA, 1995.

[42] S. G. Lekhnitskii, Anisotropic Plates, Gordon and Breach Sci-ence Publishers, New York, NY, USA, 1968.

[43] J. W. S. Rayleigh, Theory of Sound, Dover, New York, NY, USA,1945.

[44] D. Botteldooren, “Acoustical finite-difference time-domainsimulation in a quasi-Cartesian grid,” Journal of the AcousticalSociety of America, vol. 95, no. 5, pp. 2313–2319, 1994.

[45] D. Botteldooren, “Finite-difference time-domain simulationof low-frequency room acoustic problems,” Journal of theAcoustical Society of America, vol. 98, no. 6, pp. 3302–3308,1995.

[46] P. M. Morse and K. U. Ingard, Theoretical Acoustics, PrincetonUniversity Press, Princeton, NJ, USA, 1986.

[47] L. L. Beranek, “Acoustic impedance of commercial materialsand the performance of rectangular rooms with one treatedsurface,” Journal of the Acoustical Society of America, vol. 12,pp. 14–23, 1940.

[48] M. Jiang, “Room acoustics and physical modeling of thepiano,” M.S. thesis, Purdue University, West Lafayette, Ind,USA, 1999.

N. Giordano obtained his Ph.D. from YaleUniversity in 1977, and has been at the De-partment of Physics at Purdue Universitysince 1979. His research interests includemesoscopic and nanoscale physics, compu-tational physics, and musical acoustics. Heis the author of the textbook ComputationalPhysics (Prentice-Hall, 1997). He also col-lects and restores antique pianos.

M. Jiang has a B.S. degree in physics (1997)from Peking University, China, and M.S.degrees in both physics and computer sci-ence (1999) from Purdue University. Someof the work described in this paper was partof his physics M.S. thesis. After graduation,he worked as a software engineer for twoyears, developing Unix kernel software anddevice drivers. In 2002, he moved to Boze-man, Montana, where he is now pursuing aPh.D. in computer science in Montana State University. Minghui’scurrent research interests include the design of algorithms, compu-tational geometry, and biological modeling and bioinformatics.


Sound Synthesis of the Harpsichord Usinga Computationally Efficient Physical Model

Vesa ValimakiLaboratory of Acoustics and Audio Signal Processing, Helsinki University of Technology, P.O. Box 3000,02015 Espoo, FinlandEmail: [email protected]

Henri PenttinenLaboratory of Acoustics and Audio Signal Processing, Helsinki University of Technology, P.O. Box 3000,02015 Espoo, FinlandEmail: [email protected]

Jonte KnifSibelius Academy, Centre for Music and Technology, P.O. Box 86, 00251 Helsinki, FinlandEmail: [email protected]

Mikael LaursonSibelius Academy, Centre for Music and Technology, P.O. Box 86, 00251 Helsinki, FinlandEmail: [email protected]

Cumhur ErkutLaboratory of Acoustics and Audio Signal Processing, Helsinki University of Technology, P.O. Box 3000,02015 Espoo, FinlandEmail: [email protected]

Received 24 June 2003; Revised 28 November 2003

A sound synthesis algorithm for the harpsichord has been developed by applying the principles of digital waveguide modeling. Amodification to the loss filter of the string model is introduced that allows more flexible control of decay rates of partials than ispossible with a one-pole digital filter, which is a usual choice for the loss filter. A version of the commuted waveguide synthesisapproach is used, where each tone is generated with a parallel combination of the string model and a second-order resonator thatare excited with a common excitation signal. The second-order resonator, previously proposed for this purpose, approximatelysimulates the beating effect appearing in many harpsichord tones. The characteristic key-release thump terminating harpsichordtones is reproduced by triggering a sample that has been extracted from a recording. A digital filter model for the soundboardhas been designed based on recorded bridge impulse responses of the harpsichord. The output of the string models is injected inthe soundboard filter that imitates the reverberant nature of the soundbox and, particularly, the ringing of the short parts of thestrings behind the bridge.

Keywords and phrases: acoustic signal processing, digital filter design, electronic music, musical acoustics.

1. INTRODUCTION

Sound synthesis is particularly interesting for acoustic key-board instruments, since they are usually expensive and largeand may require amplification during performances. Elec-tronic versions of these instruments benefit from the factthat keyboard controllers using MIDI are commonly avail-able and fit for use. Digital pianos imitating the timbre and

features of grand pianos are among the most popular elec-tronic instruments. Our current work focuses on the imita-tion of the harpsichord, which is expensive, relatively rare,but is still commonly used in music from the Renaissanceand the baroque era. Figure 1 shows the instrument used inthis study. It is a two-manual harpsichord that contains threeindividual sets of strings, two bridges, and has a large sound-board.

Sound Synthesis of the Harpsichord Using a Physical Model 935

Figure 1: The harpsichord used in the measurements has two man-uals, three string sets, and two bridges. The picture was taken duringthe tuning of the instrument in the anechoic chamber.

Instead of wavetable and sampling techniques that arepopular in digital instruments, we apply modeling tech-niques to design an electronic instrument that sounds nearlyidentical to its acoustic counterpart and faithfully respondsto the player’s actions, just as an acoustic instrument. We usethe modeling principle called commuted waveguide synthe-sis [1, 2, 3], but have modified it, because we use a digitalfilter to model the soundboard response. Commuted syn-thesis uses the basic property of linear systems, that in acascade of transfer functions their ordering can be changedwithout affecting the overall transfer function. This way, thecomplications in the modeling of the soundboard resonancesextracted from a recorded tone can be hidden in the in-put sequence. In the original form of commuted synthesis,the input signal contains the contribution of the excitationmechanism—the quill plucking the string—and that of thesoundboard with all its vibrating modes [4]. In the currentimplementation, the input samples of the string models areshort (less than half a second) and contain only the initialpart of the soundboard response; the tail of the soundboardresponse is reproduced with a reverberation algorithm.

Digital waveguide modeling [5] appears to be an excel-lent tool for the synthesis of harpsichord tones. A strong ar-gument supporting this view is that tones generated usingthe basic Karplus-Strong algorithm [6] are reminiscent ofthe harpsichord for many listeners.1 This synthesis techniquehas been shown to be a simplified version of a waveguidestring model [5, 7]. However, this does not imply that realis-tic harpsichord synthesis is easy. A detailed imitation of theproperties of a fine instrument is challenging, even thoughthe starting point is very promising. Careful modificationsto the algorithm and proper signal analysis and calibrationroutines are needed for a natural-sounding synthesis.

The new contributions to stringed-instrument modelsinclude a sparse high-order loop filter and a soundboard

1The Karplus-Strong algorithm manages to sound something like theharpsichord in some registers only when a high sampling rate is used, suchas 44.1 kHz or 22.05 kHz. At low sample rates, it sounds somewhat similarto violin pizzicato tones.

model that consists of the cascade of a shaping filter and acommon reverb algorithm. The sparse loop filter consists ofa conventional one-pole filter and a feedforward comb filterinserted in the feedback loop of a basic string model. Meth-ods to calibrate these parts of the synthesis algorithm are pro-posed.

This paper is organized as follows. Section 2 gives a shortoverview on the construction and acoustics of the harpsi-chord. In Section 3, signal-processing techniques for synthe-sizing harpsichord tones are suggested. In particular, the newloop filter is introduced and analyzed. Section 4 concentrateson calibration methods to adjust the parameters accordingto recordings. The implementation of the synthesizer usinga block-based graphical programming language is describedin Section 5, where we also discuss the computational com-plexity and potential applications of the implemented sys-tem. Section 6 contains conclusions, and suggests ideas forfurther research.

2. HARPSICHORD ACOUSTICS

The harpsichord is a stringed keyboard instrument with along history dating back to at least the year 1440 [8]. It isthe predecessor of the pianoforte and the modern piano. Itbelongs to the group of plucked string instruments due toits excitation mechanism. In this section, we describe brieflythe construction and the operating principles of the harpsi-chord and give details of the instrument used in this study.For a more in-depth discussion and description of the harp-sichord, see, for example, [9, 10, 11, 12], and for a descrip-tion of different types of harpsichord, the reader is referredto [10].

2.1. Construction of the instrument

The form of the instrument can be roughly described as tri-angular, and the oblique side is typically curved. A harpsi-chord has one or two manuals that control two to four sets ofstrings, also called registers or string choirs. Two of the stringchoirs are typically tuned in unison. These are called the 8′ (8foot) registers. Often the third string choir is tuned an octavehigher, and it is called the 4′ register. The manuals can be setto control different registers, usually with a limited numberof combinations. This permits the player to use different reg-isters with left- and right-hand manuals, and therefore varythe timbre and loudness of the instrument. The 8′ registersdiffer from each other in the plucking point of the strings.Hence, the 8′ registers are called 8′ back and front registers,where “back” refers to the plucking point away from the nut(and the player).

The keyboard of the harpsichord typically spans four orfive octaves, which became a common standard in the early18th century. One end of the strings is attached to the nutand the other to a long, curved bridge. The portion of thestring behind the bridge is attached to a hitch pin, whichis on top of the soundboard. This portion of the string alsotends to vibrate for a long while after a key press, and it givesthe instrument a reverberant feel. The nut is set on a veryrigid wrest plank. The bridge is attached to the soundboard.


gsb

Tonecorrector

SoundboardfilterR(z)

Excitationsamples

Timbrecontrol

S(z)Output

grelease

Releasesamples

Trigger atrelease time

Trigger atattack time

Figure 2: Overall structure of the harpsichord model for a single string. The model structure is identical for all strings in the three sets, butthe parameter values and sample data are different.

Therefore, the bridge is mainly responsible for transmittingstring vibrations to the soundboard. The soundboard is verythin—about 2 to 4 mm—and it is supported by several ribsinstalled in patterns that leave trapezoidal areas of the sound-board vibrating freely. The main function of the soundboardis to amplify the weak sound of the vibrating strings, but italso filters the sound. The soundboard forms the top of aclosed box, which typically has a rose opening. It causes aHelmholtz resonance, the frequency of which is usually be-low 100 Hz [12]. In many harpsichords, the soundbox alsoopens to the manual compartment.

2.2. Operating principle

A plectrum—also called a quill—that is anchored onto ajack, plucks the strings. The jack rests on a string, but there isa small piece of felt (called the damper) between them. Oneend of the wooden keyboard lever is located a small distancebelow the jack. As the player pushes down a key on the key-board, the lever moves up. This action lifts the jack up andcauses the quill to pluck the string. When the key is released,the jack falls back and the damper comes in contact with thestring with the objective to dampen its vibrations. A springmechanism in the jack guides the plectrum so that the stringis not replucked when the key is released.

2.3. The harpsichord used in this study

The harpsichord used in this study (see Figure 1) was builtin 2000 by Jonte Knif (one of the authors of this paper) andArno Pelto. It has the characteristics of harpsichords built inItaly and Southern Germany. This harpsichord has two man-uals and three sets of string choirs, namely an 8′ back, an8′ front, and a 4′ register. The instrument was tuned to theVallotti tuning [13] with the fundamental frequency of A4 of415 Hz.2 There are 56 keys from G1 to D6, which correspondto fundamental frequencies 46 Hz and 1100 Hz, respectively,in the 8′ register; the 4′ register is an octave higher, so thecorresponding lowest and highest fundamental frequenciesare about 93 Hz and 2200 Hz. The instrument is 240 cm long

2The tuning is considerably lower than the current standard (440 Hz orhigher). This is typical of old musical instruments.

and 85 cm wide, and its strings are all made of brass. Theplucking point changes from 12% to about 50% of the stringlength in the bass and in the treble range, respectively. Thisproduces a round timbre (i.e., weak even harmonics) in thetreble range. In addition, the dampers have been left out inthe last octave of the 4′ register to increase the reverberantfeel during playing. The wood material used in the instru-ment has been heat treated to artificially accelerate the agingprocess of the wood.

3. SYNTHESIS ALGORITHM

This section discusses the signal processing methods used inthe synthesis algorithm. The structure of the algorithm isillustrated in Figure 2. It consists of five digital filters, twosample databases, and their interconnections. The physicalmodel of a vibrating string is contained in block S(z). Its in-put is retrieved from the excitation signal database, and itcan be modified during run-time with a timbre-control fil-ter, which is a one-pole filter. In parallel with the string, asecond-order resonator R(z) is tuned to reproduce the beat-ing of one of the partials, as proposed earlier by Bank et al.[14, 15]. While we could use more resonators, we have de-cided to target a maximally reduced implementation to min-imize the computational cost and number of parameters. Thesum of the string model and resonator output signals is fedthrough a soundboard filter, which is common for all strings.The tone corrector is an equalizer that shapes the spectrumof the soundboard filter output. By varying coefficients grelease

and gsb, it is possible to adjust the relative levels of the stringsound, the soundboard response, and the release sound.

In the following, we describe the string model, the sampledatabases, and the soundboard model in detail, and discussthe need for modeling the dispersion of harpsichord strings.

3.1. Basic string model revisited

We use a version of the vibrating string filter model proposedby Jaffe and Smith [16]. It consists of a feedback loop, wherea delay line, a fractional delay filter, a high-order allpass filter,and a loss filter are cascaded. The delay line and the fractionaldelay filter determine the fundamental frequency of the tone.The high-order allpass filter [16] simulates dispersion which


One-polefilter

z−1

−+

a

b

Ripplefilter

z−R

r

z−L1F(z)

Ad(z)

x(n) y(n)

Figure 3: Structure of the proposed string model. The feedback loop contains a one-pole filter (denominator of (1)), a feedforward combfilter called “ripple filter” (numerator of (1)), the rest of the delay line, a fractional delay filter F(z), and an allpass filter Ad(z) simulatingdispersion.

is a typical characteristic of vibrating strings and which in-troduces inharmonicity in the sound. For the fractional delayfilter, we use a first-order allpass filter, as originally suggestedby Smith and Jaffe [16, 17]. This choice was made because itallows a simple and sufficient approximation of delay whena high sampling rate is used.3 Furthermore, there is no needto implement fundamental frequency variations (pitch bend)in harpsichord tones. Thus, the recursive nature of the allpassfractional delay filter, which can cause transients during pitchbends, is not harmful.

The loss filter of waveguide string models is usually im-plemented as a one-pole filter [18], but now we use an ex-tended version. The transfer function of the new loss filteris

H(z) = br + z−R

1 + az−1, (1)

where the scaling parameter b is defined as

b = g(1 + a), (2)

R is the delay line length of the ripple filter, r is the rippledepth, and a is the feedback gain. Figure 3 shows the blockdiagram of the string model with details of the new loss filter,which is seen to be composed of the conventional one-polefilter and a ripple filter in cascade. The total delay line lengthL in the feedback loop is 1+R+L1 plus the phase delay causedby the fractional delay filter F(z) and the allpass filter Ad(z).

The overall loop gain is determined by parameter g,which is usually selected to be slightly smaller than 1 to en-sure stability of the feedback loop. The feedback gain param-eter a defines the overall lowpass character of the filter: avalue slightly smaller than 0 (e.g., a = −0.01) yields a mildlowpass filter, which causes high-frequency partials to decayfaster than the low-frequency ones, which is natural.

The ripple depth parameter r is used to control the de-viation of the loss filter gain from that of the one-pole filter.

3The sampling rate used in this work is 44100 Hz.

The delay line length R is determined as

R = round(rrateL

), (3)

where rrate is the ripple rate parameter that adjusts the rip-ple density in the frequency domain and L is the total delaylength in the loop (in samples, or sampling intervals).

The ripple filter was developed because it was found thatthe magnitude response of the one-pole filter alone is overlysmooth when compared to the required loop gain behaviorfor harpsichord sounds. Note that the ripple factor r in (1)increases the loop gain, but it is not accounted for in the scal-ing factor in (2). This is purposeful because we find it usefulthat the loop gain oscillates symmetrically around the mag-nitude response of the conventional one-pole filter (obtainedfrom (1) by setting r = 0). Nevertheless, it must be ensuredsomehow that the overall loop gain does not exceed unity atany of the harmonic frequencies—otherwise the system be-comes unstable. It is sufficient to require that the sum g + |r|remains below one, or |r| < 1−g. In practice, a slightly largermagnitude of r still results in a stable system when r < 0,because this choice decreases the loop gain at 0 Hz and theconventional loop filter is a lowpass filter, and thus its gain atthe harmonic frequencies is smaller than g.

With small positive or negative values of r, it is possible toobtain wavy loop gain characteristics, where two neighboringpartials have considerably different loop gains and thus decayrates. The frequency of the ripple is controlled by parameterrrate so that a value close to one results in a very slow wave,while a value close to 0.5 results in a fast variation where theloop gain for neighboring even and odd partials differs byabout 2r (depending on the value of a). An example is shownin Figure 4 where the properties of a conventional one-poleloss filter are compared against the proposed ripply loss filter.Figure 4a shows that by adding a feedforward path with smallgain factor r = 0.002, the loop gain characteristics can bemade less regular.

Figure 4b shows the corresponding reverberation time(T60) curve, which indicates how long it takes for each partialto decay by 60 dB. The T60 values are obtained by multiplyingthe time-constant values τ by −60/[20 log(1/e)] or 6.9078.


0 500 1000 1500 2000 2500 3000

Frequency (Hz)

0.985

0.99

0.995

1

Loop

gain

(a)

0 500 1000 1500 2000 2500 3000

Frequency (Hz)

0

5

10

T60

(s)

(b)

Figure 4: The frequency-dependent (a) loop gain (magnitude response) and (b) reverberation time T60 determined by the loss filter. Thedashed lines show the smooth characteristics of a conventional one-pole loss filter (g = 0.995, a = −0.05). The solid lines show thecharacteristics obtained with the ripply loss filter (g = 0.995, a = −0.05, r = 0.0020, rrate = 0.5). The bold dots indicate the actualproperties experienced by the partials of the synthetic tone (L = 200 samples, f0 = 220.5 Hz).

The time constants τ(k) for partial indices k = 1, 2, 3, . . . , onthe other hand, are obtained from the loop gain data G(k) as

τ(k) = −1f0 ln

[G(k)

] . (4)

The loop gain sequence G(k) is extracted directly from themagnitude response of the loop filter at the fundamental fre-quency (k = 1) and at the other partial frequencies (k =2, 3, 4, . . . ).

Figure 4b demonstrates the power of the ripply loss fil-ter: the second partial can be rendered to decay much slowerthan the first and the third partials. This is also perceivedin the synthetic tone: soon after the attack, the second par-tial stands out as the loudest and the longest ringing partial.Formerly, this kind of flexibility has been obtained only withhigh-order loss filters [17, 19]. Still, the new filter has onlytwo parameters more than the one-pole filter, and its com-putational complexity is comparable to that of a first-orderpole-zero filter.

3.2. Inharmonicity

Dispersion is always present in real strings. It is caused bythe stiffness of the string material. This property of stringsgives rise to inharmonicity in the sound. An offspring of theharpsichord, the piano, is famous for its strongly inharmonictones, especially in the bass range [9, 20]. This is due to thelarge elastic modulus and the large diameter of high-strengthsteel strings in the piano [9]. In waveguide models, inhar-monicity is modeled with allpass filters [16, 21, 22, 23]. Nat-urally, it would be cost-efficient not to implement the inhar-

monicity, because then the allpass filter Ad(z) would not beneeded at all.

The inharmonicity of the recorded harpsichord toneswere investigated in order to find out whether it is relevantto model this property. The partials of recorded harpsichordtones were picked semiautomatically from the magnitudespectrum, and with a least-square fit we estimated the in-harmonicity coefficient B [20] for each recorded tone. Themeasured B values are displayed in Figure 5 together with thethreshold of audibility and its 90% confidence intervals takenfrom listening test results [24]. It is seen that the B coeffi-cient is above the mean threshold of audibility in all cases, butabove the frequency 140 Hz, the measured values are withinthe confidence interval. Thus, it is not guaranteed that thesecases actually correspond to audible inharmonicity. At lowfrequencies, in the case of the 19 lowest keys of the harpsi-chord, where the inharmonicity coefficients are about 10−5,the inharmonicity is audible according to this comparison.It is thus important to implement the inharmonicity for thelowest 2 octaves or so, but it may also be necessary to imple-ment the inharmonicity for the rest of the notes.

This conclusion is in accordance with [10], where inhar-monicity is stated as part of the tonal quality of the harp-sichord, and also with [12], where it is mentioned that theinharmonicity is less pronounced than in the piano.

3.3. Sample databases

The excitation signals of the string models are stored in adatabase from where they can be retrieved at the onset time.The excitation sequences contain 20,000 samples (0.45 s),


0 200 400 600 800 1000

Fundamental frequency (Hz)

10−7

10−6

10−5

10−4

10−3

10−2

B

Figure 5: Estimates of the inharmonicity coefficient B for all 56 keysof the harpsichord (circles connected with thick line). Also shownare the threshold of audibility for the B coefficient (solid line) andits 90% confidence intervals (dashed lines) taken from [24].

and they have been extracted from recorded tones by can-celing the partials. The analysis and calibration procedure isdiscussed further in Section 4 of this paper. The idea is toinclude in these samples the sound of the quill scraping thestring plus the beginning of the attack of the sound so thata natural attack is obtained during synthesis, and the ini-tial levels of partials are set properly. Note that this approachis slightly different from the standard commuted synthesistechnique, where the full inverse filtered recorded signal isused to excite the string model [18, 25]. In the latter case,all modes of the soundboard (or soundbox) are containedwithin the input sequence, and virtually perfect resynthesis isaccomplished if the same parameters are used for inverse fil-tering and synthesis. In the current model, however, we havetruncated the excitation signals by windowing them with theright half of a Hanning window. The soundboard responseis much longer than that (several seconds), but imitating itsringing tail is taken care of by the soundboard filter (see thenext subsection).

In addition to the excitation samples, we have extractedshort release sounds from recorded tones. One of these is re-trieved and played each time a note-off command occurs. Ex-tracting these samples is easy: once a note is played, the playercan wait until the string sound has completely decayed, andthen release the key. This way a clean recording of noises re-lated to the release event is obtained, and any extra process-ing is unnecessary. An alternative way would be to synthesizethese knocking sounds using modal synthesis, as suggested in[26].

3.4. Modeling the reverberant soundboardand undamped strings

When a note is plucked on the harpsichord, the string vibra-tions excite the bridge and, consequently, the soundboard.

0 0.5 1 1.5 2

Time (s)

0

500

1000

1500

2000

2500

3000

3500

4000

Freq

uen

cy(H

z)

−40

−35

−30

−25

−20

−15

−10

−5

0dB

Figure 6: Time-frequency plot of the harpsichord air radiationwhen the 8′ bridge is excited. To exemplify the fast decay of thelow-frequency modes only the first 2 seconds and frequencies upto 4000 Hz are displayed.

The soundboard has its own modes depending on the sizeand the materials used. The radiated acoustic response of theharpsichord is reasonably flat over a frequency range from 50to 2000 Hz [11]. In addition to exciting the air and structuralmodes of the instrument body, the pluck excites the part ofthe string that lies behind the bridge, the high modes of thelow strings that the dampers cannot perfectly attenuate, andthe highest octave of the 4′ register strings.4 The resonancestrings behind the bridge are about 6 to 20 cm long and havea very inharmonic spectral structure. The soundboard filterused in our harpsichord synthesizer (see Figure 2) is respon-sible for imitating all these features. However, as will be dis-cussed further in Section 4.5, the lowest body modes can beignored since they decay fast and are present in the excita-tion samples. In other words, the modeling is divided intotwo parts so that the soundboard filter models the rever-berant tail while the attack part is included in the excitationsignal, which is fed to the string model. Reference [11] dis-cusses the resonance modes of the harpsichord soundboardin detail.

The radiated acoustic response of the harpsichord wasrecorded in an anechoic chamber by exciting the bridges(8′ and 4′) with an impulse hammer at multiple positions.Figure 6 displays a time-frequency response of the 8′ bridgewhen excited between the C3 strings, that is, approximatelyat the middle point of the bridge. The decay times at fre-quencies below 350 Hz are considerably shorter than in thefrequency range from 350 to 1000 Hz. The T60 values at therespective bands are about 0.5 seconds and 4.5 seconds. Thiscan be explained by the fact that the short string portions

4The instrument used in this study does not have dampers in the lastoctave of the 4′ register.


behind the bridge and the undamped strings resonate anddecay slowly.

As suggested by several authors, see for example, [14, 27,28], the impulse response of a musical instrument body canbe modeled with a reverberation algorithm. Such algorithmshave been originally devised for imitating the impulse re-sponse of concert halls. In a previous work, we triggered astatic sample of the body response with every note [29]. Incontrast to the sample-based solution, which produces thesame response every time, the reverberation algorithm pro-duces additional variation in the sound: as the input signalof the reverberation algorithm is changed, or in this case asthe key or register is changed, the temporal and frequencycontent of the output changes accordingly.

The soundboard response of the harpsichord in this workis modeled with an algorithm presented in [30]. It is a mod-ification of the feedback delay network [31], where the feed-back matrix is replaced with a single coefficient, and comballpass filters have been inserted in the delay line loops. Aschematic view of the reverberation algorithm is shown inFigure 7. This structure is used because of its computationalefficiency. The Hk(z) blocks represent the loss filters, Ak(z)blocks are the comb allpass filters, and the delay lines are oflength Pk. In this work, eight (N = 8) delay lines are imple-mented.

One-pole lowpass filters are used as loss filters which im-plement the frequency-dependent decay. The comb allpassfilters increase the diffusion effect and they all have the trans-fer function

Ak(z) = aap,k + z−Mk

1 + aap,kz−Mk, (5)

where Mk are the delay-line lengths and aap,k are the allpassfilter coefficients. To ensure stability, it is required that aap,k ∈[−1, 1]. In addition to the reverberation algorithm, a tone-corrector filter, as shown in Figure 2, is used to match thespectral envelope of the target response, that is, to suppressthe low frequencies below 350 Hz and give some additionallowpass characteristics at high frequencies. The choice of theparameters is discussed in Section 4.5.

4. CALIBRATION OF THE SYNTHESIS ALGORITHM

The harpsichord was brought into an anechoic chamberwhere the recordings and the acoustic measurements wereconducted. The registered signals enable the automatic cali-bration of the harpsichord synthesizer. This section describesthe recordings, the signal analysis, and the calibration tech-niques for the string and the soundboard models.

4.1. Recordings

Harpsichord tones were recorded in the large anechoic cham-ber of Helsinki University of Technology. Recordings weremade with multiple microphones installed at a distance ofabout 1 m above the soundboard. The signals were recordeddigitally (44.1 kHz, 16 bits) directly onto the hard disk, andto remove disturbances in the infrasonic range, they were

highpass filtered. The highpass filter is a fourth-order But-terworth highpass filter with a cutoff frequency of 52 Hz or32 Hz (for the lowest tones). The filter was applied to thesignal in both directions to obtain a zero-phase filtering.The recordings were compared in an informal listening testamong the authors, and the signals obtained with a high-quality studio microphone by Schoeps were selected for fur-ther analysis.

All 56 keys of the instrument were played separately withsix different combinations of the registers that are commonlyused. This resulted in 56 × 6 = 336 recordings. The toneswere allowed to decay into silence, and the key release was in-cluded. The length of the single tones varied between 10 and25 seconds, because the bass tones of the harpsichord tendto ring much longer than the treble tones. For completeness,we recorded examples of different dynamic levels of differentkeys, although it is known that the harpsichord has a limiteddynamic range due to its excitation mechanism. Short stac-cato tones, slow key pressings, and fast repetitions of singlekeys were also registered. Chords were recorded to measurethe variations of attack times between simultaneously playedkeys. Additionally, scales and excerpts of musical pieces wereplayed and recorded.

Both bridges of the instrument were excited at severalpoints (four and six points for the 4′ and the 8′ bridge, re-spectively) with an impulse hammer to obtain reliable acous-tic soundboard responses. The force signal of the hammerand acceleration signal obtained from an accelerometer at-tached to the bridge were recorded for the 8′ bridge atthree locations. The acoustic response was recorded in syn-chrony.

4.2. Analysis of recorded tones and extractionof excitation signals

Initial estimates of the synthesizer parameters can be ob-tained from analysis of recorded tones. For the basic calibra-tion of the synthesizer, the recordings were selected whereeach register is played alone. We use a method based on theshort-time Fourier transform and sinusoidal modeling, aspreviously discussed in [18, 32]. The inharmonicity of harp-sichord tones is accounted for in the spectral peak-pickingalgorithm with the help of the estimated B coefficient val-ues. After extracting the fundamental frequency, the analy-sis system essentially decomposes the analyzed tone into itsdeterministic and stochastic parts, as in the spectral model-ing synthesis method [33]. However, in our system the de-cay times of the partials are extracted, and the loop filter de-sign is based on the loop gain data calculated from the de-cay times. The envelopes of partials in the harpsichord tonesexhibit beating and two-stage decay, as is usual for string in-struments [34]. The residual is further processed, that is, thesoundboard contribution is mostly removed (by windowingthe residual signal in the time domain) and the initial levelof each partial is adjusted by adding a correction obtainedthrough sinusoidal modeling and inverse filtering [35, 36].The resulting processed residual is used as an excitation sig-nal to the model.


+gfb

+AN (z)HN (z)z−PN+

+...

y(n)+

−+

x(n)

−A1(z)H1(z)z−P1+

Figure 7: A schematic view of the reverberation algorithm used for soundboard modeling.

4.3. Loss filter design

Since the ripply loop filter is an extension of the one-pole fil-ter that allows improved matching of the decay rate of onepartial and simply introduces variations to the others, it isreasonable to design it after the one-pole filter. This kindof approach is known to be suboptimal in filter design, buthighest possible accuracy is not the main goal of this work.Rather, a simple and reliable routine to automatically pro-cess a large amount of measurement data is reached for, thusleaving a minimum amount of erroneous results to be fixedmanually.

Figure 8 shows the loop gain and T60 data for an examplecase. It is seen that the target data (bold dots in Figure 8) con-tain a fair amount of variation from one partial to the nextone, although the overall trend is downward as a functionof frequency. Partials with indices 10, 11, 16, and 18 are ex-cluded (set to zero), because their decay times were found tobe unreliable (i.e., loop gain larger than unity). The one-polefilter response fitted using a weighted least squares technique[18] (dashed lines in Figure 8) can follow the overall trend,but it evens up the differences between neighboring partials.

The ripply loss filter can be designed using the followingheuristic rules.

(1) Select the partial with the largest loop gain startingfrom the second partial5 (the sixth partial in this case,see Figure 8), whose index is denoted by kmax. Usuallyone of the lowest partials will be picked once the out-liers have been discarded.

(2) Set the absolute value of r so that, together with theone-pole filter, the magnitude response will match thetarget loop gain of the partial with index kmax, that is,|r| = G(kmax) − |H(kmax f0)|, where the second termis the loop gain due to the one-pole filter at that fre-quency (in this case r = 0.0015).

5In practice, the first partial may have the largest loop gain. However, ifwe tried to match it using the ripply loss filter, the rrate parameter would goto 1, as can be seen from (6), and the delay-line lengthRwould become equalto L rounded to an integer, as can be seen from (3). This practically meansthat the ripple filter would be reduced to a correction of the loop gain byr, which can be done also by simply replacing the loop gain parameter g byg + r. For this reason, it is sensible to match the loop gain of a partial otherthan the first one.

0 500 1000 1500 2000 2500 3000 3500 4000

Frequency (Hz)

0.985

0.99

0.995

1

Loop

gain

(a)

0 500 1000 1500 2000 2500 3000 3500 4000

Frequency (Hz)

0

5

10

T60

(s)

(b)

Figure 8: (a) The target loop gain for a harpsichord tone ( f0 =197 Hz) (bold dots), the magnitude response of the conventionalone-pole filter with g = 0.9960 and a = −0.0296 (dashed line), andthe magnitude response of the ripply loss filter with r = −0.0015and rrate = 0.0833 (solid line). (b) The corresponding T60 data. Thetotal delay-line length is 223.9 samples, and the delay-line length Rof the ripple filter is 19 samples.

(3) If the target loop gain of the first partial is larger thanthe magnitude response of the one-pole filter alone atthat frequency, set the sign of r to positive, and other-wise to negative so that the decay of the first partial ismade fast (in the example case in Figure 8, the minussign is chosen, that is, r = −0.0015).

(4) If a positive r has been chosen, conduct a stabilitycheck at the zero frequency. If it fails (i.e., g + r ≥ 1),the value of r must be made negative by changing itssign.

(5) Set the ripple rate parameter rrate so that the longestringing partial will occur at the maximum nearest to0 Hz. This means that the parameter must be chosen


according to the following rule:

rrate =

1kmax

when r ≥ 0,

12kmax

when r < 0.(6)

In the example case, as the ripple pattern is a negativecosine wave (in the frequency domain) and the peak shouldhit the 6th partial, we set the rrate parameter equal to 1/12 =0.0833. This implies that the minimum will occur at every12th partial and the first maximum will occur at the 6th par-tial. The result of this design procedure is shown in Figure 8with the solid line. Note that the peak is actually between the5th and the 6th partial, because fractional delay techniquesare not used in this part of the system and the delay-linelength R is thus an integer, as defined in (3). It is obvious thatthis design method is limited in its ability to follow arbitrarytarget data. However, as we now know that the resolution ofhuman hearing is also very limited in evaluating differencesin decay rates [37], we find the match in most cases to besufficiently good.

4.4. Beating filter design

The beating filter, a second-order resonator R(z) coupled inparallel with the string model (see Figure 2), is used for re-producing the beating in harpsichord synthesis. In practice,we decided to choose the center frequency of the resonator sothat it brings about the beating effect in one of the low-indexpartials that has a prominent level and large beat amplitude.These criteria make sure that the single resonator will pro-duce an audible effect during synthesis.

In this implementation, we probed the deviation of theactual decay characteristics of the partials from the ideal ex-ponential decay. This procedure is illustrated in Figure 9. InFigure 9a, the mean-squared error (MSE) of the deviation isshown. The lowest partial that exhibits a high deviation (10thpartial in this example) is selected as a candidate for the mostprominent beating partial. Its magnitude envelope is pre-sented in Figure 9b by a solid curve. It exhibits a slow beatingpattern with a period of about 1.5 seconds. The second-orderresonator that simulates beating, in turn, can be tuned to re-sult in a beating pattern with this same rate. For comparison,the magnitude envelopes of the 9th and 11th partials are alsoshown by dashed and dash-dotted curves, respectively.

The center frequency of the resonator is measured fromthe envelope of the partial. In practice, the offset ranges frompractically 0 Hz to a few Hertz. The gain of the resonator,that is, the amplitude of the beating partial, is set to be thesame as that of the partial it beats against. This simple choiceis backed by the recent result by Jarvelainen and Karjalainen[38] that the beating in string instrument tones is essentiallyperceived as an on/off process: if the beating amplitude isabove the threshold of audibility, it is noticed, while if it isbelow it, it becomes inaudible. Furthermore, changes in thebeating amplitude appear to be inaccurately perceived. Be-fore knowing these results, in a former version of the synthe-sizer, we also decided to use the same amplitude for the two

20 40 60 80 100

Harmonic #

0

500

1000

1500

MSE

(a)

9th partial10th partial11th partial

500 1000 1500 2000

Time (ms)

−200

−180

−160

−140

−120

Mag

nit

ude

(dB

)

(b)

Figure 9: (a) The mean squared error of exponential curve fittingto the decay of partials ( f0 = 197 Hz), where the lowest large devi-ation has been circled (10th partial), and the acceptance thresholdis presented with a dashed-dotted line. (b) The corresponding tem-poral envelopes of the 9th, 10th, and 11th partials, where the slowbeating of the 10th partial and deviations in decay rates are visible.

components that produce the beating, because the mixingparameter that adjusts the beating amplitude was not givinga useful audible variation [39]. Thus, we are now convincedthat it is unnecessary to add another parameter for all stringmodels by allowing changes in the amplitude of the beatingpartial.

4.5. Design of soundboard filter

The reverberation algorithm and the tone correction unit areset in cascade and together they form the soundboard model,as shown in Figure 2. For determining the soundboard filter,the parameters of the reverberation algorithm and its tonecorrector have to be set. The parameters for the reverbera-tion algorithm were chosen as proposed in [31]. To matchthe frequency-dependent decay, the ratio between the de-cay times at 0 Hz and at fs/2 was set to 0.13, so that T60 at0 Hz became 6.0 seconds. The lengths of the eight delay linesvaried from 1009 to 1999 samples. To avoid superimposingthe responses, the lengths were incommensurate numbers[40]. The lengths Mk of the delay lines in the comb allpassstructures were set to 8% of the total length of each delayline path Pk, filter coefficients aap,k were all set to 0.5, and thefeedback coefficient gfb was set to −0.25.


The excitation signals for the harpsichord synthesizerare 0.45 second long, and hence contain the necessary fast-decaying modes for frequencies below 350 Hz (see Figure 6).Therefore, the tone correction section is divided into twoparts: a highpass filter that suppresses frequencies below350 Hz and another filter that imitates the spectral envelopeat the middle and high frequencies. The highpass filter is a5th-order Chebyshev type I design with a 5 dB passband rip-ple, the 6 dB point at 350 Hz, and a roll-off rate of about50 dB per octave below the cutoff frequency. The spectral en-velope filter for the soundboard model is a 10th-order IIRfilter designed using linear prediction [41] from a 0.2-secondlong windowed segment of the measured target response (seeFigure 6 from 0.3 second to 0.5 second). Figure 10 shows thetime-frequency plot of the target response and the sound-board filter for the first 1.5 seconds up to 10 kHz. The tar-get response has a prominent lowpass characteristic, whichis due to the properties of the impulse hammer. While theresponse should really be inverse filtered by the hammerforce signal, in practice we can approximately compensatethis effect with a differentiator whose transfer function isHdiff(z) = 0.5− 0.5z−1. This is done before the design of thetone corrector, so the compensation filter is not included inthe synthesizer implementation.

5. IMPLEMENTATION AND APPLICATIONS

This section deals with computational efficiency, implemen-tation issues, and musical applications of the harpsichordsynthesizer.

5.1. Computational complexity

The computational cost caused by implementing the harp-sichord synthesizer and running it at an audio sample rate,such as 44100 Hz, is relatively small. Table 1 summarizes theamount of multiplications and additions needed per sam-ple for various parts of the system. In this cost analysis, it isassumed that the dispersion is simulated using a first-orderallpass filter. In practice, the lowest tones require a higher-order allpass filter, but some of the highest tones may nothave the allpass filter at all. So the first-order filter representsan average cost per string model. Note that the total cost perstring is smaller than that of an FIR filter of order 12 (i.e., 13multiplications and 12 additions). In practice, one voice inharpsichord synthesis is allocated one to three string mod-els, which simulate the different registers. The soundboardmodel is considerably more costly than a string model: thenumber of multiplications is more than fourfold, and thenumber of additions is almost seven times larger. The com-plexity analysis of the comb allpass filters in the soundboardmodel is based on the direct form II implementation (i.e.,one delay line, two multiplications, and two additions percomb allpass filter section).

The implementation of the synthesizer, which is dis-cussed in detail in the next section, is based on high-levelprogramming and control. Thus, it is not optimized forfastest possible real-time operation. The current implemen-tation of the synthesizer runs on a Macintosh G4 (800 MHz)

0 2000 40006000 8000 10000Frequency (Hz)

−40

−20

0

Mag

nit

ude

(dB

)

1.5

1

0.5

0

Tim

e (s)

(a)

0 2000 4000 6000 8000 10000Frequency (Hz)

−40

−20

0M

agn

itu

de(d

B)

1.5

1

0.5

0

Tim

e(s

)

(b)

Figure 10: The time-frequency representation of (a) the recordedsoundboard response and (b) the synthetic response obtained as theimpulse response of a modified feedback delay network.

computer, and it can simultaneously run 15 string models inreal time without the soundboard model. With the sound-board model, it is possible to run about 10 strings. A new,faster computer and optimization of the code can increasethese numbers. With optimized code and fast hardware, itmay be possible to run the harpsichord synthesizer with fullpolyphony (i.e., 56 voices) and soundboard in real time usingcurrent technology.

5.2. Synthesizer implementation

The signal-processing part of the harpsichord synthesizeris realized using a visual software synthesis package calledPWSynth [42]. PWSynth, in turn, is part of a larger visualprogramming environment called PWGL [43]. Finally, thecontrol information is generated using our music notationpackage ENP (expressive notation package) [44]. In this sec-tion, the focus is on design issues that we have encounteredwhen implementing the synthesizer. We also give ideas on


Table 1: The number of multiplications and additions in different parts of the synthesizer.

Part of synthesis algorithm Multiplications Additions

String model

• Fractional delay allpass filter F(z) 2 2

• Inharmonizing allpass filter Ad(z) 2 2

• One-pole filter 2 1

• Ripple filter 1 1

• Resonator R(z) 3 2

• Timbre control 2 1

•Mixing with release sample 1 1

Soundboard model

•Modified FDN reverberator 33 47

• IIR tone corrector 11 10

•Highpass filter 12 9

•Mixing 1 1

Total

• Per string (without soundboard model) 13 10

• Soundboard model 57 67

• All (one string and soundboard model) 70 77

how the model is parameterized so that it can be controlledfrom the music notation software.

Our previous work in designing computer simulationsof musical instruments has resulted in several applications,such as the classical guitar [39], the Renaissance lute, theTurkish ud [45], and the clavichord [29]. The two-manualharpsichord tackled in the current study is the most chal-lenging and complex instrument that we have yet investi-gated. As this kind of work is experimental, and the synthe-sis model must be refined by interactive listening, a systemis needed that is capable of making fast and efficient proto-types of the basic components of the system. Another non-trivial problem is the parameterization of the harpsichordsynthesizer. In a typical case, one basic component, such asthe vibrating string model, requires over 10 parameters sothat it can be used in a convincing simulation. Thus, since thefull harpsichord synthesizer implementation has three stringsets each having 56 strings, we need at least 1680 (= 10 ×3 × 56) parameters in order to control all individual stringsseparately.

Figure 11 shows a prototype of a harpsichord synthe-sizer. It contains three main parts. First, the top-most box(called “num-box” with the label “number-of-strings”) givesthe number of strings within each string set used by the syn-thesizer. This number can vary from 1 (useful for preliminarytests) to 56 (the full instrument). In a typical real-time sit-uation, this number can vary, depending on the polyphonyof the musical score to be realized, between 4 and 10. Thenext box of interest is called “string model.” It is a spe-cial abstraction box that contains a subwindow. The con-tents of this window are displayed in Figure 12. This abstrac-tion box defines a single string model. Next, Figure 11 showsthree “copy-synth-patch” boxes that determine the individ-

ual string sets used by the instrument. These sets are labeledas follows: “harpsy1/8-fb/,” “harpsy1/8-ff,” and “harpsy1/4-ff/.” Each string set copies the string model patch count times,where count is equal to the current number of strings (givenby the upper number-of-strings box). The rest of the boxesin the patch are used to mix the outputs of the string sets.

Figure 12 gives the definition of a single string model.The patch consists of two types of boxes. First, the boxes withthe name “pwsynth-plug” (the boxes with the darkest out-lines in grey-scale) define the parametric entry points thatare used by our control system. Second, the other boxes arelow-level DSP modules, realized in C++, that perform the ac-tual sample calculation and boxes which are used to initializethe DSP modules. The “pwsynth-plug” boxes point to mem-ory addresses that are continuously updated while the syn-thesizer is running. Each “pwsynth-plug” box has a label thatis used to build symbolic parameter pathnames. While the“copy-synth-patch” boxes (see the main patch of Figure 11)copy the string model in a loop, the system automaticallygenerates new unique pathnames by merging the label fromthe current “copy-synth-patch” box, the current loop index,and the label found in “pwsynth-plug” boxes. Thus, path-names like “harpsy1/8-fb/1/lfgain” are obtained, which refersto the lfgain (loss filter gain) of the first string of the 8′ backstring set of a harpsichord model called “harpsy1.”

5.3. Musical applications

The harpsichord synthesizer can be used as an electronic mu-sical instrument controlled either from a MIDI keyboard orfrom a sequencer software. Recently, some composers havebeen interested in using a formerly developed model-basedguitar synthesizer for compositions, which are either experi-mental in nature or extremely challenging for human players.


S

Score

Patch

Synth-box

S

+

SSS

VectorVectorVector

Accum-vectorAccum-vectorAccum-vector

SSS

harpsy1/4-f/harpsy1/8-ff/harpsy1/8-fb/PatchCountPatchCountPatchCount

Copy-synth-patchCopy-synth-patchCopy-synth-patch

A

String-model

Number of strings

56

Num-box

Figure 11: The top-level prototype of the harpsichord synthesizer in PWSynth. The patch defines one string model and the three string setsused by the instrument.

Another fascinating idea is to extend the range and timbre ofthe instrument. A version of the guitar synthesizer, that wecall the super guitar, has an extended range and a large num-ber of strings [46]. We plan to develop a similar extension ofthe harpsichord synthesizer.

In the current version of the synthesizer, the parametershave been calibrated based on recordings. One obvious ap-plication for a parametric synthesizer is to modify the timbreby deviating the parameter values. This can lead to extendedtimbres that belong to the same instrument family as theoriginal instrument or, in the extreme cases, to a novel vir-tual instrument that cannot be recognized by listeners. Oneof the most obvious subjects for modification is the decayrate, which is controlled with the coefficients of the loop fil-ter.

A well-known limitation of the harpsichord is its re-stricted dynamic range. In fact, it is a controversial issuewhether the key velocity has any audible effect on the soundof the harpsichord. The synthesizer easily allows the imple-mentation of an exaggerated dynamic control, where the keyvelocity has a dramatic effect on both the amplitude and thetimbre, if desired, such as in the piano or in the acoustic gui-tar. As the key velocity information is readily available, it canbe used to control the gain and the properties of a timbrecontrol filter (see Figure 2).

Luthiers who make musical instruments are interested in

modern technology and want to try physics-based synthesisto learn about the instrument. A synthesizer allows varyingcertain parameters in the instrument design, which are diffi-cult or impossible to adjust in the real instrument. For exam-ple, the point where the quill plucks the string is structurallyfixed in the harpsichord, but as it has a clear effect on thetimbre, varying it is of interest. In the current harpsichordsynthesizer, it would require the knowledge of the pluckingpoint and then inverse filtering its contribution from the ex-citation signal. The plucking point contribution can then beimplemented in the string model by inserting another feed-forward comb filter, as discussed previously in several works[7, 16, 17, 18]. Another prospect is to vary the location of thedamper. Currently, we do not have an exact model for thedamper, and neither is its location a parameter. Testing this isstill possible, because it is known that the nonideal function-ing of the damper is related to the nodal points of the strings,which coincide with the locations of the damper. The ripplyloss filter allows the imitation of this effect.

Luthiers are interested in the possibility of virtual proto-typing without the need for actually building many versionsof an instrument out of wood. The current synthesis modelmay not be sufficiently detailed for this purpose. A real-timeor near-real-time implementation of a physical model, whereseveral parameters can be adjusted, would be an ideal tool fortesting prototypes.


S

0.001Sig

Linear-iPS

NumbersNumber∗

1.01fgainsc

Pwsynth-plug

1

0.9948421fgain

Pwsynth-plug

S

NumbersNumber

+

A

Extra-sample1

S

Gain

SigCoef

OnepoleIntval1fcoef

Pwsynth-plugSz∧−1

0.5Ripple Ripdepth

Sig Delay

Ripple-delay-1g3 1/freq 1/fcoef 1fgain A

Initial-vals

0.0Ripdepth

Pwsynth-plug

1 Overfreq Intval

Pwsynth-plug

S

NumbersNumber

+

0.5Riprate

Pwsynth-plug

Trigg 0

Pwsynth-plug

S

Amp TrigSample Freq

Sample-player

freqsc 1

Pwsynth-plug

P1gain 0

Pwsynth-plug

SoundID 0

Pwsynth-plug

Figure 12: The string model patch. The patch contains the low-level DSP modules and parameter entry points used by the harpsichordsynthesizer.

6. CONCLUSIONS

This paper proposes signal-processing techniques for synthe-sizing harpsichord tones. A new extension to the loss filterof the waveguide synthesizer has been developed which al-lows variations in the decay times of neighboring partials.This filter will be useful also for the waveguide synthesis ofother stringed instruments. The fast-decaying modes of thesoundboard are incorporated in the excitation samples ofthe synthesizer, while the long-ringing modes at the middleand high frequencies are imitated using a reverberation al-gorithm. The calibration of the synthesis model is made al-most automatic. The parameterization and use of simple fil-ters also allow manual adjustment of the timbre. A physics-based synthesizer, such as the one described here, has severalmusical applications, the most obvious one being the usageas a computer-controlled musical instrument.

Examples of single tones and musical pieces synthesizedwith the synthesizer are available at http://www.acoustics.hut.fi/publications/papers/jasp-harpsy/.

ACKNOWLEDGMENTS

The work of Henri Penttinen has been supported by thePythagoras Graduate School of Sound and Music Research.The work of Cumhur Erkut is part of the EU project ALMA

(IST-2001-33059). The authors are grateful to B. Bank, P. A.A. Esquef, and J. O. Smith for their helpful comments. Spe-cial thanks go to H. Jarvelainen for her help in preparingFigure 5.

REFERENCES

[1] J. O. Smith, “Efficient synthesis of stringed musical instru-ments,” in Proc. International Computer Music Conference, pp.64–71, Tokyo, Japan, September 1993.

[2] M. Karjalainen and V. Valimaki, “Model-based analy-sis/synthesis of the acoustic guitar,” in Proc. Stockholm MusicAcoustics Conference, pp. 443–447, Stockholm, Sweden, July–August 1993.

[3] M. Karjalainen, V. Valimaki, and Z. Janosy, “Towards high-quality sound synthesis of the guitar and string instruments,”in Proc. International Computer Music Conference, pp. 56–63,Tokyo, Japan, September 1993.

[4] J. O. Smith and S. A. Van Duyne, “Commuted piano syn-thesis,” in Proc. International Computer Music Conference, pp.319–326, Banff, Alberta, Canada, September 1995.

[5] J. O. Smith, “Physical modeling using digital waveguides,”Computer Music Journal, vol. 16, no. 4, pp. 74–91, 1992.

[6] K. Karplus and A. Strong, “Digital synthesis of plucked stringand drum timbres,” Computer Music Journal, vol. 7, no. 2, pp.43–55, 1983.

[7] M. Karjalainen, V. Valimaki, and T. Tolonen, “Plucked-string models, from the Karplus-Strong algorithm to digital


waveguides and beyond,” Computer Music Journal, vol. 22,no. 3, pp. 17–32, 1998.

[8] F. Hubbard, Three Centuries of Harpsichord Making, HarvardUniversity Press, Cambridge, Mass, USA, 1965.


[10] E. L. Kottick, K. D. Marshall, and T. J. Hendrickson, “Theacoustics of the harpsichord,” Scientific American, vol. 264,no. 2, pp. 94–99, 1991.

[11] W. R. Savage, E. L. Kottick, T. J. Hendrickson, and K. D. Mar-shall, “Air and structural modes of a harpsichord,” Journal ofthe Acoustical Society of America, vol. 91, no. 4, pp. 2180–2189,1992.

[12] N. H. Fletcher, “Analysis of the design and performance ofharpsichords,” Acustica, vol. 37, pp. 139–147, 1977.

[13] J. Sankey and W. A. Sethares, “A consonance-based approachto the harpsichord tuning of Domenico Scarlatti,” Journal ofthe Acoustical Society of America, vol. 101, no. 4, pp. 2332–2337, 1997.

[14] B. Bank, “Physics-based sound synthesis of the piano,” M.S.thesis, Department of Measurement and Information Sys-tems, Budapest University of Technology and Economics, Bu-dapest, Hungary, 2000, published as Tech. Rep. 54, Laboratoryof Acoustics and Audio Signal Processing, Helsinki Universityof Technology, Espoo, Finland, 2000.

[15] B. Bank, V. Valimaki, L. Sujbert, and M. Karjalainen, “Effi-cient physics based sound synthesis of the piano using DSPmethods,” in Proc. European Signal Processing Conference,vol. 4, pp. 2225–2228, Tampere, Finland, September 2000.

[16] D. A. Jaffe and J. O. Smith, “Extensions of the Karplus-Strongplucked-string algorithm,” Computer Music Journal, vol. 7,no. 2, pp. 56–69, 1983.

[17] J. O. Smith, Techniques for digital filter design and system iden-tification with application to the violin, Ph.D. thesis, StanfordUniversity, Stanford, Calif, USA, 1983.

[18] V. Valimaki, J. Huopaniemi, M. Karjalainen, and Z. Janosy,“Physical modeling of plucked string instruments with appli-cation to real-time sound synthesis,” Journal of the Audio En-gineering Society, vol. 44, no. 5, pp. 331–353, 1996.

[19] B. Bank and V. Valimaki, “Robust loss filter design for digitalwaveguide synthesis of string tones,” IEEE Signal ProcessingLetters, vol. 10, no. 1, pp. 18–20, 2003.

[20] H. Fletcher, E. D. Blackham, and R. S. Stratton, “Quality ofpiano tones,” Journal of the Acoustical Society of America, vol.34, no. 6, pp. 749–761, 1962.

[21] S. A. Van Duyne and J. O. Smith, “A simplified approach tomodeling dispersion caused by stiffness in strings and plates,”in Proc. International Computer Music Conference, pp. 407–410, Arhus, Denmark, September 1994.

[22] D. Rocchesso and F. Scalcon, “Accurate dispersion simulationfor piano strings,” in Proc. Nordic Acoustical Meeting, pp. 407–414, Helsinki, Finland, June 1996.


[24] H. Jarvelainen, V. Valimaki, and M. Karjalainen, “Audibilityof the timbral effects of inharmonicity in stringed instrumenttones,” Acoustics Research Letters Online, vol. 2, no. 3, pp. 79–84, 2001.

[25] M. Karjalainen and J. O. Smith, “Body modeling techniquesfor string instrument synthesis,” in Proc. International Com-puter Music Conference, pp. 232–239, Hong Kong, China, Au-gust 1996.

[26] P. R. Cook, “Physically informed sonic modeling (PhISM):synthesis of percussive sounds,” Computer Music Journal, vol.21, no. 3, pp. 38–49, 1997.

[27] D. Rocchesso, “Multiple feedback delay networks for soundprocessing,” in Proc. X Colloquio di Informatica Musicale, pp.202–209, Milan, Italy, December 1993.

[28] H. Penttinen, M. Karjalainen, T. Paatero, and H. Jarvelainen,“New techniques to model reverberant instrument body re-sponses,” in Proc. International Computer Music Conference,pp. 182–185, Havana, Cuba, September 2001.

[29] V. Valimaki, M. Laurson, and C. Erkut, “Commuted waveg-uide synthesis of the clavichord,” Computer Music Journal, vol.27, no. 1, pp. 71–82, 2003.

[30] R. Vaananen, V. Valimaki, J. Huopaniemi, and M. Kar-jalainen, “Efficient and parametric reverberator for roomacoustics modeling,” in Proc. International Computer Mu-sic Conference, pp. 200–203, Thessaloniki, Greece, September1997.

[31] J. M. Jot and A. Chaigne, “Digital delay networks for design-ing artificial reverberators,” in Proc. 90th Convention AudioEngineering Society, Paris, France, February 1991.

[32] C. Erkut, V. Valimaki, M. Karjalainen, and M. Laurson, “Ex-traction of physical and expressive parameters for model-based sound synthesis of the classical guitar,” in Proc. 108thConvention Audio Engineering Society, p. 17, Paris, France,February 2000.

[33] X. Serra and J. O. Smith, “Spectral modeling synthesis: asound analysis/synthesis system based on a deterministic plusstochastic decomposition,” Computer Music Journal, vol. 14,no. 4, pp. 12–24, 1990.


[35] V. Valimaki and T. Tolonen, “Development and calibration ofa guitar synthesizer,” Journal of the Audio Engineering Society,vol. 46, no. 9, pp. 766–778, 1998.

[36] T. Tolonen, “Model-based analysis and resynthesis of acousticguitar tones,” M.S. thesis, Laboratory of Acoustics and AudioSignal Processing, Department of Electrical and Communica-tions Engineering, Helsinki University of Technology, Espoo,Finland, 1998, Tech. Rep. 46.

[37] H. Jarvelainen and T. Tolonen, “Perceptual tolerances for de-cay parameters in plucked string synthesis,” Journal of the Au-dio Engineering Society, vol. 49, no. 11, pp. 1049–1059, 2001.

[38] H. Jarvelainen and M. Karjalainen, “Perception of beating andtwo-stage decay in dual-polarization string models,” in Proc.International Symposium on Musical Acoustics, Mexico City,Mexico, December 2002.

[39] M. Laurson, C. Erkut, V. Valimaki, and M. Kuuskankare,“Methods for modeling realistic playing in acoustic guitarsynthesis,” Computer Music Journal, vol. 25, no. 3, pp. 38–49,2001.

[40] W. G. Gardner, “Reverberation algorithms,” in Applications ofDigital Signal Processing to Audio and Acoustics, M. Kahrs andK. Brandenburg, Eds., pp. 85–131, Kluwer Academic, Boston,Mass, USA, 1998.

[41] J. D. Markel and A. H. Gray Jr., Linear Prediction of Speech,Springer-Verlag, Berlin, Germany, 1976.

[42] M. Laurson and M. Kuuskankare, “PWSynth: a Lisp-basedbridge between computer assisted composition and soundsynthesis,” in Proc. International Computer Music Conference,pp. 127–130, Havana, Cuba, September 2001.

[43] M. Laurson and M. Kuuskankare, “PWGL: a novel visuallanguage based on Common Lisp, CLOS and OpenGL,” inProc. International Computer Music Conference, pp. 142–145,Gothenburg, Sweden, September 2002.


[44] M. Kuuskankare and M. Laurson, “ENP2.0: a music notationprogram implemented in Common Lisp and OpenGL,” inProc. International Computer Music Conference, pp. 463–466,Gothenburg, Sweden, September 2002.

[45] C. Erkut, M. Laurson, M. Kuuskankare, and V. Valimaki,“Model-based synthesis of the ud and the Renaissance lute,” inProc. International Computer Music Conference, pp. 119–122,Havana, Cuba, September 2001.

[46] M. Laurson, V. Valimaki, and C. Erkut, “Production of vir-tual acoustic guitar music,” in Proc. Audio Engineering Society22nd International Conference on Virtual, Synthetic and Enter-tainment Audio, pp. 249–255, Espoo, Finland, June 2002.

Vesa Valimaki was born in Kuorevesi, Fin-land, in 1968. He received the M.S. de-gree, the Licentiate of Science degree, andthe Doctor of Science degree, all in elec-trical engineering from Helsinki Universityof Technology (HUT), Espoo, Finland, in1992, 1994, and 1995, respectively. He waswith the HUT Laboratory of Acoustics andAudio Signal Processing from 1990 to 2001.In 1996, he was a Postdoctoral Research Fel-low with the University of Westminster, London, UK. During theacademic year 2001-2002 he was Professor of signal processing atthe Pori School of Technology and Economics, Tampere Universityof Technology (TUT), Pori, Finland. He is currently Professor ofaudio signal processing at HUT. He was appointed Docent in sig-nal processing at the Pori School of Technology and Economics,TUT, in 2003. His research interests are in the application of digi-tal signal processing to music and audio. Dr. Valimaki is a SeniorMember of the IEEE Signal Processing Society and is a Member ofthe Audio Engineering Society, the Acoustical Society of Finland,and the Finnish Musicological Society.

Henri Penttinen was born in Espoo, Fin-land, in 1975. He received the M.S. degreein electrical engineering from Helsinki Uni-versity of Technology (HUT), Espoo, Fin-land, in 2003. He has worked at the HUTLaboratory of Acoustics and Signal Process-ing since 1999 and is currently a Ph.D. stu-dent there. His main research interests aresignal processing algorithms, real-time au-dio applications, and musical acoustics. Mr.Penttinen is also active in music through playing, composing, andperforming.

Jonte Knif was born in Vaasa, Finland, in1975. He is currently studying music tech-nology at the Sibelius Academy, Helsinki,Finland. Prior to this he studied the harpsi-chord at the Sibelius Academy for five years.He has built and designed many histori-cal keyboard instruments and adaptationssuch as an electric clavichord. His presentinterests include also loudspeaker and stu-dio electronics design.

Mikael Laurson was born in Helsinki,Finland, in 1951. His formal training atthe Sibelius Academy consists of a guitardiploma (1979) and a doctoral dissertation(1996). In 2002, he was appointed Docentin music technology at Helsinki Univer-sity of Technology, Espoo, Finland. Betweenthe years 1979 and 1985 he was active asa guitarist. Since 1989 he has been work-ing at the Sibelius Academy as a Researcherand Teacher of computer-aided composition. After conceiving thePatchWork (PW) programming language (1986), he started a closecollaboration with IRCAM resulting in the first PW release in 1993.After 1993 he has been active as a developer of various PW user li-braries. Since the year 1999, Dr. Laurson has worked in a projectdealing with physical modeling and sound synthesis control fundedby the Academy of Finland and the Sibelius Academy InnovationCentre.

Cumhur Erkut was born in Istanbul,Turkey, in 1969. He received the B.S. and theM.S. degrees in electronics and communi-cation engineering from the Yildiz Techni-cal University, Istanbul, Turkey, in 1994 and1997, respectively, and the Doctor of Sci-ence degree in electrical engineering fromHelsinki University of Technology (HUT),Espoo, Finland, in 2002. Between 1998 and2002, he worked as a Researcher at the HUTLaboratory of Acoustics and Audio Signal Processing. He is cur-rently a Postdoctoral Researcher in the same institution, wherehe contributes to the EU-funded research project “Algorithms forthe Modelling of Acoustic Interactions” (ALMA, European projectIST-2001-33059). His primary research interests are model-basedsound synthesis and musical acoustics.


Multirate Simulations of String Vibrations IncludingNonlinear Fret-String Interactions Usingthe Functional Transformation Method

L. TrautmannMultimedia Communications and Signal Processing, University of Erlangen-Nuremberg, Cauerstrasse 7, 91058 Erlangen, GermanyEmail: [email protected]

Laboratory of Acoustics and Audio Signal Processing, Helsinki University of Technology, P.O. Box 3000, 02015 Espoo, FinlandEmail: [email protected]

R. RabensteinMultimedia Communications and Signal Processing, University of Erlangen-Nuremberg, Cauerstrasse 7, 91058 Erlangen, GermanyEmail: [email protected]


The functional transformation method (FTM) is a well-established mathematical method for accurate simulations of multidimen-sional physical systems from various fields of science, including optics, heat and mass transfer, electrical engineering, and acoustics.This paper applies the FTM to real-time simulations of transversal vibrating strings. First, a physical model of a transversal vibrat-ing lossy and dispersive string is derived. Afterwards, this model is solved with the FTM for two cases: the ideally linearly vibratingstring and the string interacting nonlinearly with the frets. It is shown that accurate and stable simulations can be achieved withthe discretization of the continuous solution at audio rate. Both simulations can also be performed with a multirate approachwith only minor degradations of the simulation accuracy but with preservation of stability. This saves almost 80% of the compu-tational cost for the simulation of a six-string guitar and therefore it is in the range of the computational cost for digital waveguidesimulations.

Keywords and phrases: multidimensional system, vibrating string, partial differential equation, functional transformation, non-linear, multirate approach.

1. INTRODUCTION

Digital sound synthesis methods can mainly be categorizedinto classical direct synthesis methods and physics-basedmethods [1]. The first category includes all kinds of soundprocessing algorithms like wavetable, granular and subtrac-tive synthesis, as well as abstract mathematical models, likeadditive or frequency modulation synthesis. What is com-mon to all these methods is that they are based on the soundto be (re)produced.

The physics-based methods, also called physical model-ing methods, start at the physics of the sound productionmechanism rather than at the resulting sound. This approachhas several advantages over the sound-based methods.

(i) The resulting sound and especially transitions be-tween successive notes always sound acoustically realistic asfar as the underlying model is sufficiently accurate.

(ii) Sound variations of acoustical instruments due to dif-

ferent playing techniques or different instruments within oneinstrument family are described in the physics-based meth-ods with only a few parameters. These parameters can be ad-justed in advance to simulate a distinct acoustical instrumentor they can be controlled by the musician to morph betweenreal world instruments to obtain more degrees of freedom inthe expressiveness and variability.

The second item makes physical modeling methods quiteuseful for multimedia applications where only a very limitedbandwidth is available for the transmission of music as, forexample, in mobile phones. In these applications, the physi-cal model has to be transferred only once and afterwards it issufficient to transfer only the musical score while keeping thevariability of the resulting sound.

The starting points for the various existing physical mod-eling methods are always physical models varying for a cer-tain vibrating object only in the model accuracies. The appli-cation of the basic laws of physics to an existing or imaginary


vibrating object results in continuous-time, continuous-space models. These models are called initial-boundary-value problems and they contain a partial differential equa-tion (PDE) and some initial and boundary conditions. Thediscretization approaches to the continuous models and thedigital realizations are different for the single physical mod-eling methods.

One of the first physical modeling algorithm for the sim-ulation of musical instruments was made by Hiller and Ruiz1971 in [2] with the finite difference method. It directly dis-cretizes the temporal and spatial differential operators of thePDE to finite difference terms. On the one hand, this ap-proach is computationally very demanding; since temporaland spatial sampling intervals have to be chosen small foraccurate simulations. Furthermore, stability problems occurespecially in dispersive vibrational objects if the relationshipbetween temporal and spatial sampling intervals is not cho-sen properly [3]. On the other hand, the finite differencemethod is quite suitable for studies in which the vibration hasto be evaluated in a dense spatial grid. Therefore, the finitedifference method has mainly been used for academic stud-ies rather than for real-time applications (see, e.g., [4, 5]).However, the finite difference method has recently becomemore popular also for real-time applications in conjunctionwith other physical modeling methods [6, 7].

A mathematically similar discretization approach is usedin mass-spring models that are closely related to the finiteelement method. In this approach, the vibrating structureis reduced to a finite number of mass points that are inter-connected by springs and dampers. One of the first systemsfor the simulation of musical instruments was the CORDISsystem which could be realized in real time on a specializedprocessor [8]. The finite difference method, as well as themass-spring models, can be viewed as direct discretizationapproaches of the initial-boundary-value problems. Despitethe stability problems, they are very easy to set up, but theyare computationally demanding.

In modal synthesis, first introduced in [9], the PDE isspatially discretized at non necessarily equidistant spatialpoints, similar to the mass-spring models. The interconnec-tions between these discretized spatial points reflect the phys-ical behavior of the structure. This discretization reduces thedegrees of freedom for the vibration to the number of spatialpoints which is directly transferred to the same number oftemporal modes the structure can vibrate in. The reductiondoes not only allow the calculation of the modes of simplestructures, but it can also handle vibrational measurementsof more complicated structures at a finite number of spatialpoints [10]. A commercial product of the modal synthesis,Modalys, is described, for example, in [11]. For a review ofmodal synthesis and a comparison to the functional trans-formation method (FTM), see also [12].

The commercially and academically most popular phys-ical modeling method of the last two decades was the digitalwaveguide method (DWG) because of its computational ef-ficiency. It was first introduced in [13] as a physically inter-preted extension of the Karplus-Strong algorithm [14]. Ex-tensions of the DWG are described, for example, in [15, 16,

17, 18]. The DWG first simplifies the PDE to the wave equa-tion which has an analytical solution in the form of a for-ward and backward traveling wave, called d’Alembert solu-tion. It can be realized computationally very efficient withdelay lines. The sound effects like damping or dispersion oc-curring in the vibrating structure are included in the DWG bylow-order digital filters concentrated in one point of the de-lay line. This procedure ensures the computational efficiency,but the implementation looses the direct connection to thephysical parameters of the vibrating structure.

The focus of this article is the FTM. It was first intro-duced in [19] for the heat-flow equation and first used fordigital sound synthesis in [20]. Extensions to the basic modelof a vibrating string and comparisons between the FTM andthe above mentioned physical modeling methods are given,for example, in [12]. In the FTM, the initial-boundary-valueproblem is first solved analytically by appropriate functionaltransformations before it is discretized for computer simula-tions. This ensures a high simulation accuracy as well as aninherent stability. One of the drawbacks of the FTM is so farits computational load, which is about five times higher thanthe load of the DWG [21].

This article extends the FTM by applying a multirate ap-proach to the discrete realization of the FTM, such that thecomputational complexity is significantly reduced. The ex-tension is shown for the linearly vibrating string as well asfor the nonlinear limitation of the string vibration by a fret-string interaction occurring in slapbass synthesis.

The article is organized as follows. Section 2 derives thephysical model of a transversal vibrating, dispersive, andlossy string in terms of a scalar PDE and initial and boundaryconditions. Furthermore, a model for a nonlinear fret-stringinteraction is given. These models are solved in Section 3with the FTM in continuous time and continuous space.Section 4 discretizes these solutions at audio rate and derivesan algorithm to guarantee stability even for the nonlineardiscrete system. A multirate approach is used in Section 5for the simulation of the continuous solution to save com-putational cost. It is shown that this multirate approach alsoworks for nonlinear systems. Section 6 compares the audiorate and the multirate solutions with respect to the simula-tion accuracy and the computational complexity.

2. PHYSICAL MODELS

In this Section, a transversal vibrating, dispersive, and lossystring is analyzed using the basic laws of physics. From thisanalysis, a scalar PDE is derived in Section 2.1. Section 2.2defines the initial states of the vibration, as well as the fixingsof the string at the nut and the bridge end, in terms of ini-tial and boundary conditions, respectively. In Section 2.3, thelinear model is extended with a deflection-dependent forcesimulating the nonlinear interaction between the string andthe frets, well known as slap synthesis [22].

In all these models, the strings are assumed to be homo-geneous and isotropic. Furthermore, the smoothness of theirsurfaces may not permit stress concentrations. The deflec-tions of the strings are assumed to be small enough to change

Multirate Simulations of String Vibrations Using the FTM 951

neither the cross section area nor the tension on the string sothat the string itself behaves linearly.

2.1. Linear partial differential equationderived by basic laws of physics

The string under examination is characterized by its ma-terial and geometrical parameters. The material parametersare given by the mass density ρ, the Young’s modulus E, thelaminar air flow damping coefficient d1, and the viscoelasticdamping coefficient d3. The geometrical parameters consistof the length l, the cross section area A and the moment ofinertia I . Furthermore, a tension Ts is applied to the string inaxial direction. Considering only a string segment betweenthe spatial positions xs and xs + ∆x, the forces on this stringsegment can be analyzed in detail. They consist of the restor-ing force fT caused by the tension Ts, the bending force fBcaused by the stiffness of the string, the laminar air flow forcefd1, the viscoelastic damping force fd3 (modeled here withoutmemory), and the external excitation force fe. They result atxs in

fT(xs, t

) = Ts sin(ϕ(xs, t

)) ≈ Tsϕ(xs, t

), (1a)

fB(xs, t

) = −EIb′(xs, t), (1b)

fd1(xs, t

) = d1∆xv(xs, t

), (1c)

fd3(xs, t

) = d3 sin(ϕ(xs, t

)) ≈ d3ϕ(xs, t

), (1d)

where ϕ(xs, t) is the slope angle of the string, b(xs, t) is thecurvature of the string, v(xs, t) is the velocity, and prime de-notes spatial derivative and dot denotes temporal derivative.Note that in (1a) and in (1d) it is assumed that the amplitudeof the string vibration is small so that the sine function canbe approximated by its argument. Similar equations can befound for the forces at the other end of the string segment atxs + ∆x.

All these forces are combined by the equation of motionto

ρA∆xv(xs, t

) = fy(xs, t

)+ fd3

(xs, t

)− fy(xs + ∆x, t

)− fd3

(xs + ∆x, t

)− fd1(xs, t

)+ fe

(xs, t

),(2)

where fy = fT + fB. Setting ∆x → 0 and solving (2) for theexcitation force density fe1(xs, t) = fe(xs, t)δ(x − xs), fourcoupled equations are obtained, that are valid not only at thestring segment xs ≤ x ≤ xs + ∆x but also at the whole string0 ≤ x ≤ l. δ(x) denotes the impulse function.

fe1(x, t) = ρAv(x, t) + d1v(x, t)− f ′y (x, t)− d3b(x, t), (3a)

fy(x, t) = Tsϕ(x, t)− EIb′(x, t), (3b)

b(x1, t

) = ϕ′(x, t), (3c)

v′(x1, t

) = ϕ(x, t). (3d)

An extended version of the derivation of (3) can be foundin [12]. The four coupled equations (3) can be simplifiedto one scalar PDE with only one output variable. All thedependent variables in (3a) can be written in terms of the

string deflection y(x, t) by replacing v(x, t) with y(x, t) andϕ(x, t) = y′(x, t) from (3d) and with (3b) and (3c). Then (3)can be written in a general notation of scalar PDEs

Dy(x, t)

+ L

y(x, t)

+ W

y(x, t)

= fe1(x, t), x ∈ [0, l], t ∈ [0,∞),

(4a)

with

Dy(x, t)

= ρAy(x, t) + d1 y(x, t),

Ly(x, t)

= −Tsy′′(x, t) + EIBy

′′′′(x, t),

Wy(x, t)

= WD

WLy(x, t)

= −d3 y′′(x, t).

(4b)

As it can be seen in (4), the operator D contains only tem-poral derivatives, the operator L has only spatial deriva-tives, and the operator W consists of mixed temporal andspatial derivatives. The PDE is valid only on the string be-tween x = 0 and x = l and for all positive times. Equation(4) forms a continuous-time, continuous-space PDE. For aunique solution, initial and boundary conditions must begiven as specified in the next section.

2.2. Initial and boundary conditions

Initial conditions define the initial state of the string at timet = 0. This definition is written in the general operator nota-tion with

fTiy(x, t)

=[y(x, 0)

y(x, 0)

]= 0, x ∈ [0, l], t = 0. (5)

Since the scalar PDE (4) is of second order with respect totime, only two initial conditions are needed. They are chosenarbitrarily by the initial deflection and the initial velocity ofthe string as seen in (5). For musical applications, it is a rea-sonable assumption that the initial states of the strings vanishat time t = 0 as given in (5). Note that this does not preventthe interaction between successively played notes since thetime is not set to zero for each note. Thus, this kind of initialcondition is only used for, for example, the beginning of apiece of music.

In addition to the initial conditions, also the fixings ofthe string at both ends must be defined in terms of bound-ary conditions. In most stringed instruments, the strings arenearly fixed at the nut end (x = x0 = 0) and transfer energyat the other end (x = x1 = l) via the bridge to the resonantbody [2]. For some instruments (e.g., the piano) it is also ajustified assumption, that the bridge fixing can be modeledto be ideally rigid [23]. Then the boundary conditions aregiven by

fTbiy(x, t)

=[y(xi, t

)y′′(xi, t

)]= 0, i ∈ 0, 1, t ∈ [0,∞). (6)

It can be seen from (6) that the string is assumed to be fixed,allowed to pivot at both ends, such that the deflection y andthe curvature b = y′′ must vanish. These are boundary con-ditions of first kind. For simplicity, there is no energy fed


PDEIC, BC

L· ODEBC

T · Algebraicequation

DiscreteMD TFM

Discrete1−D TFM

Discretesolution

MD TFMReordering

Discretization

T −1·z−1·

Figure 1: Procedure of the FTM solving initial boundary value problems defined in form of PDEs, IC, and BC.

into the system via the boundary, resulting in homogeneousboundary conditions.

The PDE (4), in conjunction with the initial (5) andboundary conditions (6), forms the linear continuous-time continuous-space initial-boundary-value problem to besolved and simulated.

2.3. Nonlinear extension to the linear modelfor slap synthesis

Nonlinearities are an important part in the sound produc-tion mechanisms of musical instruments [23]. One exampleis the nonlinear interaction of the string with the frets, wellknown as slap synthesis. This effect was modeled first for theDWG in [22] as a nonlinear amplitude limitation. For theFTM, the effect was already applied to vibrating strings in[24].

A simplified model for this interaction interprets the fretas a spring with a high stiffness coefficient Sfret acting at oneposition xf as a force ff on the string at time instances wherethe string is in contact with the fret. Since this force dependson the string deflection, it is nonlinear, defined with

ff(xf , t, y, yf

)

=Sfret

(y(xf , t

)− yf(xf , t

)), for y

(xf , t

)− yf(xf , t

)> 0,

0, for y(xf , t

)− yf(xf , t

) ≤ 0.

(7)

The deflection of the fret from the string rest position is de-noted with yf . The PDE (4) becomes nonlinear by adding theslap force ff to the excitation function fe1(x, t). Thus, a linearand a nonlinear system for the simulation of the vibratingstring is derived. Both systems are solved in the next sectionswith the FTM.

3. CONTINUOUS SOLUTIONS USING THE FTM

To obtain a model that can be implemented in the computer,the continuous initial-boundary-value problem has to bediscretized. Instead of using a direct discretization approachas described in Section 1, the continuous analytical solutionis derived first, which is discretized subsequently. This proce-dure is well known from the simulation of one-dimensionalsystems like electrical networks. It has several advantages in-cluding simulation accuracy and guaranteed stability.

The outline of the FTM is given in Figure 1. First, thePDE with initial conditions (IC) and boundary conditions

(BC) is Laplace transformed (L·) with respect to timeto derive a boundary-value problem (ODE, BC). Then aso-called Sturm-Liouville transformation (T ·) is used forthe spatial variable to obtain an algebraic equation. Solvingfor the output variable results in a multidimensional (MD)transfer function model (TFM). It is discretized and by ap-plying the inverse Sturm-Liouville transformation T −1·and the inverse z-transformation z−1· it results in the dis-cretized solution in the time and space domain.

The impulse-invariant transformation is used for the dis-cretization shown in Figure 1. It is equivalent to the calcu-lation of the continuous solution by inverse transformationinto the continuous time and space domain with subsequentsampling. The calculation of the continuous solution is pre-sented in Sections 3.1 to 3.5, the discretization is shown inSections 4 and 5.

For the nonlinear system, the transformations cannot ob-viously result in a TFM. Therefore, the procedure has to bemodified slightly, resulting in an MD implicit equation, de-scribed in Section 3.6.

3.1. Laplace transformation

As known from linear electrical network theory, the Laplacetransformation removes the temporal derivatives in linearand time-invariant (LTI) systems and includes, due to thedifferentiation theorem, the initial conditions as additiveterms (see, e.g., [25]). Since first- and second-order timederivatives occur in (4) and the initial conditions (5) are ho-mogeneous, the application of the Laplace transformation tothe initial boundary value problem derived in Section 2 re-sults in

dD(s)Y(x, s) + LY(x, s)

+ wD(s)WL

Y(x, s)

= Fe1(x, s), x ∈ [0, l],

(8a)

fTbiY(x, s) = 0, i ∈ 0, 1. (8b)

The Laplace transformed functions are written with capitalletters and the complex temporal frequency variable is de-noted by s = σ + jω. It can be seen in (8a) that the temporalderivatives of (4a) are replaced with scalar multiplication ofthe functions

dD(s) = ρAs2 + d1s, wD(s) = −d3s. (8c)

Thus, the initial boundary value problem (4), (5), and (6) isreplaced with the boundary-value problem (8) after Laplacetransformation.


3.2. Sturm-Liouville transformation

The transformation of the spatial variable should have thesame properties as the Laplace transformation has for thetime variable. It should remove the spatial derivatives and itshould include the boundary conditions as additive terms.Unfortunately, there is no unique transformation availablefor this task due to the finite spatial definition range in con-trast to the infinite time axis. That calls for a determinationof the spatial transformation at hand, depending on the spa-tial differential operator and the boundary conditions. Sinceit leads to an eigenvalue problem first solved for simplifiedproblems by Sturm and Liouville between 1836 and 1838,this transformation is called a Sturm-Liouville transforma-tion (SLT) [26]. Mathematical details of the SLT applied toscalar PDEs can be found in [12].

The SLT is defined by

TY(x, s)

= Y(µ, s) =∫ l

0K(µ, x)Y(x, s)dx. (9)

Note that there is a finite integration range in (9) in contrastto the Laplace transformation. The transformation kernelsK(µ, x) of the SLT are obtained as the set of eigenfunctions ofthe spatial operator LW = L + WL with respect to the bound-ary conditions (8b). The corresponding eigenvalues are de-noted by β4

µ(s) where βµ(s) is the discrete spatial frequencyvariable (see, e.g., [12] for details).

For the boundary-value problem defined in (8) with theoperators given in (4b), the transformation kernels and thediscrete spatial frequency variables result in

K(µ, x) = sin(µπ

lx)

, µ ∈ N, (10a)

β4µ(s) = EI

(µπ

l

)4

− (Ts + d3s)(µπ

l

)2

. (10b)

Thus, the SLT can be interpreted as an extended Fourier se-

ries decomposition.

3.3. Multidimensional transfer function model

Applying the SLT (9) to the boundary-value problem (8) andsolving for the transformed output variable Y(µ, s) results inthe MD TFM

Y(µ, s) = 1dD(s) + β4

µ(s)Fe(µ, s). (11)

Hence, the transformed input forces F(µ, s) are related viathe MD transfer function given in (11) to the transformedoutput variable Y(µ, s). The denominator of the MD TFMdepends quadratically on the temporal frequency variable sand to the power of four on the spatial frequency variable βµ.This is based on the second-order temporal and fourth-orderspatial derivatives occurring in the scalar PDE (4). Thus, thetransfer function is a two-pole system with respect to timefor each discrete spatial eigenvalue βµ.

3.4. Inverse transformations

As explained at the beginning of Section 3, the continuoussolution in the time and space domain is now calculated byusing inverse transformations.

Inverse SLT

The inverse SLT is defined by an infinite sum over all discreteeigenvalues βµ with

Y(x, s) = T −1Y(µ, s) =∑

µ

1Nµ

Y(µ, s)K(µ, x). (12)

The inverse transformation kernel K(µ, x) and the inversespatial frequency variable βµ are the same eigenfunctions andeigenvalues as for the forward transformation due to the self-adjointness of the spatial operators L and WL (see [12] for de-tails). Thus, the inverse SLT can be evaluated at each spatialposition by evaluating the infinite sum. Since only quadraticterms of µ occur in the denominator, it is sufficient to sumover positive values of µ and double the result to account forthe negative values. The norm factor results in that case inNµ = l/4.

Inverse Laplace transformation

It can be seen from (11) and (8c), (10b) that the transferfunctions consist of two-pole systems with conjugate com-plex pole pairs for each discrete spatial eigenvalue βµ. There-fore the inverse Laplace transformation results for each spa-tial frequency variable in a damped sinusoidal term, calledmode.

3.5. Continuous solution

After applying the inverse transformations to the MD TFM,the continuous solution results in

y(x, t) = 4ρAl

∞∑µ=1

(1ωµ

eσµt sin(ωµt

)∗ fe(x, t)

)K(µ, x)δ−1(t).

(13)The step function, denoted by δ−1(t), is used since the solu-tion is only valid for positive time instances; ∗ means tem-poral convolution. fe(x, t) is the spatially transformed exci-tation force, derived by inserting fe1 into (9). The angularfrequencies ωµ, as well as their corresponding damping co-efficients σµ, can be calculated from the poles of the transferfunction model (11). They directly depend on the physicalparameters of the string and can be expressed by

ωµ =√√√√( EI

ρA−(d3

2ρA

)2)(

µπ

l

)4

+

(Ts

ρA− d1d3

2(ρA)2

)(µπ

l

)2

−(d1

2ρA

)2

,

σµ = − d1

2ρA− d3

2ρA

(µπ

l

)2

.

(14)

Thus, an analytical continuous solution (13), (14) of the ini-tial boundary value problem (4), (5), (6) is derived withouttemporal or spatial derivatives.


3.6. Implicit equation for slap synthesis

The PDE (4) becomes nonlinear by adding the solution-dependent slap force ff (xf , t, y, yf ) in (7) to the right-handside of the linear PDE. Obviously, the application of theLaplace transformation and the SLT to the nonlinear initial-boundary-value problem cannot lead to an MD TFM, sincea TFM always requires linearity. However, assuming that thenonlinearity can be represented as a finite power series andthat the nonlinearity does not contain spatial derivatives,both transformations can be applied to the system [12]. With(7), both premises are given such that the slap force can alsobe transformed into the frequency domains. The Y(x, s)-dependency of Ff can be expressed with (12) in terms ofY(ν, s) to be consistently in the spatial frequency domain.Then an MD implicit equation is derived in the temporal andspatial frequency domain

Y(µ, s) = 1dD(s) + β4

µ(s)

(Fe(µ, s) + Ff

(µ, s, Y(ν, s)

)). (15)

Note that the different argument ν in the output dependenceof Ff (µ, s, Y(ν, s)) denotes an interaction between all modescaused by the nonlinear slap force. Details can be found in[12].

Since the transfer functions in (11) and (15) are the same,also the spatial transformation kernels and frequency vari-ables stay the same as in the linear case. Thus, also the tem-poral poles of (15) are the same as in the MD TFM (11) andthe continuous solution results in the implicit equation

y(x, t) = 4ρAl

∞∑µ=1

(1ωµ

eσµt sin(ωµt

)

∗ ( fe(x, t) + ff(µ, t, y(ν, t)

)))

× K(µ, x)δ−1(t),

(16)

with ωµ and σµ given in (14). It is shown in the next sectionsthat this implicit equation is turned into explicit ones by ap-plying different discretization schemes.

4. DISCRETIZATION AT AUDIO RATE

This section describes the discretization of the continuoussolutions for the linear and the nonlinear cases. It is per-formed at audio rate, for example with sampling frequencyfs = 1/T = 44.1 kHz, where T denotes the sampling interval.The discrete realization is shown as it can be implementedin the computer. For the nonlinear slap synthesis, some ex-tensions of the discrete realization are required and, further-more, the stability of the entire system must be controlled.

4.1. Discretization of the linear MD model

The discrete realization of the MD TFM (11) consists of athree-step procedure performed below:

(1) discretization with respect to time,(2) discretization with respect to space,(3) inverse transformations.

Discretization with respect to time

Discretizing the time variable with t = kT , k ∈ N and assum-ing an impulse-invariant system, an s-to-z mapping is ap-plied to the MD TFM (11) with z = e−sT . This procedure di-rectly leads to an MD TFM with the discrete-time frequencyvariable z:

Yd(µ, z) = T(1/ρAωµ

)zeσµT sin

(ωµT

)z2 − 2zeσµT cos

(ωµT

)+ e2σµT

Fde (µ, z). (17)

Superscript d denotes discretized variables. The angular fre-quency variables and the damping coefficients are given in(14). Pole-zero diagrams of the continuous and the discretesystem are shown in [27].

Discretization with respect to space

For the spatial frequency domain, there is no need for dis-cretization, since the spatial frequency variable is already dis-crete. However, a discretization has to be applied to the spa-tial variable x. This spatial discretization consists of simplyevaluating the analytical solution (13) at a limited numberof arbitrary spatial positions xa on the string. They can bechosen to be the pickup positions or the fret positions, re-spectively.

Inverse transformations

The inverse SLT cannot be performed any longer for an infi-nite number of µ due to the temporal discretization. To avoidtemporal aliasing the number must be limited to µT such that|ωµTT| ≤ π, which also ensures realizable computer imple-mentations. Effects of this truncation are described in [12].The most important conclusion is that the sound quality isnot effected since only modes beyond the audible range areneglected.

By applying the shifting theorem, the inverse z-trans-formation results in µT second-order recursive systems inparallel, each one realizing one vibrational mode of thestring. The structure is shown with solid lines in Figure 2.

This linear structure can be implemented directly in thecomputer since it only includes delay elements z−1, adders,and multipliers. Due to (14), the coefficients of the second-order recursive systems in Figure 2 only depend on the phys-ical parameters of the vibrating string.

4.2. Extensions for slap synthesis

The discretization procedure for the nonlinear slap synthe-sis can be performed with the same three steps described inSection 4.1. Here, the discretized MD TFM is extended withthe output-dependent slap force Fd

f (µ, z, Yd(ν, z)) and thusstays implicit. However, after discretization with respect tospace as described above, and inverse z-transformation withapplication of the shifting theorem, the resulting recursivesystems are explicit. This is caused by the time shift of the ex-citation function due to the multiplication with z in the nu-merator of (17). Therefore, the linear system given with solidlines in Figure 2 is extended with feedback paths denoted bydashed lines from the output to additional inputs between


f de (k) f d

f (k)

NL

yd(xa, k)+

· · ·

...

+

K(µT , xa)NµT

K(1, xa)N1

z−1+z−1

c1,e(1) c1,s(1)

z−1 + z−1

−e2σ1T 2eσ1T cos(ω1T)

c1,e(µT ) c1,s(µT )

−e2σµT T 2eσµTT cos(ωµT T)

Figure 2: Basic structure of the FTM simulations derived from the linear initial boundary value problem (4), (5), and (6) with severalsecond-order resonators in parallel. Solid lines represent basic linear system, while dashed lines represent extensions for the nonlinear slapforce.

z−1 z−1+ +

yd1 (µT , k) yd

1,s(µT , k)

−e2σµT T

2eσµT T cos(ωµT T)

yd(µT , k)

c1,e(µT ) f de (k) c1,s(µT ) f d

f (k)yd2 (µT , k)

Figure 3: Recursive system realization of one mode of the transversal vibrating string.

the unit delays of all recursive systems. The feedback pathsare weighted with the nonlinear (NL) function (7).

4.3. Guaranteeing stability

The discretized LTI systems derived in Section 4.1 are inher-ently stable as long as the underlying continuous physicalmodel is stable due to the use of the impulse-invariant trans-formation [25]. However, for the nonlinear system derived inSection 4.2 this stability consideration is not valid any more.It might happen that the passive slap force of the continu-ous system becomes active with the direct discretization ap-proach [24]. To preserve the passivity of the system, and thusthe inherent stability, the slap force must be limited such thatthe discrete impulses correspond to their continuous coun-terparts.

The instantaneous energy of the string vibration can becalculated by monitoring the internal states of the modal de-flections [12]. The slap force limitation can then be obtaineddirectly from the available internal states. For an illustrationof these internal states, the recursive system of one mode µTis given in Figure 3.

The variables c1,e(µT) and c1,s(µT), denoting the weight-ings of the linear excitation force f d

e (k) at xe and of the slapforce f d

f (k) at xf , respectively, result with (9), (10a) and (17)

in

c1,(e,s)(µT) = 2T

ρAωµTsin

(ωµTT

)sin

(µTπ

lx(e,s)

). (18)

The total instantaneous energy of the string vibration with-out slap force density can be calculated with [12, 28] (timestep k and mode number µT dependencies are omitted forconcise notation)

Evibr(k) = 4ρAl

∑µT

(σ2µT + ω2

µT

)

× yd21 − 2 yd

1 yd2e

σµT T cos(ωµTT

)− yd22 e2σµT T

e2σµT T sin2 (ωµTT) .

(19)

In (19), the instantaneous energy is calculated without appli-cation of the slap force since the internal states yd

1 (µT , k) areused (see Figure 3). For calculating the instantaneous energyEs(k) after applying the slap force, yd

1 (µT , k) must be replacedwith yd

1,s(µT , k) in (19). To meet the condition of passivityof the elastic slap collision, both energies must be related byEvibr(k) ≥ Es(k). Here, only the worst-case scenario withregard to the instability problem is discussed, where both


energies are the same. By inserting into this energy equal-ity the corresponding expressions of (19) and solving for theslap force f d

f (k) results in

f df (k)

=∑µT

c5(µT)(

2eσµT T cos(ωµTT

)yd

2

(µT , k

)− 2 yd1

(µT , k

)),

(20a)

with

c5(µT)

=c1,s(µT)(σ2µT + ω2

µT

)∏νT =µT e

2σνT T sin2 (ωνT T)

∑κT

(c2

1,s

(κT)(σ2κT + ω2

κT

)∏νT =κT e

2σνT T sin2 (ωνT T)) .

(20b)

The force limitation discussed here can be implementedvery efficiently. Only one additional multiplication, onesummation, and one binary shift are needed for each vibra-tional mode (see (20a)), since the more complicated con-stants c5(µT) have to be calculated only once and the weight-ing of yd

2 (µT , k) has to be performed within the recursive sys-tem anyway (compare Figure 3).

Discrete realizations of the analytical solutions of the MDinitial boundary value problems have been derived in thissection. For the linear and nonlinear systems, they resultedin stable and accurate simulations of the transversal vibrat-ing string. The drawback of these straight forward discretiza-tion approaches of the MD systems in the frequency domainsis the high computational complexity of the resulting real-izations. Assuming a typical nylon guitar string with 247 Hzpitch frequency, 59 eigenmodes have to be calculated up tothe Nyquist frequency at 22.050 kHz. With an average of 3.1and 4.2 multiplications per output sample (MPOS) per re-cursive system for the linear and the nonlinear systems, re-spectively, the total computational cost results for the wholestring in 183 MPOS and 248 MPOS. Note that the fractionsof the average MPOS result from the assumption that thereare only few time instances where an excitation force acts onthe string, such that the input weightings of the recursive sys-tems do not have to be calculated at each sample step. Sincethis is also assumed for the nonlinear slap force, the fractionalpart in the nonlinear system is higher than in the linear sys-tem.

These computational costs are approximately five timeshigher than those of the most efficient physical modelingmethod, the DWG [21]. The next section shows that this dis-advantage of the FTM can be fixed by using a multirate ap-proach for the simulation of the recursive systems.

5. DISCRETIZATION WITH A MULTIRATE APPROACH

The basic idea using a multirate approach to the FTM realiza-tion is that the single modes have a very limited bandwidthas long as the damping coefficients σµ are small. Subdivid-

ing the temporal spectrum into different bands that are pro-cessed independently of each other, the modes within thesebands can be calculated with a sampling rate that is a frac-tion of the audio rate. Thus, the computational complexitycan be reduced with this method. The sidebands generatedby this procedure at audio rate are suppressed with a syn-thesis filter bank when all bands are added up to the outputsignal. The input signals of the subsampled modes also haveto be subsampled. To avoid aliasing, the respective input sig-nals for the modes are obtained by processing the excitationsignal f d

e (k) through an analysis filter bank. This general pro-cedure is shown with solid lines in Figure 4. It shows severalmodes (RS # i), each one running at its respective downsam-pled rate.

This filter bank approach is discussed in detail in the nexttwo sections for the linear as well as for the nonlinear modelof the FTM.

5.1. Discretization of the linear MD model

For the realization of the structure shown in Figure 4, twomajor tasks have to be fulfilled [29]:

(1) designing an analysis and a synthesis filter bank thatcan be realized efficiently,

(2) developing an algorithm that can simulate bandchanges of single sinusoids to keep the flexibility of theFTM.

Filter bank design

There are numerous design procedures for filter banks thatare mainly specialized to perfect or nearly perfect reconstruc-tion requirements [30]. In the structure shown in Figure 4there is no need for a perfect reconstruction as in sound-processing applications, since the sound production mecha-nism is performed within the single downsampled frequencybands. Therefore, inaccuracies of the interpolation filters canbe corrected by additional weightings of the subsampled re-cursive systems. Linear phase filters with finite impulse re-sponses (FIR) are used for the filter bank due to the vari-ability of the single sinusoids over time. Furthermore, a real-valued generation of the sinusoids in the form of second-order recursive systems as shown in Figure 2 is preferred tocomplex-valued first-order recursive systems. This approachavoids on one hand additional real-valued multiplications ofcomplex numbers. On the other hand, the nonlinear slapimplementation can be performed in a similar way for themultirate approach, as explained for the audio-rate realiza-tion in Section 4.2. A multirate realization of the FTM withcomplex-valued first-order systems is described in [31].

To fulfill these prerequisites and the requirement of low-order filters for computational efficiency with necessarily flatfilter edges, a filter bank with different downsampling factorsfor different bands has to be designed. A first step is to de-sign a basic filter bank with PED equidistant filters, all usingthe same downsampling factor rED = PED. Due to the flat fil-ter edges, there will be PED − 1 frequency gaps between thesingle filters that have neither a sufficient passband amplifi-cation nor a sufficient stopband attenuation. These gaps are


NLf df (rk)

+

+yd(xa, rk)

RS # 1

RS # 2

RS # 3

RS # 4

RS # 5

RS # 6

RS # 7

f de (k)

An

alys

isfi

lter

ban

k

Syn

thes

isfi

lter

ban

k

↓ 4

↓ 6

↓ 4

+

+

+

+

+

+

↑ 4

↑ 6

↑ 4

yd(xa, k)

...

Figure 4: Structure of the multirate FTM. Solid lines represent the basic linear system, while dashed and dotted lines represent the extensionsfor the nonlinear slap force. RS means recursive system. The arrow between RS # 3 and RS # 4 indicates a band change.

filled with low-order FIR filters that realize the interpolationof different downsampling factors than rED. The combina-tion of all filters forms the filter bank. It is used for the anal-ysis and the synthesis filter bank as shown in Figure 4.

An example of this procedure is shown in Figure 5 withPED = 4. The total number of bands is P = 7. The frequencyregions where the single filters are used as passbands in thefilter bank are separated by vertical dashed lines. The filtersare designed by a weighted least-squares method such thatthey meet the desired passband bandwidths and stopband at-tenuations. Note that there are several frequency regions foreach filter where the frequency response is not specified ex-plicitly. These so-called “don’t care bands” occur since onlya part of the Nyquist bandwidth in the downsampled do-main is used for the simulation of the modes. Thus, there canonly be images of these sinusoids in the upsampled versionin distinct regions. All other parts of the spectrum are “don’tcare bands,” for the lowpass filter they are shown as gray ar-eas in Figure 5. Magnitude ripples of ±3 dB are allowed inthe passband which can be compensated by a correction ofthe weighting factors of the single sinusoids. The stopbandsare attenuated by at least−60 dB, which is sufficient for mosthearing conditions. Merely in studio-like hearing conditionslarger stopband attenuations must be used such that artifactsproduced by using the filter bank cannot be heard.

Due to the different specifications of the filters, concern-ing bandwidths and edge steepnesses, they have different or-ders and thus different group delays. To compensate for thedifferent group delays, delay-lines of length (Mmax −Mp)/2are used in conjunction with the filters. The number of coef-

0

−30

−60

−90

0 0.2 0.4 0.6 0.8 1

4 4 4 4

0

−30

−60

−900 0.2 0.4 0.6 0.8 1

6 5 6

0

−30

−60

−900 0.2 0.4 0.6 0.8 1

Mag

nit

ude

resp

onse

(dB

)

ωµT/π

Figure 5: Top: frequency responses of the equidistant filters (withdownsampling factor four in this example). Center: frequency re-sponses of the filters with other downsampling factors. Bottom: fre-quency response of the filter bank. The downsampling factors r aregiven within the corresponding passbands. The FIR filter orders arebetween Mmin = 34 and Mmax = 72 in this example. They realize astopband attenuation of at least −60 dB and allow passband ripplesof ±3 dB.

ficients of the interpolation filters are denoted by Mp, whereMmax is the maximum order of all filters. The delay lines con-sume some memory space but no additional computational


cost [32]. Realizing the filter bank in a polyphase structure,each filter bank results in a computational cost of

Cfilterbank =P∑

p=1

Mp

rpMPOS, (21)

with the downsampling factors rp of each band. For the ex-ample given above, each filter bank needs 73 MPOS. In (21)it is assumed that each band contains at least one mode tobe reproduced, so that it is a worst-case scenario. As long asthe excitation signal is known in advance, the excitations foreach band can be precalculated such that only the synthesisfilter bank must be implemented in real time. The case thatthe excitation signals are known and stored as wavetables inadvance is quite frequent in physical modeling algorithms,although the pure physicality of the model is lost by this ap-proach. For example, for string simulations, typical pluckingor striking situations can be described by appropriate excita-tion signals which are determined in advance.

The practical realization of the multirate approach startswith the calculation of the modal frequencies ωµT and theircorresponding damping coefficients σµT . The frequency de-notes in which band the mode is synthesized. The coefficientsof the recursive systems, as shown in Figure 2 for the audiorate realization, have to be modified in the downsampled do-main since the sampling interval T is replaced by

T(r) = rT(1) = rT. (22)

Superscript (r) denotes the downsampled simulation withfactor r. The downsampling factors of the different bands rpare given in the top and center plot of Figure 5. No furtheradjustments have to be performed for the coefficients of therecursive systems in the multirate approach, since modes canbe realized in the downsampled baseband or each of the cor-responding images.

Band changes of single modes

One advantage of the FTM is that the physical parametersof a vibrating object can be varied while playing. This is notonly valid for successively played notes but also within onenote, as it occurs, for example, in vibrato playing. As far asone or several modes are at the edges of the filter bank bands,these variations can cause the modes to change the bandswhile they are active. This is shown with an arrow in Figure 4.In such a case, the reproduction cannot be performed by justadjusting the coefficients of the recursive systems with (22)to the new downsampling rate and using the other interpo-lation filter. This procedure would result in strong transientsand in a modification of the modal amplitudes and phases.Therefore, a three-step procedure has to be applied to theband changing modes:

(1) adjusting the internal states of the recursive systemssuch that no phase shift and no amplitude differenceoccurs in the upsampled output signal from this mode,

(2) canceling the filter output of the band changing mode,

(3) training of the new interpolation filter to avoid tran-sient behavior.

Similar to the calculation of the instantaneous energy for slapsynthesis, also the instantaneous amplitude and phase can becalculated from the internal states of a second-order recursivesystem, y1 and y2. They can be calculated for the old bandwith downsampling factor r1, as well as for the new bandwith factor r2. Demanding the equality of both amplitudesand phases, the internal states of the new band are calculatedfrom the internal states of the old band to

y(r2)1 = y(r1)

1sin

(ωµr2T

)sin

(ωµr1T

)

+ y(r1)2 eσµr1T

(cos

(ωµr2T

)− sin(ωµr2T

)tan

(ωµr1T

))

,

y(r2)2 = y(r1)

2 eσµ(r1−r2)T .

(23)

The second item of the three-step procedure means thatthe output of the synthesis interpolation filter must not con-tain those modes that are leaving that band at time instancekchT for time steps kT ≥ kchT . Since the filter bank is a causalsystem of length MpT , the information of the band changemust either be given in advance at (kch−Mp)T or a turbo fil-tering procedure has to be applied. In the turbo filtering, thecalculations of several sample steps are performed within onesampling interval at the cost of a higher peak computationalcomplexity. In this case, the turbo filtering must calculate theprevious outputs of the modes, leaving the band and sub-tract their contribution to the interpolated output for timeinstances kT ≥ kchT . Due to the higher peak computationalcomplexity of the turbo filtering and the low orders of theinterpolation filters, the additional delay of MpT is preferredhere.

In the same way, as the band changing mode must nothave an effect on the leaving band from kchT on, it mustalso be included in the interpolation filter of the new bandfrom this time instance on. In other words, the new interpo-lation filter must be trained to correctly produce the desiredmode without transients, as addressed in the third item of thethree-step procedure above. It can also be performed with theturbo processing procedure with a higher computational costor with the delay of MpT between the information of bandchange and its effect in the output signal.

Now, the linear solution (13) of the transversal vibratingstring derived with the FTM is realized also with a multirateapproach. Since the single modes are produced at a lower ratethan the audio rate, this procedure saves computational costin comparison to the direct discretization procedure derivedin Section 4.1. The amount of computational savings withthis procedure is discussed in more detail in Section 6.

5.2. Extensions for slap synthesis

In the discretization approach described in Section 4.2 theoutput yd(xa, k) is fed back to the recursive systems via thepath of the external force f d

e (k) (compare Figure 2). Usingthe same path in the multirate system shown in Figure 4


would result in a long delay within the feedback path dueto the delays in the interpolation filters of the analysis andthe synthesis filter bank. Furthermore, the analysis filter bankshould not be realized in real time as long as the excitationsignal is known in advance.

Fortunately, the recursive systems calculate directly theinstantaneous deflection of the single modes, but in thedownsampled domain. Considering a system where onlymodes are simulated in baseband, the signal can be fed backin between the down- and upsampled boxes in Figure 4 andthus directly in the downsampled domain. In comparison tothe full-rate system, the observation of the penetration of thestring into the fret might be delayed by up to (rp − 1)T sec-onds. This delay results in a different slap force, but applyingthe stabilization procedure described in Section 4.3 the sta-bility is guaranteed.

However, in realistic simulations there are also modes inthe higher frequency bands than just in the baseband. Thismodifies the simulations described above in two ways:

(i) the deflection of the string and thus the penetrationinto the fret depends on the modes of all bands,

(ii) there is an interaction due to nonlinear slap force be-tween all modes in all bands.

The calculation of the instantaneous string deflection in thedownsampled rates is rarely possible, since there are variousdownsampling rates as shown in Figure 4. Thus, there areonly a few time instances kallT , where the modal deflectionsare updated in all bands at the same time. Since in almost allbands one sample value of the recursive systems representsmore than half the period of the mode, it is not reasonableto use the previously calculated sample value for the calcu-lation of the deflection at time instances kT = kallT . How-ever, all the equidistant bands of the filter bank as shownon top of Figure 5 have the same downsampling factor andcan thus represent the same time instances for the calcula-tion of the deflection. Furthermore, most of the energy ofguitar string vibrations is in the lower modes [28], such thatthe deflection is mostly defined by the modes simulated inthe lowest bands. Therefore, the string deflection is deter-mined here at each r1th audio sample from all equidistantbands and each (kmod r1 = 0) ∧ (kmod r2 = 0)th audiosample from all equidistant bands and bands with the down-sampling rate of the lowest band-pass. This is shown in theright dashed and dotted paths in Figure 4. In the exampleof Figure 5, in each twelfth audio sample the deflection iscalculated from the four equidistant bands and each twelfthaudio sample it is calculated also from the second and sixthbands.

In the same way the string deflection is calculated withvarying participation of the different bands, also the slapforce is only applied to modes in these bands as shown in theleft dashed and dotted paths in Figure 4. This procedure hastwo effects: firstly, there is no interaction between all modesat all (downsampled) time instances from the slap force. Sec-ondly, the slap force itself, being an impulse-like signal with abright spectrum, is filtered by the filter bank. The first effect isnot that important since the procedure ensures interactions

between most modes but it only restricts them to few timeinstances, in the example above every fourth or twelfth audiosample. These low delays of the interaction are not notice-able. The second effect can be handled by adding impulsesdirectly to the interpolation filters of the synthesis filter bank.The weights of the impulses in each band are determined bythe difference between the sum of all slap force impulses inall bands and the applied slap force impulses in that band. Inthat way, a slap force, only applied to baseband modes, pro-duces a nearly white noise slap signal at audio rate.

The stabilization procedure described in Section 4.3 canbe also applied to the multirate realization of the nonlinearslap force. The only differences to the audio rate simulationsare that T is replaced by rpT as given in (22) and the sum-mation for the calculation of the stable slap force f d

f (k) asgiven in (20a) is only performed over the modes realized inthe participating bands. Thus, there are time instances wherethe slap force is only applied to the modes in the equidistantbands and time instances where it is applied also to bandswith another downsampling factor. This is shown with thedotted lines in Figure 4. Due to the different cases of partici-pating bands, also two versions of the constants c5(µT) haveto be calculated, since the products and sums in (20b) de-pend only on the participating modes.

Now, a stable and realistic simulation of the nonlinearslap force is also obtained in the multirate realization. In thenonlinear case, the simulation accuracy obviously decreaseswith higher downsampling factors and thus with an increas-ing number of bands. This effect is discussed in more detailin the next section.

6. SIMULATION ACCURACY AND COMPUTATIONALCOMPLEXITY

In the previous sections, stable, linear and nonlinear, discreteFTM models have been derived. In the next sections, the sim-ulation accuracies of these models and their correspondingcomputational complexities are discussed.

6.1. Simulation accuracies

For the linearly vibrating string, the discrete realization ofthe single modes at full rate is an exactly sampled version ofthe continuous modes. This is true as long as the input forcecan be modeled with discrete impulses, since the impulse-invariant transformation is used as explained in Section 4.1.However, the exactness of the complete system is lost withthe truncation of the summation of partials in (12) to avoidaliasing effects. Therefore, the results are only accurate aslong as the excitation signal has only low energy in the trun-cated high frequency range. This is true for the guitar andmost other musical instruments [28] and, furthermore, theneglected higher partials cannot be received by the humanauditory system as long as the sampling interval T is cho-sen small enough. Since the audible modes are simulated ex-actly and the simulation error is out of the audible range, theFTM is used here as an optimized discretization approach forsound synthesis applications.


In multirate simulations of linear systems as describedin Section 5.1, the single modes are produced exactly withinthe downsampled domain. But due to the imperfectness ofthe analysis filter bank, modes are not only excited by thecorrect frequency components of the excitation force, butalso by aliasing terms that occur with downsampling. In thesame way, the images, produced by upsampling the outputsof the recursive systems, are not suppressed perfectly withthe synthesis filter bank. However, the filter banks have beendesigned such that the stopband suppressions are at least−60 dB. This is sufficient for most listening conditions as de-fined in Section 5.1. Furthermore, the filters are designed ina least-mean-squares sense such that the energy of the sidelobes in the stopbands is minimized. Further filter bank op-timizations with respect to the human auditory system aredifficult since the filter banks are designed only once for allkinds of mode configurations concerning their positions andamplitude relations in the simulated spectrum.

In the audio rate string model excited nonlinearly withthe slap force as described in Section 4.2, the truncation ofthe infinite sum in (16) also effects the accuracy of the lowermodes through the nonlinearity. The simulations are accu-rate only as long as the external excitation and the nonlinear-ity have low contributions to the higher modes. Although theexternal excitation contributes rarely to the higher modes,there is an interaction between all modes due to the slapforce. This interaction grows with the modal frequencies. Itcan be directly seen in the coefficients c5(µT) in (20b), sincethey have larger absolute values for higher frequencies. How-ever, the force contributions of the omitted modes are dis-tributed to the simulated modes since the denominator of(20b) decreases for less simulated partials. Furthermore, thesign of c5(µT) changes with µT due to (18) as well as the ex-pression in parenthesis of (20a) does with time. Thus, there isa bidirectional interaction between low and high modes andnot only an energy shift from low to high frequencies. Ne-glecting modes out of the audible range results in less energyfluctuations of the audible modes. But since the neglected en-ergy fluctuations have high frequencies, they are also out ofthe audible range.

In the multirate implementation of the nonlinear modelas described in Section 5.2, the interactions between al-most all modes are retained. It is more critical here thatthe observation of the fret-string penetration might bedelayed by several audio samples. This circumvents notonly the strict limitation of the string deflection by thefret, but is also changes the modal interactions becausethe nonlinear system is not time-invariant. However, theaudible slap effect stays similar to the full-rate simula-tions and sounds realistic. Audio examples can be found athttp://www.LNT.de/∼traut/JASP04/sounds.html.

It has been shown that the FTM realizes the continuoussolutions of the physical models of the vibrating string ac-curately. With the multirate approach, the FTM looses theexactness of the linear audio rate model, but the inaccura-cies cannot be heard. For the nonlinear model, the multirateapproach leads to audible differences compared to the audiorate simulations, but the characteristics of the slap sounds

are preserved. Thus, simplifications and computational sav-ings due to the filter bank approach are performed here withrespect to the human auditory system.

6.2. Computational complexities

The computational complexities of the FTM are explainedwith two typical examples, a single bass guitar string sim-ulated in different qualities and a six-string acoustic gui-tar. The first example simulates the vibration of one bassguitar string with fundamental frequency of 41 Hz. The cor-responding physical parameters can be found, for example,in [12]. This string is simulated in different sound quali-ties by varying the number of simulated modes from 1 to117, which corresponds to the simulation of all modes upto the Nyquist frequency with a sampling frequency of fs =44.1 kHz.

Figure 6 shows the dependency of the computationalcomplexities on the number of simulated modes and thusthe simulation accuracy or sound quality. The procedureused here to enhance the sound quality consists of sim-ulating more and more modes in consecutive order fromthe lowest mode on. Thus, the enhancement of the soundquality sounds like opening the lowpass in subtractive syn-thesis. The upper plot shows the computational complexi-ties for the linear system, simulated at audio rate and withthe multirate approach using filter banks with P = 7 andP = 15. The bottom plot shows the corresponding graphsfor the nonlinear systems. It is assumed that the exter-nal forces only act on the string at one tenth of the out-put samples such that the weighting of the inputs do nothave to be performed at each time instance. Thus, each lin-ear recursive system needs 3.1 MPOS for the calculation ofone output sample, whereas the nonlinear system needs 4.2MPOS.

It can be seen that the multirate implementations aremuch more efficient than the audio-rate simulations, exceptfor simulations with very few modes. With all 117 simulatedmodes, the relation between audio rate and multirate sim-ulations (P = 7) is 363 MPOS to 157 MPOS for the linearsystem and 492 MPOS to 187 MPOS for the nonlinear sys-tem. This is a reduction of the computational complexity ofmore than 60%.

The steps in the multirate graphs denote the offset of thefilter bank realization and that the interpolations of the fil-ter bank bands are only calculated as long as there is at leastone mode simulated in those bands. On the one hand, theregions between the steps are steeper in the filter bank withP = 7 than in that with P = 15 due to the higher downsam-pling factors in filter banks with more bands. On the otherhand, the steps are higher for filter banks with more bandsdue to the higher interpolation filter orders. In this example,the multirate approach with P = 7 is superior to the filterbank with P = 15 for high qualities, since there are only afew modes simulated in the higher bands of P = 15, but thefilter bank offset is higher. For other configurations with ahigher number of simulated modes, this situation is differentas shown in the next example.


200

150

100

50

00 20 40 60 80 100 120

Number of modes

Com

puta

tion

alco

mpl

exit

y

(MP

OS)

(a)

200

150

100

50

00 20 40 60 80 100 120

Number of modes

Com

puta

tion

alco

mpl

exit

y

(MP

OS)

(b)

Figure 6: Computational complexities of the FTM simulations dependent on the number of simulated modes at audio rate (dotted line),and with multirate approaches with P = 7 (dashed line) and P = 15 (solid line). (a): Linearly vibrating string, (b): vibrating string withnonlinear slap forces.

The second example shows the computational complex-ities of the simultaneous simulations of six independentstrings as they occur in an acoustic guitar. Obviously, there isonly one interpolation filter bank needed for all strings. Theaverage number of simulated modes for each guitar stringis assumed to be 60. In contrast to the first example, it isassumed that the modes are equally distributed in the fre-quency domain, such that at least one mode is simulated ineach band.

Figure 7 shows that the computational complexities de-pend on the choice of the used filter bank. On the one hand,each filter bank needs a fixed amount of computational costwhich grows with the number of used bands. On the otherhand, filter banks with more bands provide higher down-sampling factors for the production of the sinusoids whichsaves computational cost. Thus, the choice of the optimalfilter bank depends on the number of simultaneously sim-ulated modes. For practical implementations this has to beestimated in advance.

It can be seen that for the linear case (solid line) theminimum computational cost is 272 MPOS using the filterbank with P = 11. In the nonlinear case, the filter bankwith P = 15 has the minimum computational cost with 319MPOS for the simulation of all six strings. Compared to theaudio-rate simulations with 1116 MPOS and 1512 MPOS forthe linear and nonlinear case, respectively, the multirate sim-ulations allow computational savings up to 79%. Thus, themultirate simulations have a computational complexity ofapproximately 45 MPOS (53 MPOS) for each linearly (non-linearly) simulated string.

400

350

300

7 11 15 19

P

Com

puta

tion

alco

mpl

exit

y(M

PO

S)

Figure 7: Computational complexities of the FTM simulations of asix-string guitar dependent on the number of bands for the multi-rate approach. Solid line: linearly vibrating string. Dashed line: vi-brating string with nonlinear slap forces.

Compared to high quality DWG simulations, the com-putational complexities of the multirate FTM approach arenearly the same. Linear DWG simulations need up to 40MPOS for the realization of the reflection filters [21] andthe nonlinear limitation of the string by the fret additionallyneeds 3 MPOS per fret position [22].


7. CONCLUSIONS

The complete procedure of the FTM has been described fromthe basic physical analysis of a vibrating structure resulting inan initial boundary value problem via its analytical solutionto efficient digital multirate implementations. The transver-sal vibrating dispersive and lossy string with a nonlinear slapforce served as an example. The novel contribution is a thor-ough investigation of the implementation and the propertiesof a multirate realization.

It has been shown that the differences between audio-rate and multirate simulations for linearly vibrating stringsimulations are not audible. The differences of the nonlin-ear simulations were audible but the multirate approach pre-serves the sound characteristics of the slap sound. The ap-plication of the multirate approach saves almost 80% of thecomputational cost at audio rate. Thus, it is nearly as efficientas the most popular physical modeling method, the DWG.

The multirate FTM is by far not limited to the example ofvibrating strings. It can be used in a similar way to spatiallymultidimensional systems, like membranes or plates, or evento other physical problems like heat flow or diffusion.

ACKNOWLEDGMENTS

The authors would like to thank Vesa Valimaki for numer-ous discussions and his help in the filter bank design for themultirate FTM. Furthermore, the financial support of theDeutsche Forschungsgemeinschaft (DFG) for this research isgreatly acknowledged.

REFERENCES

[1] C. Roads, S. Pope, A. Piccialli, and G. De Poli, Eds., MusicalSignal Processing, Swets & Zeitlinger, Lisse, The Netherlands,1997.

[2] L. Hiller and P. Ruiz, “Synthesizing musical sounds by solvingthe wave equation for vibrating objects: Part I,” Journal of theAudio Engineering Society, vol. 19, no. 6, pp. 462–470, 1971.

[3] A. Chaigne and V. Doutaut, “Numerical simulations of xy-lophones. I. Time-domain modeling of the vibrating bars,”Journal of the Acoustical Society of America, vol. 101, no. 1, pp.539–557, 1997.


[5] A. Chaigne and A. Askenfelt, “Numerical simulations of pi-ano strings. I. A physical model for a struck string using finitedifference methods,” Journal of the Acoustical Society of Amer-ica, vol. 95, no. 2, pp. 1112–1118, 1994.

[6] M. Karjalainen, “1-D digital waveguide modeling for im-proved sound synthesis,” in Proc. IEEE Int. Conf. Acoustics,Speech, Signal Processing, vol. 2, pp. 1869–1872, IEEE SignalProcessing Society, Orlando, Fla, USA, May 2002.

[7] C. Erkut and M. Karjalainen, “Finite difference methodvs. digital waveguide method in string instrument modelingand synthesis,” in Proc. International Symposium on MusicalAcoustics, Mexico City, Mexico, December 2002.

[8] C. Cadoz, A. Luciani, and J. Florens, “Responsive inputdevices and sound synthesis by simulation of instrumentalmechanisms: the CORDIS system,” Computer Music Journal,vol. 8, no. 3, pp. 60–73, 1984.

[9] J. M. Adrien, “Dynamic modeling of vibrating structures forsound synthesis, modal synthesis,” in Proc. AES 7th Inter-national Conference, pp. 291–299, Audio Engineering Society,Toronto, Canada, May 1989.

[10] G. De Poli, A. Piccialli, and C. Roads, Eds., Representations ofMusical Signals, MIT Press, Cambridge, Mass, USA, 1991.

[11] G. Eckel, F. Iovino, and R. Causse, “Sound synthesis by phys-ical modelling with Modalys,” in Proc. International Sympo-sium on Musical Acoustics, pp. 479–482, Le Normant, Dour-dan, France, July 1995.

[12] L. Trautmann and R. Rabenstein, Digital Sound Synthe-sis by Physical Modeling Using the Functional TransformationMethod, Kluwer Academic Publishers, New York, NY, USA,2003.

[13] D. A. Jaffe and J. O. Smith, “Extensions of the Karplus-Strongplucked-string algorithm,” Computer Music Journal, vol. 7,no. 2, pp. 56–69, 1983.

[14] K. Karplus and A. Strong, “Digital synthesis of plucked-stringand drum timbres,” Computer Music Journal, vol. 7, no. 2, pp.43–55, 1983.



[17] M. Karjalainen, V. Valimaki, and Z. Janosy, “Towards high-quality sound synthesis of the guitar and string instruments,”in Proc. International Computer Music Conference, pp. 56–63,Tokyo, Japan, September 1993.

[18] M. Karjalainen, V. Valimaki, and T. Tolonen, “Plucked-stringmodels, from the Karplus-Strong algorithm to digital waveg-uides and beyond,” Computer Music Journal, vol. 22, no. 3, pp.17–32, 1998.

[19] R. Rabenstein, “Discrete simulation of dynamical boundaryvalue problems,” in Proc. EUROSIM Simulation Congress, pp.177–182, Vienna, Austria, September 1995.

[20] L. Trautmann and R. Rabenstein, “Digital sound synthesisbased on transfer function models,” in Proc. IEEE Workshopon Applications of Signal Processing to Audio and Acoustics, pp.83–86, IEEE Signal Processing Society, New Paltz, NY, USA,October 1999.

[21] L. Trautmann, B. Bank, V. Valimaki, and R. Rabenstein,“Combining digital waveguide and functional transformationmethods for physical modeling of musical instruments,” inProc. Audio Engineering Society 22nd International Conferenceon Virtual, Synthetic and Entertainment Audio, pp. 307–316,Espoo, Finland, June 2002.

[22] E. Rank and G. Kubin, “A waveguide model for slapbass syn-thesis,” in Proc. IEEE Int. Conf. Acoustics, Speech, Signal Pro-cessing, pp. 443–446, IEEE Signal Processing Society, Munich,Germany, April 1997.

[23] M. Kahrs and K. Brandenburg, Eds., Applications of DigitalSignal Processing to Audio and Acoustics, Kluwer AcademicPublishers, Boston, Mass, USA, 1998.

[24] L. Trautmann and R. Rabenstein, “Stable systems for nonlin-ear discrete sound synthesis with the functional transforma-tion method,” in Proc. IEEE Int. Conf. Acoustics, Speech, SignalProcessing, vol. 2, pp. 1861–1864, IEEE Signal Processing So-ciety, Orlando, Fla, USA, May 2002.

[25] B. Girod, R. Rabenstein, and A. Stenger, Signals and Systems,John Wiley & Sons, Chichester, West Sussex, UK, 2001.

[26] R. V. Churchill, Operational Mathematics, McGraw-Hill, NewYork, NY, USA, 3rd edition, 1972.

[27] R. Rabenstein and L. Trautmann, “Digital sound synthe-sis of string instruments with the functional transformation


method,” Signal Processing, vol. 83, no. 8, pp. 1673–1688,2003.


[29] L. Trautmann and V. Valimaki, “A multirate approach tophysical modeling synthesis using the functional transforma-tion method,” in Proc. IEEE Workshop on Applications of Sig-nal Processing to Audio and Acoustics, pp. 221–224, IEEE SignalProcessing Society, New Paltz, NY, USA, October 2003.

[30] P. P. Vaidyanathan, Multirate Systems and Filter Banks, Pren-tice Hall, Englewood Cliffs, NJ, USA, 1993.

[31] S. Petrausch and R. Rabenstein, “Sound synthesis by physicalmodeling using the functional transformation method: Effi-cient implementation with polyphase filterbanks,” in Proc.International Conference on Digital Audio Effects, London, UK,September 2003.

[32] B. Bank, “Accurate and efficient method for modeling beatingand two-stage decay in string instrument synthesis,” in Proc.MOSART Workshop on Current Research Directions in Com-puter Music, pp. 134–137, Barcelona, Spain, November 2001.

L. Trautmann received his “Diplom-Inge-nieur” and “Doktor-Ingenieur” degrees inelectrical engineering from the University ofErlangen-Nuremberg, in 1998 and 2002, re-spectively. In 2003 he was working as a Post-doc in the Laboratory of Acoustics and Au-dio Signal Processing at the Helsinki Uni-versity of Technology, Finland. His researchinterests are in the simulation of multi-dimensional systems with focus on digitalsound synthesis using physical models. Since 1999, he publishedmore than 25 scientific papers, book chapters, and books. He is aholder of several patents on digital sound synthesis.

R. Rabenstein received his “Diplom-Inge-nieur” and “Doktor-Ingenieur” degrees inelectrical engineering from the Universityof Erlangen-Nuremberg, in 1981 and 1991,respectively, as well as the “Habilitation”in signal processing in 1996. He workedwith the Telecommunications Laboratoryof this university from 1981 to 1987 andsince 1991. From 1998 to 1991, he was withthe Physics Department of the University ofSiegen, Germany. His research interests are in the fields of multidi-mensional systems theory and simulation, multimedia signal pro-cessing, and computer music. He serves in the IEEE TC on SignalProcessing Education. He is a Board Member of the School of En-gineering of the Virtual University of Bavaria and has participatedin several national and international research cooperations.


Physically Inspired Models for the Synthesis of StiffStrings with Dispersive Waveguides

I. TestaDipartimento di Scienze Fisiche, Universita di Napoli “Federico II,” Complesso Universitario di Monte S. Angelo, 80126 Napoli, ItalyEmail: [email protected]

G. EvangelistaDipartimento di Scienze Fisiche, Universita di Napoli “Federico II,” Complesso Universitario di Monte S. Angelo, 80126 Napoli, ItalyEmail: [email protected]

S. CavaliereDipartimento di Scienze Fisiche, Universita di Napoli “Federico II,” Complesso Universitario di Monte S. Angelo, 80126 Napoli, ItalyEmail: [email protected]


We review the derivation and design of digital waveguides from physical models of stiff systems, useful for the synthesis ofsounds from strings, rods, and similar objects. A transform method approach is proposed to solve the classic fourth-orderequations of stiff systems in order to reduce it to two second-order equations. By introducing scattering boundary matrices,the eigenfrequencies are determined and their n2 dependency is discussed for the clamped, hinged, and intermediate cases.On the basis of the frequency-domain physical model, the numerical discretization is carried out, showing how the insertionof an all-pass delay line generalizes the Karplus-Strong algorithm for the synthesis of ideally flexible vibrating strings. Know-ing the physical parameters, the synthesis can proceed using the generalized structure. Another point of view is offered byLaguerre expansions and frequency warping, which are introduced in order to show that a stiff system can be treated as anonstiff one, provided that the solutions are warped. A method to compute the all-pass chain coefficients and the optimumwarping curves from sound samples is discussed. Once the optimum warping characteristic is found, the length of the dis-persive delay line to be employed in the simulation is simply determined from the requirement of matching the desired fun-damental frequency. The regularization of the dispersion curves by means of optimum unwarping is experimentally evalu-ated.

Keywords and phrases: physical models, dispersive waveguides, frequency warping.

1. INTRODUCTIONInterest in digital audio synthesis techniques has been rein-forced by the possibility of transmitting signals to a wider au-dience within the structured audio paradigm, in which algo-rithms and restricted sets of data are exchanged [1]. Amongthese techniques, the physically inspired models play a privi-leged role since the data are directly related to physical quan-tities and can be easily and intuitively manipulated in orderto obtain realistic sounds in a flexible framework. Applica-tions are, amongst the others, the simulation of a “physicalsituation” producing a class of sounds as, for example, a clos-ing door, a car crash, the hiss made by a crawling creature, thehuman-computer interaction and, of course, the simulationof musical instruments.

In the general physical models technique, continuous-time solutions of the equations describing the physical sys-

tem are sought. However, due to the complexity of the realphysical systems—from the classic design of musical in-struments to the molecular structure of extended objects—solutions of these equations cannot be generally found in ananalytic way and one should resort to numerical methods orapproximations. In many cases, the resulting approximationscheme only closely resembles the exact model. For this rea-son, one could better define these methods as physically in-spired models, as first proposed in [2], where the mathemat-ical equations or solutions of the physical problem serve asa solid base to inspire the actual synthesis scheme. One ofthe advantages of using physically inspired models for soundsynthesis is that they allow us to perform a “selection” of thephysical parameters actually influencing the sound so that atrade-off between completeness and particular goals can beachieved.

Physically Inspired Models 965

In the following, we will focus on stiff vibrating systems,including rods and stiff strings as encountered in pianos.However, extensions to two- or three-dimensional systemscan be carried out with little effort.

Vibrating physical systems have been extensively studiedover the last thirty years for their key role in many musi-cal instruments. The wave equation can be directly approxi-mated by means of finite difference equations [3, 4, 5, 6, 7],or by discretization of the wave functions as proposed byJaffe and Smith [8, 9] who reinterpreted and generalized theKarplus-Strong algorithm [10] in a wave propagation setting.The outcome of the approximation of the time domain solu-tion of the wave equation is the design of a digital waveg-uide simulating the string itself: the sound signal simulationis achieved by means of an appropriate excitation signal, suchas white noise. However, in order to achieve a more realisticand flexible synthesis, the interaction of the excitation sys-tem with the vibrating element is, in turn, physically mod-eled. Digital waveguide methods for the simulation of physi-cal models have been widely used [11, 12, 13, 14, 15, 16]. Oneof the reasons for their success is that they are appropriate forreal-time synthesis [17, 18, 19, 20]. This result allowed us tochange our approach to model musical instruments based onvibrating strings: waveguides can be designed for modelingthe “core” of the instruments, that is, the vibrating string, butthey are also suitable for the integration of interaction mod-els, for example, for the excitation due to a hammer [21] orto a bow [9], the radiation of sound due to the body of theinstrument [22, 23, 24, 25], and of different side-effects inplucked strings [26]. It must be pointed out that the interac-tions being highly nonlinear, their modeling and the deter-mination of the range of stability is not an easy task.

In this paper, we will review the design of a digital waveg-uide simulating a vibrating stiff system, focusing on stiffstrings and treating bars as a limit case where the tensionin negligible. The purpose is to derive a general frameworkinspiring the determination of a discrete numerical model.A frequency domain approach has been privileged, whichallows us to separate the fourth-order differential equationof stiff systems into two second-order equations, as shownin Section 2. This approach is also useful for the simula-tion of two-dimensional (2D) systems such as thin plates.By enforcing proper boundary conditions, we obtain theeigenfrequencies and the eigenfunctions of the system asfound, for the case of strings, in the classic works by Fletcher[27, 28]. Once the exact solutions are completely charac-terized, their numerical approximation is discussed [29, 30]together with their justification based on physical reason-ing. The discretization of the continuous-time domain so-lutions is carried out in Section 3, which naturally leads todispersive waveguides based on a long chain of all-pass fil-ters. From a different point of view, the derived structure canbe described in terms of Laguerre expansions and frequencywarping [31]. In this framework, a stiff system can be shownto be equivalent to a nonstiff (Karplus-Strong like) system,whose solutions are frequency warped, provided that the ini-tial and the possibly moving boundary conditions are prop-erly unwarped [32, 33]. As a side effect, this property can be

exploited in order to perform an analysis of piano soundsby means of pitch-synchronous frequency warped waveletsin which the excitation can be separated from the resonantsound components [34].

The models presented in this paper provide at least twoentry points for the synthesis. If the physical parameters andboundary conditions are completely known, or if it is de-sired to specify them to model arbitrary strings or rods, thenthe eigenfunctions, hence the dispersion curve, can be deter-mined. The problem is then reconducted to that of findingthe best approximation of the continuous-time dispersioncurve with the phase response of a suitable all-pass chain us-ing the methods illustrated in Section 3. Another entry pointis offered if sound samples of an instrument are available.In this case, the parameters of the synthesis model can bedetermined by finding the warping curve that best fits thedata given by the frequencies of the partials, together withthe length of the dispersive delay line. This is achieved bymeans of a regularization method of the experimental dis-persion data, as reported in Section 4.

The physical entry point is to be preferred in those sit-uations where sound samples are not available, for example,when we are modeling a nonexisting instrument by extensionof the physical model, such as a piano with unusual speak-ing length. The other entry level is best for approximatingreal instrument sounds. However, in this case, the synthesisis limited to existing sources, although some variations canbe obtained in terms of the warping parameters, which arerelated to, but do not directly represent, physical factors.

2. PHYSICAL STIFF SYSTEMS

In this section, we present a brief overview of the stiffstring and rod equations of motion and of their solution.The purpose is twofold. On the one hand, these equationsgive the necessary background to the physical modeling ofstiff strings. On the other hand, we show that their fre-quency domain solution ultimately provides the link betweencontinuous-time and discrete-time models, useful for thederivation of the digital waveguide and suitable for their sim-ulation. This link naturally leads to Laguerre expansions forthe solution and to frequency warping equivalences. Further-more, enforcing proper boundary conditions determines theeigenfrequencies and eigenfunctions of the system, useful forfitting experimentally measured resonant modes to the onesobtained by simulation. This fit allows us to determine theparameters of the waveguide through optimization.

2.1. Stiff string and bar equation

The equation of motion for the stiff string can be determinedby studying the equilibrium of a thin plate [35, 36]. One ob-tains the following 4th-order differential equation for the de-formation of the string y(x, t):

−ε ∂4y(x, t)∂x4

+∂2y(x, t)

∂x2= 1

c2

∂2y(x, t)∂t2

,

ε = EI

T, c =

√T

ρS,

(1)


featuring the Young modulus of the material E, the inertiamoment I with respect to the transversal axis of the cross-section of the string (for a circular section of radius r, I =πr4/4 as in [36]), the tension of the string T , and the massper unit length ρS. Note that for ε → 0, (1) becomes the well-known equation of the vibrating string [35]. Otherwise, if theapplied tension T is negligible, we obtain

−ε′ ∂4y(x, t)∂x4

= ∂2y(x, t)∂t2

, ε′ = EI

ρS, (2)

which is the equation for the transversal vibrations of rods.Solutions of (1) and (2) are best found in terms of the Fouriertransform of y(x, t) with respect to time:

Y(x,ω) =∫ +∞

−∞y(x, t) exp(−iωt)dt, (3)

where ω is the angular velocity related to frequency f by therelationship f = 2πω.

By taking the Fourier transform of both members of (1)and (2), we obtain

ε∂4Y(x,ω)

∂x4− ∂2Y(x,ω)

∂x2− ω2

c2Y(x,ω) = 0 (4)

for the stiff string and

ε′∂4Y(x,ω)

∂x4− ω2Y(x,ω) = 0 (5)

for the rod.The second-order −∂2/∂x2 spatial differential operator is

defined as a repeated application of the L2 space extension ofthe−i(∂/∂x) operator [37]. To the purpose, we seek solutionswhose spatial and frequency dependency can be factored, ac-cording to the separation of variables method, as follows:

Y(x,ω) =W(ω)X(x). (6)

Substituting (6) in (4) and (5) results in the elimination ofthe W(ω) term, obtaining ordinary differential equations,whose characteristic equations, respectively, are

ελ4 − λ2 − ω2

c2= 0 (stiff string),

ε′λ4 − ω2 = 0 (rod).(7)

The elementary solutions for the spatial part X(x) have theform X(x) = C exp(λx). It is important to note that in bothcases, the characteristics equations have the following form:

(λ2 − ξ2

1

)(λ2 − ξ2

2

) = 0, (8)

where ξ1 and ξ2 are, in general, complex numbers that de-pend on ω. Equation (8) allows us to factor both equationsin (4) and (5) as follows:

[∂2

∂x2− ξ2

1

]·[∂2

∂x2− ξ2

2

]Y(x,ω) = 0. (9)

The operator −∂2/∂x2 is selfadjoint with respect to the L2

scalar product [37]. Therefore, (9) can be separated into thefollowing two independent equations:

[∂2

∂x2− ξ2

1

]Y1(x,ω) = 0,

[∂2

∂x2− ξ2

2

]Y2(x,ω) = 0,

(10)

where

Y(x,ω) = Y1(x,ω) + Y2(x,ω). (11)

As we will see, (10) justifies the use, with proper modifica-tions, of a second-order generalized waveguide based on pro-gressive and regressive waves for the numerical simulation ofstiff systems.

2.2. General solution of the stiff stringand bar equations

In this section, we will provide the general solution of (8).The particular eigenfunctions and eigenfrequencies of rodsand stiff strings are determined by proper boundary condi-tions and are treated in Section 2.3. From (7), it can be shownthat

ξ±1 = ±√−√

1 + 4ω2ε/c2 − 12ε

ξ±2 = ±√√

1 + 4ω2ε/c2 + 12ε

(stiff string),

ξ±1 = ±√− ω√

ε′

ξ±2 = ±√

ω√ε′

(rod).

(12)

Note that in both cases, the eigenvalues ξ±1 are complex num-bers, while ξ±2 are real numbers. It is also worth noting that

ξ21 + ξ2

2 =1ε

(stiff string),

ξ21 + ξ2

2 = 0 (rod),(13)

where ξ1 corresponds to the positive choice of the sign infront of the square root in (12) and ξ2 = |ξ±2 |. As expected, ifwe let T → 0, then both sets of eigenvalues of the stiff stringtend to those found for the rod. Using the equations in (12),we then have for both strings and rods

Y1(x,ω) = c+1 exp

(ξ1x

)+ c−1 exp

(− ξ1x),

Y2(x,ω) = c+2 exp

(ξ2x

)+ c−2 exp

(− ξ2x),

(14)

where c±1 , c±2 are, in general, functions of ω. Note thatY1(x,ω) is an oscillating term, while, since ξ2 is real, Y2(x,ω)is nonoscillating. For finite-length strings, both positive andnegative real exponentials are to be retained.


From (12), we see that the primary effect of stiffness isthe dependency on frequency of the argument (from now on,phase) of the solutions of (4) and (5). Therefore, the propa-gation of the wave from one section of the string located at xto the adjacent section located at x+∆x is obtained by multi-plication of a frequency dependent factor exp(ξ1∆x). Conse-quently, the group velocity u, defined as u ≡ (dξ1/dω)−1, alsodepends on frequency. This results in a dispersion of the wavepacket, characterized by the function ξ1(ω), whose modulusis plotted in Figure 1 for the case of a brass string using thefollowing values of the physical parameters r, T , ρ, and E:

r = 1 mm,

T = 9 · 107 dyne,

ρ = 8.44 g cm−3,

E = 9 · 1011 dyne cm−2.

(15)

Clearly, the previous example is a very crude approximationof a physical piano string (e.g., real-life piano strings in thelow register are built out of more than one material and acopper or brass wire is wrapped around a steel core). For thesake of completeness, we give the explicit expression of |u| inboth the cases we are studying. We have

|u| = 2c√(

c2 + 4ω2ε)

√(2c2 ± 2c

√(c2 + 4ω2ε

)) (stiff string),

|u| = 2√ω√ε′ (rod).

(16)

If T → 0, the two group velocities are equal. Moreover, if inthe first line in (16), we let ε → 0, then u → c, which is thelimit case of the ideally flexible vibrating string. These factsfurther justify the use of a dispersive waveguide in the nu-merical simulation. With respect to this point, a remark is inorder: the dispersion introduced by stiffness can be treated asa limiting “nonphysical” consequence of the Euler-Bernoullibeam equation:

d2

dx2

[EI

d2y

dx2

]= p, (17)

where p is the distributed load acting on the beam. It is “non-physical” in the sense that u → ∞ as

√ω. However, in the

discrete-time domain, this “nonphysical” situation is avoidedif we suppose all the signals be bandlimited.

2.3. Complete characterization of stiff stringand rod solution

Boundary conditions for real piano strings lie in between theconditions of clamped extrema:

Y(− L

2,ω)= Y

(L

2,ω)= 0,

∣∣∣∣∂Y(x,ω)∂x

∣∣∣∣−L/2

=∣∣∣∣∂Y(x,ω)

∂x

∣∣∣∣L/2= 0,

(18)

0 0.5 1 1.5 2 2.5×104

Frequency (Hz)

0

0.5

1

1.5

2

2.5

3

3.5

ξ 1(c

m−1

)

Figure 1: Plot of the phase module of the stiff model equation so-lution for ε = π/4 cm2 and c ≈ 2∗ 104 cm s−1.

and of hinged extrema [5, 16, 31, 35, 36]:

Y(− L

2,ω)= Y

(L

2,ω)= 0,

∣∣∣∣∂2Y(x,ω)∂x2

∣∣∣∣−L/2

=∣∣∣∣∂2Y(x,ω)

∂x2

∣∣∣∣L/2= 0.

(19)

Before determining the conditions for the eigenfrequenciesof the considered stiff systems, we find a more compact wayof writing (18) and (19). Starting from the factorized form ofthe stiff systems equation (see (10)), and using the symbolsintroduced in Section 2.2, we have

Y1(x,ω) = ψ+1 (x,ω) + ψ−1 (x,ω),

Y2(x,ω) = ψ+2 (x,ω) + ψ−2 (x,ω),

(20)

where we let

ψ±1 (x,ω) = c±1 exp(ξ±1 x

),

ψ±2 (x,ω) = c±2 exp(ξ±2 x

).

(21)

Conditions (18) can then be rewritten as follows:

Y1

(− L

2,ω)= −Y2

(− L

2,ω)

,

Y1

(L

2,ω)= −Y2

(L

2,ω)

,

∣∣∣∣∂Y1(x,ω)∂x

∣∣∣∣−L/2

= −∣∣∣∣∂Y2(x,ω)

∂x

∣∣∣∣−L/2

,

∣∣∣∣∂Y1(x,ω)∂x

∣∣∣∣L/2= −

∣∣∣∣∂Y2(x,ω)∂x

∣∣∣∣L/2

.

(22)

At the terminations of the string or of the rod, we have

ψ+1 + ψ−1 = −

(ψ+

2 + ψ−2),

ξ+1 ψ

+1 + ξ−1 ψ

−1 = −

(ξ+

2 ψ+2 + ξ−2 ψ

−2

),

(23)


which can be rewritten in matrix form:[1 1ξ+

1 ξ+2

][ψ+

1

ψ+2

]= −

[1 1ξ−1 ξ−2

][ψ−1ψ−2

]. (24)

By left-multiplying both members of (24) for the inverse ofthe

[ 1 1ξ+

1 ξ+2

]matrix, we have

[ψ+

1

ψ+2

]= Sc

[ψ−1ψ−2

], (25)

where we let

Sc ≡

−(ξ+

2 + ξ+1

)ξ+

2 − ξ+1

−2ξ+

2

ξ+2 − ξ+

1

2ξ+

1

ξ+2 − ξ+

1

ξ+2 + ξ+

1

ξ+2 − ξ+

1

. (26)

The matrix Sc relates the incident wave with the reflectedwave at the boundaries. Independently of the roots ξi, it hasthe following properties:

∣∣Sc∣∣ = −1,

S2c =

[1 00 1

].

(27)

In the case of a hinged stiff system (see (19)) at both ends, wehave

ψ+1 + ψ−1 = −

(ψ+

2 + ψ−2),(

ξ+1

)2ψ+

1 +(ξ−1)2ψ−1 = −

((ξ+

2

)2ψ+

2 +(ξ−2)2ψ−2) (28)

which, in matrix form, becomes[

1 1(ξ+

1

)2 (ξ+

2

)2

][ψ+

1

ψ+2

]= −

[1 1(ξ−1)2 (

ξ−2)2

][ψ−1ψ−2

]. (29)

By taking the inverse of matrix[ 1 1

(ξ+1 )2 (ξ+

2 )2

], we obtain

[ψ+

1

ψ+2

]= Sh

[ψ−1ψ−2

], (30)

where

Sh = −[

1 00 1

]. (31)

The Sh matrix for the hinged stiff system is independent ofroots ξi. The matrices Sh and Sc are related in the followingway:

∣∣Sh∣∣ = −∣∣Sc∣∣,

S2h = S2

c .(32)

In conclusion, the boundary conditions for stiff systemscan be expressed in terms of matrices that can be used inthe numerical simulation of stiff systems. Moreover, since thereal-life boundary conditions for stiff strings in piano lie in

between the conditions given in (18) and (19), we can com-bine the two matrices Sc and Sh in order to enforce moregeneral conditions, as illustrated in Section 3. In the follow-ing, we will solve (4) and (5) applying separately these sets ofboundary conditions.

2.3.1. The clamped stiff string and rod

In order to characterize the eigenfunctions in the case of con-ditions (18), in (12) we let

ξ1 = iξ′1 (33)

for both the stiff string and the rod solution. By definition,ξ′1 is a real number. Moreover, for the rod, we have ξ′1 = ξ2.With this position, it can be shown that conditions (18) forthe stiff string lead to the equations [35, 38]

tan(ξ′1L

2

)tanh

(ξ2L

2

)

tanh(ξ2L

2

)− tan

(ξ′1L

2

)[ξ′1ξ2

]=[

00

], (34)

while, for the rod, we have

cos(ξ′1L

)cosh

(ξ2L

) = 1. (35)

Equations (34) and (35) can be solved numerically. In partic-ular, taking into account the second line in (12), solutions of(35) are [35]

ωn = π2

4

(3.0112, 52, 72, . . . , (2n + 1)2)α′2,

α′ =4√ε′

L.

(36)

A similar trend can be obtained for the stiff string. In viewof their historical and practical relevance, we here report thenumerical approximation for the allowed eigenfrequencies ofthe stiff string given by Fletcher [27]:

ωn (nπ

c

L

)√(1 + n2π2α2

)(1 + 2α + 4α2),

α =√ε

L.

(37)

If we expand the above expression in a series of powers ofα truncated to second order, we have the following approxi-mate formula valid for small values of stiffness:

ωn (nπ

c

L

)[1 + 2α +

(1 +

18n2π2

)(2α)2

]. (38)

The last approximation does not apply to bars. For ε = 0, wehave α = 0 and the eigenfrequencies tend to the well-knownformula for the vibrating string [35]:

ωn = nω1. (39)

Typical curves of the relative spacing χn ≡ ∆ωn/ω1, where∆ωn ≡ ωn+1 − ωn, of eigenfrequencies for the stiff string are


r = 3 mm

r = 1 mm

0 10 20 30 40

Partial number

0

1

2

3

4

5

6

7

8

9

Rel

ativ

esp

acin

g

Figure 2: Typical eigenfrequencies relative spacing curves of theclamped stiff string for different values of the radius r of the sec-tion S.

shown in Figure 2 with variable r, where values of the otherphysical parameters are the same as in (15).

Due to the dependency on the frequency of the phase ofthe solution, the eigenfrequencies of the stiff string are notequally spaced. For a small radius r, hence for low degreeof the stiffness of the string (see (1)), the relative spacing isalmost constant for all the considered order of eigenfrequen-cies. However, for higher stiffness, the spacing of the eigen-frequencies increases, in first approximation, as a linear func-tion of the order of the eigenfrequency. The above results aresummarized by the typical “warping curves” of the system,shown in Figure 3, in which the quantity ωn−ωn, which rep-resents the deviation from the linearity, is plotted in terms ofspacing ∆ωn between consecutive eigenfrequencies.

In the stiff string case, we have two sets of eigenfunctions,one having even parity and the other one having odd parity,whose analytical expressions are respectively given by [38]

Y(x,ω)=C(ω) cos(ξ′1L

2

)[cos

(ξ′1x

)cos

(ξ′1(L/2)

)− cosh(ξ2x

)cosh

(ξ2(L/2)

)]

,

Y(x,ω)=C(ω) sin(ξ′1L

2

)[sin

(ξ′1x

)sin

(ξ′1(L/2)

)− sinh(ξ2x

)sinh

(ξ2(L/2)

)]

,

(40)

where C(ω) is a constant that can be calculated imposing theinitial conditions.

2.3.2. Hinged stiff string and rod

Conditions (19) lead to the following sets of equations forthe stiff string:

sin(ξ′1L

)sinh

(ξ2L

) = 0,

ξ′12 + ξ2

2 = 0,(41)

r = 3 mm

r = 1 mm

0 500 1000 1500 2000

Frequency (Hz)

0

500

1000

1500

2000

2500

3000

3500

Dev

iati

onfr

omlin

eari

ty(H

z)

Figure 3: Typical warping curves of the clamped stiff string for dif-ferent values of the radius r of the section S.

and for the rod:

sin(ξ′1L

)sinh

(ξ2L

) = 0. (42)

The second line in (41) has no solutions since both ξ′12 and

ξ22 are real functions. It follows that hinged stiff systems are

only described by (42). In this equation, sinh(ξ2L) = 0 hasno solution, hence the eigenfrequencies are determined bythe condition

ξ′1 =nπ

L. (43)

Using the parameters α′ and α respectively defined in (36)and (37), the eigenfrequencies for the hinged stiff string areexactly expressed as follows:

ωn =(nπ

c

L

)√(n2π2α2 + 1

), (44)

while for the rod, we have

ωn = n2π2α′2. (45)

As the tension T → 0, (44) tends to (45). Figure 4 showsthe relative spacing of the eigenfrequencies in the case of thehinged stiff string.

Relative eigenfrequencies spacing curves are very similarto the ones of the clamped string and so are the “warpingcurves” of the system, as shown in Figure 5.

Using (45), we can give an analytic expression for the rel-ative spacing of the eigenfrequencies of the hinged rod. Wehave

π2α′2(2n + 1). (46)

Equation (43) leads to the following set of odd and even


r = 3 mm

r = 1 mm

0 10 20 30 40

Partial number

0

2

4

6

8

10

Rel

ativ

esp

acin

g

Figure 4: Typical eigenfrequencies relative spacing curves of thehinged stiff string for different values of the radius r of the sectionS.

r = 3 mm

r = 1 mm

0 500 1000 1500 2000

Frequency (Hz)

0

500

1000

1500

2000

2500

3000

3500

4000

Dev

iati

onfr

omlin

eari

ty(H

z)

Figure 5: Typical warping curves of the hinged stiff string for dif-ferent values of the radius r of the section S.

eigenfunctions for the stiff string [38]:

Yn(x,ω) = 2D(ω) sin(

2nπL

x)

,

Yn(x,ω) = 2D(ω) cos(

(2n + 1)πL

x)

,

(47)

where D(ω) must be determined by enforcing the initial con-ditions. It is worth noting that both functions in (47) are in-dependent of the stiffness parameter ε. In Section 3, we willuse the obtained results in order to implement the dispersivewaveguides digitally simulating the solutions of (4) and (5).

Finally, we need to stress the fact that the eigenfrequen-cies of the hinged stiff string are similar to the ones for theclamped case except for the factor (1 + 2α + 4α2). Therefore,for small values of stiffness, they do not differ too much. This

X(z) Y(z)2P delays

z−2P

G(z)(low-pass)

Figure 6: Basic Karplus-Strong delays cascade.

can also be seen from the similarity of the warping curves ob-tained with the two types of boundary conditions.

Taking into account the fact that real-piano stringsboundary conditions lie in between these two cases, we canconclude that the eigenfrequencies of real-piano strings canbe calculated by means of the approximated formula [27, 28]:

ωn An√Bn2 + 1, (48)

where A and B can be obtained from measurements. Approx-imation (48) is useful in order to match measured vibratingmodes against the model eigenfrequencies.

3. NUMERICAL APPROXIMATIONS OF STIFF SYSTEMS

Most of the problems encountered when dealing with thecontinuous-time equation of the stiff string consist in de-termining the general solution and in relating the initialand boundary conditions to the integrating constants of theequation. In this section, we will show that we can use a sim-ilar technique also in discrete-time, which yields a numericaltransform method for the computation of the solution.

In Section 2, we noted that (1) becomes the equation ofvibrating string in the case of negligible stiffness coefficient ε.It is well known that the technique known as Karplus-Strongalgorithm implements the discrete-time domain solution ofthe vibrating string equation [8], allowing us to reach goodquality acoustic results. The block diagram of the adoptedloop circuit is shown in Figure 6.

The transfer function of the digital loop chain can bewritten as follows:

H(z) = 11− z−2PG(z)

, (49)

where the loop filter G(z) takes into account losses due tononrigid terminations and to internal friction, and P is thenumber of sections in which the string is subdivided, as ob-tained from time and space sampling. Loop filters designcan be based on measured partial amplitude and frequencytrajectories [18], or on linear predictive coding (LPC)-typemethods [9]. The filter G(z) can be modelled as IIR or FIRand it must be estimated from samples of the sound or froma model of the string losses, where, for stability, we need|G(e jω)| < 1. Clearly, in the IIR case or in the nonlinear phaseFIR case, the phase response of the loop filter introduces a


u = 0.9

u = −0.9

0 1 2 3

Discrete frequency (Hz)

0

0.5

1

1.5

2

2.5

3

Dis

cret

efr

equ

ency

(Hz)

Figure 7: First-order all-pass phase plotted for various values of u.

limited amount of dispersion. Additional phase terms in theform of all-pass filters can be added in order to tune thestring model to the required pitch [13] and contribute to fur-ther dispersion.

Since the group velocity for a traveling wave for a stiff sys-tem depends on frequency (see (16)), it is natural to substi-tute, in discrete time, the cascade of unit delays with a chainof circuital elements whose phase responses do depend onfrequency. One can show that the only choice that leads torational transfer functions is given by a chain of first-orderall-pass filters [39, 40]. More complex physical systems, forexample, as in the simulation of a monaural room, call forsubstituting the delays chain with a more general filter as il-lustrated in [41]:

A(z,u) = z−1 − u

1− uz−1(50)

whose phase characteristic is

θ(Ω)= Ω + 2 arctanu sin(Ω)

1− u cos(Ω). (51)

The phase characteristics in (51) are plotted in Figure 7 forvarious values of u.

A comparison between the curve in Figure 1 and the onesin Figure 7 gives more elements of plausibility for the approx-imation of the solution phase of the stiff model equations,given in (12), with the all-pass filter phase (51). Adopting asimilar circuital scheme as in the Karplus-String algorithm[10] in which the unit delays are replaced by first-order all-pass filters, the approximation is given by

ξ′1(Ω fs

) P

Lθ(Ω), (52)

X(z) Y(z)2P all-pass cascade

A(z)2P

G(z)(low-pass)

Figure 8: Dispersive waveguide used to simulate dispersive systems.

where fs is the sampling frequency. Note that, by definition,both members of (52) are real numbers. Therefore, in the z-domain, a nonstiff system can be mapped into a stiff systemby means of the frequency warping map

z−1 −→ A(z). (53)

The resulting circuit is shown in Figure 8. Note, that the feed-back all-pass chain results in delay-free loops. Computation-ally, these loops can be resolved by the methods illustrated in[34, 42, 43]. Moreover, the phase response of the loop filterG(z) contributes to the dispersion and it must be taken intoaccount in the global model.

The circuit in Figure 8 can be optimized in order to takeinto account the losses and the coupling amongst strings(e.g., as in piano). In the framework of this paper, we con-fined our interest to the design of the stiff system filter. For areview of the design of lossy filters and coupling models, see[17].

3.1. Stiff system filter parameters determination

Within the framework of the approximation (52) in the caseof dispersive waveguide, the integer parameter P can be ob-tained by constraining the two functions to attain the samevalues at the extrema of the bandwidth. Since θ(π) = π, wehave

P = ξ1(π fs

)L

π. (54)

As we will see, condition (54) is not the only one that can beobtained for the parameter P. The deviation from linearityintroduced by the warping θ(Ω) can be written as follows:

∆(Ω) ≡ θ(Ω)−Ω = 2 arctanu sin(Ω)

1− u cos(Ω). (55)

The function ∆(Ω) is plotted, for different values of u, inFigure 9.

One can see that the absolute value of ∆(Ω) has a max-imum which corresponds to the maximum deviation fromthe linearity of θ(Ω). It can be shown that this maximum oc-curs for

Ω = ΩM = arccos(u) (56)


u = 0.9

u = −0.9

0 1 2 3

Discrete frequency (Hz)

−3

−2

−1

0

1

2

3

Dev

iati

onfr

omlin

eari

ty(H

z)

Figure 9: Plot of the deviation from linearity of the all-pass filterphase for different values of parameter u.

for which the maximum deviation is

∆(ΩM ,u

) = 2 arcsin(u). (57)

Substituting (56) in (51), we have

θ(ΩM

) = π

2+ arcsin(u). (58)

Since the solution phase ξ1 is approximated by θ(Ω), it has tosatisfy the condition

ξ1

(ΩM

T

)L

P π

2+ arcsin(u) (59)

and therefore, we have the following bound on P:

P Lξ1(fs arccos(u)

)π/2 + arcsin(u)

. (60)

For higher-order Q all-pass filters, (60) can be written as fol-lows:

P 1Q

Q∑i=1

ξ1(fs arccos

(ui))L

π/2 + arcsin(ui) . (61)

An optimization algorithm can be used to obtain the vectorparameter u. Based on our experiments, we estimated thatan optimal order Q is 4 for the piano string. Therefore, us-ing the values in (15) for the 58 Hz tone of an L = 200 cmbrass string, we obtain P = 209. Although this is not a modelfor a real-life wound inhomogeneous piano string, this ex-ample gives a rough idea of the typical number of the re-quired all-pass sections. The computation of this long all-pass chain can be too heavy for real-time applications. There-

fore, an approximation of the chain by means of a cascade ofan all-pass of order much smaller than 2P with unit delays isusually sought [13, 29, 30]. A simple and accurate approachis to model the all-pass as a cascade of first-order sectionswith variable real parameter u [38]. However, a more gen-eral approach calls for including in the design second-orderall-pass sections, equivalent to a pair of complex conjugatedfirst-order sections [29]. In Section 4, we will bypass this esti-mation procedure based on the theoretical eigenfunctions ofthe string to estimate the all-pass parameters and the numberof sections from samples of the piano.

3.2. Laguerre sequences

An invertible and orthogonal transform, which is related tothe all-pass chain included in the stiff string model, is givenby the Laguerre transform [44, 45]. The Laguerre sequencesli[m,u] are best defined in the z-domain as follows:

Li(z,u) =√

1− u2

1− uz−1

[z−1 − u

1− uz−1

]i. (62)

Thus, the Laguerre sequences can be obtained from the z-domain recurrence

L0(z,u) =√

1− u2

1− uz−1,

Li+1(z,u) = A(z)Li(z,u),(63)

where A(z) is defined as in (50). Comparison of (62) with(50) shows that the phase of the z transform of the Laguerresequences is suitable for approximating the phase of the solu-tion of the stiff model equation. A biorthogonal generaliza-tion of the Laguerre sequences calling for a variable u fromsection to section is illustrated in [46]. This is linked to therefined approximation of the solution previously shown.

3.3. Initial conditions

Putting together the results obtained in Section 1, we canwrite the solution phase of the stiff model Y(Ω, x) as follows(see (11) and (14)):

Y(ω, x) = c+1 (ω) exp

(iξ′1x

)+ c−1 (ω) exp

(− iξ′1x). (64)

We are now disregarding the transient term due to ξ2 since itdoes not influence the acoustic frequencies of the system. Indiscrete time and space, we let x = m(L/P) as in [10]. Withthe approximation (52), (64) becomes

Y(m,Ω) c+1 (Ω) exp

(imθ(Ω)

)+ c−1 (Ω) exp

(− imθ(Ω)).

(65)Substituting (63) in (65), we have

Y(Ω,m) c+1 (Ω)

Lm(Ω,u)L0(z,u)

+ c−1 (Ω)L−m(Ω,u)L0(z,u)

, (66)

where we have used the fact that

A(eiΩ,u

) = e−iΩ − u

1− ue−iΩ= exp

(iθ(Ω)

). (67)


By defining

V+(Ω) ≡ c+1 (Ω)

L0(z,u), V−(Ω) ≡ c−1 (Ω)

L0(z,u), (68)

(66) can be written as follows:

Y(m,Ω) V+(Ω)Lm(Ω,u) + V−(Ω)L−m(Ω,u). (69)

Taking the inverse discrete-time Fourier transform (IDTFT)on both sides of (69), we obtain

y[m,n] y+[m,n] + y−[m,n], (70)

where

y+[m,n] =∞∑

k=−∞v+[n− k]lm[k,u],

y−[m,n] =∞∑

k=−∞v−[n− k]l−m[k,u],

(71)

and the sequences v±(n) are the IDTFT of V±(Ω). For thesake of conciseness, we do not report here the expression ofv±[n] in terms of constants c±1 . For further details, see [31,38]. The expression of the numerical solution y[m,n] can bewritten in terms of a generic initial condition

y[m, 0] = y+[m, 0] + y−[m, 0]. (72)

In order to do this, we resort to the extension of Laguerresequences to negative arguments:

lm[n,u] =lm[n,u], n ≥ 0,

lm[−n,u], n < 0,(73)

and to the property

lm[n,u] = ln[m,−u]. (74)

If we introduce the quantity

y±k [u] =∞∑

m=0

y±[m, 0]lk [±m,u],

lk [±m,u] = l±m[k,u],

(75)

with a simple mathematical manipulation, (71) can be writ-ten as follows:

y+[m,n] =∞∑

k=−∞y+k [u]lm[k + n,u],

y−[m,n] =∞∑

k=−∞y−k [u]lm[k + n,u].

(76)

Therefore, the numeric solution becomes

y[m,n] =∞∑

k=−∞y+k lm[k + n,u] +

∞∑k=−∞

y−k lm[k + n,u]. (77)

We have just shown that the solution of the discrete-timestiff model equation can be written as a Laguerre expansionof the initial condition. At the same time, this shows that thestiff string model is equivalent to a nonstiff string model cas-caded by frequency warping obtained by Laguerre expansion.

3.4. Boundary conditions

In Section 1, we discussed the stiff model equation bound-ary conditions in continuous time (see (18) and (19)). Inthis section, we will discuss the homogenous boundary con-ditions (i.e., the first line in both (18) and (19)) in thediscrete-time domain. Using approximation (52) and lettingthe number of sections of the stiff system P be an even integer,we can write the homogenous conditions as follows (see also(69)):

Y(− P

2,Ω)= 0

=⇒ V+(Ω)L−P/2(Ω,u) + V−(Ω)LP/2(Ω,u) = 0,

Y(

+P

2,Ω)= 0

=⇒ V+(Ω)LP/2(Ω,u) + V−(Ω)L−P/2(Ω,u) = 0.

(78)

Like (34), (78) can be expressed in matrix form:

[LP/2(Ω,u) L−P/2(Ω,u)L−P/2(Ω,u) LP/2(Ω,u)

][V+(Ω)V−(Ω)

]=[

00

]. (79)

As shown in Section 3.3, the functionsV±(Ω) are determinedby means of Laguerre expansion of the initial conditions se-quences through (71) and (76). For any choice of these initialconditions, the determinant of the coefficients matrix in (79)must be zero, obtaining the following condition:

[LP/2(Ω,u)

]2 − [L−P/2(Ω,u)]2 = 0. (80)

Recalling the z-transform expression for the Laguerre se-quences, we have

sin[θ(Ω)P

] = 0, θ(Ω) = kπ

P, k = 1, 2, 3, . . . . (81)

In the stiff string case, the eigenfrequencies of the system arenot harmonically related. In our approximation of the phaseof the solution with the digital all-pass phase, the harmonic-ity is reobtained at a different level: the displacement of theall-pass phase values is harmonic according to the law writ-ten in (81). The distance between two consecutive values ofthis phase is π/P. Due to the nonrigid terminations, the real-life boundary conditions can be given in terms of frequencydependent functions, which are included in the loop filter.In mapping the stiff structure to a nonstiff one, care must betaken into unwarping the loop filter as well.

4. SYNTHESIS OF SOUND

In order to implement a piano simulation via the physicalmodel, we need to determine the design parameters of the


0 5 10 15 20 25 30

Partial number

0.2

0.3

0.4

0.5

0.6

0.7

War

pin

gpa

ram

eter

Figure 10: Computed all-pass optimized parameters u.

dispersive waveguide, that is, the number of all-pass sectionsand the coefficients ui of the all-pass filters. This task could beperformed by means of lengthy measurements or estimationof the physical variables, such as tension, Young’s module,density, and so forth. However, as we already remarked, dueto the constitutive complexity of the real-life piano stringsand terminations, this task seems to be quite difficult and tolead to inaccurate results. In fact, the given physical modelonly approximately matches the real situation. Indeed, inorder to model and justify the measured eigenfrequencies,we resorted to Fletcher’s experimental model described by(48). However, in that case, we ignore the exact form of theeigenfunctions, which is required in order to determine thenumber of sections of the waveguide and the other param-eters. A more pragmatic and effective approach is to esti-mate the waveguide parameters directly from the measuredeigenfrequencies ωn. These can be extracted, for example,from recorded samples of notes played by the piano underexam. Fletcher’s parameters A and B can be calculated asfollows:

A = 12n

√16ω2

n − ω22n

3,

B = 1n2

4γ2 − 11− 16γ2

, γ = ωn

ω2n.

(82)

In practice, in the model where the all-pass parametersui are equal throughout the delay line, one does not evenneed to estimate Fletcher’s parameters. In fact, in view of theequivalence of the stiff string model with the warped non-stiff model, one can directly determine, through optimiza-tion, the parameter u that makes the dispersion curve of theeigenfrequencies the closest to a straight line, using a suitabledistance. A result of this optimization is shown in Figure 10.It must be pointed out that our point of view differs fromthe one proposed in [29, 30], where the objective is the min-

0 20 40 60 80

Partial number

20

40

60

80

100

120

140

160

180

Spac

ing

ofth

epa

rtia

ls(H

z)

Figure 11: Warped deviation from linearity.

0 10 20 30 40 50

Partial number

0.016

0.017

0.018

0.019

0.02

0.021

0.022N

orm

aliz

edfr

equ

ency

Figure 12: Optimized all-pass parameters u for A#3 tone.

imization of the number of nontrivial all-pass sections in thecascade.

Given the optimum warping curve, the number of sec-tions is then determined by forcing the pitch of the cascadeof the nonstiff model (Karplus-Strong like) with warping tomatch the required fundamental frequency of the recordedtone. An example of this method is shown in Figure 11,where the measured warping curves pertaining to several pi-ano keys in the low register, as estimated from the resonanteigenfrequencies, are shown. In Figure 12, the optimum se-quence of all-pass parameters u for the examined tones isshown. Finally, in Figure 13, the plot of the regularized dis-persion curves by means of optimum unwarping is shown.For further details about this method, see [47, 48, 49]. Fre-quency warping has also been employed in conjunction with2D waveguide meshes in the effort of reducing the artificial


0 10 20 30 40

Partial number

0

50

100

150

200

250

Freq

uen

cy(H

z)

Figure 13: Optimum unwarped regularized dispersion curves.

dispersion introduced by the nonisotropic spatial sampling[50]. Since the required warping curves do not match thefirst-order all-pass phase characteristic, in order to overcomethis difficulty, a technique including resampling operatorshas been used in [50, 51] according to a scheme first in-troduced in [33] and further developed in [52] for thewavelet transforms. However, the downsampling operatorsinevitably introduce aliasing. While in the context of wavelettransforms, this problem is tackled with multichannel filterbanks, this is not the case of 2D waveguide meshes.

5. CONCLUSIONS

In order to support the design and use of digital dispersivewaveguides, we reviewed the physical model of stiff systems,using a frequency domain approach in both continuous anddiscrete time. We showed that, for dispersive propagation inthe discrete-time, the Laguerre transform allows us to writethe solution of the stiff model equation in terms of an or-thogonal expansion of the initial conditions and to reob-tain harmonicity at the level of the displacement of the all-pass phase values. Consequently, we showed that the stiffstring model is equivalent to a nonstiff string model cas-caded with frequency warping, in turn obtained by Laguerreexpansion. Finally, we showed that due to this equivalence,the all-pass coefficients can be computed by means of opti-mization algorithms of the stiff model with a warped nonstiffone.

The exploration of physical models of musical instru-ments requires mathematical or physical approximations inorder to make the problem treatable. When available, thesolutions will only partially reflect the ensemble of mechan-ical and acoustic phenomena involved. However, the phys-ical models serve as a solid background for the construc-tion of physically inspired models, which are flexible nu-merical approximations of the solutions. Per se, these ap-proximations are interesting for the synthesis of virtual in-

struments. However, in order to fine tune the physically in-spired models to real instruments, one needs methods forthe estimation of the parameters from samples of the instru-ment. In this paper, we showed that dispersion from stiff-ness is a simple case in which the solution of the raw phys-ical model suggests a discrete-time model, which is flexibleenough to be used in the synthesis and which provides real-istic results when the characteristics are estimated from thesamples.

REFERENCES

[1] B. L. Vercoe, W. G. Gardner, and E. D. Scheirer, “Structuredaudio: creation, transmission, and rendering of parametricsound representations,” Proceedings of the IEEE, vol. 86, no.5, pp. 922–940, 1998.

[2] P. Cook, “Physically informed sonic modeling (PhISM): syn-thesis of percussive sounds,” Computer Music Journal, vol. 21,no. 3, pp. 38–49, 1997.

[3] L. Hiller and P. Ruiz, “Synthesizing musical sounds by solvingthe wave equation for vibrating objects: Part I,” Journal of theAudio Engineering Society, vol. 19, no. 6, pp. 462–470, 1971.

[4] L. Hiller and P. Ruiz, “Synthesizing musical sounds by solvingthe wave equation for vibrating objects: Part II,” Journal of theAudio Engineering Society, vol. 19, no. 7, pp. 542–551, 1971.

[5] A. Chaigne and A. Askenfelt, “Numerical simulations of pi-ano strings. I. A physical model for a struck string using finitedifference methods,” Journal of the Acoustical Society of Amer-ica, vol. 95, no. 2, pp. 1112–1118, 1994.

[6] A. Chaigne and A. Askenfelt, “Numerical simulations of pianostrings. II. Comparisons with measurements and systematicexploration of some hammer-string parameters,” Journal ofthe Acoustical Society of America, vol. 95, no. 3, pp. 1631–1640,1994.


[8] D. A. Jaffe and J. O. Smith III, “Extensions of the Karplus-Strong plucked-string algorithm,” The Music Machine, C.Roads, Ed., pp. 481–494, MIT Press, Cambridge, Mass, USA,1989.

[9] J. O. Smith III, Techniques for digital filter design and sys-tem identification with application to the violin, Ph.D. the-sis, Electrical Engineering Department, Stanford University(CCRMA), Stanford, Calif, USA, June 1983.

[10] K. Karplus and A. Strong, “Digital synthesis of plucked-stringand drum timbres,” The Music Machine, C. Roads, Ed., pp.467–479, MIT Press, Cambridge, Mass, USA, 1989.

[11] J. O. Smith III, “Physical modeling using digital waveguides,”Computer Music Journal, vol. 16, no. 4, pp. 74–91, 1992.

[12] J. O. Smith III, “Physical modeling synthesis update,” Com-puter Music Journal, vol. 20, no. 2, pp. 44–56, 1996.

[13] S. A. Van Duyne and J. O. Smith III, “A simplified approach tomodeling dispersion caused by stiffness in strings and plates,”in Proc. 1994 International Computer Music Conference, pp.407–410, Aarhus, Denmark, September 1994.

[14] J. O. Smith III, “Principles of digital waveguide models ofmusical instruments,” in Applications of Digital Signal Pro-cessing to Audio and Acoustics, M. Kahrs and K. Branden-burg, Eds., pp. 417–466, Kluwer Academic Publishers, Boston,Mass, USA, 1998.

[15] M. Karjalainen, T. Tolonen, V. Valimaki, C. Erkut, M. Laur-son, and J. Hiipakka, “An overview of new techniques andeffects in model-based sound synthesis,” Journal of New Mu-sic Research, vol. 30, no. 3, pp. 203–212, 2001.


[16] J. Bensa, S. Bilbao, R. Kronland-Martinet, and J. O. Smith III,“The simulation of piano string vibration: from physicalmodels to finite difference schemes and digital waveguides,”Journal of the Acoustical Society of America, vol. 114, no. 2, pp.1095–1107, 2003.



[19] J. O. Smith III, “Efficient synthesis of stringed musical instru-ments,” in Proc. 1993 International Computer Music Confer-ence, pp. 64–71, Tokyo, Japan, September 1993.

[20] M. Karjalainen, V. Valimaki, and Z. Janosy, “Towards high-quality sound synthesis of the guitar and string instruments,”in Proc. 1993 International Computer Music Conference, pp.56–63, Tokyo, Japan, September 1993.

[21] G. Borin and G. De Poli, “A hysteretic hammer-string inter-action model for physical model synthesis,” in Proc. NordicAcoustical Meeting, pp. 399–406, Helsinki, Finland, June 1996.

[22] G. E. Garnett, “Modeling piano sound using digital waveg-uide filtering techniques,” in Proc. 1987 International Com-puter Music Conference, pp. 89–95, Urbana, Ill, USA, August1987.

[23] J. O. Smith III and S. A. Van Duyne, “Commuted piano syn-thesis,” in Proc. 1995 International Computer Music Confer-ence, pp. 319–326, Banff, Canada, September 1995.

[24] S. A. Van Duyne and J. O. Smith III, “Developments for thecommuted piano,” in Proc. 1995 International Computer Mu-sic Conference, pp. 335–343, Banff, Canada, September 1995.

[25] M. Karjalainen and J. O. Smith III, “Body modeling tech-niques for string instrument synthesis,” in Proc. 1996 Interna-tional Computer Music Conference, pp. 232–239, Hong Kong,August 1996.

[26] M. Karjalainen, V. Valimaki, and T. Tolonen, “Plucked-stringmodels, from the Karplus-Strong algorithm to digital waveg-uides and beyond,” Computer Music Journal, vol. 22, no. 3, pp.17–32, 1998.

[27] H. Fletcher, “Normal vibration frequencies of a stiff pianostring,” Journal of the Acoustical Society of America, vol. 36,no. 1, pp. 203–209, 1964.

[28] H. Fletcher, E. D. Blackham, and R. Stratton, “Quality of pi-ano tones,” Journal of the Acoustical Society of America, vol.34, no. 6, pp. 749–761, 1962.

[29] D. Rocchesso and F. Scalcon, “Accurate dispersion simulationfor piano strings,” in Proc. Nordic Acoustical Meeting, pp. 407–414, Helsinki, Finland, June 1996.

[30] D. Rocchesso and F. Scalcon, “Bandwidth of perceived inhar-monicity for physical modeling of dispersive strings,” IEEETrans. Speech and Audio Processing, vol. 7, no. 5, pp. 597–601,1999.

[31] I. Testa, G. Evangelista, and S. Cavaliere, “A physical modelof stiff strings,” in Proc. Institute of Acoustics (Internat. Symp.on Music and Acoustics), vol. 19, pp. 219–224, Edinburgh, UK,August 1997.

[32] S. Cavaliere and G. Evangelista, “Deterministic least squaresestimation of the Karplus-Strong synthesis parameter,” inProc. International Workshop on Physical Model Synthesis, pp.15–19, Firenze, Italy, June 1996.

[33] G. Evangelista and S. Cavaliere, “Discrete frequency warpedwavelets: theory and applications,” IEEE Trans. Signal Process-ing, vol. 46, no. 4, pp. 874–885, 1998.

[34] A. Harma, M. Karjalainen, L. Savioja, V. Valimaki, U. K.Laine, and J. Huopaniemi, “Frequency-warped signal process-ing for audio applications,” Journal of the Audio EngineeringSociety, vol. 48, no. 11, pp. 1011–1031, 2000.

[35] N. H. Fletcher and T. D. Rossing, Principles of Vibration andSound, Springer-Verlag, New York, NY, USA, 1995.

[36] L. D. Landau and E. M. Lifsits, Theory of Elasticity, EditionsMir, Moscow, Russia, 1967.

[37] N. Dunford and J. T. Schwartz, Linear Operators. Part 2: Spec-tral Theory, Self Adjoint Operators in Hilbert Space, John Wiley& Sons, New York, NY, USA, 1st edition, 1963.

[38] I. Testa, Sintesi del suono generato dalle corde vibranti: un al-goritmo basato su un modello dispersivo, Physics degree thesis,Universita Federico II di Napoli, Napoli, Italy, 1997.

[39] H. W. Strube, “Linear prediction on a warped frequencyscale,” Journal of the Acoustical Society of America, vol. 68, no.4, pp. 1071–1076, 1980.

[40] J. A. Moorer, “The manifold joys of conformal mapping: ap-plications to digital filtering in the studio,” Journal of the AudioEngineering Society, vol. 31, no. 11, pp. 826–841, 1983.

[41] J.-M. Jot and A. Chaigne, “Digital delay networks for design-ing artificial reverberators,” in Proc. 90th Convention AudioEngineering Society, Paris, France, preprint no. 3030, Febru-ary, 1991.

[42] M. Karjalainen, A. Harma, and U. K. Laine, “Realizablewarped IIR filters and their properties,” in Proc. IEEE Interna-tional Conference on Acoustics, Speech, and Signal Processing,vol. 3, pp. 2205–2208, Munich, Germany, April 1997.

[43] A. Harma, “Implementation of recursive filters having delayfree loops,” in Proc. IEEE International Conference on Acous-tics, Speech, and Signal Processing, vol. 3, pp. 1261–1264, Seat-tle, Wash, USA, May 1998.

[44] P. W. Broome, “Discrete orthonormal sequences,” Journal ofthe ACM, vol. 12, no. 2, pp. 151–168, 1965.

[45] A. V. Oppenheim, D. H. Johnson, and K. Steiglitz, “Computa-tion of spectra with unequal resolution using the fast Fouriertransform,” Proceedings of the IEEE, vol. 59, pp. 299–301,1971.

[46] G. Evangelista and S. Cavaliere, “Audio effects based onbiorthogonal time-varying frequency warping,” EURASIPJournal on Applied Signal Processing, vol. 2001, no. 1, pp. 27–35, 2001.

[47] G. Evangelista and S. Cavaliere, “Auditory modeling via fre-quency warped wavelet transform,” in Proc. European Sig-nal Processing Conference, vol. I, pp. 117–120, Rhodes, Greece,September 1998.

[48] G. Evangelista and S. Cavaliere, “Dispersive and pitch-synchronous processing of sounds,” in Proc. Digital AudioEffects Workshop, pp. 232–236, Barcelona, Spain, November1998.

[49] G. Evangelista and S. Cavaliere, “Analysis and regulariza-tion of inharmonic sounds via pitch-synchronous frequencywarped wavelets,” in Proc. 1997 International Computer Mu-sic Conference, pp. 51–54, Thessaloniki, Greece, September1997.

[50] L. Savioja and V. Valimaki, “Reducing the dispersion er-ror in the digital waveguide mesh using interpolation andfrequency-warping techniques,” IEEE Trans. Speech and AudioProcessing, vol. 8, no. 2, pp. 184–194, 2000.

[51] L. Savioja and V. Valimaki, “Multiwarping for enhancing thefrequency accuracy of digital waveguide mesh simulations,”IEEE Signal Processing Letters, vol. 8, no. 5, pp. 134–136, 2001.

[52] G. Evangelista, Dyadic Warped Wavelets, vol. 117 of Advancesin Imaging and Electron Physics, Academic Press, NY, USA,2001.


I. Testa was born in Napoli, Italy, onSeptember 21, 1973. He received the Lau-rea in Physics from University of Napoli“Federico II” in 1997 with a dissertation onphysical modeling of vibrating strings. Inthe following years, he has been engagedin the didactics of physics research, in thefield of secondary school teacher trainingon the use of computer-based activities andin teaching computer architecture for theinformation sciences course. He is currently teaching “electronicsand telecommunications” at the Vocational School, Galileo Fer-raris, Napoli.

G. Evangelista received the Laurea inphysics (with the highest honors) from theUniversity of Napoli, Napoli, Italy, in 1984and the M.S. and Ph.D. degrees in electri-cal engineering from the University of Cal-ifornia, Irvine, in 1987 and 1990, respec-tively. Since 1995, he has been an AssistantProfessor with the Department of PhysicalSciences, University of Napoli “Federico II”.From 1998 to 2002, he was a Scientific Ad-junct with the Laboratory for Audiovisual Communications, SwissFederal Institute of Technology, Lausanne, Switzerland. From 1985to 1986, he worked at the Centre d’Etudes de Mathematique etAcoustique Musicale (CEMAMu/CNET), Paris, France, where hecontributed to the development of a DSP-based sound synthesissystem, and from 1991 to 1994, he was a Research Engineer atthe Microgravity Advanced Research and Support Center, Napoli,where he was engaged in research in image processing appliedto fluid motion analysis and material science. His interests in-clude digital audio, speech, music, and image processing; coding;wavelets and multirate signal processing. Dr. Evangelista was a re-cipient of the Fulbright Fellowship.

S. Cavaliere received the Laurea in elec-tronic engineering (with the highest hon-ers) from the University of Napoli “FedericoII”, Napoli, Italy, in 1971. Since 1974, he hasbeen with the Department of Physical Sci-ences, University of Napoli, first as a Re-search Associate and then as an AssociateProfessor. From 1972 to 1973, he was withCNR at the University of Siena. In 1986, hespent an academic year at the Media Lab-oratory, Massachusetts Institute of Technology, Cambridge. From1987 to 1991, he received a research grant for a project devoted tothe design of VLSI chips for real-time sound processing and for therealization of the Musical Audio Research Station, workstation forsound manipulation, IRIS, Rome, Italy. He has also been a ResearchAssociate with INFN for the realization of very-large systems fordata acquisition from nuclear physics experiments (KLOE in Fras-cati and ARGO in Tibet) and for the development of techniques forthe detection of signals in high-level noise in the Virgo experiment.His interests include sound and music signal processing, in partic-ular for the Web, signal transforms and representations, VLSI, andspecialized computers for sound manipulation.


Digital Waveguides versus Finite Difference Structures:Equivalence and Mixed Modeling

Matti KarjalainenLaboratory of Acoustics and Audio Signal Processing, Helsinki University of Technology, 02150 Espoo, FinlandEmail: [email protected]

Cumhur ErkutLaboratory of Acoustics and Audio Signal Processing, Helsinki University of Technology, 02150 Espoo, FinlandEmail: [email protected]

Received 30 June 2003; Revised 4 December 2003

Digital waveguides and finite difference time domain schemes have been used in physical modeling of spatially distributed systems.Both of them are known to provide exact modeling of ideal one-dimensional (1D) band-limited wave propagation, and both ofthem can be composed to approximate two-dimensional (2D) and three-dimensional (3D) mesh structures. Their equal capabil-ities in physical modeling have been shown for special cases and have been assumed to cover generalized cases as well. The abilityto form mixed models by joining substructures of both classes through converter elements has been proposed recently. In thispaper, we formulate a general digital signal processing (DSP)-oriented framework where the functional equivalence of these twoapproaches is systematically elaborated and the conditions of building mixed models are studied. An example of mixed modelingof a 2D waveguide is presented.

Keywords and phrases: acoustic signal processing, hybrid models, digital waveguides, scattering, FDTD model structures.

1. INTRODUCTION

Discrete-time simulation of spatially distributed acoustic sys-tems for sound and voice synthesis finds its roots both inmodeling of speech production and musical instruments.The Kelly-Lochbaum vocal tract model [1] introduced a one-dimensional transmission line simulation of speech produc-tion with two-directional delay lines and scattering junc-tions for nonhomogeneous vocal tract profiles. Delay sec-tions discretize the d’Alembert solution of the wave equa-tion [2] and the scattering junctions implement the acous-tic continuity laws of pressure and volume velocity in a tubeof varying diameter. Further simplification led to the synthe-sis models used as the basis for linear prediction of speech[3].

A similar modeling approach to musical instruments,such as string and wind instruments, was formulated laterand named the technique of digital waveguides (DWGs)[4, 5]. For computational efficiency reasons, in DWGs two-directional delay lines are often reduced to single delay loops[6]. DWGs have been further discussed in two-dimensional(2D) and three-dimensional (3D) modeling [5, 7, 8, 9, 10],combined sometimes with a finite difference approach intoDWG meshes.

Finite difference schemes [11] were introduced to thesimulation of vibrating string as a numerical integration so-lution of the wave equation [12, 13], and the approach hasbeen developed further for example in [14] as a finite differ-ence time domain (FDTD) simulation. The second-order fi-nite difference scheme including propagation losses was for-mulated as a digital filter structure in [15], and its stabilityissues were discussed in [16]. This particular structure is themain focus of the finite difference discussions in the rest ofthis paper and we will refer to it as the FDTD model struc-ture.

DWG and FDTD approaches to discrete-time simula-tion of spatially distributed systems show a high degreeof functional equivalence. As discussed in [5], in the one-dimensional band-limited case, the ideal wave propagationcan be exactly modeled by both methods. The basic differ-ence is that the FDTD model structures process the signalsas they are, whereas DWGs process their wave decompo-sition. There are other known differences between DWGsand FDTD model structures. One of them is the instabil-ities (“spurious” responses) found in FDTD model struc-tures, but not in DWGs, to specific excitations. Another dif-ference is the numeric behavior in finite precision computa-tion.

Digital Waveguides versus Finite Difference Structures 979

Comparison of these two different paradigms has beendeveloped further in [10, 17, 18]. In [17], the interesting andimportant possibility of building mixed models with sub-models of DWG and FDTD types was introduced and gen-eralized to elements with arbitrary wave impedances in [18].The problem of functional comparison and compatibilityanalysis has remained, however, and is the topic of this pa-per.

The rest of the paper is organized as follows. Section 2provides the background information and notation that willbe used in the following sections. A summary of wave-based modeling and finite difference modeling is also in-cluded in this section. Section 3 provides the derivation ofthe FDTD model structures, including the source terms, scat-tering, and the continuity laws. Based on the wave equationin the acoustical domain, this section highlights the func-tional equivalence of DWGs and FDTD model structures. Italso presents a way of building mixed models. The formalproofs of equivalence are provided in “Appendix.” Section 4is devoted to real-time implementation of mixed models. Fi-nally, Section 5 draws conclusions and indicates future direc-tions.

2. BACKGROUND

Sound synthesis algorithms that simulate spatially dis-tributed acoustic systems usually provide discrete-time so-lutions to a hyperbolic partial differential equation, thatis, the wave equation. According to the domain of simula-tion, the variables correspond to different physical quanti-ties. The physical variables may further be characterized bytheir mathematical nature. An across variable is defined hereto describe a difference between two values of an irrotationalpotential function (a function that integrates or sums up tozero over closed trajectories), whereas a through variable isdefined here to describe a solenoidal function (a quantitythat integrates or sums-up to zero over closed surfaces). Forexample in the acoustical domain, the deviation from thesteady-state pressure p(x, t) is an across variable and the vol-ume velocity u(x, t) is a through variable, where x is the spa-tial vector variable and t is the temporal scalar variable. Sim-ilarly, in the mechanical domain, the across variable is theforce and the through variable is the velocity. The ratio of thethrough and across variables yields the impedance Z. The ad-mittance is the inverse of Z, that is, Y = 1/Z.

In a one-dimensional (1D) medium, the spatial vectorvariable reduces to a scalar variable x, so that in a homo-geneous, lossless, unbounded, and source-free medium thewave equation is written

ytt = c2yxx, (1)

where y is a physical variable, subscript tt refers to the secondpartial derivative in time t, xx to the second partial deriva-tive in spatial variable x, and c is speed of wavefront in themedium of interest. For example in the mechanical domain(e.g., vibrating string) we are primarily interested in transver-

sal wave motion for which c =√T/µ, where T is tension force

and µ is mass per unit length of the string [2]. The impedanceis closely related to the tension T , mass density µ, and the

propagation speed c and is given by Z =√Tµ = T/c. In

the acoustical domain, the admittance is also related to theacoustical propagation speed c. For instance, the admittanceof a tube with a constant cross-section area A is given by

Y = A

ρc, (2)

where ρ is the gas density in the tube.The two common forms of discretizing the wave equa-

tion for numerical simulation are through traveling wave so-lution and by finite difference formulation.

2.1. Wave-based modeling

The traveling wave formulation is based on the d’Alembertsolution of propagation of two opposite direction waves, thatis,

y(x, t) = →y(x − ct) +

←y(x + ct). (3)

Here, the arrows denote the right-going and the left-goingcomponents of the total waveform. Assuming that the signalsare bandlimited to half of sampling rate, we may sample thetraveling waves without losing any information by selectingT as the sample interval and X the position interval betweensamples so that T = X/c. Sampling is applied in a discretetime-space grid in which n and k are related to time and po-sition, respectively. The discretized version of (3) becomes[5]:

y(k,n) = →y(k − n) +

←y(k + n). (4)

It follows that the wave propagation can be computed by up-dating state variables in two delay lines by

→yk,n+1 =

→yk−1,n,

←yk,n+1 =

←yk+1,n, (5)

that is, by simply shifting the samples to the right and left,respectively. The shift is implemented with a pair of delaylines, and this kind of discrete-time modeling is called DWGmodeling [5]. Since the physical variables are split into di-rectional wave components, we will refer to such models asW-models. According to (3) or (4), a single physical variable(either through or across) is computed by summing the trav-eling waves, whereas the other one may be computed implic-itly via the impedance.

If the medium is nonhomogeneous, then the admittancevaries as a function of the spatial variable. In this case, the en-ergy transfer between the wave components should be com-puted according to Kirchhoff-type of continuity laws, ensur-ing that the total energy is preserved. These laws may be de-rived utilizing the irrotational and solenoidal nature of acrossand through variables, respectively. In the DWG equivalent,the change in Y across a junction of the waveguide sec-tions causes scattering and the scattering junctions of inter-connected ports, with given admittances and wave variables,


have to be formulated [5]. For instance, in a parallel junc-tion of N waveguides in the acoustical domain, the Kirchhoffconstraints are

P1 = P2 = · · · = PN = PJ ,

U1 + U2 + · · · + UN + Uext = 0,(6)

where Pi and Ui are the total pressure and volume velocityof the ith branch1, respectively, PJ is the common pressureof coupled branches, and Uext is an external volume veloc-ity to the junction. Such a junction is illustrated in Figure 1.When port pressures are represented by incoming wave com-ponents P+

i , outgoing wave components by P−i , admittancesattached to each port by Yi, and

Pi = P+i + P−i , U+

i = YiP+i , (7)

the junction pressure PJ can be obtained as

PJ = 1Ytot

(Uext + 2

N∑i=1

YiP+i

), (8)

where Ytot =∑N

i=1 Yi is the sum of all admittances to thejunction. Outgoing pressure waves are obtained from (7) toyield P−i = PJ − P+

i . The resulting junction, a W-node, isdepicted in Figure 2. The delay lines or termination admit-tances (see appendix) are connected to the W-ports of a W-node.

A useful addition to DWG theory is to adopt wave digitalfilters (WDF) [10, 19] as discrete-time simulators of lumpedparameter elements. Being based on W-modeling, they arecomputationally compatible with the W-type DWGs [10, 18,20].

2.2. Finite difference modeling

In the most commonly used way to discretize the wave equa-tion by finite differences, the partial derivatives in (1) are ap-proximated by centered differences. The centered differenceapproximation to the spatial partial derivative yx is given by[11]

yx ≈(y(x + ∆x/2, t)− y(x − ∆x/2, t)

)∆x

, (9)

where ∆x is the spatial sampling interval. A similar expres-sion is obtained for the temporal partial derivative, if x iskept constant and t is replaced by t ± ∆t, where ∆t is thediscrete-time sampling interval. Iterating the difference ap-proximations, second-order partial derivatives in (1) are ap-proximated by

yxx ≈(yx+∆x,t − 2yx,t + yx−∆x,t

)∆x2 ,

ytt ≈(yx,t+∆t − 2yx,t + yx,t−∆t

)∆t2 ,

(10)

1Note that capital letters denote a transform variable. For instance, Pi isthe z-transform of the signal pi(n).

Uext

Y1

Y2

Y3

Yn

PJ

P−3

P+3

P−2

P+2

P−1

P+1

Figure 1: Parallel junction of admittances Yi with associated pres-sure waves indicated. A volume velocity input Uext is also attached.

where the short-hand notation yx,t is used instead of y(x, t).By selecting ∆t = ∆x/c, and using index notation k = x/∆xand n = t/∆t, (10) result in

yk,n+1 = yk−1,n + yk+1,n − yk,n−1. (11)

From (11) we can see that a new sample yk,n+1 at position kand time index n+ 1 is computed as the sum of its neighbor-ing position values minus the value at the position itself onesample period earlier. Since yk,n+1 is a physical variable, wewill refer to models based on finite differences as K-models,with a reference to Kirchhoff type of physical variables.

3. FORMULATION OF THE FDTD MODEL STRUCTURE

The equivalence of the traveling wave and the finite differencesolution of the ideal wave equation (given in (5) and (11), re-spectively) has been shown, for instance, in [5]. Based on thisfunctional equivalence, (11) has been previously expandedwithout a formal derivation to a scattering junction with ar-bitrary port impedances, where (8) is used as a template forthe expansion [18]. The resulting FDTD model structure isillustrated in Figure 3 for a three-port junction. A compari-son of the FDTD model structure in Figure 3 and the DWGscattering junction in Figure 2 reveals the functional simi-larities of the two methods. However, a formal, generalized,and unified derivation of the FDTD model structure with-out an explicit reference to the DWG method remains tobe presented. This section presents such a derivation basedon the equations of motion of the gas in a tube. Note that,because of the analogy between different physical domains,once the formulation is derived, it can be used in differentdomains as well. Therefore, the derivation below is not lim-ited to the acoustical domain and the resulting structure canalso be used in other domains.

3.1. Source terms

In order to explain the excitation Uext and the associated filterH(z) = 1 − z−2 in Figure 3, we consider a piece of tube of


W-a

dmit

tan

ce

W-p

ort

10

Y1

P+1

P−1

Uext P+3 P−3Y3

W-port 3

Y3

2Y1

1∑Yi

PJ

+

+

−

+

−+

+

2 Y2

2−

+

+

W-nodeN1

P+2

W-p

ort

2

P−2z−N

W-line

z−N

Y2

(a)

Y1

Y1 w w N1

PJ

w

Uext

ww

Y2

W-line w w NN w w

YN

W-line w

(b)

Figure 2: (a) N-port scattering junction (three ports are shown) of ports with admittances Yi. Incoming and outgoing pressure waves areP+i and P−i , respectively. W-port 1 is terminated by admittance Y1. (b) Abstract representation of the W-node in (a).

K-a

dmit

tan

ce

K-p

ort

1

z−1

Y1z−2

+

−

Y1

Uext

2

+

Y3K-port 3

Y3

2

z−1 z−1

1∑Yi

PJ

+−

2 Y2

K-nodeN1

K-p

ort

2

K-p

ipe

Y2

(a)

Y1

Y1 k k N1

PJ

k

Uext

kk

Y2

K-pipe k k NN k k

YN

K-pipe k

(b)

Figure 3: (a) Digital filter structure for finite difference approximation of a three-port scattering node with port admittances Yi. Only totalvelocity PJ (K-variable) is explicitly available. (b) Abstract representation of the K-node in (a).


constant cross-sectional area A that includes an ideal volumevelocity source s(t). The pressure p and volume velocity u(the variables in the acoustical domain, as explained in theprevious section) satisfy the following PDE set:

ρ∂u

∂t+ A

∂p

∂x= 0

A

ρc2

∂p

∂t+∂u

∂x= s, (12)

where ρ is the gas density and c is the propagation speed.This set may be combined to yield a single PDE in p and thesource term

∂2p

∂t2− ρc2

A

∂s

∂t= c2 ∂

2p

∂x2. (13)

Defining

s(t) = 12

(s

(t − ∆t

2

)+ s

(t +

∆t

2

))+ O

(∆t2), (14)

using index notation k = x/∆x and n = t/∆t, and applyingcentered differences (see Section 2.2) to (13) with ∆x/∆t = cyields the following difference equation

pk(n + 1) = pk+1(n) + pk−1(n)− pk(n− 1)

+ρc∆x

2A

(sk(n + 1)− sk(n− 1)

).

(15)

Note that ρc/A is the acoustic impedance that converts thevolume velocity source s(t) to the pressure. Since the modeloutput is the pressure at the time step n + 1, it follows thatthe source is delayed two samples, subtracted from its currentvalue, and scaled, corresponding to the filter 1− z−2 for Uext

in Figure 3.

3.2. Admittance discontinuity and scattering

Now consider an unbounded, source-free tube with a cross-section A(x) that is a smooth real function of spatial variablex. In this case, the governing PDEs can be combined into asingle PDE in the pressure alone [10],

∂2p

∂t2= c2

A(x)∂

∂x

(A(x)

∂p

∂x

)(16)

which is the Webster horn equation. Discretizing this equa-tion by centered differences yields the following differenceequation

pk(n + 1)− 2pk(n) + pk(n− 1)∆t2

= c2

Ak

Ak+1/2(pk+1(n)− pk(n)

)− Ak−1/2(pk(n)− pk−1(n)

)∆x2

,

(17)

where Ak = A(k∆x). By selecting ∆x = c∆t and using theapproximation

Ak = 12

(Ak−1/2 + Ak+1/2

)+ O

(∆x2) (18)

twice, (17) becomes

pk(n + 1) + pk(n− 1)

= 2Ak−1/2 + Ak+1/2

(Ak−1/2pk−1(n) + Ak+1/2pk+1(n)

).

(19)

Finally, by defining Yk−1 = Ak−1/2/ρc we obtain

pk(n + 1) + pk(n− 1)

= 2Ytot

(Yk−1pk−1(n) + Yk+1pk+1(n)

),

(20)

where the term Ytot = Yk−1 + Yk+1 may be interpreted asthe sum of all admittances connected to the kth cell. Thisrecursion is implemented with the filter structure illustratedin Figure 4. The output of the structure is the junction pres-sure pJ ,k(n). It is worth to note that (20) is functionally thesame as the DWG scattering representation given in (8), ifthe admittances are real. A more general case of complex ad-mittances has been considered in the appendix. Whereas theDWG formulation can easily be extended to N-port junc-tions, this extension is not necessarily possible for a K-model,where the continuity laws are generally not satisfied. In thenext subsection, we investigate the continuity laws within theFDTD model structure.

3.3. Continuity laws

We denote the pressure across the impedance 1/∑Yi as

pa(n), and the volume velocity through the same impedanceas ut(n), with a reference to Figure 4. According to these no-tations, Ohm’s law in the acoustical domain yields

pa(n) = ut(n)Ytot

, (21)

whereas the Kirchhoff continuity laws can be written as

pa(n) = pk(n + 1) + pk(n− 1), (22)

ut(n) = 2Yk−1pk−1(n) + 2Yk+1pk+1(n). (23)

Inserting (21) into (23) eliminates ut(n), and the result maybe combined with (22) to give the following equation forcombined continuity laws:

pk(n + 1) + pk(n− 1)

= 2Ytot

(Yk−1pk−1(n) + Yk+1pk+1(n)

).

(24)

This relation is exactly the recursion of the FDTD modelstructure given in (20), but obtained here solely from thecontinuity laws. We thus conclude that the continuity lawsare automatically satisfied by the FDTD model structure ofFigure 4.

It is worth to note that more ports may be added to thestructure without violating the continuity laws for any num-ber of linear, time-invariant (LTI) admittances, as long asYtot =

∑Yi. For N ports connected to the ith cell, (23) be-

comes

Ut = 2N∑i=1

z−1YiPJ ,i (25)


Yk−1 2 + 2 Yk+1

ut(n)

1∑Yi

pa(n)

−+

pk(n + 1) = pJ,K

z−1 z−1

pk+1(n)pk−1(n)

Figure 4: Digital filter structure for finite difference approximation of an unbounded, source-free tube with a spatially varying cross section.

z−1

K-p

ort

2Y1

N1

+ 2Y2

1Y1 + Y2

+−P1

z−1z−1

K-p

ort

Y2

+

−z−2

z−1 +

KW-converter

W-p

ort

P+2

2Y2

N2

+ 2Y3

1Y2 + Y3

P−2

−+

P2

−+

W-p

ort

0

Figure 5: FDTD node (left) and a DWG node (right) forming a part of a hybrid waveguide. There is a KW-converter between K- and W-models. Yi are wave admittances of W-lines, K-pipes, and KW-converter between junction nodes. P1 and P2 are the junction pressures of theK-node and W-node, respectively.

and the recursion in (24) can be expressed in z-domain as

PJ ,k + z−2PJ ,k = 2∑Yi

N∑i=1

z−1YiPJ ,i. (26)

The superposition of the excitation block in (14) and theN-port formulation above completes the formulation of theFDTD model structure. In particular, by setting N = 3 thedigital filter structure in Figure 3 is obtained.

3.4. Construction of mixed models

An essential difference between DWGs of Figure 2 and FDTDmodel structures of Figure 3 is that while DWG junctionsare connected through two-directional delay lines (W-lines),FDTD nodes have two unit delays of internal memory anddelay-free K-pipes connecting ports between nodes. Thesejunction nodes and ports are thus not directly compatible.The next question is the possibility to interface these sub-models. The interconnection of a lossy FDTD model struc-ture and a similar DWG has been tackled in [17]. A proper in-terconnection element (converter) has been proposed for the

resulting hybrid model in this special case. A generalizationhas been proposed in [18], which allows to make any hybridmodel of K-elements (FDTD) and W-elements having arbi-trary wave admittances/impedances at their ports (see also[21]).

Here, we derive how a hybrid model (shown in Figure 5)can be constructed in a 1D waveguide between a K-node N1

(left) and a W-node N2 (right), aligned with the spatial gridsk = 1 and 2, respectively. The derivation is based on thefact that the junction pressures are available in both typesof nodes, but in the DWG case not at the W-ports.

If N1 and N2 would be both W-nodes (see Figure 8 inthe appendix), the traveling wave entering into the node N2

could be calculated as

P+2 = z−1P−1 = z−1(P1 − z−1P−2

) = z−1P1 − z−2P−2 . (27)

Note that P1 is available in the K-node N1 in Figure 5. Con-versely, if N1 and N2 would be both K-nodes, the junctionpressure z−1P2 would be needed for calculation of P1 (seeFigure 10 in the appendix). Although P2 is implicitly avail-able in N2, it can also be obtained by summing up the wave


yt yt yt

w w w

wl wl wl

wl w wl w wl w wl w yt

kw kw wl

kp k kp k kw w wl w yt

kp kp wl

kp k kp k kw w wl w yt

kp kp wl

Figure 6: Part of a 2D waveguide mesh composed of (a) K-type FDTD elements (left bottom): K-pipes (kp) and K-nodes (k), (b) W-typeDWG elements (top and right): delay-controllable W-lines (wl), W-nodes (w), and terminating admittances (yt), and (c) converter elements(kw) to connect K- and W-type elements into a mixed model.

components within the converter

z−1P2 = z−1(P+2 + P−2

). (28)

Equation (27) may be inserted in (28) to yield the followingtransfer matrix of the 2-port KW-converter element[

P+2

z−1P2

]=[

1 −z−2

1(1− z−2

)][

z−1P1

P−2

]. (29)

The KW-converter in Figure 5 essentially performs the cal-culations given in (29) and interconnects the K-type port ofan FDTD node and the W-type port of a DWG node. Thesignal behavior in a mixed modeling structure is further in-vestigated in the appendix.

4. IMPLEMENTATION OF MIXED MODELS

The functional equivalence and mixed modeling paradigm ofDWGs and FDTDs presented above allows for flexible build-ing of physical models from K- and W-type of substructures.In this way, it is possible to exploit the advantages of eachtype. In this section, we will explore a simple example ofdigital waveguide model that shows how the mixed mod-els can be built. Before that, a short discussion on the prosand cons of the different paradigms in practical realizationsis presented.

4.1. K-modeling versus W-modeling, pros and cons

An advantage of W-modeling is in its numerical robustness.By proper formulation, the stability is guaranteed also withfixed-point arithmetics [5, 19]. Another useful property isthe relatively straightforward way of using fractional delays[22] when building digital waveguides, which makes for ex-ample tuning and run time variation of musical instrumentmodels convenient. In general, it seems that W-modeling isthe right choice in most 1D cases.

The advantages of K-modeling by FDTD waveguides arefound when realizing mesh-like structures, such as 2D and3D meshes [7, 8]. In such cases, the number of unit delays(memory positions) is two for any dimensionality, while fora DWG mesh it is two times the dimensionality of the mesh.A disadvantage of FDTDs is their inherent lack of numeri-cal robustness and tendency of instability for signal frequen-cies near DC and the Nyquist frequency. Furthermore, FDTDjunction nodes cannot be made memoryless, which may be alimitation in nonlinear and parametrically varying models.

4.2. 2D waveguide mesh case

Figure 6 illustrates a part of a 2D mixed model structure thatis based on a rectangular FDTD waveguide mesh for effi-cient and memory-saving computation and DWG elementsat boundaries. Such model could be for example a membrane


of a drum or in a 3D case a room enclosed by walls. Whenthere is need to attach W-type termination admittances tothe model or to vary the propagation delays within the sys-tem, a change from K-elements to W-elements through con-verters is a useful property. Furthermore, variable-length de-lays can be used, for example, for passive nonlinearities at theterminations to simulate gongs and other instruments wherenonlinear mode coupling takes place [23]. The same princi-ple can be used to simulate shock waves in brass instrumentbores [24]. In such cases, the delay lengths are made depen-dent on the signal value passing through the delay elements.

In Figure 6, the elements denoted by kp are K-type pipesbetween K-type nodes. Elements kw are K-to-W convertersand elements wl are W-lines, where the arrows indicate thatthey are controllable fractional delays. Elements yt are ter-minating admittances. In a general case, scattering can becontrolled by varying the admittances, although the compu-tational efficiency is improved if the admittances are madeequal. In a modern PC, a 2D mesh of a few hundred elementscan run in real time at full audio rate. By decimated compu-tation, bigger models can be computed if a lower cutoff fre-quency is permitted, allowing large physical dimensions ofthe mesh.

4.3. Mixed modeling in BlockCompiler

The development of the K- and W-models above has led to asystematic formulation of computational elements for bothparadigms and mixed modeling. The W-lines and K-pipesas well as related junction nodes are useful abstractions fora formal specification of model implementation. We havedeveloped a software tool for physical modeling called theBlockCompiler [20] that is designed in particular for flexiblemodeling and efficient real-time computation of the models.

The BlockCompiler contains two levels: (a) model cre-ation and (b) model implementation. The model creationlevel is written in the Common Lisp programming lan-guage for maximal flexibility in symbolic object-based ma-nipulation of model structures. A set of DSP-oriented andphysics-oriented computational blocks are available. Newblock classes can be created either as macro classes composedof predefined elementary blocks or by writing new elemen-tary blocks. The blocks are connected through ports: inputsand outputs for DSP blocks and K- or W-type ports for phys-ical blocks. A full interconnected model is called a patch.

The model implementation level is a code generator thatdoes the scheduling of the blocks, writes C source code into afile, compiles it on the fly, and allows for streaming soundin real time or computation by stepping in a sample-by-sample mode. The C code can also be exported to otherplatforms, such as the Mustajuuri audio platform [25] andpd [26]. Sound examples of mixed models can be found athttp://www.acoustics.hut.fi/demos/waveguide-modeling/.

5. SUMMARY AND CONCLUSIONS

This paper has presented a formulation of a specific FDTDmodel structure and showed its functional equivalence to the

DWGs. Furthermore, an example of mixed models consist-ing of FDTD and DWG blocks and converter elements is re-ported. The formulation allows for high flexibility in build-ing 1D or higher dimensional physical models from inter-connected blocks.

The DWG method is used as a primary example tothe wave-based methods in this paper. Naturally, the KW-converter formulation is applicable to any W-method, suchas the wave digital filters (WDFs) [19]. In the future, we planto extend our examples to include WDF excitation blocks.Other important future directions are the analysis of the dy-namic behavior of parametrically varying hybrid models, aswell as benchmark tests for computational costs of the pro-posed structures.

Matlab scripts and demos related to DWGs andFDTDs can be found at http://www.acoustics.hut.fi/demos/waveguide-modeling/.

APPENDIX

A. PROOFS OF EQUIVALENCE

The proofs of functional equivalence between the DWG andFDTD formulations used in this article are given below. Theapproach useful for this can be based on the Thevenin andNorton theorems [27].

A.1. Termination in a DWG network

Passive termination of a DWG junction port by a given ad-mittance Y is equivalent to attaching a delay line of infinitelength and wave admittance Y . In the DWG case, this meansan infinite long sequence of admittance-matched unit delaylines. Since there is no back-scattering in finite time, we canuse the left-side port termination of Figure 2, with zero vol-ume velocity in input terminal. Thus, admittance filter Y1

is not needed in computation, it has only to be included inmaking the filter 1/

∑Yi.

A.2. Termination in an FDTD network

Deriving the passive port termination for an FDTD junc-tion is not as obvious as for a DWG junction. We can ap-ply again an infinitely long sequence of admittance-matchedFDTD sections, as depicted in Figure 7 on the left-hand side.With the notations given and z-transforms of variables andadmittances we can denote

P0 = 2Y1∑YiP−1z

−1 +2∑Yi

M∑i=2

YiPiz−1 − P0z

−2, (A.1a)

P−1 = P0z−1 + P−2z

−1 − P−1z−2, (A.1b)

P−k = P−k+1z−1 + P−k−1z

−1 − P−kz−2, for k < −1, (A.1c)

where Pi, i = 1, . . . ,M, are pressures of all M neighboringjunctions linked through admittances Yi to junction i = 0,and Pk, where k = 0,−1,−2, . . . are pressures in junctionsbetween admittance-matched elements chained as termina-tion of junction 0. By applying (A.1c) to (A.1b) iteratively for


+

P−2

−+

z−1 z−1

+

P−1

−+

z−1 z−1

2Y1 + 2Y2

1∑Yi

−+

P0

z−1 z−1

Figure 7: FDTD structure terminated by admittance-matched chain of FDTD elements on the left-hand side.

Uext

02Y1 + 2Y2

1Y1 + Y2

−+

P1

−+

P−1

z−1

z−1

P+2

2Y2 + 2Y3 0

1Y2 + Y3

−+

P2

−+

P−2

Figure 8: Structure for derivation of signal behavior in a DWG network.

k = 2, . . . ,N we get

P−1 = P0z−1 + P−N−1z

−N − P−Nz−N−1. (A.2)

When N → ∞, the last two terms cease to have effect on P−1

in any finite time span and they can thus be discarded. Whenthe result P−1 = P0z−1 is used in (A.1a), we get

P0 =

2Y1∑YiP0z

−1z−1 +

2∑Yi

M∑i=2

YiPiz−1 − P0z

−2, (A.3)

where the first term on the right-hand side can be interpretedas a way to implement the termination as a feedback througha unit delay as illustrated in Figure 3 for the left-hand port ofthe FDTD junction.

A.3. Signal behavior in a DWG network

Figure 8 illustrates a case where an arbitrarily large intercon-nected DWG network is reduced so that only two scatteringjunctions, connected through unit delay line of wave admit-tance Y2, are shown explicitly. Norton equivalent source Uext

is feeding junction node 1 and an equivalent termination ad-mittance is Y1. Junction node 2 is terminated by a Nortonequivalent admittance Y3. Now, we derive the signal prop-agation from Uext to junction pressure P1 and transmissionratio between pressures P2 and P1. If these “transfer func-tions” are equal for the DWG, the FDTD, and the mixed casewith KW-converter, the models are functionally equivalent

Uext

z−2

−+

2Y1 + 2Y2

1Y1 + Y2

z−1

+− z−1

PJ

z−1 z−1

Figure 9: FDTD structure for derivation of volume velocity source(Uext) to junction pressure (PJ ) transfer function.

for any topologies and parametric values equivalent betweenthese cases. This is due to the superposition principle and theNorton theorem.


2Y1 + 2Y2

1Y1 + Y2

−+

z−1

P1

z−1 z−1

2Y2 + 2Y3

1Y2 + Y3

−+

z−1

P2

z−1 z−1

Figure 10: FDTD structure for derivation of signal relation between two junction pressures.

02Y1 + 2Y2

1Y1 + Y2

−+

P1

−+

P−1

+

−z−2

+ z−1

W-to-K converter

2Y2 + 2Y3

1Y2 + Y3

z−1

+ −P2

z−1 z−1

Figure 11: Mixed modeling structure for derivation of DWG to FDTD pressure relation.

From Figure 8, we can write directly for the propagationof equivalent source Uext to junction pressure P1 as

P1 = Uext

Y1 + Y2. (A.4)

Signal transmission ratio between P2 and P1 can be de-rived from the following set of equations (A.5a), (A.5b), and(A.5c):

P2 = 2Y2

Y2 + Y3P−1 z

−1, (A.5a)

P−1 = P1 − P−2 z−1, (A.5b)

P−2 = P2 − P−1 z−1. (A.5c)

By eliminating wave variables P−1 and P−2 ,

P−1 =(P1 − P2z−1

)(1− z−2

) ,

P−2 =(P2 − P1z−1

)(1− z−2

) ,

P2 = 2Y2

Y2 + Y3

(P1 − P2z

−1) z−1

1− z−2

(A.6)

and by solving for P2/P1, we get

P2

P1= 2Y2z−1

Y2 + Y3 +(Y2 − Y3

)z−2

. (A.7)

In the special case of admittance match Y2 = Y3, we get P2/P1

= z−1. Forms (A.4) and (A.7) are now the reference to proveequivalence with FDTD and mixed modeling cases.

A.4. Signal behavior in an FDTD network

Using notations in Figure 9, which shows a Norton’s equiva-lent for an FDTD network, we can write

PJ = Uext

Y1 + Y2

(1− z−2)− PJz

−2

+2Y1

Y1 + Y2PJz

−2 +2Y2

Y1 + Y2PJz

−2(A.8)

that after simplification yields

PJ = Uext

Y1 + Y2, (A.9)

which is equivalent to the DWG form (A.4). Notice that form(1−z−2) in feeding Uext to the node has zeros on the unit cir-cle for angles nπ (n is integer), compensating poles inherentin the FDTD backbone structure. This degrades numericalrobustness of the structure around these frequencies.

For the structure of two FDTD nodes in Figure 10, we canwrite equation

P2 = −P2z−2 +

2Y3

Y2 + Y3P2z

−2 +2Y2

Y2 + Y3P1z

−1, (A.10)


which simplifies to

P2

P1= 2Y2z−1

Y2 + Y3 +(Y2 − Y3

)z−2

(A.11)

being equivalent to the DWG form (A.7). This completesproving the equivalence of the DWG and FDTD structures.

A.5. Signal behavior in a mixed modeling structure

To prove the equivalence of signal behavior also in the mixedmodeling structure of Figure 5 with a KW-adaptor, we haveto analyze the junction signal relations in both directions. Wefirst prove the equivalence in the FDTD to DWG direction.According to Figure 5, we can write

P2 = 2Y2

Y2 + Y3P1z

−1 − 2Y2

Y2 + Y3P−2 z

−2,

P−2 = P2 −(P1z

−1 − P−2 z−2). (A.12)

Eliminating P−2 and solving for P2/P1 yields again form (A.7),proving the equivalence.

According to Figure 11, we can analyze signal relation-ship in the DWG to FDTD direction by writing

P2 = 2Y3

Y2 + Y3P2z

−2 − P2z−2

− 2Y2

Y2 + Y3

(P−1 − P−1 z

−2 + P2z−1)z−1,

P−1 = P1 −(P2z

−1 − P−1 z−2).

(A.13)

By eliminating P−1 and solving for P2/P1, we get again form(A.7). This concludes proving the equivalence of the mixedmodeling case to corresponding DWG and thus also toFDTD structures.

ACKNOWLEDGMENTS

This work is part of the Algorithms for the Modelling ofAcoustic Interactions (ALMA) project (IST-2001-33059) andhas been supported by the Academy of Finland as a part ofthe project “Technology for Audio and Speech Processing”(SA 53537).

REFERENCES

[1] J. L. Kelly and C. C. Lochbaum, “Speech synthesis,” in Proc.4th International Congress on Acoustics, pp. 1–4, Copenhagen,Denmark, September 1962.

[2] N. H. Fletcher and T. D. Rossing, The Physics of Musical In-struments, Springer-Verlag, New York, NY, USA, 2nd edition,1998.

[3] J. D. Markel and A. H. Gray, Linear Prediction of Speech,Springer-Verlag, New York, NY, USA, 1976.


[5] J. O. Smith, “Principles of digital waveguide models of musi-cal instruments,” in Applications of Digital Signal Processing toAudio and Acoustics, M. Kahrs and K. Brandenburg, Eds., pp.417–466, Kluwer Academic Publishers, Boston, Mass, USA,1998.

[6] M. Karjalainen, V. Valimaki, and T. Tolonen, “Plucked-stringmodels: From the Karplus-Strong algorithm to digital waveg-uides and beyond,” Computer Music Journal, vol. 22, no. 3, pp.17–32, 1998.

[7] S. A. Van Duyne and J. O. Smith, “Physical modeling with the2-D digital waveguide mesh,” in Proc. International ComputerMusic Conference, pp. 40–47, Tokyo, Japan, September 1993.

[8] L. Savioja, T. J. Rinne, and T. Takala, “Simulation of roomacoustics with a 3-D finite difference mesh,” in Proc. Interna-tional Computer Music Conference, pp. 463–466, Aarhus, Den-mark, September 1994.

[9] L. Savioja, Modeling techniques for virtual acoustics, Ph.D. the-sis, Helsinki University of Technology, Espoo, Finland, 1999.

[10] S. D. Bilbao, Wave and scattering methods for the numericalintegration of partial differential equations, Ph.D. thesis, Stan-ford University, Stanford, Calif, USA, May 2001.

[11] J. C. Strikwerda, Finite Difference Schemes and Partial Differ-ential Equations, Wadsworth and Brooks/Cole, Pacific Grove,Calif, USA, 1989.

[12] L. Hiller and P. Ruiz, “Synthesizing musical sounds by solvingthe wave equation for vibrating objects: Part 1,” Journal of theAudio Engineering Society, vol. 19, no. 6, pp. 462–470, 1971.

[13] L. Hiller and P. Ruiz, “Synthesizing musical sounds by solvingthe wave equation for vibrating objects: Part 2,” Journal of theAudio Engineering Society, vol. 19, no. 7, pp. 542–551, 1971.


[15] M. Karjalainen, “1-D digital waveguide modeling for im-proved sound synthesis,” in Proc. IEEE International Con-ference on Acoustics, Speech and Signal Processing, vol. 2, pp.1869–1872, Orlando, Fla, USA, May 2002.

[16] C. Erkut and M. Karjalainen, “Virtual strings based on a1-D FDTD waveguide model: Stability, losses, and travelingwaves,” in Proc. Audio Engineering Society 22nd InternationalConference on Virtual, Synthetic and Entertainment Audio, pp.317–323, Espoo, Finland, June 2002.

[17] C. Erkut and M. Karjalainen, “Finite difference methodvs. digital waveguide method in string instrument modelingand synthesis,” in Proc. International Symposium on MusicalAcoustics, Mexico City, Mexico, December 2002.

[18] M. Karjalainen, C. Erkut, and L. Savioja, “Compilation ofunified physical models for efficient sound synthesis,” in Proc.IEEE International Conference on Acoustics, Speech and SignalProcessing, vol. 5, pp. 433–436, Hong Kong, China, April 2003.

[19] A. Fettweis, “Wave digital filters: Theory and practice,” Proc.IEEE, vol. 74, no. 2, pp. 270–327, 1986.

[20] M. Karjalainen, “BlockCompiler: Efficient simulation ofacoustic and audio systems,” in Proc. 114th Audio EngineeringSociety Convention, Amsterdam, Netherlands, March 2003,preprint 5756.

[21] M. Karjalainen, “Time-domain physical modeling and real-time synthesis using mixed modeling paradigms,” in Proc.Stockholm Music Acoustics Conference, vol. 1, pp. 393–396,Stockholm, Sweden, August 2003.

[22] T. I. Laakso, V. Valimaki, M. Karjalainen, and U. K. Laine,“Splitting the unit delay-tools for fractional delay filter de-sign,” IEEE Signal Processing Magazine, vol. 13, no. 1, pp. 30–60, 1996.

[23] J. R. Pierce and S. A. Van Duyne, “A passive nonlinear digitalfilter design which facilitates physics-based sound synthesis ofhighly nonlinear musical instruments,” Journal of the Acousti-cal Society of America, vol. 101, no. 2, pp. 1120–1126, 1997.

[24] R. Msallam, S. Dequidt, S. Tassart, and R. Causse, “Physicalmodel of the trombone including nonlinear propagation ef-fects,” in Proc. International Symposium on Musical Acoustics,vol. 2, pp. 419–424, Edinburgh, Scotland, UK, August 1997.


[25] T. Ilmonen, “Mustajuuri—an application and toolkit for in-teractive audio processing,” in Proc. International Conferenceon Auditory Display, pp. 284–285, Espoo, Finland, July 2001.

[26] M. Puckette, “Pure data,” in Proc. International Computer Mu-sic Conference, pp. 224–227, Thessaloniki, Greece, September1997.

[27] J. E. Brittain, “Thevenin’s theorem,” IEEE Spectrum, vol. 27,no. 3, pp. 42, 1990.

Matti Karjalainen was born in Hankasalmi,Finland, in 1946. He received the M.S. andthe Dr.Tech. degrees in electrical engineer-ing from the Tampere University of Tech-nology, in 1970 and 1978, respectively. Since1980 he has been a professor in acousticsand audio signal processing at the HelsinkiUniversity of Technology in the Faculty ofElectrical Engineering. In audio technology,his interest is in audio signal processingsuch as digital signal processing (DSP) for sound reproduction,perceptually based signal processing, as well as music DSP andsound synthesis. In addition to audio DSP, his research activitiescover speech synthesis, analysis, and recognition, perceptual audi-tory modeling and spatial hearing, DSP hardware, software, andprogramming environments, as well as various branches of acous-tics, including musical acoustics and modeling of musical instru-ments. He has written more than 300 scientific and engineering ar-ticles and contributed to organizing several conferences and work-shops. Professor Karjalainen is Audio Engineering Society (AES)Fellow and Member in Institute of Electrical and Electronics En-gineers (IEEE), Acoustical Society of America (ASA), EuropeanAcoustics Association (EAA), International Computer Music As-sociation (ICMA), European Speech Communication Association(ESCA), and several Finnish scientific and engineering societies.

Cumhur Erkut was born in Istanbul,Turkey, in 1969. He received his B.S. and hisM.S. degrees in electronics and communi-cation engineering from the Yildiz Techni-cal University, Istanbul, Turkey, in 1994 and1997, respectively, and the Dr.Tech. degreein electrical engineering from the HelsinkiUniversity of Technology (HUT), Espoo,Finland, in 2002. Between 1998 and 2002,he worked as a researcher at the Laboratoryof Acoustics and Audio Signal Processing of the HUT. He is cur-rently a postdoctoral researcher in the same institution, where hecontributes to the EU-funded research project “Algorithms for theModelling of Acoustic Interactions” (ALMA, IST-2001-33059). Hisprimary research interests are model-based sound synthesis andmusical acoustics.


A Digital Synthesis Model of Double-ReedWind Instruments

Ph. GuillemainLaboratoire de Mecanique et d’Acoustique, Centre National de la Recherche Scientifique, 31 chemin Joseph-Aiguier,13402 Marseille cedex 20, FranceEmail: [email protected]


We present a real-time synthesis model for double-reed wind instruments based on a nonlinear physical model. One specificityof double-reed instruments, namely, the presence of a confined air jet in the embouchure, for which a physical model has beenproposed recently, is included in the synthesis model. The synthesis procedure involves the use of the physical variables via adigital scheme giving the impedance relationship between pressure and flow in the time domain. Comparisons are made betweenthe behavior of the model with and without the confined air jet in the case of a simple cylindrical bore and that of a more realisticbore, the geometry of which is an approximation of an oboe bore.

Keywords and phrases: double-reed, synthesis, impedance.

1. INTRODUCTION

The simulation of woodwind instrument sounds has been in-vestigated for many years since the pioneer studies by Schu-macher [1] on the clarinet, which did not focus on digitalsound synthesis. Real-time-oriented techniques, such as thefamous digital waveguide method (see, e.g., Smith [2] andValimaki [3]) and wave digital models [4] have been intro-duced in order to obtain efficient digital descriptions of res-onators in terms of incoming and outgoing waves, and usedto simulate various wind instruments.

The resonator of a clarinet can be said to be approxi-mately cylindrical as a first approximation, and its embou-chure is large enough to be compatible with simple airflowmodels. In double-reed instruments, such as the oboe, theresonator is not cylindrical but conical and the size of the airjet is comparable to that of the embouchure. In this case, thedissipation of the air jet is no longer free, and the jet remainsconfined in the embouchure, giving rise to additional aero-dynamic losses.

Here, we describe a real-time digital synthesis model fordouble-reed instruments based on one hand on a recentstudy by Vergez et al. [5], in which the formation of the con-fined air jet in the embouchure is taken into account, and onthe other hand on an extension of the method presented in[6] for synthesizing the clarinet. This method avoids the needfor the incoming and outgoing wave decompositions, since itdeals only with the relationship between the impedance vari-ables, which makes it easy to transpose the physical model toa synthesis model.

The physical model is first summarized in Section 2. Inorder to obtain the synthesis model, a suitable form of theflow model is then proposed, a dimensionless version is writ-ten and the similarities with single-reed models (see, e.g.,[7]) are pointed out. The resonator model is obtained by as-sociating several elementary impedances, and is described interms of the acoustic pressure and flow.

Section 3 presents the digital synthesis model, which re-quires first discrete-time equivalents of the reed displacementand the impedance relations. The explicit scheme solving thenonlinear model, which is similar to that proposed in [6], isthen briefly summarized.

In Section 4, the synthesis model is used to investigate theeffects of the changes in the nonlinear characteristics inducedby the confined air jet.

2. PHYSICAL MODEL

The main physical components of the nonlinear synthesismodel are as follows.

(i) The linear oscillator modeling the first mode of reedsvibration.

(ii) The nonlinear characteristics relating the flow to thepressure and to the reed displacement at the mouth-piece.

(iii) The impedance equation linking pressure and flow.

Figure 1 shows a highly simplified embouchure model for anoboe and the corresponding physical variables described inSections 2.1 and 2.2.

A Digital Synthesis Model of Double-Reed Wind Instruments 991

pm

y/2

y/2

H pj , vj pr , q

Reeds Backbore Main bore

Figure 1: Embouchure model and physical variables.

2.1. Reed model

Although this paper focuses on the simulation of double-reed instruments, oboe experiments have shown that the dis-placements of the two reeds are symmetrical [5, 8]. In thiscase, a classical single-mode model seems to suffice to de-scribe the variations in the reed opening. The opening isbased on the relative displacement y(t) of the two reeds whena difference in acoustic pressure occurs between the mouthpressure pm and the acoustic pressure pj(t) of the air jetformed in the reed channel. If we denote the resonance fre-quency, damping coefficient, and mass of the reeds ωr , qr andµr , respectively, the relative displacement satisfies the equa-tion

d2y(t)dt2

+ ωrqrdy(t)dt

+ ω2r y(t) = − pm − pj(t)

µr. (1)

Based on the reed displacement, the opening of the reedchannel denoted Si(t) is expressed by

Si(t) = Θ(y(t) + H

)×w(y(t) + H

), (2)

wherew denotes the width of the reed channel,H denotes thedistance between the two reeds at rest (y(t) and pm = 0) andΘ is the Heaviside function, the role of which is to keep theopening of the reeds positive by canceling it when y(t) +H <0.

2.2. Nonlinear characteristics

2.2.1. Physical bases

In the case of the clarinet or saxophone, it is generally rec-ognized that the acoustic pressure pr(t) and volume velocityvr(t) at the entrance of the resonator are equal to the pressurepj(t) and volume velocity vj(t) of the air jet in the reed chan-nel (see, e.g., [9]). In oboe-like instruments, the smallness ofthe reed channel leads to the formation of a confined air jet.According to a recent hypothesis [5], pr(t) is no longer equalin this case to pj(t), but these quantities are related as follows

pj(t) = pr(t) +12ρΨ

q(t)2

S2ra

, (3)

where Ψ is taken to be a constant related to the ratio betweenthe cross section of the jet and the cross section at the en-trance of the resonator, q(t) is the volume flow, and ρ is themean air density. In what follows, we will assume that thearea Sra, corresponding to the cross section of the reed chan-nel at the point where the flow is spread over the whole crosssection, is equal to the area Sr at the entrance of the resonator.

The relationship between the mouth pressure pm and thepressure of the air jet pj(t) and the velocity of the air jet vj(t)and the volume flow q(t), classically used when dealing withsingle-reed instruments, is based on the stationary Bernoulliequation rather than on the Backus model (see, e.g., [10] forjustification and comparisons with measurements). This re-lationship, which is still valid here, is

pm = pj(t) +12ρvj(t)2,

q(t) = Sj(t)vj(t) = αSi(t)vj(t),(4)

where α, which is assumed to be constant, is the ratio be-tween the cross section of the air jet Sj(t) and the reed open-ing Si(t).

It should be mentioned that the aim of this paper is topropose a digital sound synthesis model that takes the dis-sipation of the air jet in the reed channel into account. Fora detailed physical description of this phenomenon, readerscan consult [5], from which the notation used here was bor-rowed.

2.2.2. Flow model

In the framework of the digital synthesis model on whichthis paper focuses, it is necessary to express the volume flowq(t) as a function of the difference between the mouth pres-sure pm and the pressure at the entrance of the resonatorpr(t).

From (4), we obtain

vj(t)2 = 2ρ

(pm − pj(t)

), (5)

q2(t) = α2Si(t)2vj(t)2. (6)

Substituting the value of pj(t) given by (3) into (5) gives

vj(t)2 = 2ρ

(pm − pr(t)

)−Ψq(t)2

S2r

. (7)

Using (6), this gives

q2(t) = α2Si(t)2

(2ρ

(pm − pr(t)

)−Ψq(t)2

S2r

), (8)

from which we obtain the expression for the volume flow,namely, the nonlinear characteristics

q(t) = sign(pm − pr(t)

)

× αSi(t)√1 + Ψα2Si(t)2/S2

r

√2ρ

∣∣pm − pr(t)∣∣. (9)

2.3. Dimensionless model

The reed displacement and the nonlinear characteristics areconverted into the dimensionless equations used in the syn-thesis model. For this purpose, we first take the reed displace-ment equation and replace the air jet pressure pj(t) by the


expression involving the variables q(t) and pr(t) (equation(3)),

d2y(t)dt2

+ ωrqrdy(t)dt

+ ω2r y(t) = − pm − pr(t)

µr+ ρΨ

q(t)2

2µrS2r.

(10)

On similar lines to what has been done in the case of single-reed instruments [11], y(t) is normalized with respect to thestatic beating-reed pressure pM defined by pM = Hω2

r µr .We denote by γ the ratio, γ = pm/pM and replace y(t) byx(t), where the dimensionless reed displacement is definedby x(t) = y(t)/H + γ.

With these notations, (10) becomes

1ω2r

d2x(t)dt2

+qrωr

dx(t)dt

+ x(t) = pr(t)pM

+ρΨ

2pM

q(t)2

S2r

(11)

and the reed opening is expressed by

Si(t) = Θ(1− γ + x(t)

)×wH(1− γ + x(t)

). (12)

Likewise, we use the dimensionless acoustic pressurepe(t) and the dimensionless acoustic flow ue(t) defined by

pe(t) = pr(t)pM

, ue(t) = ρc

Sr

q(t)pM

, (13)

where c is the speed of the sound.With these notations, the reed displacement and the non-

linear characteristics are finally rewritten as follows,

1ω2r

d2x(t)dt2

+qrωr

dx(t)dt

+ x(t) = pe(t) + Ψβuue(t)2 (14)

and using (9) and (12),

ue(t) = Θ(1− γ + x(t)

)sign

(γ − pe(t)

)× ζ

(1− γ + x(t)

)√

1 + Ψβx(1− γ + x(t)

)2

√∣∣γ − pe(t)∣∣

= F(x(t), pe(t)

),

(15)

where ζ , βx and βu are defined by

ζ = √H√

2ρµr

cαw

Srωr, βx = H2 α

2w2

S2r

, βu = Hω2r µr

2ρc2.

(16)

This dimensionless model is comparable to the modeldescribed, for example, in [7, 9] in the case of single-reed in-struments, where the dimensionless acoustic pressure pe(t),the dimensionless acoustic flow ue(t), and the dimensionlessreed displacement x(t) are linked by the relations

1ω2r

d2x(t)dt2

+qrωr

dx(t)dt

+ x(t) = pe(t),

ue(t) = Θ(1− γ + x(t)

)sign

(γ − pe(t)

)×ζ(1− γ + x(t)

)√∣∣γ − pe(t)∣∣.

(17)

In addition to the parameter ζ , two other parameters βxand βu depend on the height H of the reed channel at rest.Although, for the sake of clarity in the notations, the vari-able t has been omitted, γ, ζ , βx, and βu are functions of time(but slowly varying functions compared to the other vari-ables). Taking the difference between the jet pressure and theresonator pressure into account results in a flow which is nolonger proportional to the reed displacement, and a reed dis-placement which is no longer linked to pe(t) in an ordinarylinear differential equation.

2.4. Resonator model

We now consider the simplified resonator of an oboe-like in-strument. It is described as a truncated, divergent, linear con-ical bore connected to a mouthpiece including the backboreto which the reeds are attached, and an additional bore, thevolume of which corresponds to the volume of the missingpart of the cone. This model is identical to that summarizedin [12].

2.4.1. Cylindrical bore

The dimensionless input impedance of a cylindrical boreis first expressed. By assuming that the radius of the boreis large in comparison with the boundary layers thick-nesses, the classical Kirchhoff theory leads to the value ofthe complex wavenumber for a plane wave k(ω) = ω/c −(i3/2/2)ηcω1/2, where η is a constant depending on the radiusR of the bore η = (2/Rc3/2)(

√lv + (cp/cv − 1)

√lt). Typical val-

ues of the physical constants, in mKs units, are lv = 4.10−8,lt = 5.6.10−8, Cp/Cv = 1.4 (see, e.g., [13]). The trans-fer function of a cylindrical bore of infinite length betweenx = 0 and x = L, which constitutes the propagation filterassociated with the Green formulation, including the prop-agation delay, dispersion, and dissipation, is then given byF(ω) = exp(−ik(ω)L).

Assuming that the radiation losses are negligible, the di-mensionless input impedance of the cylindrical bore is clas-sically expressed by

C(ω) = i tan(k(ω)L

). (18)

In this equation, C(ω) is the ratio between the Fouriertransforms Pe(ω) and Ue(ω) of the dimensionless variablespe(t) and ue(t) defined by (13). The input admittance of thecylindrical bore is denoted by C−1(ω).

A different formulation of the impedance relation of acylindrical bore, which is compatible with a time-domainimplementation, and was proposed in [6], is used and ex-tended here. It consists in rewriting (18) as

C(ω) = 11 + exp

(− 2ik(ω)L) − exp

(− 2ik(ω)L)

1 + exp(− 2ik(ω)L

) . (19)

Figure 2 shows the interpretation of (19) in terms oflooped propagation filters. The transfer function of thismodel corresponds directly to the dimensionless inputimpedance of a cylindrical bore. It is the sum of two parts.The upper part corresponds to the first term of (19) and the


ue(t)

pe(t)− exp(− 2ik(ω)L

)

− exp(− 2ik(ω)L

)

Figure 2: Impedance model of a cylindrical bore.

ue(t) pe(t)xec

D

C−1(ω) −1

Figure 3: Impedance model of a conical bore.

lower part corresponds to the second term. The filter havingthe transfer function −F(ω)2 = − exp(−2ik(ω)L) stands forthe back and forth path of the dimensionless pressure waves,with a sign change at the open end of the bore.

Although k(ω) includes both dissipation and dispersion,the dispersion is small (e.g., in the case of a cylindrical borewith a radius of 7 mm, η = 1.34.10−5), and the peaks of theinput impedance of a cylindrical bore can be said to be nearlyharmonic. In particular, this intrinsic dispersion can be ne-glected, unlike the dispersion introduced by the geometry ofthe bore (e.g., the input impedance of a truncated conicalbore cannot be assumed to be harmonic).

2.4.2. Conical bore

From the input impedance of the cylindrical bore, the di-mensionless input impedance of the truncated, divergent,conical bore can be expressed as a parallel combination ofa cylindrical bore and an “air” bore,

S2(ω) = 11/(iωxe/c

)+ 1/C(ω)

, (20)

where xe is the distance between the apex and the input. It isexpressed in terms of the angle θ of the cone and the inputradius R as xe = R/ sin(θ/2).

The parameter η involved in the definition of C(ω) in(20), which depends on the radius and characterizes thelosses included in k(ω), is calculated by considering the ra-dius of the cone at (5/12)L. This value was determined em-pirically, by comparing the impedance given by (20) with aninput impedance of the same conical bore obtained with a se-ries of elementary cylinders with different diameters (steppedcone), using the transmission line theory.

Denoting by D the differentiation operator D(ω) = iωand rewriting (20) in the form S2(ω) = D(ω)(xe/c)/(1 +D(ω)(xe/c)C−1(ω)), we propose the equivalent scheme inFigure 3.

2.4.3. Oboe-like bore

The complete bore is a conical bore combined with a mouth-piece.

The mouthpiece consists of a combination of two bores,

(i) a short cylindrical bore with length L1, radius R1, sur-face S1, and characteristic impedance Z1. This is thebackbore to which the reeds are attached. Its radiusis small in comparison with that of the main conicalbore, the characteristic impedance of which is denotedZ2 = ρc/Sr , and

(ii) an additional short cylindrical bore with length L0, ra-dius R0, surface S0, and characteristic impedance Z0.Its radius is large in comparison with that of the back-bore. This role serves to add a volume correspond-ing to the truncated part of the complete cone. Thismakes it possible to reduce the geometrical dispersionresponsible for inharmonic impedance peaks in thecombination backbore/conical bore.

The impedance C1(ω) of the short cylindrical backboreis based on an approximation of i tan(k1(ω)L1) with smallvalues of k1(ω)L1. It takes the dissipation into account andneglects the dispersion. Assuming that the radius R1 is largein comparison with the boundary layers thicknesses, using(19), C1(ω) is first approximated by

C1(ω) 1− exp(− η1c

√ω/2L1

)exp

(− 2iωL1/c)

1 + exp(− η1c

√ω/2L1

)exp

(− 2iωL1/c) , (21)

which, since L1 is small, is finally simplified as

C1(ω) 1− exp(− η1c

√ω/2L1

)(1− 2iωL1/c

)1 + exp

(− η1c√ω/2L1

) . (22)

By noting G(ω) = (1 − exp(−η1c√ω/2L1))/(1 +

exp(−η1c√ω/2L1)), and H(ω) = (L1/c)(1 − G(ω)), the

expression of C1(ω) reads

C1(ω) = G(ω) + iωH(ω). (23)

This approximation avoids the need for a second delay linein the sampled formulation of the impedance.

The transmission line equation relates the acoustic pres-sure pn and the flow un at the entrance of a cylindrical bore(with characteristic impedance Zn, length Ln, and wavenum-ber kn) to the acoustic pressure pn+1 and the flow un+1 atthe exit of a cylindrical bore. With dimensioned variables,it reads

pn(ω) = cos(kn(ω)Ln

)pn+1(ω) + iZn sin

(kn(ω)Ln

)un+1(ω),

un(ω) = i

Znsin

(kn(ω)Ln

)pn+1(ω) + cos

(kn(ω)Ln

)un+1(ω),

(24)

yielding

pn(ω)un(ω)

= pn+1(ω)/un+1(ω) + iZn tan(kn(ω)Ln

)1 + (i/Zn) tan

(kn(ω)Ln

)(pn+1(ω)/un+1(ω)

) . (25)


ue(t)

C1(ω)

S2(ω)

D(ω)

Z1

Z2

−Vρc2

1Z2

pe(t)

Figure 4: Impedance model of the simplified resonator.

Using the notations introduced in (20) and (23), the inputimpedance of the combination backbore/main conical borereads

p1(ω)u1(ω)

= Z2S2(ω) + Z1C1(ω)1 +

(Z2/Z1

)S2(ω)C1(ω)

, (26)

which is simplified as p1(ω)/u1(ω) = Z2S2(ω) + Z1C1(ω),since Z1 Z2.

In the same way, the input impedance of the whole borereads

p0(ω)u0(ω)

= p1(ω)/u1(ω) + iZ0 tan(k0(ω)L0

)1 + (i/Z0) tan

(k0(ω)L0

)(p1(ω)/u1(ω)

) , (27)

which, since Z0 Z1, is simplified as

p0(ω)u0(ω)

= p1(ω)/u1(ω)1 + (i/Z0) tan

(k0(ω)L0

)(p1(ω)/u1(ω)

) . (28)

Since L0 is small and the radius is large, the losses in-cluded in k0(ω) can be neglected, and hence k0(ω) = ω/cand tan(k0(ω)L0) = (ω/c)L0. Under these conditions, the in-put impedance of the bore is given by

p0(ω)u0(ω)

= 11/(p1(ω)/u1(ω)

)+ iω/c

(L0/Z0

)= 1

1/(Z2S2(ω) + Z1C1(ω)

)+ iω/c

(L0S0/ρc

) . (29)

If we take V to denote the volume of the short addi-tional bore V = L0S0 and rewrite (29) with the dimension-less variables Pe and Ue (Ue = Z2u0), the dimensionless in-put impedance of the whole resonator relating the variablesPe(ω) and Ue(ω) becomes

Ze(ω) = Pe(ω)Ue(ω)

= 1/Z2

iωV/(ρc2

)+ 1/

(Z1C1(ω) + Z2S2(ω)

) . (30)

After rearranging (30), we propose the equivalent scheme inFigure 4.

It can be seen from (30) that the mouthpiece is equivalentto a Helmholtz resonator consisting of a hemispherical cavitywith volume V and radius Rb such that V = (4/6)πR3

b, con-nected to a short cylindrical bore with length L1 and radiusR1.

ue(t)

Ze(ω)

H, pm

(ζ, βx, βu, γ

)

f

Reedmodel

x(t)

pe(t)

pe(t)

Figure 5: Nonlinear synthesis model.

2.5. Summary of the physical model

The complete dimensionless physical model consists of threeequations,

1ω2r

d2x(t)dt2

+qrωr

dx(t)dt

+ x(t) = pe(t) + Ψβuue(t)2, (31)

ue(t) = ζ(1− γ + x(t)

)√

1 + Ψβx(1− γ + x(t)

)2

×Θ(1− γ + x(t)

)sign

(γ − pe(t)

)× √∣∣γ − pe(t)∣∣,

(32)

Pe(ω) = Ze(ω)Ue(ω). (33)

These equations enable us to introduce the reed and thenonlinear characteristics in the form of two nonlinear loops,as shown in Figure 5. The first loop relates the output pe tothe input ue of the resonator, as in the case of single-reedinstruments models. The second nonlinear loop correspondsto the u2

e-dependent changes in x. The output of the model isgiven by the three coupled variables pe, ue, and x. The controlparameters of the model are the length L of the main conicalbore and the parameters H(t) and pm(t) from which ζ(t),βx(t), βu(t), and γ(t) are calculated.

In the context of sound synthesis, it is necessary to calcu-late the external pressure. Here we consider only the propa-gation within the main “cylindrical” part of the bore in (20).Assuming again that the radiation impedance can be ne-glected, the external pressure corresponds to the time deriva-tive of the flow at the exit of the resonator pext(t) = dus(t)/dt.Using the transmission line theory, one directly obtains

Us(ω) = exp(− ik(ω)L

)(Pe(ω) + Ue(ω)

). (34)

From the perceptual point of view, the quantityexp(−ik(ω)L) can be left aside, since it stands for thelosses corresponding to a single travel between the em-bouchure and the open end. This simplification leads to thefollowing expression for the external pressure

pext(t) = d

dt

(pe(t) + ue(t)

). (35)


3. DISCRETE-TIME MODEL

In order to draw up the synthesis model, it is necessary touse a discrete formulation in the time domain for the reeddisplacement and the impedance models. The discretizationschemes used here are similar to those described in [6] forthe clarinet, and summarized in [12] for brass instrumentsand saxophones.

3.1. Reed displacement

We take e(t) to denote the excitation of the reed e(t) =pe(t) + Ψβuue(t)2. Using (31), the Fourier transform of theratio X(ω)/E(ω) can be readily written as

X(ω)E(ω)

= ω2r

ω2r − ω2 + iωqrωr

. (36)

An inverse Fourier transform provides the impulse responseh(t) of the reed model

h(t) = 2ωr√4− q2

r

exp(− 1

2ωrqrt

)sin

(12

√4− q2

r ωrt). (37)

Equation (37) shows that h(t) satisfies h(0) = 0. This prop-erty is most important in what follows. In addition, the rangeof variations allowed for qr is ]0, 2[.

The discrete-time version of the impulse response usestwo centered numerical differentiation schemes which pro-vide unbiased estimates of the first and second derivativeswhen they are applied to sampled second-order polynomi-als

iω fe2

(z − z−1),

−ω2 f 2e

(z − 2 + z−1), (38)

where z = exp(iω), ω = ω/ fe, and fe is the sampling fre-quency.

With these approximations, the digital transfer functionof the reed is given by

X(z)E(z)

=z−1

f 2e /ω2

r + feqr/(2ωr

)−z−1(2 f 2

e /ω2r−1

)−z−2(feqr/

(2ωr

)−f 2e /ω2

r

) ,

(39)

yielding a difference equation of the type

x(n) = b1a e(n− 1) + a1ax(n− 1) + a2ax(n− 2). (40)

This difference equation keeps the property h(0) = 0.Figure 6 shows the frequency response of this approxi-

mated reed model (solid line) superimposed with the exactone (dotted line).

This discrete reed model is stable under the condi-tion ωr < fe

√4− q2

r . Under this condition, the mod-ulus of the poles of the transfer function is given by√

(2 fe − ωrqr)/(2 fe + ωrqr) and is always smaller than 1. This

0 2000 4000 6000 8000 10000 12000 14000 16000 18000

Hz

0

1

2

3

4

5

6

Figure 6: Approximated (solid line) and exact (dotted line) reedfrequency response with parameter values fr = 2500 Hz, qr = 0.2,and fe = 44.1 kHz.

stability condition makes this discretization scheme unsuit-able for use at low sampling rates, but in practice, at the CDquality sample rate, this problem does not arise for a reed res-onance frequency of up to 5 kHz with a quality factor of up to0.5. For a more detailed discussion of discretization schemes,readers can consult, for example, [14].

The bilinear transformation does not provide a suitablediscretization scheme for the reed displacement. In this case,the impulse response does not satisfy the property of the con-tinuous model h(0) = 0.

3.2. Impedance

A time domain equivalent to the inverse Fourier transformof impedance Ze(ω) given by (30) is now required. Here weexpress pe(n) as a function of ue(n).

The losses in the cylindrical bore element contributing tothe impedance of the whole bore are modeled with a digi-tal low-pass filter. This filter approximates the back and forthlosses described by F(ω)2 = exp(−2ik(ω)L) and neglects the(small) dispersion. So that they can be adjusted to the ge-ometry of the resonator, the coefficients of the filter are ex-pressed analytically as functions of the physical parameters,rather than using numerical approximations and minimiza-tions. For this purpose, a one-pole filter is used,

F(ω) = b0 exp(−iωD)1− a1 exp(−iω)

, (41)

where ω = ω/ fe, and D = 2 fe(L/c) is the pure delay corre-sponding to a back and forth path of the waves.

The parameters b0 and a1 are calculated so that|F(ω)2|2 = |F(ω)|2 for two given values of ω, and are so-lutions of the system

∣∣∣F(ω1)2∣∣∣2(

1 + a21 − 2a1 cos

(ω1)) = b2

0,∣∣∣F(ω2)2∣∣∣2(

1 + a21 − 2a1 cos

(ω2)) = b2

0,(42)


where |F(ω(1,2))2|2 = exp(−2ηc√ω(1,2)/2L). The first value

ω1 is an approximation of the frequency of the firstimpedance peak of the truncated conical bore given by ω1 =c(12πL+9π2xe+16L)/(4L(4L+3πxe+4xe)), in order to ensurea suitable height of the impedance peak at the fundamentalfrequency. It is important to keep this feature to obtain a real-istic digital simulation of the continuous dynamical system,since the linear impedance is associated with the nonlinearcharacteristics. This ensures that the decay time of the fun-damental frequency of the approximated impulse responseof the impedance matches the exact value, which is impor-tant in the case of fast changes in γ (e.g., attack transient).The second value ω2 corresponds to the resonance frequencyof the Helmholz resonator ω2 = c

√S1/(L1V).

The phase of F(ω) has a nonlinear part, which is givenby− arctan(a1 sin(ω)/(1− a1 cos(ω))). This part differs fromthe nonlinear part of the phase of F(ω)2, which is given by−ηc√ω/2L. Although these two quantities are different andalthough the phase of F(ω) is determined by the choice ofa1, which is calculated from the modulus, it is worth not-ing that in both cases, the dispersion is always very small,has a negative value, and is monotonic up to the frequency( fe/2π) arccos(a1). Consequently, in both cases, in the case ofa cylindrical bore, up to this frequency, the distance betweensuccessive impedance peaks decreases as their rank increases,ωn+1 − ωn < ωn − ωn−1.

Using (19) and (41), the impedance of the cylindricalbore unit C(ω) is then expressed by

C(z) = 1− a1z−1 − b0z−D

1− a1z−1 + b0z−D. (43)

Since L1 is small, the frequency-dependent functionG(ω)involved in the definition of the impedance of the short back-bore C1(ω) can be approximated by a constant, correspond-ing to its value in ω2.

The bilinear transformation is used to discretize D = iω:D(z) = 2 fe((z − 1)/(z + 1)).

The combination of all these parts according to (30)yields the digital impedance of the whole bore in the form

Ze(z) =∑k=4

k=0 bck z−k +

∑k=3k=0 bcDk z

−D−k

1−∑k=4k=1 ack z−k −

∑k=3k=0 acDk z−D−k

, (44)

where the coefficients bck , ack , bcDk , and acDk are expressed an-alytically as functions of the geometry of each part of thebore. This leads directly to the difference equation, which canbe conveniently written in the form

pe(n) = bc0ue(n) + V , (45)

where V includes all the terms that do not depend on thetime sample n

V =k=4∑k=1

bckue(n− k) +k=3∑k=0

bcDkue(n−D − k)

+k=4∑k=1

ack pe(n− k) +k=3∑k=0

acDk pe(n−D − k).

(46)

0 500 1000 1500 2000 2500 3000 3500 4000

Hz

0

5

10

15

20

25

30

(a)

0 200 400 600 800 1000 1200 1400 1600 1800 2000

samples

−0.2

−0.1

0

0.1

0.2

0.3

(b)

Figure 7: (a) represents approximated (solid lines) and exact (dot-ted lines) input impedance, while (b) represents approximated(solid lines) and exact (dotted lines) impulse response. Geometri-cal parameters L = 0.46 m, R = 0.00216 m, θ = 2, L1 = 0.02 m,R1 = 0.0015 m, and Rb = 0.006 m.

Figure 7 shows an oboe-like bore input impedance, bothapproximated (solid line) and exact (dotted line) togetherwith the corresponding impulse responses.

3.3. Synthesis algorithm

The sampled expressions for the impulse responses of thereed displacement and the impedance models are now usedto write the sampled equivalent of the system of (31), (32),and (33):

x(n) = b1a

(pe(n− 1) + Ψβuue(n− 1)2)

+ a1ax(n− 1) + a2ax(n− 2),(47)

pe(n) = bc0ue(n) + V , (48)

ue(n) =W sign(γ − pe(n)

)√∣∣γ − pe(n)∣∣, (49)

where W is

W = Θ(1− γ + x(n)

)× ζ

(1− γ + x(n)

)√

1 + Ψβx(1− γ + x(n)

)2.

(50)

This system of equations is an implicit system, since ue(n)has to be known in order to be able to compute pe(n) with theimpedance equation (48). Likewise, ue(n) is obtained fromthe nonlinear equation (49) and requires pe(n) to be known.

Thanks to the specific reed discretization scheme pre-sented in Section 3.1, calculating x(n) with (47) does not


require pe(n) and ue(n) to be known. This makes it possi-ble to solve this system explicitly, as shown in [6], thus doingaway with the need for schemes such as the K-method [15].

Since W is always positive, if one considers the two casesγ− pe(n) ≥ 0 and γ− pe(n) < 0, successively, substituting theexpression for pe(n) from (48) into (49) eventually gives

ue(n) = 12

sign(γ − V)

×(− bc0W

2 + W√(

bc0W)2

+ 4|γ − V |).

(51)

The acoustic pressure and flow in the mouthpiece at sam-pling time n are then finally obtained by the sequential cal-culation of V with (46), x(n) with (47), W with (50), ue(n)with (51), and pe(n) with (48).

The external pressure pext(n) is calculated using the dif-ference between the sum of the internal pressure and the flowat sampling time n and n− 1.

4. SIMULATIONS

The effects of introducing the confined air jet into the non-linear characteristics are now studied in the case of two dif-ferent bore geometries. In particular, we consider a cylindri-cal resonator, the impedance peaks of which are odd har-monics, and a resonator, the impedance of which containsall the harmonics. We start by checking numerically the va-lidity of the resolution scheme in the case of the cylindricalbore. (Sound examples are available at http://omicron.cnrs-mrs.fr/∼guillemain/eurasip.html.)

4.1. Cylindrical resonator

We first consider a cylindrical resonator, and make the pa-rameter Ψ vary linearly from 0 to 4000 during the soundsynthesis procedure (1.5 seconds). The transient attack cor-responds to an abrupt increase in γ at t = 0. During the de-cay phase, starting at t = 1.3 seconds, γ decreases linearlytowards zero. Its steady-state value is γ = 0.56. The otherparameters are constant, ζ = 0.35, βx = 7.5.10−4, βu =6.1.10−3. The reed parameters are ωr = 2π.3150 rad/second,qr = 0.5. The resonator parameters are R = 0.0055 m,L = 0.46 m.

Figure 8 shows superimposed curves, in the top figure,the digital impedance of the bore is given in dotted lines,and the ratio between the Fourier transforms of the sig-nals pe(n) and ue(n) in solid lines; in the bottom figure, thedigital reed transfer function is given in dotted lines, andthe ratio of the Fourier transforms of the signals x(n) andpe(n)+Ψ(n)βuue(n)2 (including attack and decay transients)in solid lines.

As we can see, the curves are perfectly superimposed.There is no need to check the nonlinear relation betweenue(n), pe(n), and x(n), which is satisfied by constructionsince ue(n) is obtained explicitly as a function of the othervariables in (51). In the case of the oboe-like bore, the re-sults obtained using the resolution scheme are equally accu-rate.

0 500 1000 1500 2000 2500 3000 3500 40004000

Hz

0

5

10

15

20

25

30

(a)

0 500 1000 1500 2000 2500 3000 3500 40004000

Hz

1

1.5

2

(b)

Figure 8: (a) represents impedance (dotted line) and ratio betweenthe spectra of pe and ue (solid line), while (b) represents reed trans-fer (dotted line) and ratio of spectra between x and pe+Ψβuu2

e (solidline).

0 0.2 0.4 0.6 0.8 1 1.2 1.4

s

0

−2

−4

−6

−8

−10

kHz

Figure 9: Spectrogram of the external pressure for a cylindrical boreand a beating reed where γ = 0.56.

4.1.1. The case of the beating reed

The first example corresponds to a beating reed situation,which is simulated by choosing a steady-state value of γgreater than 0.5 (γ = 0.56).

Figure 9 shows the spectrogram (dB) of the external pres-sure generated by the model. The values of the spectrogramare coded with a grey-scale palette (small values are dark andhigh values are bright). The bright horizontal lines corre-spond to the harmonics of the external pressure.


−8 −6 −4 −2 0 2 4 6

×10−1

02468

1012141618×10−2

(a)

−8 −6 −4 −2 0 2 4 6

×10−1

02468

1012141618×10−2

(b)

Figure 10: ue(n) versus pe(n): (a) t = 0.25 second, (b) t = 0.5second.

−8 −6 −4 −2 0 2 4 6

×10−1

0

2

4

6

8

10

12

14

16×10−2

(a)

−8 −6 −4 −2 0 2 4 6

×10−1

0

2

4

6

8

10

12

14×10−2

(b)

Figure 11: ue(n) versus pe(n): (a) t = 0.75 second, (b) t = 1 second.

Increasing the value of Ψ mainly affects the pitch andonly slightly affects the amplitudes of the harmonics. In par-ticular, at high values of Ψ, a small increase in Ψ results in astrong decrease in the pitch.

A cancellation of the self-oscillation process can be ob-served at around t = 1.2 seconds, due to the high value of Ψ,since it occurs before γ starts decreasing.

Odd harmonics have a much higher level than even har-monics as occuring in the case of the clarinet. Indeed, theeven harmonics originate mainly from the flow, which istaken into account in the calculation of the external pressure.However, it is worth noticing that the level of the second har-monic increases with Ψ.

Figures 10 and 11 show the flow ue(n) versus the pressurepe(n), obtained during a small number (32) of oscillation pe-riods at around t = 0.25 seconds, t = 0.5 seconds, t = 0.75seconds and t = 1 seconds. The existence of two differentpaths, corresponding to the opening or closing of the reed, isdue to the inertia of the reed. This phenomenon is observedalso on single-reed instruments (see, e.g., [14]). A disconti-nuity appears in the whole path because the reed is beating.This cancels the opening (and hence the flow) while the pres-sure is still varying.

The shape of the curve changes with respect to Ψ. Thisshape is in agreement with the results presented in [5].

0 0.2 0.4 0.6 0.8 1 1.2 1.4

s

0

−2

−4

−6

−8

−10

kHz

Figure 12: Spectrogram of the external pressure for a cylindricalbore and a nonbeating reed where γ = 0.498.

−5 −4 −3−2 −1 0 1 2 3 4 5

×10−1

0

2

4

6

8

10

12

14

16×10−2

(a)

−5 −4 −3−2 −1 0 1 2 3 4 5

×10−1

0

2

4

6

8

10

12

14

16×10−2

(b)


4.1.2. The case of the nonbeating reed

The second example corresponds to a nonbeating reed situa-tion, which is obtained by choosing a steady-state value of γsmaller than 0.5 (γ = 0.498).

Figure 12 shows the spectrogram of the external pressuregenerated by the model. Increasing the value of Ψ results ina sharp change in the level of the high harmonics at aroundt = 0.4 seconds, a slight change in the pitch, and a cancella-tion of the self-oscillation process at around t = 0.8 seconds,corresponding to a smaller value of Ψ than that observed inthe case of the beating reed.

Figure 13 shows the flow ue(n) versus the pressure pe(n)at around t = 0.25 seconds and t = 0.5 seconds. Since thereed is no longer beating, the whole path remains continu-ous. The changes in its shape with respect to Ψ are smallerthan in the case of the beating reed.

4.2. Oboe-like resonator

In order to compare the effects of the confined air jet with thegeometry of the bore, we now consider an oboe-like bore,


0 0.5 1 1.5

s

−0.4−0.2

00.2

0.4

(a)

0 500 1000 1500 2000 2500 3000 3500 4000

samples

−0.2−0.1

00.1

0.2

(b)

0 500 1000 1500 2000 2500 3000 3500 4000

samples

−0.1−0.05

00.05

0.1

(c)

Figure 14: (a) represents external acoustic pressure, and (b), (c)represent attack and decay transients.

the input impedance, and geometric parameters of whichcorrespond to Figure 7. The other parameters have the samevalues as in the case of the cylindrical resonator, and thesteady-state value of γ is γ = 0.4.

Figure 14 shows the pressure pext(t). Increasing the effectof the air jet confinement with Ψ, and hence the aerodynam-ical losses, results in a gradual decrease in the signal ampli-tude. The change in the shape of the waveform with respectto Ψ can be seen on the blowups corresponding to the attackand decay transients.

Figure 15 shows the spectrogram of the external pressuregenerated by the model.

Since the impedance includes all the harmonics (and notonly the odd ones as in the case of the cylindrical bore),the output pressure also includes all the harmonics. Thismakes for a considerable perceptual change in the timbrein comparison with the cylindrical geometry. Since the in-put impedance of the bore is not perfectly harmonic, it isnot possible to determine whether the “moving formants”are caused by a change in the value of Ψ or by a “phasingeffect” resulting from the slight inharmonic nature of theimpedance.

Increasing the value of Ψ affects the amplitude of the har-monics and slightly changes the pitch. In addition, as in thecase of the cylindrical bore with a nonbeating reed, a largevalue of Ψ brings the self-oscillation process to an end.

0 0.2 0.4 0.6 0.8 1 1.2 1.4

s

0

−2

−4

−6

−8

−10

kHz

Figure 15: Spectrogram of the external pressure for an oboe-likebore where γ = 0.4.

−16 −12 −8 −4 0 4

×10−1

02468

1012141618×10−2

(a)

−14 −10 −6 −2 2

×10−1

02468

1012141618×10−2

(b)


−12 −8 −4 0 4

×10−1

02468

1012141618×10−2

(a)

−10 −8 −6 −4 −2 0 2 4

×10−1

0

2

4

6

8

10

12

14

16×10−2

(b)

Figure 17: ue(n) versus pe(n): (a) t = 0.75 second, (b) t = 1 second.

Figures 16 and 17 show the flow ue(n) versus the pressurepe(n) at around t = 0.25 seconds, t = 0.5 seconds, t = 0.75seconds, and t = 1 seconds. The shape and evolution with Ψof the nonlinear characteristics are similar to what occurs inthe case of a cylindrical bore with a beating reed.


5. CONCLUSION

The synthesis model described in this paper includes the for-mation of a confined air jet in the embouchure of double-reed instruments. A dimensionless physical model, the formof which is suitable for transposition to a digital synthesismodel, is proposed. The resonator is modeled using a timedomain equivalent of the input impedance and does not re-quire the use of wave variables. This facilitates the model-ing of the digital coupling between the bore, the reed andthe nonlinear characteristics, since all the components of themodel use the same physical variables. It is thus possible toobtain an explicit resolution of the nonlinear coupled sys-tem thanks to the specific discretization scheme of the reedmodel. This is applicable to other self-oscillating wind in-struments using the same flow model, but it still requires tobe compared with other methods.

This synthesis model was used in order to study the in-fluence of the confined jet on the sound generated, by carry-ing out a real-time implementation. Based on the results ofinformal listening tests with an oboe player, the sound anddynamics of the transients obtained are fairly realistic. Thesimulations show that the shape of the resonator is the mainfactor determining the timbre of the instrument in steady-state parts, and that the confined jet plays a role at the con-trol level of the model, since it increases the oscillation stepand therefore plays an important role mainly in the transientparts.

ACKNOWLEDGMENTS

The author would like to thank Christophe Vergez for helpfuldiscussions on the physical flow model, and Jessica Blanc forreading the English.

REFERENCES

[1] R. T. Schumacher, “Ab initio calculations of the oscillation ofa clarinet,” Acustica, vol. 48, no. 71, pp. 71–85, 1981.

[2] J. O. Smith III, “Principles of digital waveguide models ofmusical instruments,” in Applications of Digital Signal Pro-cessing to Audio and Acoustics, M. Kahrs and K. Branden-burg, Eds., pp. 417–466, Kluwer Academic Publishers, Boston,Mass, USA, 1998.

[3] V. Valimaki and M. Karjalainen, “Digital waveguide modelingof wind instrument bores constructed of truncated cones,” inProc. International Computer Music Conference, pp. 423–430,Computer Music Association, San Francisco, 1994.

[4] M. van Walstijn and M. Campbell, “Discrete-time modelingof woodwind instrument bores using wave variables,” Journalof the Acoustical Society of America, vol. 113, no. 1, pp. 575–585, 2003.

[5] C. Vergez, R. Almeida, A. Causse, and X. Rodet, “Toward asimple physical model of double-reed musical instruments:influence of aero-dynamical losses in the embouchure on thecoupling between the reed and the bore of the resonator,”Acustica, vol. 89, pp. 964–974, 2003.

[6] Ph. Guillemain, J. Kergomard, and Th. Voinier, “Real-timesynthesis of wind instruments, using nonlinear physical mod-els,” submitted to Journal of the Acoustical Society of Amer-ica.

[7] J. Kergomard, “Elementary considerations on reed-instru-ment oscillations,” in Mechanics of Musical Instruments,A. Hirschberg, J. Kergomard, and G. Weinreich, Eds.,Springer-Verlag, New York, NY, USA, 1995.

[8] A. Almeida, C. Vergez, R. Causse, and X. Rodet, “Physicalstudy of double-reed instruments for application to sound-synthesis,” in Proc. International Symposium in Musical Acous-tics, pp. 221–226, Mexico City, Mexico, December 2002.

[9] A. Hirschberg, “Aero-acoustics of wind instruments,” in Me-chanics of Musical Instruments, A. Hirschberg, J. Kergomard,and G. Weinreich, Eds., Springer-Verlag, New York, NY, USA,1995.

[10] S. Ollivier, Contribution a l’etude des oscillations des instru-ments a vent a anche simple, Ph.D. thesis, l’Universite duMaine, France, 2002.

[11] T. A. Wilson and G. S. Beavers, “Operating modes of the clar-inet,” Journal of the Acoustical Society of America, vol. 56, no.2, pp. 653–658, 1974.

[12] Ph. Guillemain, J. Kergomard, and Th. Voinier, “Real-timesynthesis models of wind instruments based on physical mod-els,” in Proc. Stockholm Music Acoustics Conference, Stock-holm, Sweden, 2003.

[13] A. D. Pierce, Acoustics—An Introduction to Its Physical Prin-ciples and Applications, McGraw-Hill, New York, NY, USA,1981, reprinted by Acoustical Society of America, Woodbury,NY, USA, 1989.

[14] F. Avanzini and D. Rocchesso, “Efficiency, accuracy, and sta-bility issues in discrete time simulations of single reed instru-ments,” Journal of the Acoustical Society of America, vol. 111,no. 5, pp. 2293–2301, 2002.

[15] G. Borin, G. De Poli, and D. Rocchesso, “Elimination of delay-free loops in discrete-time models of nonlinear acoustic sys-tems,” IEEE Trans. Speech and Audio Processing, vol. 8, no. 5,pp. 597–605, 2000.

Ph. Guillemain was born in 1967 in Paris.Since 1995, he has been working as a fulltime researcher at the Centre National dela Recherche Scientifique (CNRS) in Mar-seille, France. He obtained his Ph.D. in1994 on the additive synthesis modeling ofnatural sounds using time frequency andwavelets representations. Since 1989, he hasbeen working in the field of musical soundsanalysis, synthesis and transformation us-ing signal models, and phenomenological models with an emphasison propagative models, their link with physics, and the design andcontrol of real-time compatible synthesis algorithms.


Real-Time Gesture-Controlled Physical ModellingMusic Synthesis with Tactile Feedback

David M. HowardMedia Engineering Research Group, Department of Electronics, University of York, Heslington, York, YO10 5DD, UKEmail: [email protected]

Stuart Rimell

Media Engineering Research Group, Department of Electronics, University of York, Heslington, York, YO10 5DD, UK


Electronic sound synthesis continues to offer huge potential possibilities for the creation of new musical instruments. The tra-ditional approach is, however, seriously limited in that it incorporates only auditory feedback and it will typically make use ofa sound synthesis model (e.g., additive, subtractive, wavetable, and sampling) that is inherently limited and very often nonintu-itive to the musician. In a direct attempt to challenge these issues, this paper describes a system that provides tactile as well asacoustic feedback, with real-time synthesis that invokes a more intuitive response from players since it is based upon mass-springphysical modelling. Virtual instruments are set up via a graphical user interface in terms of the physical properties of basic well-understood sounding objects such as strings, membranes, and solids. These can be interconnected to form complex integratedstructures. Acoustic excitation can be applied at any point mass via virtual bowing, plucking, striking, specified waveform, orfrom any external sound source. Virtual microphones can be placed at any point masses to deliver the acoustic output. Theseaspects of the instrument are described along with the nature of the resulting acoustic output.

Keywords and phrases: physical modelling, music synthesis, haptic interface, force feedback, gestural control.

1. INTRODUCTION

Musicians are always searching for new sounds and newways of producing sounds in their compositions and per-formances. The availability of modern computer systems hasenabled considerable processing power to be made availableon the desktop and such machines have the capability of en-abling sound synthesis techniques to be employed in real-time, that would have required large dedicated computer sys-tems just a few decades ago. Despite the increased incorpo-ration of computer technology in electronic musical instru-ments, the search is still on for virtual instruments that arecloser in terms of how they are played to their physical acous-tic counterparts.

The system described in this paper aims to integrate mu-sic synthesis by physical modelling with novel control in-terfaces for real-time use in composition and live perfor-mances. Traditionally, sound synthesis has relied on tech-niques involving oscillators, wavetables, filters, time envelopeshapers, and digital sampling of natural sounds (e.g., [1]).More recently, physical models of musical instruments havebeen used to generate sounds which have more natural qual-ities and have control parameters which are less abstract and

more closely related to musicians’ experiences with acous-tic instruments [2, 3, 4, 5]. Professional electroacoustic mu-sicians require control over all aspects of the sounds withwhich they are working, in much the same way as a con-ductor is in control of the sound produced by an orchestra.Such control is not usually available from traditional syn-thesis techniques, since user adjustment of available synthe-sis parameters rarely leads to obviously predictable acous-tic results. Physical modelling, on the other hand, offers thepotential of more intuitive control, because the underlyingtechnique is related directly to the physical vibrating prop-erties of objects, such as strings and membranes with whichthe user can interact through inference relating to expecta-tion.

The acoustic output from traditional electronic musicalinstruments is often described as “cold” or “lifeless” by play-ers and audience alike. Indeed, many report that such soundsbecome less interesting with extended exposure. The acous-tic output from acoustic musical instruments, on the otherhand, is often described as “warm,” “intimate” or “organic.”The application of physical modelling for sound synthesisproduces output sounds that resemble much more closelytheir physical counterparts.


The success of a user interface for an electronic musicalinstrument might be judged on its ability to enable the userto experience the illusion of directly manipulating objects,and one approach might be the use of virtual reality inter-faces. However, this is not necessarily the best way to achievesuch a goal in the context of a musical instrument, since aperforming musician needs to be actively in touch visuallyand acoustically not only with other players, but also with theaudience. This is summed up by Shneiderman [6]: “virtualreality is a lively new direction for those who seek the immer-sion experience, where they block out the real world by hav-ing goggles on their heads.” In any case, traditionally trainedmusicians rely less on visual feedback with their instrumentand more on tactile and sonic feedback as they become in-creasingly accustomed to playing it. For example, Hunt andKirk [7] note that “observation of competent pianists willquickly reveal that they do not need to look at their fingers,let alone any annotation (e.g., sticky labels with the namesof the notes on) which beginners commonly use. Graphicsare a useful way of presenting information (especially to be-ginners), but are not the primary channel which humans usewhen fully accustomed to a system.”

There is evidence to suggest that the limited informa-tion available from the conventional screen and mouse in-terface is certainly limiting and potentially detrimental forcreating electroacoustic music. Buxton [8] suggests that thevisual senses are overstimulated, whilst the others are under-stimulated. In particular, he suggests that tactile input de-vices also provide output to enable the user to relate to thesystem as an object rather than an abstract system, “everyhaptic input device can also be considered to provide out-put. This would be through the tactile or kinaesthetic feed-back that it provides to the user . . . . Some devices actuallyprovide force feedback, as with some special joysticks.” Fitz-maurice [9] proposes “graspable user interfaces” as real ob-jects which can be held and manipulated, positioned, andconjoined in order to make interfaces which are more akin tothe way a human interacts with the real world. It has furtherbeen noted that the haptic senses provide the second mostimportant means (after the audio output) by which users ob-serve and interact with the behaviour of musical instruments[10], and that complex and realistic musical expression canonly result when both tactile (vibrational and textural) andproprioceptive cues are available in combination with auralfeedback [11].

Considerable activity exists on capturing human ges-ture http://www.media.mit.edu/hyperins/ and http://www.megaproject.org/ [12]. Specific to the control of musical in-struments is the provision of tactile feedback [13], electronickeyboards that have a feel close to a real piano [14], hap-tic feedback bows that simulate the feel and forces of realbows [15], and the use of finger-fitted vibrational devices inopen air gestural musical instruments [16]. Such haptic con-trol devices are generally one-off, relatively expensive, anddesigned to operate linked with specific computer systems,and as such, they are essentially inaccessible to the musi-cal masses. A key feature of our instrument is its potentialfor wide applicability, and therefore inexpensive and widely

available PC force feedback gaming devices are employed toprovide its real-time gestural control and haptic feedback.

The instrument described in this paper, known as Cy-matic [17], took its inspiration from the fact that traditionalacoustic instruments are controlled by direct physical ges-ture, whilst providing both aural and tactile feedback. Cy-matic has been designed to provide players with an immer-sive, easy to understand, as well as tactile musical experiencethat is more commonly associated with acoustic instrumentsbut rarely found with computer-based instruments. The au-dio output from Cymatic is derived from a physical mod-elling synthesis engine which has its origins in TAO [3]. Itshares some common approaches with other physical mod-elling sound synthesis environments such as Mosaic in [4]and Cordis-Anima in [5]. Cymatic makes use of the more in-tuitive approach to sound synthesis offered by physical mod-elling, to provide a building block approach to the creationof virtual instruments, based on elemental structures in one(string), two (sheet), three (block), or more dimensions thatcan be interconnected to form complex virtual acousticallyresonant structures. Such instruments can be excited acous-tically, controlled in real-time via gestural devices that incor-porate force feedback to provide a tactile response in addi-tion to the acoustic output, and heard after placing one ormore virtual microphones at user-specified positions withinthe instrument.

2. DESIGNING AND PLAYING CYMATICINSTRUMENTS

Cymatic is a physical modelling synthesis system that makesuse of a mass-spring paradigm with which it synthesisesresonating structures in real-time. It is implemented on aWindows-based PC machine in C++, and it incorporatessupport for standard force feedback PC gaming controllers toprovide gestural control and tactile feedback. Acoustic out-put is realised via a sound card that provides support forASIO audio drivers. Operation of Cymatic is a two-stage pro-cess: (1) virtual instrument design and (2) real-time soundsynthesis.

Virtual instrument design is accomplished via a graphi-cal interface, with which individual building block resonat-ing elements including strings, sheets, and solids can be in-corporated in the instrument and interconnected on a user-specified mass to mass basis. The ends of strings and edgesof sheets and blocks can be locked as desired. The tensionand mass parameters of the masses and springs within eachbuilding block element can be user defined in value and ei-ther left fixed or placed under dynamical control using a ges-tural controller during synthesis. Virtual instruments can becustomised in shape to enable arbitrary structures to be re-alised by deleting or locking any of the individual masses.Each building block resonating element will behave as avibrating structure. The individual axial resonant frequen-cies will be determined by the number of masses along thegiven axis, the sampling rate, and the specified mass and ten-sion values. Standard relationships hold in terms of the rel-ative values of resonant frequency between building blocks,

Gesture-Tactile Physical Modelling Synthesis 1003

String 1 Random

Sheet 1mic1

Block 1

Bow

Figure 1: Example build-up of a Cymatic virtual instrument start-ing with a string with 45 masses (top left), then adding a sheet of 7by 9 masses (bottom left), then a block of 4 by 4 by 3 masses (topright), and finally the completed instrument (bottom right). Mic1:audio output virtual microphone on the sheet at mass (4, 1). Ran-dom: random excitation at mass 33 of the string. Bow: bowed ex-citation at mass (2, 2, 2) of the block. Joins (dotted line) betweenstring mass 18 and sheet mass (1, 5). Join (dotted line) betweensheet mass (6, 3) and block mass (3, 2, 1).

for example, a string twice the length of another will have afundamental frequency that is one octave lower.

An excitation function, selected from the following list,can be placed on any mass within the virtual instrument:pluck, bow, random, sine wave, square wave, triangular wave,or live audio. Parameters relating to the selected excita-tion, including excitation force and its velocity and timeof application where appropriate can be specified by theuser. Multiple excitations can be specified on the basis thateach is applied to its own individual mass element. Mono-phonic audio output to the sound card is achieved via avirtual microphone placed on any individual mass withinthe instrument. Stereophonic output is available either fromtwo individual microphones or from any number of mi-crophones greater than two, where the output from each ispanned between the left and right channels as desired. Cy-matic supports whatever range of sampling rates that is avail-able on the sound card. For example, when used with anEridol UA-5 USB audio interface, the following are avail-able: 8 kHz, 9.6 kHz, 11.025 kHz, 12 kHz, 16 kHz, 22.05 kHz,24 kHz, 32 kHz, 44.1 kHz, 48 kHz, 88.2 kHz, and 96 kHz.

Figure 1 illustrates the process of building up a virtual in-strument. The instrument has been built up from a stringof 45 masses, a sheet of 7 by 9 masses, and a block of 4by 4 by 3 masses. There is an interconnection between thestring (mass 18 from the left) and the sheet (mass 1, 5) aswell as the sheet (mass 6, 3) and the block (mass 3, 2, 1) as in-dicated by the dotted lines (a simple process based on click-ing on the relevant masses). Two excitations have been in-

cluded: a random input to the string at mass 33 and a bowedexcitation to the block at mass (2, 2, 2). The basic sheet andblock have been edited. Masses have been removed from boththe sheet and the block as indicated by the gaps in their struc-ture and the masses on the back surface of the block haveall been locked. The audio output is derived from a virtualmicrophone placed on the sheet at mass (4, 1). These are in-dicated on the figure as random, bow, and mic1, respectively.Individual components, excitations, and microphones can beadded, edited, or deleted as desired.

The instrument is controlled in real-time using a Mi-crosoft Sidewinder Force Feedback Pro Joystick and a Log-itech iFeel mouse found on http://www.immersion.com. Thevarious gestures that can be captured by these devices can bemapped to any of the parameters that are associated with thephysical modelling process on an element-by-element ba-sis. The joystick offers four degrees of freedom (x, y, z-twistmovement and a rotary “throttle” controller) and eight but-tons. The mouse has two degrees of freedom (X, Y) and threebuttons. Cymatic parameters that can be controlled includethe mass or tension of any of the basic elements that makeup the instrument and the parameters associated with thechosen excitation, such as bowing pressure, excitation force,or excitation velocity. The buttons can be configured to sup-press the effect of any of the gestural movements to enable theuser to move to a new position while making no change andthen the change can be made instantaneously by releasing thebutton. In this way, step variations can be accommodated.

The force feedback capability of the joystick allows for theprovision of tactile feedback with a high degree of customis-ability. It receives its force instructions via MIDI through thecombined MIDI/joystick port on most PC sound cards, andCymatic outputs the appropriate MIDI messages to controlits force feedback devices. The Logitech iFeel mouse is anoptical mouse which implements Immersion’s iFeel technol-ogy (http://www.immersion.com). It contains a vibrotactiledevice to produce tactile feedback over a range of frequen-cies and amplitudes via the “Immersion Touchsense Entertain-ment” software, which converts any audio signal to tactilesensations. The force feedback amplitude is controlled by theacoustic amplitude of the signal from a user-specified virtualmicrophone, which might be involved in the provision of themain acoustic output, or it could solely be responsible for thecontrol of tactile feedback.

3. PHYSICAL MODELLING SYNTHESIS IN CYMATIC

Physical modelling audio synthesis in Cymatic is carried outby solving for the mechanical interaction between the massesand springs that make up the virtual instrument on a sample-by-sample basis. The central difference method of numericalintegration is employed as follows:

x(t + dt) = x(t) + v(t +

dt

2

)dt,

v(t +

dt

2

)= v

(t − dt

2

)+ a(t)dt,

(1)


where x = mass position, v = mass velocity, a = massacceleration, t = time, and dt = sampling interval.

The mass velocity is calculated half a time step ahead ofits position, which results in a more stable model than an im-plementation of the Euler approximation. The acceleration attime t of a cell is calculated by the classical equation

a = F

m, (2)

where F = the sum of all the forces on the cell and m = cellmass.

Three forces are acting on the cell:

Ftotal = Fspring + Fdamping + Fexternal, (3)

where Fspring = the force on the cell from springs connectedto neighbouring cells, Fdamping = the frictional damping forceon the cell due to the viscosity of the medium, Fexternal = theforce on the cell from external excitations.

Fspring is calculated by summing the force on the cell fromthe springs connecting it to its neighbours, calculated viaHooke’s law:

Fspring = k∑(

pn − p0), (4)

where k = spring constant, pn = the position of the nthneighbour, and p0 = the position of the current cell.

Fdamping is the frictional force on the cell caused by theviscosity of the medium in which the cell is contained. It isproportional to the cell velocity, where the constant of pro-portionality is the damping parameter of the cell.

Fdamping = −ρv(t), (5)

where ρ = the damping parameter of the cell, v(t) = the ve-locity of the cell at time t.

The acceleration of a particular cell at any instant can beestablished by combining these forces into (2)

a(t) = (1/m)(k∑(

pn − p0)− ρv(t) + Fexternal

). (6)

The position, velocity, and acceleration are calculated onceper sampling interval for each cell in the virtual instrument.Any virtual microphones in the instrument output their cellpositions to provide an output audio waveform.

4. CYMATIC OUTPUTS

Audio spectrograms provide a representation that enablesthe detailed nature of the acoustic output from Cymatic tobe observed visually. Figure 2 shows a virtual Cymatic instru-ment consisting of a string and a modified sheet which arejoined together between mass 30 (from the left) on the stringto mass (6, 3) on the sheet. A random excitation is appliedat mass 10 of the string and a virtual microphone (mic1) islocated at mass (4, 3) of the sheet. Figure 3 shows the forcefeedback joystick settings dialog used to control the virtualinstrument and it can be seen that the component mass ofthe string, the component tension, and damping and massof the sheet are controlled by the X, Y, Z and slider (throttle)

Joined masses: mass 30 on string to mass (6.3) on sheet

String 1

Sheet 1

Figure 2: Cymatic virtual instrument consisting of a string andmodified sheet. They are joined together between mass 30 (fromthe left) on the string to mass (6, 3) on the sheet. A random excita-tion is applied at point 10 of the string and the virtual microphoneis located at mass (6, 3) of the sheet.

Figure 3: Force feedback joystick settings dialog.

4

2

kHz

1 s

Figure 4: Spectrogram of output from the Cymatic virtual instru-ment, shown in Figure 2, consisting of a string and modified sheet.

functions of the joystick. Three of the buttons have been setto suppress X, Y, and Z; a feature which enables a new settingto be jumped to as desired, for example, by pressing button1, moving the joystick in the X axis and then releasing button1. Force feedback is applied based on the output amplitudelevel from mic1.

Gesture-Tactile Physical Modelling Synthesis 1005

Figure 5: Spectrogram of a section of “the child is sleeping” by Stuart Rimell showing Cymatic alone (from the start to A), the word “hush”sung by the four-part choir (A to B) and the “st” of “still” at C.

Figure 4 shows a spectrogram of the output from mic1of the instrument. The tonality visible (horizontal bandingin the spectrogram) is entirely due to the resonant propertiesof the string and sheet themselves, since the input excitationis random. Variations in the tonality are rendered throughgestural control of the joystick, and the step change notablejust before half way through is a result of using one of the“suppress” buttons.

Cymatic was used in a public live concert in Decem-ber 2002, for which a new piece “the child is sleeping” wasspecially composed by Stuart Rimell for a capella choir andCymatic (http://www.users.york.ac.uk/∼dmh). It was per-formed by the Beningbrough Singers in York, conducted byDavid Howard. The composer performed the Cymatic part,which made use of three cymbal-like structures controlledby the mouse and joystick. The choir provided a backingin the form of a slow moving carol in four-part harmony,while Cymatic played an obligato solo line. The spectrogramin Figure 5 illustrates this with a section which has Cymaticalone (up to point A), and then the choir enters singing“hush be still,” with the “sh” of “hush” showing at point Band the “st” of “still” at point C. In this particular Cymaticexample, the sound colours being used lie at the extremes ofthe vocal spectral range, but there are clearly tonal elementsin the Cymatic output visible. Indeed, these were essential asa means of giving the choir their starting pitches.

5. DISCUSSION AND CONCLUSIONS

An instrument known as Cymatic has been described, whichprovides its players with an immersive, easy to understand,

as well as tactile musical experience that is rarely found withcomputer-based instruments, but commonly expected fromacoustic musical instruments. The audio output from Cy-matic is derived from a physical modelling synthesis engine,which enables virtual instruments with arbitrary shapes tobe built up by interconnecting one (string), two (sheet),three (block), or more dimensional basic building blocks. Anacoustic excitation chosen from bowing, plucking, striking,or waveform is applied at any mass element, and the output isderived from a virtual microphone placed at any other masselement. Cymatic is controlled via gestural controllers thatincorporate force feedback to provide the player with tactileas well as acoustic feedback.

Cymatic has the potential to enable new musical instru-ments to be explored, that have the potential to produce orig-inal and inspiring new timbral palates, since virtual instru-ments that are not physically realizable can be implemented.In addition, interaction with these instruments can includeaspects that cannot be used with their physical counterparts,such as deleting part of the instrument while it is sounding,or changing its physical properties in real-time during per-formance. The design of the user interface ensures that all ofthese activities can be carried out in a manner that is moreintuitive than with traditional electronic instruments, sinceit is based on the resonant properties of physical structures.A user can therefore make sense of what she or he is doingthrough reference to the likely behaviour of strings, sheets,and blocks. Cymatic has the further potential in the future(as processing speed increases further) to move well awayfrom the real physical world, while maintaining the link withthis intuition, since the spatial dimensionality of the virtual


instruments can in principle be extended well beyond thethree of the physical world.

Cymatic provides the player with an increased sense ofimmersion, which is particularly useful when developingperformance skills since it reinforces the visual and auralfeedback cues and helps the player internalise models ofthe instrument’s response to gesture. Tactile feedback alsohas the potential to prove invaluable in group performance,where traditionally computer instruments have placed anover-reliance on visual feedback, thereby detracting from theplayer’s visual attention which should be directed elsewherein a group situation, for example, towards a conductor.

ACKNOWLEDGMENTS

The authors acknowledge the support of the Engineering andPhysical Sciences Research Council, UK, under Grant num-ber GR/M94137. They also thank the anonymous referees fortheir helpful and useful comments.

REFERENCES

[1] M. Russ, Sound Synthesis and Sampling, Focal Press, Oxford,UK, 1996.

[2] J. O. Smith III, “Physical modelling synthesis update,” Com-puter Music Journal, vol. 20, no. 2, pp. 44–56, 1996.

[3] M. D. Pearson and D. M. Howard, “Recent developments withTAO physical modelling system,” in Proc. International Com-puter Music Conference, pp. 97–99, Hong Kong, China, August1996.

[4] J. D. Morrison and J. M. Adrien, “MOSAIC: A framework formodal synthesis,” Computer Music Journal, vol. 17, no. 1, pp.45–56, 1993.

[5] C. Cadoz, A. Luciani, and J. L. Florens, “CORDIS-ANIMA: Amodelling system for sound and image synthesis, the generalformalism,” Computer Music Journal, vol. 17, no. 1, pp. 19–29,1993.

[6] J. Preece, “Interview with Ben Shneiderman,” in Human-Computer Interaction, Y. Rogers, H. Sharp, D. Benyon, S. Hol-land, and J. Preece, Eds., Addison Wesley, Reading, Mass,USA, 1994.

[7] A. D. Hunt and P. R. Kirk, Digital Sound Processing for Musicand Multimedia, Focal Press, Oxford, UK, 1999.

[8] W. Buxton, “There is more to interaction than meets the eye:Some issues in manual input,” in User Centered System Design:New Perspectives on Human-Computer Interaction, D. A. Nor-man and S. W. Draper, Eds., pp. 319–337, Lawrence ErlbaumAssociates, Hillsdale, NJ, USA, 1986.

[9] G. W. Fitzmaurice, Graspable user interfaces, Ph.D. thesis,University of Toronto, Ontario, Canada, 1998.

[10] B. Gillespie, “Introduction haptics,” in Music, Cognition, andComputerized Sound: An Introduction to Psychoacoustics, P. R.Cook, Ed., pp. 229–245, MIT Press, London, UK, 1999.

[11] D. M. Howard, S. Rimell, A. D. Hunt, P. R. Kirk, and A. M.Tyrrell, “Tactile feedback in the control of a physical mod-elling music synthesiser,” in Proc. 7th International Conferenceon Music Perception and Cognition, C. Stevens, D. Burnham,G. McPherson, E. Schubert, and J. Renwick, Eds., pp. 224–227, Casual Publications, Adelaide, Australlia, 2002.

[12] S. Kenji, H. Riku, and H. Shuji, “Development of an au-tonomous humanoid robot, iSHA, for harmonized human-machine environment,” Journal of Robotics and Mechatronics,vol. 14, no. 5, pp. 324–332, 2002.

[13] C. Cadoz, A. Luciani, and J. L. Florens, “Responsive inputdevices and sound synthesis by simulation of instrumentalmechanisms: The Cordis system,” Computer Music Journal,vol. 8, no. 3, pp. 60–73, 1984.

[14] B. Gillespie, Haptic display of systems with changing kinematicconstraints: The virtual piano action, Ph.d. dissertation, Stan-ford University, Stanford, Calif, USA, 1996.

[15] C. Nichols, “The vBow: Development of a virtual violin Bowhaptic human-computer interface,” in Proc. New Interfaces forMusical Expression Conference, pp. 168–169, Dublin, Ireland,May 2002.

[16] J. Rovan and V. Hayward, “Typology of tactile soundsand their synthesis in gesture-driven computer music perfor-mance,” in Trends in Gestural Control of Music, M. Wander-ley and M. Battier, Eds., pp. 297–320, Editions IRCAM, Paris,France, 2000.

[17] D. M. Howard, S. Rimell, and A. D. Hunt, “Force feedbackgesture controlled physical modelling synthesis,” in Proc. Con-ference on New Musical Instruments for Musical Expression, pp.95–98, Montreal, Canada, May 2003.

David M. Howard holds a first-class B.S.degree in electrical and electronic engineer-ing from University College London (1978),and a Ph.D. in human communication fromthe University of London (1985). His Ph.D.topic was the development of a signal pro-cessing unit for use with a single channelcochlear implant hearing aid. He is nowwith the Department of Electronics at theUniversity of York, UK, teaching and re-searching in music technology. His specific research areas includethe analysis and synthesis of music, singing, and speech. Currentactivities include the application of bio-inspired techniques formusic synthesis, physical modelling synthesis for music, singingand speech, and real-time computer-based visual displays for pro-fessional voice development. David is a Chartered Engineer, a Fel-low of the Institution of Electrical Engineers, and a Member of theAudio Engineering Society. Outside work, David finds time to con-duct a local 12-strong choir from the tenor line and to play the pipeorgan.

Stuart Rimell holds a B.S. in electronic mu-sic and psychology as well as an M.S. in dig-ital music technology, both from the Uni-versity of Keele, UK. He worked for 18months with David Howard at the Univer-sity of York on the development of the Cy-matic system. There he studied electroa-coustic composition for 3 years under MikeVaughan and Rajmil Fischman. Stuart is in-terested in the exploration of new and freshcreative musical methods and their computer-based implementa-tion for electronic music composition. Stuart is a guitarist and healso plays euphonium, trumpet, and piano and has been writingmusic for over 12 years. His compositions have been recognized in-ternationally through prizes from the prestigious Bourge Festivalof Electronic Music in 1999 and performances of his music world-wide.


Vibrato in Singing Voice: The Link betweenSource-Filter and Sinusoidal Models

Ixone ArroabarrenDepartamento de Ingenierıa Electrica y Electronica, Universidad Publica de Navarra, Campus de Arrosadia, 31006 Pamplona, SpainEmail: [email protected]

Alfonso CarlosenaDepartamento de Ingenierıa Electrica y Electronica, Universidad Publica de Navarra, Campus de Arrosadia, 31006 Pamplona, SpainEmail: [email protected]

Received 4 July 2003; Revised 30 October 2003

The application of inverse filtering techniques for high-quality singing voice analysis/synthesis is discussed. In the context ofsource-filter models, inverse filtering provides a noninvasive method to extract the voice source, and thus to study voice quality.Although this approach is widely used in speech synthesis, this is not the case in singing voice. Several studies have proved thatinverse filtering techniques fail in the case of singing voice, the reasons being unclear. In order to shed light on this problem, wewill consider here an additional feature of singing voice, not present in speech: the vibrato. Vibrato has been traditionally studiedby sinusoidal modeling. As an alternative, we will introduce here a novel noninteractive source filter model that incorporatesthe mechanisms of vibrato generation. This model will also allow the comparison of the results produced by inverse filteringtechniques and by sinusoidal modeling, as they apply to singing voice and not to speech. In this way, the limitations of theseconventional techniques, described in previous literature, will be explained. Both synthetic signals and singer recordings are usedto validate and compare the techniques presented in the paper.

Keywords and phrases: voice quality, source-filter model, inverse filtering, singing voice, vibrato, sinusoidal model.

1. INTRODUCTION

Inverse filtering provides a noninvasive method to studyvoice quality. In this context, high-quality speech synthesisis developed using a source-filter model, where voice textureis controlled by glottal source characteristics. Efforts to ap-ply this approach to singing voice have failed, the reasonsbeing not clear: either the unsuitability of the model, or thedifferent range of frequencies, or both, could be the cause.The lyric singers, being professionals, have an efficiency re-quirement, and as a result, they are educated to change theirformants position moving them towards the first harmonicsposition, what could also be another reason of the model’sfailure [1].

This paper purports to shed light on this problem bycomparing two salient methods for glottal source and vo-cal tract response (VTR) estimation, with a novel frequency-domain method proposed by the authors. In this way, theinverse filtering approach will be tested in singing voice anal-ysis. In order to have a benchmark, the source-filter modelwill be compared to sinusoidal model and this comparisonwill be performed thanks to the particular feature of singingvoice: vibrato.

Regarding the voice production models, we can distin-guish two approaches as follows.

(i) On the one hand, interactive models are closer to thephysical features of the vocal system. This system is com-posed by two resonant cavities (subglottal and supraglot-tal) which are connected by a valve, the glottis, where vo-cal folds are located. The movement of the vocal folds pro-vides the harmonic nature of the air flow of voiced sounds,and also controls the coupling between the two resonantcavities, which will be different during the open and closedphases. As a result of this effect, the VTR will change dur-ing a single fundamental period and there will be a relation-ship between the glottal source and the VTR. This physicalbehavior has been modeled in several ways, by physical mod-els [2] or aerodynamic models [3, 4]. From the signal pro-cessing point of view, in [4] the VTR variation is related tothe glottal area, which controls the coupling of the cavities,and this relationship is represented by a frequency modula-tion of the central frequency and bandwidth of the formants.Other effect of the source-tract interaction is the increaseof the skewness of the glottal source [4], which emphasizesthe difference between the glottal area and the glottal source[5].


(ii) On the other hand, Non Interactive Models separatethe glottal source and the VTR, and both are independentlymodeled as linear time-varying systems. This is the case ofthe source-filter model proposed by Fant in [6]. The VTR ismodeled as an all-pole filter, in the case of nonnasal sounds.For the glottal source several waveform models have beenproposed [7, 8, 9], but all of them try to include some of thefeatures of the source-tract interaction, typically the asym-metric shape of the pulse. These models provide a high qual-ity synthesis framework for the speech with a low compu-tational complexity. The synthesis is preceded by an anal-ysis stage, which is divided into two steps: an inverse fil-tering step where the glottal source and the VTR are sepa-rated [9, 10, 11, 12, 13] and a parameterization step wherethe most relevant parameters of both elements are obtained[14, 15, 16].

In general, inverse filtering techniques yield worse re-sults as the fundamental frequency increases, as is the caseof women and children in speech and singing voice. In thelatter case, singing voice, the number of published works isvery scarce [1, 17]. In [1], the glottal source features are stud-ied in speech and singing voice by acoustic and electroglotto-graphic signals [18, 19]. From these works, it is not apparentwhich is the main limitation of inverse filtering in singingvoice. It might be possible that the source-tract interactionwas more complex than in speech, what would represent aparadox in the noninteractive assumption [20]. Other rea-son mentioned in [1] is that perhaps the glottal source mod-els used in speech are not suitable for singing voice. Thesestatements are not demonstrated, but are interesting ques-tions that should be answered.

On the other hand, in [17] the noninteractive source-filter model is used as a high-quality singing voice synthesisapproach. The main contribution of that work is the devel-opment of an analysis procedure that estimates the param-eters of the synthesis model [12, 21]. However, there is noevidence that could point to differences between speech andsinging as it is indicated in [1].

One of the goals of the present work is to clarify whetherthe noninteractive models are able to model singing voice inthe same way as high-quality speech, or on the contrary, thesource-tract interaction is different from speech, and pre-cludes this linear model assumption. If the noninteractivemodel could model singing voice, the reason of the failureof inverse filtering techniques would be just the high funda-mental frequency of singing voice.

To this end, we will compare in this paper three differ-ent inverse filtering techniques, one of them novel and pro-posed recently by the authors in order to obtain the source-filter decomposition. Though they work correctly for speechand low-frequency signals, we will show their limitations asthe fundamental frequency increases. This is described inSection 2.

Since fundamental frequency in singing voice is higherthan in speech, it seems obvious that the above-mentionedmethods fail, apparently due to the limited spectral informa-tion provided in high pitched signals. To compensate for that,we claim that the introduction of a feature such as vibrato

SingingvoiceLip radiation

diagram

1− l · z−1

VTR

Glottalsource

Figure 1: Noninteractive source-filter model of voice productionsystem.

may serve to increase the information available by virtue ofthe frequency modulated nature, and therefore wider band-width, of vibrato [22, 23, 24]. Frequency variations are in-fluenced by the VTR, and this effect can be used to obtaininformation about it.

With this in mind, it is not surprising that vibrato hasbeen traditionally analyzed by sinusoidal modeling [25, 26],the most important limitation being the impossibility to sep-arate the sound generation and the VTR. In Section 3, wewill take a step forward by introducing a source-filter model,which accounts for the physical origin of the main features ofsinging voice. Making use of this model, we will also demon-strate how the simpler sinusoidal model can serve to obtain acomplementary information to inverse filtering, particularlyin those conditions where the latter method fails.

2. INVERSE FILTERING

Along this section, the noninteractive source-filter model,depicted in Figure 1, will be considered and some of the pos-sible estimation algorithms for it will be reviewed.

According to the block diagram in Figure 1, singing voiceproduction can be modeled by a glottal source excitation thatis linearly modified by the VTR and the lip radiation dia-gram. Typically, the VTR is modeled by an all-pole filter, andrelying on the linearity of the model, the lip radiation sys-tem is combined with the glottal source, in such a way thatthe glottal source derivative (GSD) is considered as the vocaltract excitation.

In this context, during the last decades many inverse fil-tering algorithms to estimate the model elements have beenproposed. This technique is usually accomplished in twosteps. In the first one, the GSD waveform and the VTR areestimated. In the second one, these signals are parameterizedin a few numerical values. This whole analysis can be practi-cally implemented in several ways. For the sake of clarity, wecan group these possibilities into two types.

(i) In the first group, the two identification steps are com-bined in a single algorithm, for instance in [9, 12]. There,a mathematical model for GSD and the autoregressive (AR)model for the VTR are considered, and then authors estimatesimultaneously the VTR and the GSD model parameters. Inthis way, the GSD model parameterizes a given phonationtype. Several different algorithms follow this structure, butall of them are invariably time domain implementations thatrequire glottal closure instant (GCI) detection [27]. There-fore, they suffer from a high computational load, what makesthem very cumbersome.

Vibrato in Singing Voice 1009

Voicesource

parametersVoice sourceparameters

optimization

PreemphasisVoice sourcemodel

CovarianceLPC

Vocal tractparameters

Preemphasis

Speech

Figure 2: Block diagram of the AbS inverse filtering algorithm.

(ii) The procedures in the second group split the wholeprocess into two stages. Regarding the first step, differentinverse filtering techniques are proposed, [11, 13]. These al-gorithms remove the GSD effect from the speech signal andthe VTR is obtained by linear prediction (LP) [28] or alterna-tively by discrete all-pole (DAP) modeling [29], which avoidsthe fundamental frequency dependence of the former.

For this comparative study three inverse filtering ap-proaches have been selected. The first one is the analysis bysynthesis (AbS) procedure presented in [9], the second oneis the one proposed by the authors in [13], Glottal SpectrumBased (GSB) inverse filtering. In this way, both groups of al-gorithms mentioned above are represented. In addition, theClosed Phase Covariance (CPC) [10] has been added to thecomparison. This approach is difficult to classify because itonly obtains the VTR, as it is the case in the second group,but it is a time domain implementation as in the first one.The most interesting feature of this algorithm is that it is lessaffected by the formant ripple due to the source-tract inter-action, because it only takes into account the time intervalwhen the vocal folds are closed. In what follows, the threeapproaches will be shortly described, and finally compared.

2.1. Analysis by synthesis

This inverse filtering algorithm was proposed in [9]. It isbased on covariance LPC [29], but the least squares error ismodified in order to include the input of the system:

E =N−1∑n=0

(s(n)− s(n)

)2

=N−1∑n=0

(s(n)−

( p∑k=1

aks(n− k) + ap+1g(n)

))2

,

(1)

where g(n) represents the GSD, and

H(z) = ap+1

1−∑pk=1 akz

−k (2)

represents the VTR. Since neither VTR nor GSD parametersare known, an iterative algorithm is proposed and a simul-

taneous search is developed. The block diagram of the algo-rithm is represented in Figure 2.

As in covariance LP without source, this approach al-lows shorter analysis windows. However, the stability of thesystem is not guaranteed and a stabilization step must be in-cluded with this purpose. Also, and since it is a time domainimplementation, the voice source model must be synchro-nized with the speech signal and a high sampling frequency ismandatory in order to obtain satisfactory results. As a result,the computational load is also high. Regarding the GSD pa-rameter optimization, it is dependent on the chosen model.In the results shown in Section 2.4, the LF model is selectedbecause it is one of the most powerful GSD models, and itallows an independent control of the three main features ofthe glottal source: open quotient, asymmetry coefficient andspectral tilt. The disadvantage of this model is its computa-tional load. For more details on the topic readers are referredto [8].

Regarding fundamental frequency limits, it is shown in[1] that this algorithm provides unsatisfactory results formedium and high pitched signals.

2.2. Glottal spectrum based inverse filtering

This technique was proposed by the authors in [13] and willbe briefly described here. Unlike the technique described inthe previous section, it is essentially a frequency domain im-plementation. In the AbS approach, the GSD effect was in-cluded in the LP error, and the AR coefficients were obtainedby Covariance LPC. In our case, a short term spectrum ofspeech is considered (3 or 4 fundamental periods), and theGSD effect is removed from the speech spectrum. Then, theAR coefficients of (2) are obtained by the DAP modeling[29].

For this spectral implementation, the KLGLOTT88model [7] has been considered. It is less powerful than theLF model, but of a simpler implementation.

As it is shown in Figure 3, there is a basic voicing wave-form controlled by the open quotient (Oq) and the amplitudeof voicing (AV), the spectral tilt being included by a first-order lowpass filter.


GSD

Lowpass filterspectral tilt

11−µz−1

g(t)Basic voicing

waveform

AVOq

Figure 3: Block diagram of the KLGLOTT88 model.


Basicvoicing

spectrum

V. tractand ST

separation

DAP modelingV. tract + ST

(N + 1)th order−Peak

detectionShort termspectrum

Speech

Figure 4: Block diagram of the GSB inverse filtering algorithm.

In our inverse filtering algorithm, once the short termspectrum is calculated, the glottal source effect is removed, byspectral division, by using the spectrum of the basic voicingwaveform (3), which can be directly obtained by the Fouriertransform of the basic voicing waveform [30]:

G( f ) = 27 AV2 Oq(2π f )3

[je− j2π f Oq To

2+

1 + 2e− j2π f Oq To

2π f Oq To

+ 3 j1− e− j2π f Oq To(

2π f Oq To)2

].

(3)

The spectral tilt (ST) and the VTR are combined in an (N +1)th order all-pole filter. The block diagram of the algorithmis shown in Figure 4.

Since DAP modeling is the most important part of thealgorithm, we should explain its rationale. In classical auto-correlation LP [28], it is a well-known effect that as funda-mental frequency increases the resulting transfer function isbiased by the spectral peaks of the signal. This happens be-cause the signal is assumed to be the impulse response of thesystem, and this assumption is obviously not entirely correct.In order to avoid this problem, an alternative proposed in[29] is to obtain the LP error based on the spectral peaks,instead of on the time domain samples. Unfortunately, thiserror calculation is based on an aliased version of the rightautocorrelation of the signal, and this aliasing grows as thefundamental frequency increases. Then, the resulting trans-fer function is not correct again. To solve this problem, theDAP modeling uses the Itakura-Saito error, instead of theleast squares error, and it can be shown that the error is min-imized using only the spectral peaks information. The de-tails of the algorithm are explained in [29]. This techniqueallows higher fundamental frequencies than classical auto-correlation LP, but for proper operation requires an enoughnumber of spectral peaks in order to estimate the right trans-

GSDVoice

Time (s)

Closed phase

0.105 0.11 0.115 0.12 0.125

−3.5−3

−2.5−2

−1.5−1−0.5

0

0.5

11.5

Nor

mal

ized

ampl

itu

de

Figure 5: Closed phase interval in voice.

Closedphase detection

EGG


GCIdetection

CovarianceLPC

Intervalselection

Speech

Figure 6: Closed phase covariance (CPC).

fer function. So, this inverse filtering algorithm will also havea limit in the highest achievable fundamental frequency.

2.3. Closed phase covariance

This inverse filtering technique was proposed in [31]. It isalso based on covariance LP, as the AbS approach explainedabove. However, instead of removing the effect of the GSDfrom a long speech interval, the classical covariance LP takesonly into account a portion of a single cycle where the vocalfolds are closed. In this way, and in the considered time in-terval, there is no GSD information to be removed, and theapplication of covariance LP will lead to the right transferfunction. Considering the linearity of the model shown inFigure 1, the closed phased interval will be the time intervalwhere the GSD is zero. This situation is depicted in Figure 5.

The most difficult step in this technique is to detect theclosed phase in the speech signal. In [10], a two-channelspeech processing is proposed, making use of electroglotto-graphic signals to detect the closed phase. Electroglottogra-phy (EGG) is a technique used to indirectly register laryngealbehavior by measuring the electrical impedance across thethroat during speech. Rapid variation in the conductance ismainly caused by movement of the vocal folds. As they ap-proximate and the physical contact between them increases,the impedance decreases, what results in a relatively highercurrent flow through the larynx structures. Therefore, thissignal will provide information about the contact surface ofthe vocal cords.

The complete inverse filtering algorithm is represented inFigure 6.


GSBCPCAbSOriginal GSD

0.02 0.025 0.03 0.035 0.04 0.045 0.05 0.055 0.06

Time (s)

(a)

GSBCPCAbSOriginal GSD

0.014 0.016 0.018 0.02 0.022 0.024 0.026

Time (s)

(b)

GSBCPCAbSOriginal VTR

0 1000 2000 3000 4000 5000 6000 7000

Frequency (Hz)

−50−40−30−20−10

01020304050

Am

plit

ude

(dB

)

(c)

GSBCPCAbSOriginal VTR

0 1000 2000 3000 4000 5000 6000 7000

Frequency (Hz)

−40−30−20

−100

1020304050

Am

plit

ude

(dB

)

(d)

Figure 7: (a) Estimated GSD. F0 = 100 Hz, vowel “a.” (b) Estimated GSD. F0 = 300 Hz, vowel “a.” (c) Estimated VTR. F0 = 100 Hz, vowel“a.” (d) Estimated VTR. F0 = 300 Hz, vowel “a.”

In Figure 6, a GCI detection block [27] is included, be-cause, even though both acoustic and electroglottographicsignals are simultaneously recorded, there is a propaga-tion delay between the acoustic signal recorded on themicrophone and the impedance variation at the neck of thesinger. Thus, a precise synchronization is mandatory.

Since this technique is based on the covariance LP, it maywork with very short window lengths. However, as the fun-damental frequency increases, the time length of the closedphase gets shorter, and there is much less information left forthe vocal tract estimation. This fact imposes a fundamentalfrequency limit, even using the covariance LP.

2.4. Practical results

Once the basics of three inverse filtering techniques havebeen presented and described, they will be compared by sim-ulations and also by making use of natural singing voicerecords. The main goal of this analysis is to see how the threetechniques are compared in terms of their fundamental fre-quency limitations.

2.4.1. Simulation results

First, the non interactive model for voice production shownin Figure 1 will be used in order to synthesize some artifi-cial signals for test. The lip radiation effect and the glottalsource are combined in a mathematical model for the GSD,also making use of the LF model. It is well known [1, 17]that the formant position can affect inverse filtering results.In [3], it is also shown that the lower first formant central fre-quency is, the higher is the source-tract interaction. So, theinteraction is higher in vowels where the first format centralfrequency is lower. Therefore, and in order to cover all pos-sible situations, two vocal all-pole filters have been used forsynthesizing the test signal: one representing Spanish vowel“a,” and the other one representing Spanish vowel “e.” In thislatter case, the first formant is located at lower frequencies.

In order to see the fundamental frequency dependence ofinverse filtering techniques, this parameter has been variedfrom 100 Hz to 300 Hz in 25 Hz steps. For each fundamen-tal frequency, the three algorithms have been applied and theGSD as well as the VTR have been estimated. In Figures 7a to


GSBCPCAbS

90 140 190 240 290

F0 (Hz)

00.020.040.060.08

0.10.120.140.160.18

0.2E

rror

F1

(a)

GSBCPCAbS

90 140 190 240 290

F0 (Hz)

00.020.040.060.08

0.10.120.140.160.18

0.2

Err

orF1

(b)

GSBCPCAbS

90 140 190 240 290

F0 (Hz)

00.10.20.30.40.50.60.70.80.9

1

Err

orG

SD

(c)

GSBCPCAbS

90 140 190 240 290

F0 (Hz)

00.10.20.30.40.50.60.70.80.9

1

Err

orG

SD

(d)

Figure 8: Fundamental frequency dependence. (a) ErrorF1 in vowel “a.” (b) ErrorF1 in vowel “e.” (c) ErrorGSD in vowel “a.” (d) ErrorGSD invowel “e.”

7d, the glottal GSD and the VTR estimated by the three ap-proaches are shown for two different fundamental frequen-cies. Note that in them, and in other figures, DC level hasbeen arbitrarily modified to facilitate comparisons.

Comparing the results obtained by the three inverse fil-tering approaches, it is shown that as fundamental frequencyincreases the error in both GSD and VTR increases. Recall-ing the implementation of the algorithms, the CPC uses onlythe time interval where the GSD is zero. When the funda-mental frequency is low, it is possible to see that the resultof this technique is the closest one to the original one. Inthe case of the other two techniques, both have slight vari-ations in the closed phase, because in both cases the glottalsource effect is removed from the speech signal in an approx-imated manner. Otherwise, when the fundamental frequencyis high, the AbS approach leads comparatively to the best re-sult. However, it provides neither the right GSD, nor the rightVTR.

In Figure 8, the relative error in the first formant centralfrequency and the error in the GSD are represented for the

three methods, calculated according to the following expres-sions:

ErrorF1 =∣∣F1 − F1

∣∣F1

,

ErrorGSD =∑N−1

n=0

∣∣g(n)− g(n)∣∣2

N,

(4)

where F1 represents the first formant central frequency andg(n) and g(n) are the original and estimated GSD waveforms,respectively.

Although the simulation model does not take into ac-count source-tract interactions, Figure 8 shows that inversefiltering results are dependent on the first formant position,being worse as it moves to lower frequencies. Also, it is possi-ble to see that both errors increase as fundamental frequencyincreases. Therefore, the main conclusion of this simulation-based study is that the inverse filtering results have funda-mental frequency dependence even when applied to a noninteractive source-filter model.


GSBCPCAbS

1.269 1.274 1.279 1.284 1.289 1.294 1.299

Time (s)

(a)

GSBCPCAbS

0 1000 2000 3000 4000 5000 6000 7000

Frequency (Hz)

−100

−80

−60

−40

−20

0

Am

plit

ude

(dB

)

(b)

GSBCPCAbS

0.765 0.767 0.769 0.771 0.773 0.775 0.777

Time (s)

(c)

GSBCPCAbS

0 1000 2000 3000 4000 5000 6000 7000

Frequency (Hz)

−100

−80

−60

−40

−20

0

Am

plit

ude

(dB

)

(d)

Figure 9: (a) Estimated GSD. F0 = 123 Hz, vowel “a.” (b) Estimated VTR. F0 = 123 Hz, vowel “a.” (c) Estimated GSD. F0 = 295 Hz, vowel“a.” (d) Estimated VTR. F0 = 295 Hz, vowel “a.”

2.4.2. Natural singing voice results

For this analysis, three male professional singers wererecorded: two tenors and one baritone. They were asked tosing notes of different fundamental frequency values, in or-der to register samples of all of their tessitura. Besides, differ-ent vocal tract configurations are considered, and thus, thisexercise was repeated for the five Spanish vowels “a,” “e,” “i,”“o,” “u.” The singing material was recorded in a professionalstudio, in such a way that reverberation was reduced as muchas possible. Acoustic and electroglottographic signals weresynchronously recorded, with a bandwidth of 20 KHz, andstored in .wav format. In order to remove low frequency am-bient noise, the signals were filtered out by a high pass lin-ear phase FIR filter whose cut-off frequency was set to a 75%of the fundamental frequency. In the case of electroglotto-graphic signals, this filtering was also applied because of lowfrequency artifacts typical of this kind of signals due to larynxmovements.

In Figures 9a to 9c, the results obtained for differentfundamental frequencies and vowel “a,” for the same singer,

are shown. These results are also representative of the othersingers’ recordings and of the different vowels.

By comparing Figures 9a and 9c, it is possible to concludethat in the case of a low fundamental frequency, the three al-gorithms provide very close results. In the case of CPC, theGSD presents less formant ripple in the closed phase interval.Regarding the VTR, the central frequencies of the formantsand the frequency responses are very similar. Nevertheless,in the case of a high fundamental frequency, the resultingGSD of the three analyses are very different from those ofFigure 9a, and also from the waveform model provided bythe LF model. Also, the calculated VTR is very different forthe three methods. Thus, conclusions with natural recordedvoices are similar to those obtained with synthetic signals.

3. VIBRATO IN SINGING VOICE

3.1. Definition

In Section 2, inverse filtering techniques, successfully em-ployed in speech processing, have been used for singing voice


processing. It has been shown that as fundamental frequencyincreases, they reach a limit and thus an alternative techniqueshould be used. As we will show in this section, the introduc-tion of vibrato in singing voice provides more informationabout what can be happening.

Vibrato in singing voice could be defined as a smallquasiperiodic variation of the fundamental frequency of thenote. As a result of this variation, all of the harmonics of thevoice will also present an amplitude variation, because of thefiltering effect of the VTR. Due to these nonstationary char-acteristics of the signal, singing voice has been modeled bythe modified sinusoidal model [25, 26]:

s(t) =N−1∑i=0

ai(t) cos θi(t) + r(t), (5)

where

θi(t) = 2π∫ t

−∞fi(τ)dτ (6)

and ai(t) is the instantaneous amplitude of the partial, fi(t) theinstantaneous frequency of the partial, and r(t) the stochasticresidual.

The acoustic signal is composed by a set of components,(partials), whose amplitude and frequency change with time,plus a stochastic residual, which is modeled by a spectraldensity time-varying function. Also in [25, 26], detailed in-formation is given on how these time-varying characteristicscan be measured.

Of the two features of a vibrato signal, frequency andamplitude variations, frequency is the most widely stud-ied and characterized. In [32, 33], the instantaneous fre-quency is characterized and decomposed into three maincomponents which account for three musically meaningfulcharacteristics, respectively. Namely,

f (t) = i(t) + e(t) cosϕ(t), (7)

where

ϕ(t) = 2π∫ t

−∞r(τ)dτ (8)

f (t) being the instantaneous frequency, i(t) the intonation ofthe note, which corresponds to slow variations of pitch; e(t)represents the extent or amplitude of pitch variations, andr(t) represents the rate or frequency of pitch variations.

All of them are time-dependent magnitudes and rely onthe musical context and singer’s talent and training. In thecase of intonation, its value depends on the sung note, andthus, on the context. But extent and rate are mostly singer-dependent features, typical values being a 10% of the intona-tion value and 5 Hz, respectively.

Regarding the amplitude variation of the harmonics dur-ing vibrato, a well-established parameterization is not ac-cepted, and probably it does not exist, because this varia-tion is different for all of the harmonics. It is therefore notstrange that amplitude variation has been the topic of inter-

0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000

Frequency (Hz)

−90

−80

−70

−60

−50

−40

−30

−20

−10

0

Am

plit

ude

(dB

)

Figure 10: AM-FM representation for the first 20 harmonics. Ane-choic tenor recording F0 = 220 Hz, vowel “a.”

est of some few papers. The first work on this topic is [34],where the perceptual relevance on spectral envelope discrim-ination of the instantaneous amplitude is proven. In [22], therelevance of this feature is experimentally demonstrated inthe case of synthesis of singing voice. Also, its physical causeis tackled and a representation in terms of the instantaneousamplitude versus instantaneous frequency of the harmonicsis introduced for the first time. This representation is pro-posed as a means of obtaining a local information of theVTR in limited frequency ranges. Something similar is donein [35], where the singing voice is synthesized using this lo-cal information of the VTR. We have also contributed in thisdirection, for instance in [23], where the instantaneous am-plitude is decomposed in two parts. The first one representsthe sound intensity variation and the other one representsthe amplitude variation determined by the local VTR, in anattempt to split the contribution of the source and the vocaltract. Moreover, in [24], different time-frequency processingtools have been used and compared in order to identify therelationship between instantaneous amplitude and instanta-neous frequency.

In that work, the AM-FM representation is defined asthe instantaneous amplitude versus instantaneous frequencyrepresentation, with time being an implicit parameter. Thisrepresentation is compared to the magnitude response of anall-pole filter, which is typically used for VTR modeling. Twomain conclusions are derived, the first one is that only whenanechoic recordings are considered, these two representa-tions can be compared. Otherwise, the instantaneous mag-nitudes will be affected by reverberation. The second one isthat, as a frequency modulated input is considered, and fre-quency modulation is not a linear operation, the phase of theall-pole system will affect the AM-FM representation, leadingto a different representation than the vocal tract magnituderesponse. However the relevance of this effect depends onthe formant bandwidth and vibrato characteristics, vibratorate in this case. It was also shown that in natural vibrato thephase effect of VTR is not noticeable, because vibrato rate isslow comparing to formant bandwidths.

Figure 10 constitutes a good example of the kind ofAM-FM representations we are talking about. In it, each


harmonic’s instantaneous amplitude is represented versus itsinstantaneous frequency. For this case, only two vibrato cy-cles, where the vocal intensity does not change significantly,have been considered. As the number of harmonic increases,the frequency range swept by each harmonic widens. Com-paring Figure 10 and Figure 9b, the AM-FM representationof the former one is very similar to the VTR of Figure 9b.However, in the case of the AM-FM representation, nosource-filter separation has been made, and thus both ele-ments are melted in that representation. The results obtainedby other authors [22, 35] are quite similar regarding the in-stantaneous amplitude versus instantaneous frequency rep-resentation, however, in those works no comment is madeabout the conditions of recordings.

3.2. Simplified noninteractive source-tract modelwith vibrato

The main conclusion from the results presented above couldbe that vibrato might be used in order to extract more in-formation about glottal source and VTR in singing voice.Therefore, we will propose here a simplified noninteractivesource-filter model with vibrato that will be a signal modelof vibrato production and will explain the results providedby sinusoidal modeling. We will first make some basic as-sumptions regarding what is happening with GSD and VTRduring vibrato. These assumptions are based on perceptualaspects of vibrato, and on the AM-FM representation for nat-ural singing voice.

(1) The GSD characteristics remain constant during vi-brato, and only the fundamental frequency of the voicechanges. This assumption is justified by the fact thatperceptually there is no phonation change during asingle note.

(2) The intensity of the sound is constant, at least duringone or two vibrato cycles.

(3) The VTR remains invariant during vibrato. This as-sumption relies on the fact that vocalization does notchange along the note.

(4) The three vibrato characteristics remain constant. Thisassumption is not strictly true, but their time constantsare considerably larger than the signal fundamentalperiod.

Taking into account these four assumptions, the simpli-fied noninteractive source-filter model with vibrato could berepresented by the block diagram in Figure 11.

Based on this model, we will simulate the produc-tion of vibrato. The GSD characteristics are the same asin Section 2.4, and the VTR has been implemented as anall-pole filter whose frequency response represents Spanishvowel “a.” A frequency variation, typical of vibrato, has beenapplied to the GSD with a 120 Hz intonation, an extent of10% of the intonation value, and a rate of 5,5 Hz. All of themare kept constant in the complete register.

We have applied to the resulting signal both inverse fil-tering (where the presence or absence of vibrato does not in-fluence the algorithm), and sinusoidal modeling, where in-

Singingvoice

VTR

H(z) = 11−∑p

k=1 akz−k

F0(t): vibratointonationrate extent

Glotal sourcederivativeLF model

Oq

ft

α

Figure 11: Noninteractive source-filter model with vibrato.

stantaneous amplitude and instantaneous frequency of eachharmonic need to be measured. Results obtained for this sim-ulation are shown in Figures 12, 13, 14, and 15. In Figure 12ainverse filtering results are shown for a short window analy-sis. When fundamental frequency is low, GSD and VTR arewell separated. In Figures 12a, 13a , sinusoidal modeling re-sults are shown. The frequency variations of the harmonicsof the signal are clearly observed and, as a result, the am-plitude variation. On the other hand, in Figure 14, the AM-FM representation of the partials is shown. Taking into ac-count the AM-FM representation of every partial, and com-paring this to the VTR shown in Figure 12a, it is possibleto conclude that a local information of the VTR is providedby this method. However, as no source-filter decompositionhas been developed, each AM-FM representation is shiftedin amplitude depending on the GSD spectral features. Thiseffect is a result of keeping GSD parameters constant duringvibrato. Comparing Figures 14 and 15, it can be noticed thatif the GSD magnitude spectrum is removed from the AM-FMrepresentation of the harmonics, the resulting AM-FM rep-resentation would provide only VTR information. The resultof this operation is shown in Figure 16.

For this simplified noninteractive source-filter modelwith vibrato, instantaneous parameters of sinusoidal model-ing provide a complementary information about both GSDand VTR. When inverse filtering works, the GSD effect canbe removed from the AM-FM representation provided by si-nusoidal modeling and only the information of the VTR re-mains.

3.3. Natural singing voice

The relationship between these two signal models, noninter-active source-filter model and sinusoidal model, has been es-tablished for a synthetic signal where vibrato has been in-cluded under the four assumptions stated at the beginningof the section. Now, the question is whether this relationshipholds in natural singing voice too. Therefore, both kinds ofsignal analysis will be now applied to natural singing voice. Inorder to get close to simulation conditions, some precautionshave been taken in the recording process.

(1) The musical context has been selected in order to con-trol intensity variations of the sound. Singers wereasked to sing a word of three notes, where the firstand the last one simply provide a musical support andthe note in between is a long sustained note. This noteis two semitones higher than the two accompanyingones.


Original GSDInverse filtered GSD

0.402 0.407 0.412 0.417 0.422 0.427 0.432

Time (s)

(a)

Original VTRInverse filtered VTR

0 2000 4000 6000

Frequency (Hz)

−50

−40

−30

−20

−10

0

10

20

30

Am

plit

ude

(dB

)

(b)

Figure 12: Inverse filtering results. GSB inverse filtering algorithm. (a) GSD. (b) VTR.

0 0.2 0.4 0.6 0.8

Time (s)

0

500

1000

1500

2000

2500

Freq

uen

cy(H

z)

(a)

0 0.2 0.4 0.6 0.8

Time (s)

35404550556065707580

Am

plit

ude

(dB

)

(b)

Figure 13: Sinusoidal modeling results. (a) Instantaneous frequency. (b) Instantaneous amplitude.

0 500 1000 1500 2000 2500 3000

Frequency (Hz)

0

10

20

30

40

50

60

70

80

Am

plit

ude

(dB

)

Figure 14: AM-FM representation.

(2) Recordings have been done in a studio where reverber-ations are reduced but not completely eliminated as inan anechoic room. In this situation, the AM-FM rep-resentation will present slight variations from the ac-tual VTR, but it is still possible to develop a qualitativestudy.

Short term spectrumSpectral peaks

0 500 1000 1500 2000 2500 3000Frequency (Hz)

−70

−60

−50

−40

−30

−20

−10

0

Am

plit

ude

(dB

)

Figure 15: GSD short term spectrum. Blackman-Harris window.

In Figures 17, 18, 19, and 20 the results of these analyses areshown for a low-pitched baritone recording, F0 = 128 Hz,vowel “a”. Contrarily to Figures 12, 13, 14, and 15, here thereis no reference for the original GSD and VTR. Compar-ing Figures 12b, 13b and 17b, 18b, instantaneous frequencyvariation is similar in simulation and natural singing voice.


AM-FM representation without sourceVTR

0 500 1000 1500 2000 2500 3000

Frequency (Hz)

40

50

60

70

80

90

100

110

Am

plit

ude

(dB

)

Figure 16: AM-FM representation without source.

However, the extent of vibrato in this baritone recording islower than in synthetic signal. In the case of instantaneousamplitude, natural singing voice results are not as regular assynthetic ones. This is because of reverberation and irreg-ularities of natural voice. Regarding intensity of the sound,there are not large variations in instantaneous amplitude,and so, for one or two vibrato cycles it could be consideredconstant. In this situation, the AM-FM representation of theharmonics, shown in Figure 19, is very similar to syntheticsignal’s AM-FM representation, though the already men-tioned irregularities are present. In Figure 20, the GSD spec-trum is shown for the signal of Figures 17a, 18a. It is very sim-ilar to the synthetic GSD spectrum, both are low frequencyperiodic signals, although it has slight variations in its har-monic amplitudes that will be explained later.

Now, the so-obtained GSD spectrum will be used to ex-tract from the AM-FM the information of the VTR. The re-sult of this operation is shown in Figure 21.

As in the case of synthetic signal, the compensated AM-FM representation is very close to the VTR obtained by in-verse filtering. However, the matching is not as perfect as forthe synthetic signal.

From this two-signal model comparison, it is possibleto conclude that the simplified noninteractive source-filtermodel with vibrato can explain, in an approximated way,what is happening in singing voice when vibrato is present.Now, it is possible to say that GSD and VTR have not largevariations during a few vibrato cycles. In this way, the in-stantaneous amplitude and frequency obtained by sinusoidalmodeling provide more, and complementary, informationabout GSD and VTR during vibrato than known analysismethods.

It is important to note that the AM-FM representationby itself does not provide information of GSD and VTR sep-arately, but it represents, in the vicinity of each harmonic, asmall section of the VTR. In order to know what is exactlyhappening with GSD and VTR during vibrato, precautionshave to be taken with recording conditions. Even in nonopti-mum conditions, AM-FM representation of vibrato providescomplementary information to that of inverse filtering meth-ods.

4. DISCUSSION OF RESULTS AND CONCLUSIONS

In Section 2, inverse filtering techniques have been reviewed,and their dependence on the fundamental frequency hasbeen shown. It seems to be obvious that, regardless of theparticular technique, inverse filtering in speech fails as fre-quency increases. In natural singing voice, where pitch is in-herently high, there are no references in order to make surewhether this is the only cause of this failure. In Section 3,and with the aim to give an answer to this question, a novelnoninteractive source-filter model has been introduced forsinging voice modeling, including vibrato as an additionalfeature. It has been shown that this model can represent thevibrato production in singing voice. In addition, this modelhas allowed a relationship between sinusoidal modeling andsource-filter model, through which authors have coined asAM-FM representation.

In this last section, AM-FM representation will be usedagain in singing voice analysis, in order to determine whetherthere are other effects in singing voice when fundamen-tal frequency increases. To this end, the same analysis ofSection 3 has been applied to the signal database of Section 2corresponding to three male singers’ recordings. On theone hand, inverse filtering is applied and GSD and VTRare estimated. On the other hand, sinusoidal modelingis considered and the two instantaneous magnitudes (fre-quency and amplitude for each harmonic) are measured.Then, the AM-FM representation is obtained for each (fre-quency modulated) harmonic, and the GSD is removed fromthis representation using the GSD obtained by the inversefiltering.

In Figure 22, the results obtained for several fundamen-tal frequencies, for the baritone singer, are shown. As inSection 2, these results are representative of other singers’recordings and other vowels.

Regarding the AM-FM representation, it is possible tosay, looking at Figure 22, that as fundamental frequency in-creases, the frequency range swept by one harmonic is wider,because of the extent and intonation relationship. Also, asfundamental frequency increases, the AM-FM representa-tions of two consecutive harmonics are more separated,which is a direct consequence of their harmonic relationship.In addition to these obvious effects, there is no other evi-dent consequence of fundamental frequency increase in thisanalysis, and thus the simplified noninteractive source-filtermodel with vibrato can model high-pitched singing voicewith vibrato, from the signal point of view.

The main limitation of the plain AM-FM representa-tion is that no source-filter separation is possible unless it iscombined with other method, and thus, from here, nothingcan be said about the exact shape of GSD and VTR. How-ever, the main advantage of this representation is that it hasno fundamental frequency limit, and so, it can be appliedin every singing voice sample with vibrato. This conclusionbrings along another evidence: the noninteractive source-filter model remains valid in singing voice.

We can summarize the main contributions and conclu-sions of this work as follows.


Time (s)

2.759 2.764 2.769 2.774 2.779 2.784

(a)

0 1000 2000 3000 4000 5000

Frequency (Hz)

−30

−20

−10

0

10

20

30

40

Am

plit

ude

(dB

)

(b)

Figure 17: Inverse filtering results. GSB inverse filtering algorithm. (a) GSD (b) VTR.

1.2 1.4 1.6 1.8 2 2.2

Time (s)

0

500

1000

1500

2000

2500

Freq

uen

cy(H

z)

(a)

1.2 1.4 1.6 1.8 2 2.2

Time (s)

30

35

40

45

50

55

60

Am

plit

ude

(dB

)

(b)

Figure 18: Sinusoidal modeling results. (a) Instantaneous frequency. (b) Instantaneous amplitude.

0 500 1000 1500 2000 2500 3000

Frequency (Hz)

0

10

20

30

40

50

60

70

80

Am

plit

ude

(dB

)

Figure 19: AM-FM representation.

GSD spectrumSpectral peaks

0 500 1000 1500 2000 2500 3000Frequency (Hz)

0

10

20

30

40

50

60

Am

plit

ude

(dB

)

Figure 20: GSD Short term spectrum. Blackman-Harris window.

AM-FM representation without sourceVTR of inverse filtering

0 500 1000 1500 2000 2500 3000

Frequency (Hz)

−20

−10

0

10

20

30

40

50

Am

plit

ude

(dB

)

Figure 21: AM-FM representation without source.

(i) Several representative inverse filtering techniques havebeen critically compared when applied to speech. It hasbeen shown how all of them fail as frequency increases,as it is the case in singing voice.

(ii) A novel noninteractive source-filter model has beenproposed for singing voice, which includes vibrato as apossible feature.

(iii) The existence of vibrato and the above mentionedmodel has allowed to relate source-filter model (i.e.,inverse filtering techniques) and the simple sinusoidal



0 500 1000 1500 2000 2500 3000

Frequency (Hz)

−20

−10

0

10

20

30

40

Am

plit

ude

(dB

)

(a)


0 500 1000 1500 2000 2500 3000

Frequency (Hz)

−20

−10

0

10

20

30

40

50

Am

plit

ude

(dB

)

(b)


0 500 1000 1500 2000 2500 3000

Frequency (Hz)

−30

−20

−10

0

10

20

30

Am

plit

ude

(dB

)

(c)

Figure 22: AM-FM representation removing the source and VTRgiven by inverse filtering. (a) F0 = 110 Hz, vowel “a,” (b) F0 =156 Hz, vowel “a,” (c) F0 = 227 Hz, vowel “a.”

Model. In other words, although both are signal mod-els for singing voice, the first one is related to thevoice production and the second one is a general signalmodel, but thanks to vibrato both can be linked.

(iv) Even though sinusoidal modeling does not allow toobtain separate information about the sound source

and VTR, the AM-FM representation gives comple-mentary information particularly in high frequencyranges, where inverse filtering does not work.

ACKNOWLEDGMENTS

The Gobierno de Navarra and the Universidad Publica deNavarra are gratefully acknowledged for financial support.Authors would also like to acknowledge the support fromXavier Rodet and Axel Roebel (IRCAM, Paris), material andmedical support from Ana Martınez Arellano, and the col-laboration from student Daniel Erro who implemented someof the algorithms.

REFERENCES

[1] N. Henrich, Etude de la source glottique en voix parlee etchantee : modelisation et estimation, mesures acoustiques etelectroglottographiques, perception, Ph.D. thesis, Paris 6 Uni-versity, Paris, France, 2001.

[2] B. H. Story, “An overview of the physiology, physics and mod-eling of the sound source for vowels,” Acoustical Science andTechnology, vol. 23, no. 4, pp. 195–206, 2002.

[3] B. Guerin, M. Mrayati, and R. Carre, “A voice sourcetaking account of coupling with the supraglottal cavities,”in Proc. IEEE Int. Conf. Acoustics, Speech, Signal Processing(ICASSP ’76), vol. 1, pp. 47–50, Philadelphia, Pa, USA, April1976.

[4] T. V. Ananthapadmanabha and G. Fant, “Calculation of thetrue glottal flow and its components,” Speech Communication,vol. 1, no. 3-4, pp. 167–184, 1982.

[5] M. Berouti, D. G. Childers, and A. Paige, “Glottal area ver-sus glottal volume-velocity,” in Proc. IEEE Int. Conf. Acous-tics, Speech, Signal Processing (ICASSP ’77), vol. 2, pp. 33–36,Cambridge, Mass, USA, May 1977.

[6] G. Fant, Acoustic Theory of Speech Production, Mouton, TheHague, The Netherlands, 1960.

[7] D. H. Klatt and L. C. Klatt, “Analysis, synthesis, and percep-tion of voice quality variations among female and male talk-ers,” Journal of the Acoustical Society of America, vol. 87, no. 2,pp. 820–857, 1990.

[8] G. Fant, J. Liljencrants, and Q. Lin, “A four-parameter modelof glottal flow,” Speech Transmission Laboratory-QuarterlyProgress and Status Report, vol. 85, no. 2, pp. 1–13, 1985.

[9] H. Fujisaki and M. Ljungqvist, “Proposal and evaluationof models for the glottal source waveform,” in Proc. IEEEInt. Conf. Acoustics, Speech, Signal Processing (ICASSP ’86),vol. 11, pp. 1605–1608, Tokyo, Japan, April 1986.

[10] A. K. Krishnamurthy and D. G. Childers, “Two-channelspeech analysis,” IEEE Trans. Acoustics, Speech, and Signal Pro-cessing, vol. 34, no. 4, pp. 730–743, 1986.

[11] P. Alku and E. Vilkman, “Estimation of the glottal pulseformbased on discrete all-pole modeling,” in Proc. 2nd Interna-tional Conf. on Spoken Language Processing (ICSLP ’94), pp.1619–1622, Yokohama, Japan, September 1994.

[12] H.-L. Lu and J. O. Smith, “Joint estimation of vocal tract fil-ter and glottal source waveform via convex optimization,” inIEEE Workshop on Applications of Signal Processing to Audioand Acoustics (WASPAA ’99), pp. 79–92, New Paltz, NY, USA,October 1999.

[13] I. Arroabarren and A. Carlosena, “Glottal spectrum basedinverse filtering,” in Proc. 8th European Conference onSpeech Communication and Technology (EUROSPEECH ’03),Geneva, Switzerland, September 2003.


[14] E. L. Riegelsberger and A. K. Krishnamurthy, “Glottal sourceestimation: methods of applying the LF-model to inverse fil-tering,” in Proc. IEEE Int. Conf. Acoustics, Speech, Signal Pro-cessing (ICASSP ’93), vol. 2, pp. 542–545, Minneapolis, Minn,USA, April 1993.

[15] B. Doval, C. d’Alessandro, and B. Diard, “Spectral meth-ods for voice source parameters estimation,” in Proc. 5thEuropean Conference on Speech Communication and Technol-ogy (EUROSPEECH ’97), vol. 1, pp. 533–536, Rhodes, Greece,September 1997.

[16] I. Arroabarren and A. Carlosena, “Glottal source parame-terization: a comparative study,” in Proc. ISCA Tutorial andResearch Workshop on Voice Quality: Functions, Analysis andSynthesis, Geneva, Switzerland, August 2003.

[17] H.-L. Lu, Toward a high-quality singing synthesizer with vocaltexture control, Ph.D. thesis, Stanford University, Stanford,Calif, USA, 2002.

[18] N. Henrich, B. Doval, and C. d’Alessandro, “Glottal openquotient estimation using linear prediction,” in Proc. Inter-national Workshop on Models and Analysis of Vocal Emissionsfor Biomedical Applications, Firenze, Italy, September 1999.

[19] N. Henrich, B. Doval, C. d’Alessandro, and M. Castellengo,“Open quotient measurements on EGG, speech and singingsignals,” in Proc. 4th International Workshop on Advances inQuantitative Laryngoscopy, Voice and Speech Research, Jena,Germany, April 2000.

[20] N. Henrich, C. d’Alessandro, and B. Doval, “Spectral cor-relates of voice open quotient and glottal flow asymmetry:theory, limits and experimental data,” in Proc. 7th EuropeanConference on Speech Communication and Technology (EU-ROSPEECH ’01), Aalborg, Denmark, September 2001.

[21] H.-L. Lu and J. O. Smith, “Glottal source modeling for singingvoice synthesis,” in Proc. International Computer Music Con-ference (ICMC ’00), Berlin, Germany, August 2000.

[22] R. Maher and J. Beauchamp, “An investigation of vocal vi-brato for synthesis,” Applied Acoustics, vol. 30, no. 2-3, pp.219–245, 1990.

[23] I. Arroabarren, M. Zivanovic, and A. Carlosena, “Analysis andsynthesis of vibrato in lyric singers,” in Proc. 11th EuropeanSignal Processing Conference (EUSIPCO ’02), Toulose, France,September 2002.

[24] I. Arroabarren, M. Zivanovic, X. Rodet, and A. Carlosena,“Instantaneous frequency and amplitude of vibrato in singingvoice,” in Proc. IEEE 28th Int. Conf. Acoustics, Speech, SignalProcessing (ICASSP ’03), Hong Kong, China, April 2003.

[25] R. J. McAulay and T. F. Quatieri, “Speech analysis/synthesisbased on a sinusoidal representation,” IEEE Trans. Acoustics,Speech, and Signal Processing, vol. 34, no. 4, pp. 744–754, 1986.

[26] X. Serra, “Musical sound modeling with sinusoids plus noise,”in Musical Signal Processing, C. Roads, S. Pope, A. Picialli, andG. De Poli, Eds., Swets & Zeitlinger, Lisse, The Netherlands,May 1997.

[27] C. Ma, Y. Kamp, and L. F. Willems, “A Frobenius norm ap-proach to glottal closure detection from the speech signal,”IEEE Trans. Speech and Audio Processing, vol. 2, no. 2, pp. 258–265, 1994.

[28] J. Makhoul, “Linear prediction: a tutorial review,” Proceedingsof the IEEE, vol. 63, no. 4, pp. 561–580, 1975.

[29] A. El-Jaroudi and J. Makhoul, “Discrete all-pole modeling,”IEEE Trans. Signal Processing, vol. 39, no. 2, pp. 411–423, 1991.

[30] B. Doval and C. d’Alessandro, “Spectral correlates of glot-tal waveform models: an analytic study,” in Proc. IEEE 22thInt. Conf. Acoustics, Speech, Signal Processing (ICASSP ’97), pp.1295–1298, Munich, Germany, April 1997.

[31] D. Y. Wong, J. D. Markel, and A. H. Gray, “Least squares glottalinverse filtering from the acoustic speech waveform,” IEEE

Trans. Acoustics, Speech, and Signal Processing, vol. 27, no. 4,pp. 350–355, 1979.

[32] E. Prame, “Vibrato extent and intonation in professionalwestern lyric singing,” Journal of the Acoustical Society ofAmerica, vol. 102, no. 1, pp. 616–621, 1997.

[33] I. Arroabarren, M. Zivanovic, J. Bretos, A. Ezcurra, andA. Carlosena, “Measurement of vibrato in lyric singers,” IEEETrans. Instrumentation and Measurement, vol. 51, no. 4, pp.660–665, 2002.

[34] S. McAdams and X. Rodet, “The role of FM-induced AM indynamic spectral profile analysis,” in Basic Issues in Hearing,H. Duifhuis, J. Horst, and H. Wit, Eds., pp. 359–369, Aca-demic Press, London, UK, 1988.

[35] M. Mellody and G. H. Wakefield, “Signal analysis of thesinging voice:low-order representations of singer identity,” inProc. International Computer Music Conference (ICMC ’00),Berlin, Germany, August 2000.

Ixone Arroabarren was born in Arizkun,Navarra, Spain, on December 11, 1975. Shereceived her Eng. degree in telecommunica-tions in 1999, from the Public University ofNavarra, Pamplona, Spain, where she is cur-rently pursuing her Ph.D. degree in the areaof signal processing techniques as they applyto musical signals. She has collaborated inindustrial projects for the vending machineindustry.

Alfonso Carlosena was born in Navarra,Spain, in 1962. He received his M.S. de-gree with honors and his Ph.D. in physics in1985 and 1989, respectively, both from theUniversity of Zaragoza, Spain. From 1986to 1992 he was an Assistant Professor inthe Department of Electrical Engineeringand Computer Science at the University ofZaragoza, Spain. Since October 1992, he hasbeen an Associate Professor with the Pub-lic University of Navarra, where he has also served as Head of theTechnology Transfer Office. In March 2000, he was promoted toFull Professor at the same University. He has also been a VisitingScholar in the Swiss Federal Institute of Technology, Zurich andNew Mexico State University, Las Cruces. His current research in-terests are in the areas of analog circuits and signal processing, dig-ital signal processing and instrumentation, where he has publishedover sixty papers in international journals and a similar number ofconference presentations. He is currently leading several industrialprojects for local firms.


A Hybrid Resynthesis Model for Hammer-StringInteraction of Piano Tones

Julien BensaLaboratoire de Mecanique et d’Acoustique, Centre National de la Recherche Scientifique (LMA-CNRS),13402 Marseille Cedex 20, FranceEmail: [email protected]

Kristoffer JensenDatalogisk Institut, Københavns Universitet, Universitetsparken 1, 2100 København, DenmarkEmail: [email protected]

Richard Kronland-MartinetLaboratoire de Mecanique et d’Acoustique, Centre National de la Recherche Scientifique (LMA-CNRS),13402 Marseille Cedex 20, FranceEmail: [email protected]

Received 7 July 2003; Revised 9 December 2003

This paper presents a source/resonator model of hammer-string interaction that produces realistic piano sound. The source isgenerated using a subtractive signal model. Digital waveguides are used to simulate the propagation of waves in the resonator.This hybrid model allows resynthesis of the vibration measured on an experimental setup. In particular, the nonlinear behaviorof the hammer-string interaction is taken into account in the source model and is well reproduced. The behavior of the modelparameters (the resonant part and the excitation part) is studied with respect to the velocities and the notes played. This modelexhibits physically and perceptually related parameters, allowing easy control of the sound produced. This research is an essentialstep in the design of a complete piano model.

Keywords and phrases: piano, hammer-string interaction, source-resonator model, analysis/synthesis.

1. INTRODUCTION

This paper is a contribution to the design of a com-plete piano-synthesis model. (Sound examples obtained us-ing the method described in this paper can be foundat www.lma.cnrs-mrs.fr/∼kronland/JASP/sounds.html.) It isthe result of several attempts [1, 2], eventually leading to astable and robust methodology. We address here the model-ing for synthesis of a key aspect of piano tones: the hammer-string interaction. This model will ultimately need to belinked to a soundboard model to accurately simulate pianosounds.

The design of a synthesis model is strongly linked to thespecificity of the sounds to be produced and to the expecteduse of the model. This work was done in the frameworkof the analysis-synthesis of musical sounds; we seek bothreconstructing a given piano sound and using the synthe-sis model in a musical context. The perfect reconstructionof given sounds is a strong constraint: the synthesis modelmust be designed so that the parameters can be extracted

from the analysis of natural sounds. In addition, the playingof the synthesis model requires a good relationship betweenthe physics of the instrument, the synthesis parameters, andthe generated sounds. This relationship is crucial to havinga good interaction between the “digital instrument” and theplayer, and it will constitute the most important aspects ourpiano model has to deal with.

Music based on the so-called “sound objects”—likeelectro-acoustic music or “musique concrete”—lies on syn-thesis models allowing subtle and natural transformationsof the sounds. The notion of natural transformation ofsounds consists here in transforming them so that they cor-respond to a physical modification of the instrument. Asa consequence, such sound transformations calls for themodel to include physical descriptions of the instrument.Nevertheless, the physics of musical instruments is some-times too complicated to be exhaustively taken into ac-count, or not modeled well enough to lead to satisfactorysounds. This is the case of the piano, for which hundredsof mechanical components are connected [3], and for which


the hammer-string interaction still poses physical modelingproblems.

To take into account the necessary simplifications madein the physical description of the piano sounds, we have usedhybrid models that are obtained by combining physical andsignal synthesis models [4, 5]. The physical model simulatesthe physical behavior of the instrument whereas the signalmodel seeks to recreate the perceptual effect produced by theinstrument. The hybrid model provides a perceptually plau-sible resynthesis of a sound as well as intimate manipulationsin a physically and perceptually relevant way. Here, we haveused a physical model to simulate the linear string vibration,and a physically informed signal model to simulate the non-linear interaction between the string and the hammer.

An important problem linked to hybrid models is thecoupling of the physical and the signal models. To use asource-resonator model, the source and the resonator mustbe uncoupled. Yet, this is not the case for the piano since thehammer interacts with the strings during 2 to 5 milliseconds[6, 7]. A significant part of the piano sound characteristics isdue to this interaction. Even though this observation is truefrom a physical point of view, this short interaction periodis not in itself of great importance from a perceptual pointof view. The attack is constituted of two parts due to two vi-brating ways [8]: one percussive, a result of the impact of thekey on the frame, and another that starts when the hammerstrikes the strings. Schaeffer [9] showed that cutting the firstmilliseconds of a piano sound (for a bass note, for which theimpact of the key on the frame is less perceptible) does notalter the perception of the sound. We have informally carriedout such an experiment by listening to various piano soundscleared of their attack. We found that, from a perceptualpoint of view, when the noise due to the impact of the key onthe frame is not too great (compared to the vibrating energyprovided by the string), the hammer-string interaction is notaudible in itself. Nevertheless, this interaction undoubtedlyplays an important role as an initial condition for the stringmotion. This is a substantial point justifying the dissociationof the string model and the source model in the design ofour synthesis model. Thus, the resulting model consists inwhat is commonly called a “source-resonant” system (as il-lustrated in Figure 1). Note that the model still makes sensefor high-frequency notes, for which the impact noise is of im-portance. Actually, the hammer-string interaction only lasts acouple of milliseconds, while the impact sound consists of anadditional sound, which can be simulated using predesignedsamples. Since waves are still running in the resonator afterthe release of the key, repeated keystroke is naturally takeninto account by the model.

Laroche and Meillier [10] used such a source-resonatortechnique for the synthesis of piano sound. They showedthat realistic piano tones can be produced using IIR filters tomodel the resonator and common excitation signals for sev-eral notes. Their simple resonator model, however, yieldedexcitation signals too long (from 4 to 5 seconds) to accu-rately reproduce the piano sound. Moreover, that model tookinto account neither the coupling between strings nor the de-pendence of the excitation on the velocity and octave vari-

Control

Source(nonlinear

signal model)

Excitation Resonator(physical model)

Sound

Figure 1: Hybrid model of piano sound synthesis.

ations. Smith proposed efficient resonators [11] by usingthe so-called digital waveguide. This approach simulates thephysics of the propagating waves in the string. Moreover, thewaveguide parameters are naturally correlated to the phys-ical parameters, making for easy control. Borin and Bank[12, 13] used this approach to design a synthesis model of pi-ano tones based on physical considerations by coupling dig-ital waveguides and a “force generator” simulating the ham-mer impact. The commuted synthesis concept [14, 15, 16]uses the linearity of the digital waveguide to commute andcombine elements. Then, for the piano, a hybrid model wasproposed, combining digital waveguide, a phenomenologi-cal hammer model, and a time-varying filtering that simu-lates the soundboard behavior. Our model is an extension ofthese previous works, to which we added a strong constraintof resynthesis capability. Here, the resonator was modeledusing a physically related model, the digital waveguide; andthe source—destined to generate the initial condition for thestring motion—was modeled using a signal-based nonlinearmodel.

The advantages of such a hybrid model are numerous:

(i) it is simple enough so that the parameters can be accu-rately estimated from the analysis of real sound,

(ii) it takes into account the most relevant physical char-acteristics of the piano strings (including coupling be-tween strings) and it permits the playing to be con-trolled (the velocity of the hammer),

(iii) it simulates the perceptual effect due to the nonlin-ear behavior of the hammer-string interaction, and itallows sounds transformation with both physical andperceptual approaches.

Even though the model we propose is not computationallycostly, we address here its design and its calibration ratherthan its real time implementation. Hence, the calculus andreasoning are done in the frequency domain. The time do-main implementation should give rise to a companion arti-cle.

2. THE RESONATOR MODEL

Several physical models of transverse wave propagation on astruck string have been published in the literature [17, 18, 19,20]. The string is generally modeled using a one-dimensionalwave equation. The specific features of the piano string thatare important in wave propagation (dispersion due to thestiffness of the string and frequency-dependent losses) arefurther incorporated through several perturbation terms. Toaccount for the hammer-string interaction, this equation isthen coupled to a nonlinear force term, leading to a sys-tem of equations for which an analytical solution cannot be

Hybrid Resynthesis of Piano Tones 1023

exhibited. Since the string vibration is transmitted only tothe radiating soundboard at the bridge level, it is not use-ful to numerically calculate the entire spatial motion of thestring. The digital waveguide technique [11] provides an ef-ficient way of simulating the vibration at the bridge level ofthe string, when struck at a given location by the hammer.Moreover, the parameters of such a model can be estimatedfrom the analysis of real sounds [21].

2.1. The physics of vibrating strings

We present here the main features of the physical modeling ofpiano strings. Consider the propagation of transverse wavesin a stiff damped string governed by the motion equation[21]

∂2y

∂t2− c2 ∂

2y

∂x2+ κ2 ∂

4y

∂x4+ 2b1

∂y

∂t− 2b2

∂3y

∂x2∂t= P(x, t), (1)

where y is the transverse displacement, c the wave speed,κ the stiffness coefficient, b1 and b2 the loss parameters.Frequency-dependent loss is introduced via mixed time-space derivative terms (see [21, 22] for more details). We ap-ply fixed boundary conditions

y|x=0 = y|x=L = ∂2y

∂x2

∣∣∣∣x=0

= ∂2y

∂x2

∣∣∣∣x=L

= 0, (2)

where L is the length of the string. After the hammer-stringcontact, the force P is equal to zero and this system can besolved. An analytical solution can be expressed as a sum ofexponentially damped sinusoids:

y(x, t) =∞∑n=1

an(x)e−αnteiωnt, (3)

where an is the amplitude, αn is the damping coefficient, andωn is the frequency of the nth partial. Due to the stiffness, thewaves are dispersed and the partial frequencies, which are notperfectly harmonic, are given by [23]

ωn = 2πnω0

√1 + Bn2, (4)

where ω0 is the fundamental radial frequency of the stringwithout stiffness, and B is the inharmonicity coefficient [23].The losses are frequency dependent and expressed by [21]

αn = −b1 − b2

π2

2BL2

− 1 +

√√√√1 + 4B(ωn

ω0

)2. (5)

The spectral content of the piano sound, and of most mu-sical instruments, is modified with respect to the dynamics.For the piano, this nonlinear behavior consists of an increaseof the brightness of the sound and it is linked mainly to thehammer-string contact (the nonlinear nature of the gener-ation of longitudinal waves also participates in the increaseof brightness; we do not take this phenomena into accountsince we are interested only in transversal waves). The stiff-

E(ω) D(ω) F(ω) S(ω)

G(ω)

Figure 2: Elementary digital waveguide (named G).

ness of the hammer felt increases with the impact velocity. Inthe next paragraph, we show how the waveguide model pa-rameters are related to the amplitudes, damping coefficients,and frequencies of each partial.

2.2. Digital waveguide modeling

2.2.1. The single string case: elementary digitalwaveguide

To model wave propagation in a piano string, we use a digitalwaveguide model [11]. In the single string case, the elemen-tary digital waveguide model (named G) we used consists ofa single loop system (Figure 2) including

(i) a delay line (a pure delay filter named D) simulatingthe time the waves take to travel back and forth in themedium,

(ii) a filter (named F) taking into account the dissipationand dispersion phenomena, together with the bound-ary conditions. The modulus of F is then related to thedamping of the partials and the phase to inharmonic-ity in the string,

(iii) an input E corresponding to the frequency-dependentenergy transferred to the string by the hammer,

(iv) an output S representing the vibrating signal measuredat an extremity of the string (at the bridge level).

The output of the digital waveguide driven by a deltafunction can be expanded as a sum of exponentially dampedsinusoids. The output thus coincides with the solution of themotion equation of transverse waves in a stiff damped stringfor a source term given by a delta function force. As shown in[21, 24], the modulus and phase of F are related to the damp-ing and the frequencies of the partials by the expressions

∣∣F(ωn)∣∣ = eαnD,

arg(F(ωn)) = ωnD − 2nπ,

(6)

with ωn and αn given by (4) and (5).After some calculations (see [21]), we obtain the expres-

sions of the modulus and the phase of the loop filter in termsof the physical parameters:

∣∣F(ω)∣∣ exp

(−D

[b1 +

b2π2ξ

2BL2

]), (7)

arg(F(ω)

) Dω−Dω0

√ξ

2B, (8)


with

ξ = −1 +

√1 +

4Bω2

ω20

(9)

in terms of the inharmonicity coefficient B [23].

2.2.2. The multiple strings case: coupled digitalwaveguides

In the middle and the treble range of the piano, there aretwo or three strings for each note in order to increase the ef-ficiency of the energy transmission towards the bridge. Thevibration produced by this coupled system is not the super-position of the vibrations produced by each string. It is theresult of a complex coupling between the modes of vibra-tion of these strings [25]. This coupling leads to phenomenalike beats and double decays on the amplitude of the par-tials, which constitute one of the most important features ofthe piano sound. Beats are used by professionals to preciselytune the doublets or triplets of strings. To resynthesize the vi-bration of several strings at the bridge level, we use coupleddigital waveguides. Smith [14] proposed a coupling modelwith two elementary waveguides. He assumed that the twostrings were coupled to the same termination, and that thelosses were lumped to the bridge impedance. This techniqueleads to a simple model necessitating only one loss filter. Butthe decay times and the coupling of the modes are not in-dependent. Valimaki et al. [26] proposed another approachthat couples two digital waveguides through real gain ampli-fiers. In that case, the coupling is the same for each partial,and the time behavior of the partials is similar. For synthesispurpose, Bank [27] showed that perceptually plausible beat-ing sound can be obtained by adding only a few resonatorsin parallel.

We have designed two models, a two- and a three-coupled digital waveguides, which are an extension ofValimaki et al.’s approach. They consist in separating the timebehavior of the components by using complex-valued andfrequency-dependent linear filters to couple the waveguides.The three-coupled digital waveguide is shown on Figure 3.The two models accurately simulate the energy transfer be-tween the strings (see Section 2.4.3). A related method [28](with an example of piano coupling) has been recently avail-able in the context of digital waveguide networks.

Each string is modeled using an elementary digital wave-guide (named G1, G2, G3; each loop filter and delays arenamed F1, F2, F3, and D1, D2, D3 respectively). The coupledmodel is then obtained by connecting the output of each el-ementary waveguide to the input of the others through cou-pling filters. The coupling filters simulate the wave propa-gation along the bridge and are thus correlated to the dis-tance between the strings. In the case of a doublet of strings,the two coupling filters (named C) are identical. In the caseof a triplet of strings, the coupling filters of adjacent strings(named Ca) are equal but differ from the coupling filters ofthe extreme strings (named Ce). The excitation signal is as-sumed to be the same for each elementary waveguide sincewe suppose the hammer strikes the strings in a similar way.

Ce

Ca Ca

E(ω)

Ce(ω)

Ce(ω)

Ca(ω)

Ca(ω)

Ca(ω)

Ca(ω)

G1(ω)

G2(ω)

G3(ω)

S(ω)

Figure 3: The three-coupled digital waveguide (bottom) and thecorresponding physical system at the bridge level (top).

To ensure the stability of the different models, one has torespect specific relations. First the modulus of the loop filtersmust be inferior to 1. Second, for coupled digital waveguides,the following relations must be verified:

|C|√∣∣G1

∣∣∣∣G2∣∣ < 1 (10)

in the case of two-coupled waveguides, and

∣∣G1G2C2a + G1G3C

2e + G2G3C

2a + 2G1G2G3C

2aCe

∣∣ < 1 (11)

in the case of three-coupled waveguides. Assuming that thoserelations are verified, the models are stable.

This work takes place in the general analysis-synthesisframework, meaning that the objective is not only to simu-late sounds, but also to reconstruct a given sound. The modelmust therefore be calibrated carefully. In the next section ispresented the inverse problem allowing the waveguide pa-rameters to be calculated from experimental data. We thendescribe the experiment and the measurements for one-,two- and three-coupled strings. We then show the validityand the accuracy of the analysis-synthesis process by com-paring synthetic and original signals. Finally, the behavior ofthe signal of the real piano is verified.


2.3. The inverse problem

We address here the estimation of the parameters of each el-ementary waveguide as well as the coupling filters from theanalysis of a single signal (measured at the bridge level). Forthis, we assume that in the case of three-coupled strings thesignal is composed of a sum of three exponentially decay-ing sinusoids for each partial (and respectively one and twoexponentially decaying sinusoids in the case of one and twostrings). The estimation method is a generalization of the onedescribed in [29] for one and two strings. It can be summa-rized as follows: start by isolating each triplet of the measuredsignal through bandpass filtering (a truncated Gaussian win-dow); then use the Hilbert transform to get the correspond-ing analytic signal and obtain the average frequency of thecomponent by derivating the phase of this analytic signal; fi-nally, extract from each triplet the three amplitudes, dampingcoefficients, and frequencies of each partial by a parametricmethod (Steiglitz-McBride method [30]).

The second part of the process is described in detail in theappendix. In brief, we identify the Fourier transform of thesum of the three exponentially damped sinusoids (the mea-sured signal) with the transfer function of the digital wave-guide (the model output). This identification leads to a lin-ear system that admits an analytical solution in the case ofone or two strings. In the case of three coupled strings, thesolution can be found only numerically. The process gives anestimation of the modulus and of the phase of each filter nearthe resonance peaks as a function of the amplitudes, damp-ing coefficients, and frequencies. Once the resonator modelis known, we extract the excitation signal by a deconvolutionprocess with respect to the waveguide transfer function. Sincethe transfer function has been identified near the resonantpeaks, the excitation is also estimated at discrete frequencyvalues corresponding to the partial frequencies. This excita-tion corresponds to the signal that has to be injected into theresonator to resynthesize the actual sound.

2.4. Analysis of experimental data and validationof the resonator model

We describe here first an experimental setup allowing themeasurement of the vibration of one, two, or three stringsstruck by a hammer for different velocities. Then we showhow to estimate the resonator parameters from those mea-surements, and finally, we compare original and synthesizedsignals. This experimental setup is an essential step that vali-dates the estimation method. Actually, estimating the param-eters of one-, two-, or three-coupled digital waveguides fromonly one signal is not a trivial process. Moreover, in a real pi-ano, many physical phenomena are not taken into account inthe model presented in the previous section. It is then neces-sary to verify the validity of the model on a laboratory exper-iment before applying the method to the piano case.

2.4.1. Experimental setup

On the top of a massive concrete support, we have attacheda piece of a bridge taken from a real piano. On the otherextremity of the structure, we have attached an agraffe on

0.7

0.8

0.9

1

1.1

4

3

21 1000

20003000Velocity (m/s)

Frequency (Hz)

Modulus

Figure 4: Amplitude of filter F as a function of the frequency andof hammer velocity.

a hardwood support. The strings are tightened between thebridge and the agraffe and tuned manually. It is clear thatthe strings are not totally uncoupled to their support. Nev-ertheless, this experiment has been used to record signalsof struck strings, in order to validate the synthesis models,and was it entirely satisfactory for this purpose. One, two, orthree strings are struck with a hammer linked to an electron-ically piloted key. By imposing different voltages to the sys-tem, one can control the hammer velocity in a reproducibleway. The precise velocity is measured immediately after es-capement by using an optic sensor (MTI 2000, probe module2125H) pointing to the side of the head of the hammer. Thevibration at the bridge level is measured by an accelerome-ter (B&K 4374). The signals are directly recorded on digitalaudio tape. Acceleration signals correspond to hammer ve-locities between 0.8 m.s−1 and 5.7 m.s−1.

2.4.2. Filter estimation

From the signals collected on the experimental setup, a setof data was extracted. For each hammer velocity, the wave-guide filters and the corresponding excitation signals wereestimated using the techniques described above. The filterswere studied in the frequency domain; it is not the purposeof this paper to describe the method for the time domain andto fit the transfer function using IIR or FIR filters.

Figure 4 shows the modulus of the filter response F forthe first twenty-five partials in the case of tones producedby a single string. Here the hammer velocity varies from0.7 m.s−1 to 4 m.s−1. One notices that the modulus of thewaveguide filters is similar for all hammer velocities. The res-onator represents the strings that do not change during theexperiment. If the estimated resonator remains the same fordifferent hammer velocities, all the nonlinear behavior dueto the dynamic has been taken into account in the excitationpart. The resonator and the source are well separated. Thisresult validates our approach based on a source-resonatorseparation. For high frequency partials, however, the filtermodulus decreased slightly as a function of the hammer ve-locity. This nonlinear behavior is not directly linked to the


0.7

0.8

0.9

1

1.1

4

3

21 1000


Frequency (Hz)

Modulus

Figure 5: Amplitude of filter F2 (three-coupled waveguide model)as a function of the frequency and of hammer velocity.

hammer-string contact. It is mainly due to nonlinear phe-nomena involved in the wave propagation. At large ampli-tude motion, the tension modulation introduces greater in-ternal losses (this effect is even more pronounced in pluckedstrings than in struck strings).

The filter modulus slowly decreases (as a function of fre-quency) from a value close to 1. Since the higher partials aremore damped than the lower ones, the amplitude of the filterdecreases as the frequency increases. The value of the filtermodulus (close to 1) suggests that the losses are weak. Thisis true for the piano string and is even more obvious on thisexperimental setup, since the lack of a soundboard limits theacoustic field radiation. More losses are expected in the realpiano.

We now consider the multiple strings case. From a phys-ical point of view, the behavior of the filters F1, F2, and F3

(which characterize the intrinsic losses) of the coupled digi-tal waveguides should be similar to the behavior of the filterF for a single string, since the strings are supposed identical.This is verified except for high-frequency partials. This be-havior is shown on Figure 5 for filter F2 of the three-coupledwaveguide model. Some artifacts pollute the drawing at highfrequencies. The poor signal/noise ratio at high frequency(above 2000 Hz) and low velocity introduce error terms inthe analysis process, leading to mistakes on the amplitudes ofthe loop filters (for instance, a very small value of the modu-lus of one loop filter may be compensated by a value greaterthan one for another loop filter; the stability of the coupledwaveguide is then preserved). Nevertheless, this does not al-ter the synthetic sound since the corresponding partials (highfrequency) are weak and of short duration.

The phase is also of great importance since it is relatedto the group delay of the signal and consequently directlylinked to the frequency of the partials. The phase is a non-linear function of the frequency (see (8)). It is constant withthe hammer velocity (see Figure 6) since the frequencies ofthe partials are always the same (linearity of the wave propa-gation).

0

2

4

6

8

10

12

4

3

2

11000


Frequency (Hz)

Phase

Figure 6: Phase of filter F as a function of the frequency and ham-mer velocity.

0

0.05

0.1

0.15

0.2

4

3

2

1 10002000

3000Velocity (m/s)Frequency (Hz)

Modulus

Figure 7: Modulus of filter Ca as a function of the frequency and ofhammer velocity.

The coupling filters simulate the energy transfer betweenthe strings and are frequency dependent. Figure 7 representsone of these coupling filters for different values of the ham-mer velocity. The amplitude is constant with respect to thehammer velocity (up to signal/noise ratio at high frequencyand low velocity), showing that the coupling is independentof the amplitude of the vibration. The coupling rises with thefrequency. The peaks at frequencies 700 Hz and 1300 Hz cor-respond to a maximum.

2.4.3. Accuracy of the resynthesis

At this point, one can resynthesize a given sound by using asingle- or multicoupled digital waveguide and the parame-ters extracted from the analysis. For the synthetic sounds tobe identical to the original requires describing the filters pre-cisely. The model was implemented in the frequency domain,as described in Section 2, thus taking into account the ex-act amplitude and the phase of the filters (for instance, for athree-coupled digital waveguide, we have to implement three


0

0.01

0.02

200400

600 800 8 6 4 20

Amplitude(arbitrary scale)

Frequency (Hz) Time (s)

(a)

0

0.01

0.02

200 400600 800 8 6 4 2

0


Frequency (Hz)Time (s)

(b)

Figure 8: Amplitude modulation laws (velocity of the bridge) forthe first six partials, one string, of the (a) original and (b) resynthe-sised sound.

0

0.05

200400 600

800 8 6 4 2 0



(a)

0

0.05

200 400600

800 8 6 4 20



(b)

Figure 9: Amplitude modulation laws (velocity of the bridge) forthe first six partials, two strings, of the (a) original and (b) resyn-thesised sound.

delays and five complex filters, moduli, and phases). Nev-ertheless, for real-time synthesis purposes, filters can be ap-proached by IIR of low order (see, e.g., [26]). This aspect will

0

0.02

0.04

200400

600800 6 4

2 0



(a)

0

0.02

0.04

200400 600

800 6 4 2 0



(b)

Figure 10: Amplitude modulation laws (velocity of the bridge) forthe first six partials, three strings, of the (a) original and (b) resyn-thesised sound.

be developed in future reports. By injecting the excitationsignal obtained by deconvolution into the waveguide model,the signal measured is reproduced on the experimental setup.Figures 8, 9, and 10 show the amplitude modulation laws (ve-locity of the bridge) of the first six partials of the originaland the resynthesized sound. The variations of the tempo-ral envelope are generally well retained, and for the coupledsystem (in Figures 9 and 10), the beat phenomena are wellreproduced. The slight differences, not audible, are due tofine physical phenomena (coupling between the horizontaland the vertical modes of the string) that are not taken intoaccount in our model.

In the one-string case, we now consider the second andsixth partials of the original sound in Figure 8. We can seebeats (periodic amplitude modulations) that show couplingphenomena on only one string. Indeed, the horizontal andvertical modes of vibration of the string are coupled throughthe bridge. This coupling was not taken into account in thisstudy since the phenomenon is of less importance than cou-pling between two different strings. Nevertheless, we haveshown in [29] that coupling between two modes of vibrationcan also be simulated using a two-coupled digital waveguidemodel. The accuracy of the resynthesis validates a posterioriour model and the source-resonator approach.

2.5. Behavior and control of the resonator throughmeasurements on a real piano

To take into account the note dependence of the resonator,we made a set of measurements on a real piano, a YamahaDisklavier C6 grand piano equipped with sensors. The


0.7

0.75

0.8

0.85

0.9

0.95

1

0 1000 2000 3000 4000 5000 6000 7000

Frequency (Hz)

Mod

ulu

s

ModeledOriginal

Figure 11: Modulus of the waveguide filters for notes A0, F1 andD3, original and modeled.

vibrations of the strings were measured at the bridge by anaccelerometer, and the hammer velocities were measured bya photonic sensor. Data were collected for several velocitiesand several notes. We used the estimation process describedin Section 2.3 for the previous experimental setup and ex-tracted for each note and each velocity the correspondingresonator and source parameters.

As expected, the behavior of the resonator as a func-tion of the hammer velocity and for a given note is similarto the one described in Section 2.4.2, for the signals mea-sured on the experimental setup. The filters are similar withrespect to the hammer velocity. Their modulus is close toone, but slightly weaker than previously, since it now takesinto account the losses due to the acoustic field radiated bythe soundboard. The resynthesis of the piano measurementsthrough the resonator model and the excitation obtained bydeconvolution are perceptively satisfactory since the sound isalmost indistinguishable from the original one.

On the contrary, the shape of the filters is modified asa function of the note. Figure 11 shows the modulus of thewaveguide filter F for several notes (in the multiple stringcase, we calculated an average filter by arithmetic averaging).The modulus of the loop filter is related to the losses under-gone by the wave over one period. Note that this modulus in-creases with the fundamental frequency, indicating decreas-ing loss over one period as the treble range is approached.

The relations (7) and (8), relating the physical parame-ters to the waveguide parameters, allow the resonator to becontrolled in a relevant physical way. We can either changethe length of the strings, the inharmonicity, or the losses. Butto be in accordance with the physical system, we have to takeinto account the interdependence of some parameters. Forinstance, the fundamental frequency is obviously related tothe length of the string, and to the tension and the linearmass. If we modify the length of the string, we also have to

−3

−2

−1

0

1

2

3

4

0 1 2 3 4 5 6

Time (ms)

Am

plit

ude

(arb

itra

rysc

ale)

0.8 m/s2 m/s4 m/s

Figure 12: Waveform of three excitation signals of the experimentalsetup, corresponding to three different hammer velocities.

modify, for instance, the fundamental frequency, consider-ing that the tension and the linear mass are unchanged. Thisaspect has been taken into account in the implementation ofthe model.

3. THE SOURCE MODEL

In the previous section, we observed that the waveguidefilters are almost invariant with respect to the velocity. Incontrast, the excitation signals (obtained as explained inSection 2.3 and related to the impact of the hammer onthe string) varies nonlinearly as a function of the velocity,thereby taking into account the timbre variations of the re-sulting piano sound. From the extracted excitation signals,we here study the behavior and design a source model byusing signal methods, so as to simulate these behaviors pre-cisely. The source signal is then convolved with the resonatorfilter to obtain the piano bridge signal.

3.1. Nonlinear source behavior as a functionof the hammer velocity

Figure 12 shows the excitation signals extracted from themeasurement of the vibration of a single string struck bya hammer for three velocities corresponding to the pianis-simo, mezzo-forte, and fortissimo musical playing. The exci-tation duration is about 5 milliseconds, which is shorter thanwhat Laroche and Meillier [10] proposed and in accordancewith the duration of the hammer-string contact [6]. Sincethis interaction is nonlinear, the source also behaves nonlin-early. Figure 13 shows the spectra of several excitation signalsobtained for a single string at different velocities regularlyspaced between 0.8 and 4 m/s. The excitation correspond-ing to fortissimo provides more energy than the ones corre-sponding to mezzo-forte and pianissimo. But this increased


−20

−10

0

10

20

30

40

500 1000 1500 2000 2500 3000 3500 4000 4500

Frequency (Hz)

Am

plit

ude

(dB

)

4 m/s

0.8 m/s

Figure 13: Amplitude of the excitation signals for one string andseveral velocities.

amplitude is frequency dependent: the higher partials in-crease more rapidly than the lower ones with the same ham-mer velocity. This increase in the high partials correspondsto an increase in brightness with respect to the hammer ve-locity. It can be better visualized by considering the spec-tral centroid [31] of the excitation signals. Figure 14 showsthe behavior of this perceptually (brightness) relevant crite-ria [32] as a function of the hammer velocity. Clearly, for one,two, or three strings, the spectral centroid is increased, cor-responding to an increased brightness of the sound. In addi-tion to the change of slope, which translates into the changeof brightness, Figure 13 shows several irregularities commonto all velocities, among which a periodic modulation relatedto the location of the hammer impact on the string.

3.2. Design of a source signal model

The amplitude of the excitation increases smoothly as a func-tion of the hammer velocity. For high-frequency compo-nents, this increase is greater than for low frequency compo-nents, leading to a flattening of the spectrum. Nevertheless,the general shape of the spectrum stays the same. Formantsdo not move and the modulation of the spectrum due to thehammer position on the string is visible at any velocity. Theseobservations suggest that the behavior of the excitation couldbe well reproduced using a subtractive synthesis model.

The excitation signal is seen as an invariant spectrumshaped by a smooth frequency response filter, the charac-teristics of which depend on the hammer velocity. The re-sulting source model is shown on Figure 15. The subtractivesource model consists of the static spectrum, the spectral de-viation, and the gain. The static spectrum takes into accountall the information that is invariant with respect to the ham-mer velocity. It is a function of the characteristics of the ham-mer and the strings. The spectral deviation and the gain bothshape the spectrum as function of the hammer velocity. Thespectral deviation simulates the shifting of the energy to thehigh frequencies, and the gain models the global increase of

1200

1400

1600

1800

2000

2200

1 1.5 2 2.5 3 3.5 4

Hammer velocity (m/s)

Freq

uen

cy(H

z)

One stringTwo stringsThree strings

Figure 14: The spectral centroid of the excitation signals for one(plain), two (dash-dotted) and three (dotted) strings.

Hammer position Hammer velocity

Static spectrum Spectral deviation Gain

0 dBEs E

Figure 15: Diagram of the subtractive source model.

amplitude. Earlier versions of this model were presented in[1, 2]. This type of models has been, in addition, shown towork well for many instruments [33].

In the early days of digital waveguides, Jaffe and Smith[24] modeled the velocity-dependent spectral deviation asa one-pole lowpass filter. Laursen et al. [34] proposed asecond-order biquad filter to model the differences betweenguitar tones with different dynamics.

A similar approach was developed by Smith and VanDuyne in the time domain [15]. The hammer-string interac-tion force pulses were simulated using three impulses passedthrough three lowpass filters which depend on the hammervelocity. In our case, a more accurate method is needed toresynthesize the original excitation signal faithfully.

3.2.1. The static spectrum

We defined the static spectrum as the part of the excitationthat is invariant with the hammer velocity. Considering theexpression of the amplitude of the partials, an, for a hammerstriking a string fixed at its extremities (see Valette and Cuesta[19]), and knowing that the spectrum of the excitation is


−30

−20

−10

0

10

20

30

1000 2000 3000 4000 5000 6000 7000

Frequency (Hz)

Am

plit

ude

(dB

)

Figure 16: The static spectrum Es(ω).

related to amplitudes of the partials by E = anD [29], thestatic spectrum Es can be expressed as

Es(ωn) = 4L

T

sin(nπx0/L

)nπ√

1 + n2B, (12)

where T is the string tension and L its length, B is the inhar-monicity factor, and x0 the striking position. We can easilymeasure the striking position, the string length and the in-harmonicity factor on our experimental setup. On the otherhand, we have an only estimation of the tension, it can becalculated through the fundamental frequency and the linearmass of the string.

Figure 16 shows this static spectrum for a single string.Many irregularities, however, are not taken into account forseveral reasons. We will see later their importance from a per-ceptual point of view. Equation (12) is still used, however,when the hammer position is changed. This is useful whenone plays with a different temperament because it reducesdissonance.

3.2.2. The deviation with the dynamic

The spectral deviation and the gain take into account the de-pendency of the excitation signal on velocity. They are esti-mated by dividing the spectrum of the excitation signal bythe static spectrum for all velocities:

d(ω) = E(ω)Es(ω)

, (13)

where E is the original excitation signal. Figure 17 shows thisdeviation for three hammer velocities. It effectively strength-ens the fortissimo, in particular for the medium and highpartials. Its evolution with the frequency is regular and cansuccessfully be fitted to a first-order exponential polynomial(as shown in Figure 17)

d = exp(a f + g), (14)

−70

−60

−50

−40

−30

−20

−10

0

10

20

0 2000 4000 6000 8000 10000

Frequency (Hz)

Am

plit

ude

(dB

)

OriginalSpectral tilt

3.8 m/s

2.0 m/s

0.8 m/s

Figure 17: Dynamic deviation of three excitation signals of the ex-perimental setup, original and modeled.

35

40

45

50

0 1 2 3 4 5 6

dB


5

10

15

20

0 1 2 3 4 5 6

dB/k

Hz


Figure 18: Parameters g (gain)(top), a (spectral deviation) (bot-tom) as a function of the hammer velocity for the experimentalsetup signals, original (+) and modeled (dashed).

where d is the modeled deviation. The term g correspondsto the gain (independent of the frequency) and the term a fcorresponds to the spectral deviation. The variables g anda depend on the hammer velocity. To get a usable sourcemodel, we must consider the parameter’s behavior with dif-ferent dynamics. Figure 18 shows the two parameters for sev-eral hammer velocities. The model is consistent since theirbehavior is regular. But the tilt increases with the hammer ve-locity, showing an asymptotic and nonlinear behavior. Thisobservation can be directly related to the physics of the ham-mer. As we have seen, when the felt is compressed, it be-comes harder and thus gives more energy to high frequen-cies. But, for high velocities, the felt is totally compressed andits hardness is almost constant. Thus, the amplitude of the


corresponding string wave increases further but its spectralcontent is roughly the same. We have fitted this asymptoticbehavior by an exponential model (see Figure 18), for eachparameter g and a,

g(v) = αg − βg exp(− γgv

),

a(v) = αa − βa exp(− γav

),

(15)

where αi (i = g, a) is the asymptotic value, βi (i = g, a) isthe deviation from the asymptotic value at zero velocity (thedynamic range), and γi (i = g, a) is the velocity exponen-tial coefficient, governing how sensitive the attribute is to avelocity change. The parameters of this exponential modelwere found using a nonlinear weighted curvefit.

3.2.3. Resynthesis of the excitation signal

For a given velocity, the excitation signal can now be recre-ated using (13), (14), and (15). The inverse Fourier trans-form of this source model convoluted with the transfer func-tion of the resonator leads to a realistic sound of a stringstruck by a hammer. The increase in brightness with the dy-namic is well reproduced. But from a resynthesis point ofview, this model is not satisfactory. The reproduced signalis different from the original one; it sounds too regular andmonotonous. To understand this drawback of our model, wecalculated the error we made by dividing the original excita-tion signal by the modeled one for each velocity. The corre-sponding curves are shown on Figure 19 for three velocities.

Notice that this error term does not depend on the ham-mer velocity, meaning that our static spectrum model is toostraightforward and does not take into account the irregular-ities of the original spectrum. Irregularities are due to manyphenomena including the width of the hammer-string con-tact, hysteretic phenomena in the felt, nonlinear phenomenain the string, and mode resonances of the hammer. To obtaina more realistic sound with our source model, we include thiserror term in the static spectrum. The resulting original andresynthesized signals are shown on Figure 20. The deviationsof the resulting excitations are perceptually insignificant. Thesynthesized sound obtained is then close to the original one.

3.3. Behavior and control of the source throughmeasurements on a real piano

The source model parameters were calculated for a subset ofthe data for the piano, namely the notes A0, F1, B1, G2, C3,G3, D4, E5, and F6. Each note has approximately ten veloc-ities, from about 0.4 m/s to between 3 to 6 m/s. The sourceextracted from the signals measured on the piano behaves asthe data obtained with the experimental setting for all noteswith respect to the hammer velocity. The dynamic deviationis well modeled by the gain g and the spectral deviation pa-rameter a. As in Section 3.2, their behavior as a function ofthe velocity is well fitted using an asymptotic exponentialcurve.

From a perceptual point of view, an increased hammervelocity corresponds both to an increased loudness and a rel-

−15

−10

−5

0

5

10

15

20

25

0 2000 4000 6000 8000 10000

Frequency (Hz)

Am

plit

ude

(dB

)

Figure 19: Example of the error spectrum. The large errors gener-ally fall in the weak parts of the spectrum.

−40

−30

−20

−10

0

10

20

30

40

50

0 2000 4000 6000 8000 10000

Frequency (Hz)

Am

plit

ude

(dB

)

3.8 m/s

2.0 m/s

0.8 m/s

OriginalVelocity modeled

Figure 20: Original and modeled excitation spectrum for three dif-ferent hammer velocities for the experimental setup signals.

ative increase in high frequencies leading to a brighter tone.Equations (15) make it possible to resynthesize of the excita-tion signal for a given note and hammer velocity. However,parameters g and a used in the modeling are linked in a com-plex way to the two most important perceptual features ofthe tone, that is, loudness and brightness. Thus, without athorough knowledge of the model, the user will not be ableto adjust the parameters of the virtual piano to obtain a satis-factory tone. To get an intuitive control of the model, the userneeds to be provided access to these perceptual parameters,loudness and brightness, closely corresponding to energy andspectral centroid. The energy En is directly correlated to theperception of loudness and the spectral centroid Ba to the


perception of brightness [32]. These parameters are given by

En = 1T

∫ Fs/2

0E2( f )df ,

Ba =∫ Fs/2

0 E( f ) f df∫ Fs/20 E( f )df

,

(16)

where f is the frequency and Fs the sampling frequency.To synthesize an excitation signal having a given energy

and spectral centroid, we must express parameters g and a asfunctions of Ba and En. The centroid actually depends onlyon a:

Ba =∫ Fs/2

0 Es( f )ea f f df∫ Fs/20 Es( f )ea f df

. (17)

We numerically calculate the expression of a as a function ofBa and store the solution in a table. Alternatively, assumingthat the brightness change is unaffected by the shape of thestatic spectrum Es, the spectral deviation parameter a can becalculated directly from the given brightness [35].

Knowing a, we can calculate g from the energy En by therelation

g = 12

log

(EnT∫ Fs/2

0 Es2( f )e2a f 2+2b f

). (18)

The behavior of Ba and En as a function of the hammervelocity will then determine the dynamic range of the instru-ment and it must be defined by the user.

Figure 21 shows the behavior of the spectral centroid andthe energy for several notes. The curves have similar behaviorand differ mainly by a multiplicative constant. We have fittedtheir asymptotic behavior by an exponential model, similarlyto what was done with (15). These functions are applied tothe synthesis of each excitation signal and then characterizethe dynamic range of the virtual instrument. It is easy for theuser to change the dynamic range of the virtual instrument,which is modified by the user by changing the shape of thesefunctions.

Calculating the excitation signal is then done as follows.To a given note and velocity, we associate a spectral centroidBa and an energy En (using the asymptotic exponential fit);a is then obtained from the spectral centroid and g from theenergy (equation (18)). One finally gets the spectral devia-tion which, multiplied by the static spectrum, allows the ex-citation signal to be calculated.

4. CONCLUSION

The reproduction of the piano bridge vibration is undoubtlythe first most important step for piano sound synthesis. Weshow that a hybrid model consisting of a resonant part andan excitation part is well adapted for this purpose. After accu-rate calibration, the sounds obtained are perceptually close tothe original ones for all notes and velocities. The resonator,which simulates the phenomena intervening in the strings

1000

2000

3000

4000

5000

6000

1 2 3 4 5 6

Freq

uen

cy(H

z)


F6

E5D4

G3C3

F1 G2B1A0

(a)

0

20

40

60

1 2 3 4 5 6

F6E5

D4G3C3G2B1

F1A0

dBHammer velocity (m/s)

(b)

Figure 21: Spectral centroid (a) and energy (b) for several notesas a function of the hammer velocity, original (plain) and modeled(dotted).

themselves, is modeled by a digital waveguide model that isvery efficient in simulating the wave propagation. The res-onator model exhibits physical parameters such as the stringtension, the inharmonicity coefficient, allowing physicallyrelevant control of the resonator. It also takes into accountthe coupling effects, which are extremely relevant for percep-tion. The source is extracted using a deconvolution processand is modeled using a subtractive signal model. The sourcemodel consists of three parts (static spectrum, spectral devi-ation, and gain) that are dependent on the velocities and thenotes played. To get intuitive control of the source model, weexhibited two parameters: the spectral centroid and the en-ergy, strongly related to the perceptual parameters brightnessand loudness. This perceptual link permits easy control of thedynamic characteristics of the piano.

Thus, the tone of a given piano can be synthesized using ahybrid model. This model is currently implemented in real-time using a Max-MSP software environment.

APPENDIX

INVERSE PROBLEM, THREE-COUPLEDDIGITAL WAVEGUIDE

We show in this appendix how the parameters of a three-coupled digital waveguide model can be expressed as func-tion of the modal parameters. This method is an extensionof the model presented in [29].


The signal measured at the bridge level is the result ofthe vibration of three coupled strings. Each partial is actuallyconstituted by at least three components, having frequencieswhich are slightly different from the frequencies of each in-dividual string. We write the measured signal as a sum of ex-ponentially damped sinusoids:

s(t) =∞∑k=1

a1ke−α1kteiω1kt + a2ke

−α2kteiω2kt + a3ke−α3kteiω3kt,

(A.1)with a1k, a2k, and a3k the initial amplitudes, α1k, α2k, α3k andω1k, ω2k, ω3k the damping coefficients and the frequencies ofthe components of the kth partial. The Fourier transform ofs(t) is

S(ω) =∞∑k=1

a1k

α1k + i(ω − ω1k

) +a2k

α2k + i(ω− ω2k

)+

a3k


) .(A.2)

We identify this expression locally in frequency with theoutput T(ω) of the three-coupled waveguide model (seeFigure 3):

T(ω) = N1

N2(A.3)

with

N1 = F1 + F2 + F3

+ 2[(Ca − 1

)(F1F2 + F2F3

)+(Ce − 1

)F1F3

]+ F1F2F3

[3 + 4CeCa − 4Ca − 2Ce − C2

e

],

N2 = 1− (F1 + F2 + F3)

+(F1F2 + F2F3

)(1− C2

a

)+ F1F3

(1− C2

e

)+ F1F2F3

(2C2

a + C2e − 2C2

a

+ C2e − 2C2

aCe − 1),

(A.4)

where Fi (i = 1, 2, 3) are the loop filters of the digital waveg-uides Gi (i = 1, 2, 3) (without loss of generality, one can as-sume that D1 = D2 = D3 = D, since the difference in delayscan be taken into account in the phase of the filter Fi). Forthis purpose, since T(ω) is a rational fraction of third-orderpolynomial in e−iωD (see (6)), it can be decomposed into asum of three rational fractions of the first-order polynomialin e−iωD:

T(ω) = P(ω)e−iωD

1− X(ω)e−iωD+

Q(ω)e−iωD

1− Y(ω)e−iωD

+R(ω)e−iωD

1− Z(ω)e−iωD.

(A.5)

The vibrations generated by the model are assimilated to asuperposition of three series of partials whose frequenciesand decay times are governed by the quantities X(ω), Y(ω),and Z(ω). By identification between (A.3) and (A.5), we de-

termine the following system of 6 equations:

P + Q + R = F1 + F2 + F3, (A.6)

PY + PZ + QX + QZ + RX + RY

= 2F1F2(1− Ca

)+ 2F1F3

(1− Ce

)+ 2F2F3

(1− Ca

),

(A.7)

PYZ + QXZ + RXY

= F1F2F3(4CaCe − 4Ca − 2Ce − C2

e + 3),

(A.8)

X + Y + Z = F1 + F2 + F3, (A.9)

XY + XZ + YZ = F1F2(1− C2

a

)+ F2F3

(1− C2

a

)+ F1F3

(1− C2

e

),

(A.10)

XYZ = F1F2F3(1− 2C2

a − C2e + 2C2

aCe). (A.11)

We identify (A.2) with the excitation signal times thetransfer function T (equation (A.5)):

S(ω) = E(ω)T(ω). (A.12)

Assuming that two successive modes do not overlap (theseassumptions are verified for the piano sound) and by writing

X(ω) = ∣∣X(ω)∣∣eiΦX (ω),

Y(ω) = ∣∣Y(ω)∣∣eiΦY (ω),

Z(ω) = ∣∣Z(ω)∣∣eiΦZ (ω),

(A.13)

we express (A.12) near each double resonance as

a1k

α1k + i(ω− ω1k

) +a2k


) +a3k


) E(ω)P(ω)e−iωD

1− ∣∣X(ω)∣∣e−i(ωD−ΦX (ω))

+E(ω)Q(ω)e−iωD

1− ∣∣Y(ω)∣∣e−i(ωD−ΦY (ω))

+E(ω)R(ω)e−iωD

1− ∣∣Z(ω)∣∣e−i(ωD−ΦZ (ω))

.

(A.14)

We identify term by term the members of this equation. Wetake, for example,

a1k


) E(ω)P(ω)e−iωD

1− ∣∣X(ω)∣∣e−i(ωD−ΦX (ω))

. (A.15)

The resonance frequencies of each doublet ω1k, ω2k, and ω3k

correspond to the minimum of the three denominators

1− ∣∣X(ω)∣∣e−i(ωD−ΦX (ω)),

1− ∣∣Y(ω)∣∣e−i(ωD−ΦY (ω)),

1− ∣∣Z(ω)∣∣e−i(ωD−ΦZ (ω)).

(A.16)

If we assume that moduli |X(ω)|, |Y(ω)|, and |Z(ω)| areclose to one (this assumption is realistic because the prop-agation is weakly damped), we determine the values of ω1k,


ω2k, and ω3k:

ω1k = ΦX(ω1k

)+ 2kπ

D,

ω2k = ΦY(ω2k

)+ 2kπ

D,

ω3k = ΦZ(ω3k

)+ 2kπ

D.

(A.17)

Taking ω = ω1k + ε with ε arbitrary small,

a1k

α1k + iε E

(ω1k + ε

)P(ω1k + ε

)e−iΦX (ω1k+ε)e−iεD

1− ∣∣X(ω1k + ε)∣∣e−iεD . (A.18)

A limited expansion of e−iεD 1− iεD+ θ(ε2) around ε = 0(at the zeroth order for the numerator and at the first orderfor the denominator) gives

E(ω1k + ε

)P(ω1k + ε

)e−iΦX (ω1k+ε)e−iεD

E(ω1k

)P(ω1k

)e−iΦX (ω1k),

1− ∣∣X(ω1k + ε)∣∣e−iεD 1− ∣∣X(ω1k

)∣∣(1− iεD).

(A.19)

Assuming that P(ω) and |X(ω)| are locally constant (in thefrequency domain), we identify term by term (the two mem-bers are considered as functions of the variable ε). We deducethe expressions of |X(ω)|, |Y(ω)|, and |Z(ω)| as a functionof the amplitudes and decay times coefficients for each mode:

∣∣X(ω1k)∣∣ = 1

α1kD + 1,

∣∣Y(ω2k)∣∣ = 1

α2kD + 1,

∣∣Z(ω2k)∣∣ = 1

α3kD + 1.

(A.20)

We also get the relations

E(ω1k

)P(ω1k

) = a1kDX(ω1k

),

E(ω2k

)Q(ω2k

) = a2kDY(ω2k

),

E(ω3k

)Q(ω3k

) = a3kDY(ω3k

).

(A.21)

From the measured signal, we estimate the modal parame-ters a1k, a2k, a3k, α1k, α2k, α3k, ω1k, ω2k, and ω3k. Using (A.17)and (A.20), we calculate X , Y , and Z. We still have 9 un-known variables P, Q, R, E, Ca, Ce, F1, F2, and F3. But wealso have a system of 9 equations ((A.6), (A.7), (A.8), (A.9),(A.10), (A.11), and (A.21)). Assuming that the two resonancefrequencies are close and that the variables P, Q, R, E, Ca,Ce, F1, F2, F3, X , Y , and Z have a locally smooth behav-ior, we then express the waveguide parameters as function ofthe temporal parameters. For the sake of simplicity, we noteEk = E(ω1k) = E(ω2k).

Using (A.6) and (A.9), we obtain Pk + Qk + Rk = Xk +Yk +Zk. Thanks to (A.21) we finally get the expression of theexcitation signal at the resonance frequencies

Ek = D(a1kXk + a2kYk + a3kZk

)Xk + Yk + Zk

. (A.22)

In the case of a two-coupled digital waveguide, the corre-sponding system admits analytical solutions (see [29]). Butin the case of three-coupled digital waveguide, we have notfound analytical expressions for variables P, Q, R, Ca, Ce, F1,F2, and F3. We have then solved the system numerically.

REFERENCES

[1] J. Bensa, K. Jensen, R. Kronland-Martinet, and S. Ystad, “Per-ceptual and analytical analysis of the effect of the hammer im-pact on piano tones,” in Proc. International Computer MusicConference, pp. 58–61, Berlin, Germany, August 2000.

[2] J. Bensa, F. Gibaudan, K. Jensen, and R. Kronland-Martinet,“Note and hammer velocity dependance of a piano stringmodel based on coupled digital waveguides,” in Proc. Interna-tional Computer Music Conference, pp. 95–98, Havana, Cuba,September 2001.

[3] A. Askenfelt, Ed., Five Lectures on the Acoustics of the Pi-ano, Royal Swedish Academy of Music, Stockholm, Swe-den, 1990, Lectures by H. A. Conklin, Anders Askenfeltand E. Jansson, D. E. Hall, G. Weinreich, and K. Wogram,http://www.speech.kth.se/music/5 lectures/.

[4] S. Ystad, Sound modeling using a combination of physical andsignal models, Ph.D. thesis, Universite de la Mediterranee,Marseille, France, 1998.

[5] S. Ystad, “Sound modeling applied to flute sounds,” Journalof the Audio Engineering Society, vol. 48, no. 9, pp. 810–825,2000.

[6] A. Askenfelt and E. V. Jansson, “From touch to string vibra-tions. II: The motion of the key and hammer,” Journal of theAcoustical Society of America, vol. 90, no. 5, pp. 2383–2393,1991.

[7] A. Askenfelt and E. V. Jansson, “From touch to string vibra-tions. III: String motion and spectra,” Journal of the AcousticalSociety of America, vol. 93, no. 4, pp. 2181–2195, 1993.

[8] X. Boutillon, “Le piano: Modelisation physiques et devel-oppements technologiques,” in Congres Francais d’AcoustiqueColloque C2, pp. 811–820, Lyon, France, 1990.

[9] P. Schaeffer, Traite des objets musicaux, Edition du Seuil, Paris,France, 1966.

[10] J. Laroche and J. L. Meillier, “Multichannel excitation/filtermodeling of percussive sounds with application to the piano,”IEEE Trans. Speech and Audio Processing, vol. 2, no. 2, pp. 329–344, 1994.

[11] J. O. Smith III, “Physical modeling using digital waveguides,”Computer Music Journal, vol. 16, no. 4, pp. 74–91, 1992.

[12] G. Borin, D. Rochesso, and F. Scalcon, “A physical pianomodel for music performance,” in Proc. International Com-puter Music Conference, pp. 350–353, Computer Music Asso-ciation, Thessaloniki, Greece, September 1997.

[13] B. Bank, “Physics-based sound synthesis of the piano,” M.S.thesis, Budapest University of Technology and Economics,Budapest, Hungary, 2000, published as Report 54, HelsinkiUniversity of Technology, Laboratory of Acoustics and AudioSignal Processing, http://www.mit.bme.hu/∼bank.

[14] J. O. Smith III, “Efficient synthesis of stringed musical instru-ments,” in Proc. International Computer Music Conference, pp.64–71, Computer Music Association, Tokyo, Japan, Septem-ber 1993.

[15] J. O. Smith III and S. A. Van Duyne, “Commuted piano syn-thesis,” in Proc. International Computer Music Conference,pp. 335–342, Computer Music Association, Banff, Canada,September 1995.

[16] S. A. Van Duyne and J. O. Smith III, “Developments forthe commuted piano,” in Proc. International Computer Music


Conference, pp. 319–326, Computer Music Association, Banff,Canada, September 1995.

[17] A. Chaigne and A. Askenfelt, “Numerical simulations ofstruck strings. I. A physical model for a struck string usingfinite difference methods,” Journal of the Acoustical Society ofAmerica, vol. 95, no. 2, pp. 1112–1118, 1994.

[18] X. Boutillon, “Model for piano hammers: Experimental de-termination and digital simulation,” Journal of the AcousticalSociety of America, vol. 83, no. 2, pp. 746–754, 1988.

[19] C. Valette and C. Cuesta, Mecanique de la corde vibrante,Traite des nouvelles technologies. Serie Mecanique. Hermes,Paris, France, 1993.

[20] D. E. Hall and A. Askenfelt, “Piano string excitation V: Spectrafor real hammers and strings,” Journal of the Acoustical Societyof America, vol. 83, no. 6, pp. 1627–1638, 1988.

[21] J. Bensa, S. Bilbao, R. Kronland-Martinet, and J. O. SmithIII, “The simulation of piano string vibration: from physi-cal model to finite difference schemes and digital waveguides,”Journal of the Acoustical Society of America, vol. 114, no. 2, pp.1095–1107, 2003.

[22] A. Chaigne and V. Doutaut, “Numerical simulations of xy-lophones. I. Time-domain modeling of the vibration bars,”Journal of the Acoustical Society of America, vol. 101, no. 1, pp.539–557, 1997.

[23] H. Fletcher, E. D. Blackham, and R. Stratton, “Quality of pi-ano tones,” Journal of the Acoustical Society of America, vol.34, no. 6, pp. 749–761, 1962.

[24] D. A. Jaffe and J. O. Smith III, “Extensions of the Karplus-Strong plucked-string algorithm,” Computer Music Journal,vol. 7, no. 2, pp. 56–69, 1983.



[27] B. Bank, “Accurate and efficient modeling of beating and two-stage decay for string instrument synthesis,” in Proc. Workshopon Current Research Directions in Computer Music, pp. 134–137, Barcelona, Spain, November 2001.

[28] D. Rocchesso and J. O. Smith III, “Generalized digital waveg-uide networks,” IEEE Trans. Speech and Audio Processing, vol.11, no. 3, pp. 242–254, 2003.

[29] M. Aramaki, J. Bensa, L. Daudet, P. Guillemain, andR. Kronland-Martinet, “Resynthesis of coupled piano stringvibrations based on physical modeling,” Journal of New MusicResearch, vol. 30, no. 3, pp. 213–226, 2002.

[30] K. Steiglitz and L. E. McBride, “A technique for the identifi-cation of linear systems,” IEEE Trans. Automatic Control, vol.10, pp. 461–464, 1965.

[31] J. Beauchamp, “Synthesis by spectral amplitude and “bright-ness” matching of analyzed musical instrument tones,” Jour-nal of the Audio Engineering Society, vol. 30, no. 6, pp. 396–406, 1982.

[32] S. McAdams, S. Winsberg, S. Donnadieu, G. de Soete, andJ. Krimphoff, “Perceptual scaling of synthesized musical tim-bres: Common dimensions, specificities, and latent subjectclasses,” Psychological Research, vol. 58, pp. 177–192, 1992.

[33] K. Jensen, “Musical instruments parametric evolution,” inProc. International Symposium on Musical Acoustics, pp. 319–326, Computer Music Association, Mexico City, Mexico, De-cember 2002.

[34] M. Laursen, C. Erkut, V. Valimaki, and M. Kuuskankara,“Methods for modeling realistic playing in acoustic guitarsynthesis,” Computer Music Journal, vol. 25, no. 3, pp. 38–49,2001.

[35] K. Jensen, Timbre models of musical sounds, Ph.D. thesis,Department of Datalogy, University of Copenhagen, Copen-hagen, Denmark, DIKU Tryk, Technical Report No 99/7,1999.

Julien Bensa obtained in 1998 his Mas-ter’s degree (DEA) in acoustics, signal pro-cessing, and informatics applied to mu-sic from the Pierre et Marie Curie Uni-versity, Paris, France. He received in 2003a Ph.D. in acoustics and signal process-ing from the University of Aix-MarseilleII for his work on the analysis and syn-thesis of piano sounds using physicaland signal models (available on line athttp://www.lma.cnrs-mrs.fr/∼bensa). He currently holds a postdocposition in the Laboratoire d’Acoustique Musicale, Paris, France,and works on the relation between the parameters of synthesismodels of musical instruments and the perceived quality of the cor-responding tones.

Kristoffer Jensen got his Master’s degree incomputer science at the Technical Univer-sity of Lund, Sweden, and a DEA in sig-nal processing at the ENSEEIHT, Toulouse,France. His Ph.D. was delivered and de-fended in 1999 at the Department of Data-logy, University of Copenhagen, Denmark,treating analysis/synthesis, signal process-ing, classification, and modeling of musicalsounds. Kristoffer Jensen has a broad back-ground in signal processing, including musical, speech recognitionand acoustic antenna topics. He has been involved in synthesizersfor children, state-of the-art next-generation effect processors, andsignal processing in music informatics. His current research topic issignal processing with musical applications, and related fields, in-cluding perception, psychoacoustics, physical models, and expres-sion of music. He currently holds a position at the Department ofDatalogy as Assistant Professor.

Richard Kronland-Martinet received aPh.D. in acoustics from the University ofAix-Marseille II, France, in 1983. He re-ceived a “Doctorat d’Etat es Sciences” in1989 for his work on Analysis and synthesisof sounds using time-frequency and time-scale representations. He is currently Di-rector of Research at the National Centerfor Scientific Research (CNRS), Laboratoirede Mecanique et d’Acoustique in Marseille,where he is the head of the group “Modeling, Synthesis and Con-trol of Sound and Musical Signals.” His primary research interestsare in analysis and synthesis of sounds with a particular empha-sis on musical sounds. He has recently been involved in a multi-disciplinary research project associating sound synthesis processesand brain imaging techniques fonctional Nuclear Magnetic Reso-nance (fNRM) to better understand the way the brain is processingsounds and music.


Warped Linear Prediction of Physical ModelExcitations with Applications in AudioCompression and Instrument Synthesis

Alexis GlassDepartment of Acoustic Design, Graduate School of Design, Kyushu University, 4-9-1 Shiobaru, Minami-ku, Fukuoka 815-8540, JapanEmail: [email protected]

Kimitoshi FukudomeDepartment of Acoustic Design, Faculty of Design, Kyushu University, 4-9-1 Shiobaru, Minami-ku, Fukuoka 815-8540, JapanEmail: [email protected]

Received 8 July 2003; Revised 13 December 2003

A sound recording of a plucked string instrument is encoded and resynthesized using two stages of prediction. In the first stage ofprediction, a simple physical model of a plucked string is estimated and the instrument excitation is obtained. The second stageof prediction compensates for the simplicity of the model in the first stage by encoding either the instrument excitation or themodel error using warped linear prediction. These two methods of compensation are compared with each other, and to the caseof single-stage warped linear prediction, adjustments are introduced, and their applications to instrument synthesis and MPEG4’saudio compression within the structured audio format are discussed.

Keywords and phrases: warped linear prediction, audio compression, structured audio, physical modelling, sound synthesis.

1. INTRODUCTION

Since the discovery of the Karplus-Strong algorithm [1] andits subsequent reformulation as a physical model of a string,a subset of the digital waveguide [2], physical modelling hasseen the rapid development of increasingly accurate and dis-parate instrument models. Not limited to string model im-plementations of the digital waveguide, such as the kantele[3] and the clavichord [4], models for brass, woodwind, andpercussive instruments have made physical modelling ubiq-uitous.

With the increasingly complex models, however, thetask of parameter selection has become correspondinglydifficult. Techniques for calculating the loop filter coeffi-cients and excitation for basic plucked string models havebeen refined [5, 6] and can be quickly calculated. How-ever, as the one-dimensional model gave way to models withweakly interacting transverse and vertical polarizations, re-search has looked to new ways of optimizing parameter se-lection. These new methods of optimizing parameter se-lection use neural networks or genetic algorithms [7, 8]to automate tasks which would otherwise take human op-erators an inordinate amount of time to adjust. This re-search has yielded more accurate instrument models, but

for some applications it also leaves a few problems unad-dressed.

The MPEG-4 structured audio codec allows for the im-plementation of any coding algorithm, from linear predic-tive coding to adaptive transform coding to, at its most ef-ficient, the transmission of instrument models and perfor-mance data [9]. This coding flexibility means that MPEG-4 has the potential to implement any coding algorithm andto be within an order of magnitude of the most efficientcodec for any given input data set [10]. Moreover, for sourcesthat are synthetic in nature, or can be closely approximatedby physical or other instrument models, structured audiopromises levels of compression orders of magnitude bet-ter than what is currently possible using conventional puresignal-based codecs.

Current methods used to parameterize physical modelsfrom recordings require, however, a great deal of time forcomplex models [8]. They also often require very preciseand comprehensive original recordings, such as recordingsof the impulse response of the acoustic body [5, 11], in or-der to achieve reproductions that are indistinguishable fromthe original. Given current processor speeds, these limita-tions preclude the use of genetic algorithm parameter selec-tion techniques for real-time coding. Real-time coding is also

WLP in Physical Modelling for Audio Compression 1037

made exceedingly difficult in such cases where body impulseresponses are not available or playing styles vary from modelexpectations.

This paper proposes a solution to this real-time pa-rameterization and coding problem for string modelling inthe marriage of two common techniques, the basic pluckedstring physical model and warped linear prediction (WLP)[12].

The justifications for this approach are as follows. Moststring recordings can be analyzed using the techniques de-veloped by Smith, Karjalainen et al. [2, 6] in order to param-eterize a basic plucked string model, and a considerable pre-diction gain can be achieved using these techniques. The ex-citation signal for the plucked string model is constituted byan attack transient that represents the plucking of the stringaccording to the player’s style and plucking position [11], andis followed by a decay component. This decay component in-cludes the body resonances of the instrument [11, 13], beat-ing introduced by the string’s three-dimensional movementand further excitation caused by the player’s performance.Additional excitations from the player’s performance includedeliberate expression through vibrato or even unintentionalinfluences, such as scratching of the string or the rattlingcaused by the string vibrating against the fret with weakfingering pressure. The body resonances and contributionsfrom the three-dimensional movement of the string meanthat the excitation signal is strongly correlated and there-fore a good candidate for WLP coding. Furthermore, whileresidual quantization noise in a warped predictive codec isshaped so as to be masked by the signal’s spectral peaks [12],in one of the proposed topologies, the noise in the physicalmodel’s excitation signal is likewise shaped into the mod-elled harmonics. This shaping of the noise by the physicalmodel results in distortion that, if audible, is neither un-natural nor distracting, thereby allowing codec sound qual-ity to degrade gracefully with decreasing bit rate. In theideal case, we imagine that at the lowest bit rate, the guitarwould be transmitted using only the physical model param-eters and that with increasing excitation bit rate, the repro-duced guitar timbre would become closer to the target origi-nal one.

This paper is composed of six sections. Following the in-troduction, the second section describes the plucked stringmodel used in this experiment and the analysis methods usedto parameterize it. The third section describes the record-ing of a classic guitar and an electric guitar for testing. Thecoding of the guitar tones using a combination of physicalmodelling and warped linear predictive coding is outlined inSection 4. Section 5 analyzes the results from simulated cod-ing scenarios using the recorded samples from Section 3 andthe topologies of Section 4, while investigating methods offurther improving the quality of the codec. Section 6 con-cludes the paper.

2. MODEL STRUCTURE

A simple linear string model extended from the Karplus-Strong algorithm, by Jaffe and Smith [14], was used in this

x(n)+

y(n)

F(z) G(z) z−L

Figure 1: Topology of a basic plucked string physical model.

study, comprised of one delay line z−L with a first-order all-pass fractional delay filter F(z) and a single pole low-passloop filter G(z) as shown in Figure 1, where,

F(z) = a + z−1

1 + az−1, (1)

G(z) = g(1 + a1

)1 + a1z−1

, (2)

and the overall transfer function of the system can be ex-pressed as

H(z) = 11− F(z)G(z)z−L

. (3)

This string model is very simple and much more accurateand versatile models have been developed since [6, 11, 15].For the purposes of this study, however, it was required thatthe model could be quickly and accurately parameterizedwithout the use of complex or time consuming algorithmsand sufficient that it offers a reasonable first-stage codinggain. The algorithms used to parameterize the first-ordermodel are described in detail in [15] and will only be out-lined here as they were implemented for this study.

In the first stage of the model parameterization, the pitchof the target sound was detected from the target’s autocorre-lation function. The length of the delay line z−L and the frac-tional delay filter F(z) were determined by dividing the sam-pling frequency (44.1 kHz) by the pitch of the target. Next,the magnitude of up to the first 20 harmonics were trackedusing short-term Fourier transforms (STFTs). The magni-tude of each harmonic versus time was recorded on a loga-rithmic scale after the attack transient of the pluck was deter-mined to have dissipated and until the harmonic had decayed40 dB or disappeared into the noise floor.

A linear regression was performed on each harmonic’sdecay to determine its slope, βk, as shown in Figure 2, and themeasured loop gain for each harmonic, Gk, was calculatedaccording to the following equation,

Gk = 10βkL/20H , k = 1, 2, . . . ,Nh, (4)

where L is the length of the delay line (including the frac-tional component), and H is the hop size (adjusted to ac-count for hop overlap). The loop gain at DC, g, was esti-mated to equal the loop gain of the first harmonic, G1, asin [15]. Because the target guitar sounds were arbitrary andnonideal, the harmonic envelop trajectories were quite noisyin some cases, so, additional measures had to be introducedto stop tracking harmonics when their decays became too


50

0

50

Mag

nit

ude

(dB

)

0 0.5 1 1.5 2 2.5

Time (s)

Figure 2: The temporal envelopes of the lowest four harmonics ofa guitar pluck (dashed) and their estimated decays (solid).

erratic or, as in some cases, negative. In such cases as whenthe guitar fret was held with insufficient pressure, additionaltransients occurred after the first attack transient and thistended to raise the gain factor in the loop filter, resulting ina model that did not accurately reflect string losses. For thepurposes of this study, such effects were generally ignored solong as a positive decay could be measured from the harmon-ics tracked.

The first-order loop filter coefficient a1 was estimated byminimizing the weighted error between the target loop filterGk, as calculated in (4), and candidate filters G(z) from (2).A weighting function Wk, suggested by [15] and defined as

Wk = 1(1−Gk

) , (5)

was used such that the error could be calculated as follows:

E(a1) = Nh∑

k=1

Wk(Gk −

∣∣G(e jωk , a1)∣∣), (6)

where ωk is the frequency at the harmonic being evaluatedand 0 < a1 < 1. This error function is roughly quadratic inthe vicinity of the minimum, and parabolic interpolation wasfound to yield accurate values for the minimum in less timethan iterative methods.

For controlled calibration of the loop filter extraction al-gorithm, synthesized plucked string samples were created us-ing the extended Karplus-Strong algorithm and the model asdescribed by Valimaki [11], with two string polarizations anda weak sympathetic coupling between the strings.

3. DATA ACQUISITION

The purpose of the algorithms explored in this research wasto resynthesize real, nontrivial plucked string sounds using

Separate room

PC with Layla

Anechoic chamber

Mic amp

Figure 3: Schematic for classic guitar pluck recording.

the combination of the basic plucked string model and WLPcoding. No special care was taken, therefore, in the selec-tion of the instruments to be used or the nature of the gui-tar tones to be analyzed and resynthesized beyond that theywere monophonic, recorded in an anechoic chamber andeach pluck was preceded by silence to facilitate the analysisprocess. A schematic of the recording environment and sig-nal flow for the classic guitar is pictured in Figure 3.

Two guitars were recorded. The first, a classic guitar, wasrecorded in an anechoic chamber with the guitar held ap-proximately 50 cm from a Bruel & Kjaer type 4191 free field1/2′′ microphone, the output of which was amplified by aFalcon Range 1/2′′ type 2669 microphone preamp with aBruel & Kjaer type 5935 power supply and fed into a PCthrough a Layla 24/96 multitrack recording system. The elec-tric guitar was recorded through its line out and a YamahaO3D mixer into the Layla. A variety of plucking styles wererecorded in both cases, along with the application of vibrato,string scratching, and several cases where insufficient fingerpressure on the frets lead to further string excitation (i.e., arattling of the string) after the initial pluck.

After capturing approximately 8 minutes of playing witheach guitar, suitable candidates for the study were selectedon the basis of their unique timbres, durations, and poten-tial difficulty for accurate resynthesis using existing pluckedstring models. More explicitly, in the case of the classic guitar,bright plucks of E1 (82 Hz) were recorded along with severalrecordings of B1 (124 Hz), where weak finger pressure leadto a rattling of the string. Another sample selected involvedthis weak finger pressure leading to an early damping of thestring by the fret hand, though without the nearly instan-taneous subsequent decay that a fully damped string wouldyield. A third, higher pitch was recorded with an open stringat E3 (335 Hz). In the case of the electric guitar, two sampleswere used—one of slapped E1 (82 Hz) with almost no decayand another of E2 (165 Hz) with some vibrato applied.


1

0.5

0

−0.5

−1

Am

plit

ude

0 0.5 1 1.5 2

Time (s)

(a)

1

0.5

0

−0.5

−1

Am

plit

ude

0 0.5 1 1.5 2

Time (s)

(b)

Figure 4: The decomposition of an excitation into (a) attack and(b) decay. The attack window is 200 milliseconds long. In this case,decay refers to the portion of the pluck where the greatest attenu-ation is a result of string losses. Because the string is not otherwisedamped, it may also be considered to be the sustain segment of theenvelope.

4. ANALYSIS/RESYNTHESIS ALGORITHMS

4.1. Warped linear prediction

Frequency warping methods [16] can be used with linearprediction coding so that the prediction resolution closelymatches the human auditory system’s nonuniform frequencyresolution. Harma found that WLP realizes a basic psychoa-coustic model [12]. As a control for the study, the targetsignal was therefore first processed using a twentieth-orderWLP coder of lattice structure.

The lattice filter’s reflection coefficients were not quan-tized, and after inverse filtering, the residual was split intotwo sections, attack and decay, which were quantized using amid-riser algorithm. The step size in the mid-riser quantizerwas set such that the square error of the residual was mini-mized. The number of bits per sample in the attack residual(BITSA) was set to each of BITSA = 16, 8, 4 for each ofthe bits per sample in the decay residual BITSD = 2, 1.The frame size for the coding was set to equal two periods ofthe guitar pluck being coded, and the reflection coefficientswere linearly interpolated between frames. The bit allocationmethod was used in order to match the case of the topolo-gies that use a first-stage physical model predictor, wheremore bits were allocated to the attack excitation than thedecay excitation. Harma found in [12] that near transpar-ent quality could be achieved with 3 bits per sample usinga WLP codec. It is therefore reasonable to suggest that the

WLP used here could have been optimized by distributingthe high number of bits used in the attack throughout thelength of the sound to be coded. However, since similar op-timizations could also be made in the two-stage algorithms,only the simplest method was investigated in this study.

4.2. Windowed excitation

As the most basic implementation of the physical model, theresidual from the string model’s inverse filter can be win-dowed and used as the excitation for the model. In this study,the excitation was first coded using a warped linear predic-tive coder of order 20 and with BITSA bits of quantizationfor each sample of the residual. In many cases, the first 100milliseconds of the excitation contains enough informationabout the pluck and the guitar’s body resonances for accurateresynthesis [13, 15]. The beating caused by the slight three-dimension movement of the string and the rattling caused bythe energetic plucks used in the study, however, were signifi-cant enough that a longer excitation was used.

Specifically, the window used was thus unity for the first100 milliseconds of the excitation and then decayed as thesecond half of a Hanning window for the following 100 mil-liseconds. An example of this windowed excitation can beseen in the top of Figure 4. This windowed excitation, consid-ered as the attack component, was input to the string modelfor comparison to the WLP case and used in the modifiedextended Karplus-Strong algorithm which will now be de-scribed.

4.3. Two-stage coding topologies

As described in [9], structured audio allows for the parame-terization and transmission of audio using arbitrary codecs.These codecs may be comprised of instrument models, effectmodels, psychoacoustic models, or combinations thereof.The most common methods used for the psychoacousticcompression of audio are transform codecs, such as MP3[17] and ATRAC [18] and time-domain approaches such asWLP [12]. Because the specific application being consideredhere is that of the guitar, the first stage of our codec is thesimple string model described in Section 2. The second stageof coding was then approached using one of two methods:

(1) the model’s output signal error (referred to as modelerror) could be immediately coded using WLP, or

(2) the model’s excitation could be coded using WLP, withthe attack segment of the excitation receiving more bitsas in the WLP case of Section 4.2.

The topologies of these two strategies are illustrated inFigure 5.

Both topologies require the inverse filtering of the targetpluck sound in order to extract the excitation. The decompo-sition of the excitation into attack and decay components forthe first topology, as formerly proposed by Smith [19] andimplemented by Valimaki and Tolonen in [13], reflects thewideband and high amplitude portion which marks the be-ginning of the excitation signal and the decay which typicallycontains lower frequency components from body resonances


Coder Transmission Decoder

String modelparameter estimation

WLPDP(z)

StringmodelH(z)

sInverse

filterH−1(z)

xfull ×wattack

WLPCP−1(z)

Q

BITSA

WLPDP(z)

xattack StringmodelH(z)

swex

− +emodel

WLPCP−1(z)

Q

BITSD

WLPDP(z)

emodel +swex

s

String modelparameter estimation

sInverse

filterH−1(z)

xfull WLPCP−1(z)

wattack×

wdecay×

Q

BITSA

Q

BITSD

xattack

xdecay

+ WLPDP(z)

xfull StringmodelH(z)

s

Figure 5: The WLP coding of model error (WLPCME) topology (top) and WLP coding of model excitation (WLPCMX) topology (bottom).Here, s represents the plucked string recording to be coded and s the reconstructed signal. In this diagram, WLPC indicates the WLP coder,or inverse filter, and WLPD indicates the WLP decoder. Q is the quantizer, with BITSA and BITSD being the number of bits with which therespective signals are quantized.

or from the three-dimensional movement of the string. How-ever, whereas the authors of [13] synthesized the decay exci-tation at a lower sampling rate, justified by its predominantlylower frequency components, the excitations in our study of-ten contained wideband excitations following the initial at-tack and no such multirate synthesis was therefore used. Typ-ical attack and decay decomposition of an excitation is shownin Figure 4. The high frequency decay components are a re-sult of the mismatch between the string model and the sourcerecording.

4.4. Warped linear prediction coding of model error

The WLPCME topology from Figure 5 was implementedsuch that WLP was applied to the model error as follows

swex = h∗ xattack,

emodel = s− swex,

s = swex + emodel,

(7)

where s is the recorded plucked string input, h is the im-pulse response of the derived pluck string model from (3),xattack is the WLP-coded windowed excitation introduced inSection 4.2, swex is the pluck resynthesized using only thewindowed excitation, and emodel is the model error. emodel isthus the model error coded using WLP and BITSD bits persample and s is the reconstructed pluck.

4.5. Warped linear prediction codingof model excitation

In this case, the model excitation was coded instead of themodel error. Following the string model inverse filtering, theexcitation is whitened using a twentieth-order WLP inversefilter. Next, the signal is quantized with BITSA bits per sam-ple allotted to the residual in the attack, and BITSD bits per

sample for the decay residual. This process can be expressedin the following terms:

xfull = h−1 ∗ s,

xattack = qBITSA(p−1 ∗ xfull ·wattack

),

xdecay = qBITSD(p−1 ∗ xfull ·wdecay

),

xfull = p ∗ (xattack + xdecay),

s = h∗ xfull,

(8)

where s is the original instrument recording being modelled,h is the string model’s inverse filter, and xfull is thus themodel excitation. xattack is therefore the string model exci-tation whitened by the WLP, p−1, and quantized to BITSA,while xdecay is likewise whitened and quantized to BITSD. Thesum of the attack and decay is then resynthesized by the WLPdecoder, p. The resulting xfull is subsequently considered asexcitation to the string model, h, to form the resynthesizedplucked string sound s.

5. SIMULATION RESULTS AND DISCUSSION

In order to evaluate the effectiveness of the two proposedtopologies, a measure of the sound quality was required. In-formal listening tests suggested that the WLPCMX topologyoffered slightly improved sound quality and a more musi-cal coding at lower bit rates, although it came at the cost ofa much brighter timbre. At very low bit rates, WLPCMX in-troduced considerable distortion especially for sound sourcesthat were poorly matched by the string model. WLPCME, onthe other hand, was equivalent in sound quality to WLPCand sometimes worse. Resynthesis using windowed excita-tion yielded passable guitar-like timbres, but in none of thetest cases came close to reproducing the nuance or fullness ofthe original target sounds.


For a more formal evaluation of the simulated codecs’sound quality, an objective measure of sound quality was cal-culated by measuring the spectral distance between the fre-quency warped STFTs, Sk, of the original pluck recordingand the resynthesized output, Sk, created using the codecs.The frequency-warped STFT sequences were created by firstwarping each successive frame of each signal using cascadedall-pass filters [16], followed by a Hanning window and afast Fourier transform (FFT). The method by which the barkspectral distance (BSD) was measured is as follows:

BSDk =

√√√√√(

1N

N−1∑n=0

(20 log10

∣∣Sk(n)∣∣− 20 log10

∣∣Sk(n)∣∣)2

),

(9)with the mean BSD for the whole sample being the un-weighted mean of all frames k. A typical profile of BSD ver-sus time is shown in Figure 6 for the three cases WLPC,WLPCMX, and WLPCME.

In the first round of simulations, all six input samplesas described in Section 3 were processed using each of thealgorithms described in Section 4. The resulting mean BSDswere then calculated to be as shown in Figure 7.

Subjective evaluation of the simulated coding revealedthat as bit rate decreased, the WLPCMX topology main-tained a timbre that, while brighter than the target, was rec-ognizably as a guitar. In contrast, the other methods becamenoisy and synthetic. Objective evaluation of these same re-sults reveals that both topologies using a first-stage physicalmodel predictor have greater spectral distortion than the caseof WLPC, particularly in the case of the recordings with veryslow decays (i.e., with a high DC loop gain g). In identifyingthe cause of this distortion, we must first consider the modelprediction. The degradation occurs for the following reasonin each of the two topologies.

(A) In the case of the WLPCME, the beating that is causedby the three-dimensional vibration of the string causesconsiderable phase deviation from the phase of themodelled pluck, and the model error often becomesgreater in magnitude than the original signal itself.This leads to a noisier reconstruction by the resynthe-sizer. Additionally, small model parameterization er-rors in pitch and the lack of vibrato in the model resultin phase deviations.

(B) In the case of the WLPCMX, with a low bit rate inthe residual quantization stage of the linear predictor,a small error in coding of the excitation is magnifiedby the resynthesis filter (string model). In addition tothis, as noted in [15], the inverse filter may not havebeen of sufficiently high order to cancel all harmon-ics, and high frequency noise, magnified by the WLPcoding, may have been further shaped by the pluckedstring synthesizer into bright higher harmonics.

The distortion caused by the topology in (A) seems im-possible to improve significantly without using a more com-plex model that considers the three-dimensional vibration ofthe string, such as the model proposed by Valimaki et al. [11]

12

10

8

6

4

2

0

Mea

nB

SD(d

B)

0 0.5 1 1.5 2 2.5

Time (s)

Figure 6: Bark scale spectral distortion (dB) versus time (seconds).WLPC is solid, WLPCMX is dashed-dotted, and WLPCME is thedashed line.

12

10

8

6

4

2

0

Mea

nB

SD(d

B)

1 2 3 4 5 6

Figure 7: Mean Bark scale spectral distortion (dB) using each ofWLPC, WLPCME, and WLPCMX (left to right) for (1) E3 classic,(2) E1 classic, (3) B1 classic (rattle 1), (4) B1 classic (rattle 2), (5) E1electric, and (6) E2 electric. Simulation parameters were BITSA = 4and BITSD = 1.

and previously raised in Section 2. Performance control, suchas vibrato, would also have to be extracted from the input fora locked phase to be achieved in the resynthesized pluck. Thetopology of (B), however, allows for some improvement inthe reconstructed signal quality by compromising betweenthe prediction gain of the first stage and the WLP coding ofthe second stage. More explicitly, if the loop filter gain was tobe decreased, then the cumulative error being introduced bythe quantization in the WLP stage would be correspondinglydecreased.


Such a downwards adjustment of the loop filter gain inorder to minimize coding noise results in a physical modelthat represents a plucked string with an exaggerated decay.This almost makes the physical model prediction stage ap-pear more like the long-term pitch predictor in a more con-ventional linear prediction (LP) codec targeted at speech.However, there is still the critical difference in that the physi-cal model contains the low-pass component of the loop filterand can still be thought of as modelling the behaviour of a(highly damped) guitar string.

To obtain an appropriate value for the loop gain, mul-tiplier tests were run on all six target samples. The electricguitar recordings and the recordings of the classical guitarat E3 represented “ideal” cases; there were no rattles subse-quent to the initial pluck, in addition to negligible changesin pitch throughout their lengths. Amongst the remainingrecordings, the two rattling guitar recordings represented twotimbres very difficult to model without a lengthy excitationor a much more complex model of the guitar string. Themean BSD measure for the electric guitar at E1 is shown inFigure 8.

As can be seen from Figure 8, reducing the loop gainof the physical model predictor increased the performanceof the codec and yielded superior BSD scores for loop gainmultipliers between 0.1 and 0.9. The greater the model mis-match, as in the case of the recordings with rattling strings,the less the string model predictor lowered the mean BSD.Models which did not closely match also featured minimalmean BSDs at lower loop gains (e.g., 0.5 to 0.7). The simu-lation used to produce Figure 7 was performed again usinga single, approximately optimal, loop gain multiplier of 0.7.The results from this simulation are pictured in Figure 9.

The decreased BSD for all the samples in Figure 9 con-firms the efficacy of the two-stage codec. Informal subjec-tive listening tests described briefly at the beginning of thissection also confirmed that decreasing the bit rate reducedthe similarity of the reproduced timbre to the original tim-bre, without obscuring the fact that it was a guitar pluckand without the “thickening” of the mix that occurs due tothe shaped noise in the WLPC codec. This improvement of-fered by the two-stage codec becomes even more noticeableat lower bit rates, such as with a constant 1 bit per samplequantization of WLP residual over both attack and decay.

To evaluate the utility of the proposed WLPCMX, itis important to compare it to the alternatives. Existingpurely signal-based approaches such as MP3 and WLPC haveproven their usefulness for encoding arbitrary wideband au-dio signals at low bit rates while preserving transparent qual-ity. As an example, Harma found that wideband audio couldbe coded using WLPC at 3 bits per sample (= 132.3 [email protected] kHz) for good quality [12]. These models can be im-plemented in real-time with minimal computational over-head, but like sample-based synthesis, do not represent thetransmitted signal parametrically in a form that is related tothe original instrument. Pure signal-based approaches, usingpsychoacoustic models, are thus limited to the extent whichthey can remove psychoacoustically redundant data from anaudio stream.

8

7

6

5

4

3

2

1

0

Mea

nB

SD

0 0.2 0.4 0.6 0.8 1

Loop gain multiplier

Figure 8: Mean Bark scale spectral distortion versus loop gain mul-tiplier. WLPCMX is solid and WLPC is the dashed-dotted line.

6

5

4

3

2

1

0

Mea

nB

SD(d

B)

1 2 3 4 5 6

Figure 9: Mean Bark scale spectral distortion (dB) using each ofWLPC, WLPCMX (left to right) for (1) E3 classic, (2) E1 classic, (3)B1 classic (rattle 1), (4) B1 classic (rattle 2), (5) E1 electric, and (6)E2 electric. Simulation parameters were BITSA = 4 and BITSD = 1.

On the other hand, increasingly complex physical modelscan now reproduce many classes of instruments with excel-lent quality. Assuming a good calibration or, in the best case,a performance made using known physical modelling algo-rithms, transmission of model parameters and continuouscontrollers would result in a bit rate at least an order of mag-nitude lower than the case of pure signal-based methods. Asan example, if we consider an average score file from a mod-ern sequencing program using only virtual instruments andsoftware effects, the file size (including simple instrumentand effect model algorithms) is on the order of 500 kB. For


an average song length of approximately 4 minutes, this leadsto a bit rate of approximately 17 kbps. For optimized scoresand simple instrument models, the bit rate could be lowerthan 1 kbps. Calibration of these complex instrument modelsto resynthesize acoustic instruments remains an obstacle forreal-time use in coding, however. Likewise, parametric mod-els are flexible within the class for which they are designed,but an arbitrary performance may contain elements not sup-ported by the model. Such a performance cannot be repro-duced by the pure physical model and may, indeed, result inpoor model calibration for the performance as a whole.

This preliminary study of the WLPCMX topology offersa compromise between the pure physical-model-based ap-proaches and the pure signal-based approaches. For the caseof the monophonic plucked string considered in this study, alower spectral distortion was realized using the model-basedpredictor. Because more bits were assigned to the attack por-tion of the string recording, the actual long-term bit rate ofthe codec is related to the frequency of plucks, but at its worstcase it is limited by the rate of the WLP stage (assuminga loop gain multiplier of 0) and its best case, given a closematch between model and recording, approaches the physi-cal model case. For recordings that were well modelled by thestring model, such as the electric guitar at E1 and E2 and theE3 classic guitar sample, subjective tests suggested that equiv-alent quality could be achieved with 1 bit per sample less thanthe WLPC case. Limitations of the string model prevent itfrom capturing all the nuances of the recording, such as therattling of the classical guitar’s string, but these unmodelledfeatures are successfully encoded by the WLP stage. Becausethe predictor reflects the acoustics of a plucked string, degra-dation in quality with lower bit rates sounds more natural.

6. CONCLUSIONS

The implementation of a two-stage audio codec using a phys-ical model predictor followed by WLP was simulated and thesubjective and objective sound quality analyzed. Two codectopologies were investigated. In the first topology, the instru-ment response was estimated by windowing the first 200 mil-liseconds of the excitation, and this estimate was subtractedfrom the target sample, with the difference being coded us-ing WLP coding. In the second topology, the excitation tothe plucked string physical model was coded using WLP be-fore being reconstructed by reapplying the coded excitationto the string model shown in Figure 1. Tests revealed that thelimitations of the physical model resulted in model error inthe first topology to be of greater amplitude than the targetsound, and the codec therefore operated with inferior qualityto the WLPC control case.

The second topology, however, showed promise in sub-jective tests whereby a decrease in the bits allocated tothe coding of the decay segment of the excitation reducedthe similarity of the timbre without changing its essentiallikeness to a plucked string. A further simulation was per-formed wherein the loop gain of the physical model was re-duced in order to limit the propagation of the excitation’squantization error due to the physical model’s long-time

constant. This improved objective measures of the soundquality beyond those achieved by the similar WLPC de-sign while maintaining the codec’s advantages exposed bythe subjective tests. Whereas the target plucks became noisywhen coded at 1 bit per sample using WLPC, the allocation ofquantization noise to higher harmonics in the second topol-ogy meant that the same plucks took on a drier, brighter tim-bre when coded at the same bit rate.

WLP can easily be performed in real-time, and it couldthus be applied to coding model excitations in both audiocoders and in real-time instrument synthesizers. Analysis ofpolyphonic scenes is still beyond the scope of the model,however, and the realization of highly polyphonic instru-ments would entail a corresponding increase in computa-tional demands from the WLP in the decoding of the exci-tation.

Future exploration of the two-stage physical model/WLPcoding schemes should be investigated using more accuratephysical models, such as the vertical/transverse string modelmentioned in Section 1, which might allow the first topologyinvestigated in this paper to realize coding gains. Implemen-tation of more complicated models reintroduces, however,the difficulties of accurately parameterizing them—thoughthis increased complexity is partially offset by the increasedtolerance for error that the excitation coding allows.

ACKNOWLEDGMENTS

The authors would like to thank the Japanese Ministry of Ed-ucation, Culture, Sports, Science and Technology for fundingthis research. They are also grateful to Professor Yoshikawafor his guidance throughout, and the students of the SignalProcessing Lab for their assistance, particularly in making theguitar recordings.

REFERENCES

[1] K. Karplus and A. Strong, “Digital synthesis of plucked-stringand drum timbres,” Computer Music Journal, vol. 7, no. 2, pp.43–55, 1983.


[3] C. Erkut, M. Karjalainen, P. Huang, and V. Valimaki, “Acous-tical analysis and model-based sound synthesis of the kantele,”Journal of the Acoustical Society of America, vol. 112, no. 4, pp.1681–1691, 2002.

[4] V. Valimaki, M. Laurson, C. Erkut, and T. Tolonen, “Model-based synthesis of the clavichord,” in Proc. International Com-puter Music Convention, pp. 50–53, Berlin, Germany, August–September 2000.

[5] V. Valimaki and T. Tolonen, “Development and calibration ofa guitar synthesizer,” Journal of the Audio Engineering Society,vol. 46, no. 9, pp. 766–778, 1998.

[6] M. Karjalainen, V. Valimaki, and T. Tolonen, “Plucked-stringmodels: From the Karplus-Strong algorithm to digital waveg-uides and beyond,” Computer Music Journal, vol. 22, no. 3, pp.17–32, 1998.

[7] A. Cemgil and C. Erkut, “Calibration of physical models usingartificial neural networks with application to plucked stringinstruments,” in Proc. International Symposium on MusicalAcoustics, Edinburgh, UK, August 1997.


[8] J. Riionheimo and V. Valimaki, “Parameter estimation of aplucked string synthesis model using a genetic algorithm withperceptual fitness calculation,” EURASIP Journal on AppliedSignal Processing, vol. 2003, no. 8, pp. 791–805, 2003.

[9] B. L. Vercoe, W. G. Gardner, and E. D. Scheirer, “Structuredaudio: Creation, transmission, and rendering of parametricsound representations,” Proceedings of the IEEE, vol. 86, no. 5,pp. 922–940, 1998.

[10] E. D. Scheirer, “Structured audio, Kolmogorov complexity,and generalized audio coding,” IEEE Transactions on Speechand Audio Processing, vol. 9, no. 8, pp. 914–931, 2001.

[11] M. Karjalainen, V. Valimaki, and Z. Janosy, “Towards high-quality sound synthesis of the guitar and string instruments,”in Proc. International Computer Music Conference, pp. 56–63,Tokyo, Japan, 1993.

[12] A. Harma, Audio coding with warped predictive methods, Li-centiate thesis, Helsinki University of Technology, Espoo, Fin-land, 1998.

[13] V. Valimaki and T. Tolonen, “Multirate extensions for model-based synthesis of plucked string instruments,” in Proc. In-ternational Computer Music Conference, pp. 244–247, Thessa-loniki, Greece, September 1997.

[14] D. Jaffe and J. O. Smith, “Extensions of the Karplus-Strongplucked-string algorithm,” Computer Music Journal, vol. 7,no. 2, pp. 56–69, 1983.


[16] A. Harma, M. Karjalainen, L. Savioja, V. Valimaki, U. K.Laine, and J. Huopaniemi, “Frequency-warped signal process-ing for audio applications,” Journal of the Audio EngineeringSociety, vol. 48, no. 11, pp. 1011–1031, 2000.

[17] K. Brandenburg and G. Stoll, “ISO/MPEG-audio codec: ageneric standard for coding of high quality digital audio,”Journal of the Audio Engineering Society, vol. 42, no. 10, pp.780–791, 1994.

[18] K. Tsutsui, H. Suzuki, O. Shimoyoshi, M. Sonohara, K. Aka-giri, and R. M. Heddle, ATRAC: Adaptive transform acousticcoding for MiniDisc, reprinted from the 93rd Audio Engineer-ing Society Convention, San Francisco, Calif, USA, 1992.


Alexis Glass received his B.S.E.E. fromQueen’s University, Kingston, Ontario,Canada in 1998. During his bachelor’sdegree, he interned for nine months atToshiba Semiconductor in Kawasaki, Japan.After graduating, he worked for a defensefirm in Kanata, Ontario and a videogamedeveloper in Montreal, Quebec beforewinning a Monbusho Scholarship from theJapanese government to pursue graduatestudies at Kyushu Institute of Design (KID, now Kyushu University,Graduate School of Design). In 2002, he received his Master’s ofDesign from KID and is currently a doctoral candidate there.His interests include sound, music signal processing, instrumentmodelling, and electronic music.

Kimitoshi Fukudome was born in Kago-shima, Japan in 1943. He received his B.E.,M.E., and Dr.E. degrees from Kyushu Uni-versity in 1966, 1968, and 1988, respectively.He joined Kyushu Institute of Design’s De-partment of Acoustic Design as a ResearchAssociate in 1971 and has been an AssociateProfessor there since 1990. With the Octo-ber 1, 2003 integration of Kyushu Instituteof Design into Kyushu University, his affili-ation has changed to the Department of Acoustic Design, Facultyof Design, Kyushu University. His research interests include digitalsignal processing for 3D sound systems, binaural stereophony, en-gineering acoustics, and direction of arrival (DOA) estimation withsphere-baffled microphone arrays.

Model-Based Sound Synthesis -...

Documents

Transcript of Model-Based Sound Synthesis -...