Objective and subjective comparison of electrodynamic and MAP...

13
Objective and subjective comparison of electrodynamic and MAP loudspeakers for Wave Field Synthesis Etienne Corteel 12 , Khoa-Van NGuyen 1 , Olivier Warusfel 1 , Terence Caulkins 1 , Renato Pellegrini 2 1 IRCAM, 1. Pl. Igor Stravinsky, 75004 Paris, France 2 sonic emotion, Eichweg 4, CH-8154 Oberglatt, Switzerland Correspondence should be addressed to Etienne Corteel ([email protected], [email protected]) ABSTRACT This article deals with the direct comparison of WFS rendering using either MAP or electrodynamic loud- speakers on an objective and a subjective level. Objective criteria are used to evaluate coloration and localisation cues that are perceived in an extended listening area. It is shown in a first listening test that the reduced spatial coherence of MAP loudspeakers partly explains the perceived differences between the two loudspeaker technologies. This ”diffuse” behavior can be artificially produced on electrodynamic loud- speakers using a diffuse filtering. The proposed diffuse filtering may also limit rendering artifacts above the spatial aliasing frequency. Finally, it is shown in a second listening experiment that MAP loudspeakers favor distance perception compared to electrodynamic loudspeakers. 0. INTRODUCTION Wave Field Synthesis (WFS) is a multichannel sound rendering technique that allows for the synthesis of phys- ical properties of sound fields within an extended listen- ing area [1]. It relies on a large number of closely spaced (typically 15-20 cm) loudspeakers (typically 15-20 cm) loudspeakers forming one or several linear an acoustic aperture through which the target sound field (as ema- nating from a target sound source) propagates into the listening environment. Practical implementation of WFS requires simplifica- tions to the underlying physical principles (Kirchhoff- Helmholtz and Rayleigh integrals). Real loudspeakers radiation characteristics may also contribute to alter the synthesized sound field compared to the target one. The perceptual impact of these inaccuracies relates to the more general problem of transparency of the sound ren- dering medium. Ideally, this medium should not be de- tected anywhere inside the listening area so as to create an illusion of non-mediation [2]. In this paper, we con- sider an extended definition of transparency that does not only include timbre but also spatial aspects such as local- ization cues (angular position, distance) as well as spa- tial impression (room-size, reverberance, ...). Moreover, transparency evaluation should account for non-acoustic cues [3] such as the feeling of wearing headphones or seeing loudspeakers. Two types of loudspeakers are used nowadays for Wave Fig. 1: MAP and electrodynamic loudspeakers Field Synthesis (see figure 1): array-mounted electrodynamic loudspeakers, AES 30 TH INTERNATIONAL CONFERENCE, Saariselk¨ a, Finland, 2007 MARCH 15–17 1

Transcript of Objective and subjective comparison of electrodynamic and MAP...

Page 1: Objective and subjective comparison of electrodynamic and MAP …articles.ircam.fr/textes/Corteel07b/index.pdf · 2007-07-23 · 1.1. Directivity Figures 2(a), 2(b), 2(c), and 2(d)

Objective and subjective comparison ofelectrodynamic and MAP loudspeakers forWave Field Synthesis

Etienne Corteel12, Khoa-Van NGuyen1, Olivier Warusfel1, Terence Caulkins1, Renato Pellegrini2

1IRCAM, 1. Pl. Igor Stravinsky, 75004 Paris, France

2sonic emotion, Eichweg 4, CH-8154 Oberglatt, Switzerland

Correspondence should be addressed to Etienne Corteel ([email protected],[email protected])

ABSTRACTThis article deals with the direct comparison of WFS rendering using either MAP or electrodynamic loud-speakers on an objective and a subjective level. Objective criteria are used to evaluate coloration andlocalisation cues that are perceived in an extended listening area. It is shown in a first listening test thatthe reduced spatial coherence of MAP loudspeakers partly explains the perceived differences between thetwo loudspeaker technologies. This ”diffuse” behavior can be artificially produced on electrodynamic loud-speakers using a diffuse filtering. The proposed diffuse filtering may also limit rendering artifacts above thespatial aliasing frequency. Finally, it is shown in a second listening experiment that MAP loudspeakers favordistance perception compared to electrodynamic loudspeakers.

0. INTRODUCTION

Wave Field Synthesis (WFS) is a multichannel soundrendering technique that allows for the synthesis of phys-ical properties of sound fields within an extended listen-ing area [1]. It relies on a large number of closely spaced(typically 15-20 cm) loudspeakers (typically 15-20 cm)loudspeakers forming one or several linear an acousticaperture through which the target sound field (as ema-nating from a target sound source) propagates into thelistening environment.Practical implementation of WFS requires simplifica-tions to the underlying physical principles (Kirchhoff-Helmholtz and Rayleigh integrals). Real loudspeakersradiation characteristics may also contribute to alter thesynthesized sound field compared to the target one. Theperceptual impact of these inaccuracies relates to themore general problem of transparency of the sound ren-dering medium. Ideally, this medium should not be de-tected anywhere inside the listening area so as to createan illusion of non-mediation [2]. In this paper, we con-sider an extended definition of transparency that does notonly include timbre but also spatial aspects such as local-ization cues (angular position, distance) as well as spa-

tial impression (room-size, reverberance, ...). Moreover,transparency evaluation should account for non-acousticcues [3] such as the feeling of wearing headphones orseeing loudspeakers.Two types of loudspeakers are used nowadays for Wave

Fig. 1: MAP and electrodynamic loudspeakers

Field Synthesis (see figure 1):

• array-mounted electrodynamic loudspeakers,

AES 30TH INTERNATIONAL CONFERENCE, Saariselka, Finland, 2007 MARCH 15–171

Page 2: Objective and subjective comparison of electrodynamic and MAP …articles.ircam.fr/textes/Corteel07b/index.pdf · 2007-07-23 · 1.1. Directivity Figures 2(a), 2(b), 2(c), and 2(d)

Corteel et al. Comparison of loudspeakers for WFS

• Multi-Actuator Panels (MAP).

MAP loudspeakers have been recently proposed as an al-ternative to electrodynamic loudspeakers for WFS [4].Thanks to their low visual profile, MAP loudspeakerswere originally thought as a way to facilitate the integra-tion of tens to hundreds of loudspeakers in an existingenvironment. Informal listening tests with researchersand composers, made at IRCAM and at the Forum NeuesMusiktheater in 2005, indicated that MAP loudspeakersmay improve transparency, especially in terms of dis-tance perception. It was thus decided to conduct sys-tematic objective and subjective studies to compare theacoustic properties of both loudspeaker types for WFSrendering.The goal of this paper is to compare, at an objective and asubjective level, the transparency of Wave Field Synthe-sis rendering using electrodynamic or MAP loudspeak-ers. The influence of non-acoustic factors such as visualcues are beyond the scope of this paper.In a first section, the radiation properties (directivity,spatial coherence [5]) of both electrodynamic and MAPloudspeakers are shown. Since MAP loudspeakers relyon Distributed Mode Loudspeaker (DML) technology,they exhibit a ”diffuse” behavior (reduced spatial coher-ence), especially at high frequency. In a second section,diffuse filtering for WFS is introduced. It is meant as away to replicate the diffuse properties of MAP on elec-trodynamic loudspeakers so as to validate their potentialbenefit on transparency. In a third section, simple ob-jective criteria are proposed. They account for acousticdimensions (coloration, localization cues) that may con-tribute to the transparency of the sound reproduction sys-tem within an extended listening area. This objectiveanalysis is finally completed with two subjective listen-ing experiments: on the discrimination of loudspeakerand filtering types in an ABX test, and on the perceiveddistance in a pair-comparison test.

1. LOUDSPEAKERS FOR WAVE FIELD SYN-THESISIn this section, the radiation properties of both types ofloudspeaker are described and compared considering asingle transducer. The electrodynamic loudspeaker ismanufactured by Kef, model KHT 2005. The MAP loud-speaker is manufactured by sonic emotion.

1.1. DirectivityFigures 2(a), 2(b), 2(c), and 2(d) display octave band di-rectivity polar plots of both electrodynamic and MAP

loudspeakers. Measurements were achieved in an ane-choic chamber using 48 regularly spaced microphone po-sitions (7.5◦ precision) at 1.4 m distance from the loud-speakers. In these plots, the levels are normalized in ref-erence to the frontal direction (0◦) so as to display onlydirectivity information. Figures 2(a) and 2(b) display po-lar plots at ”low frequencies” (125 Hz, 250 Hz, 500 Hz,1000 Hz). Figures 2(c) and 2(d) display polar plots at”high frequencies” (2 kHz, 4 kHz, 8 kHz, 16 kHz).MAP loudspeakers have more complex directivity prop-erties at low frequencies compared to electrodynamicloudspeakers. This can be problematic for WFS since thetheory assumes that loudspeakers have ideal omnidirec-tional behavior. Multichannel equalization may be usedto compensate for such non ideal loudspeakers directiv-ity [6] [7]. This method also reduces artifacts inherent toWFS at low frequencies (near-field artifacts, diffraction).At higher frequencies electrodynamic loudspeakers be-come more and more directive whereas MAP loudspeak-ers exhibit an irregular although more ”omnidirectional”behavior in average.

−40

−40

−30

−30

−20

−20

−10

−10

0

0

10 dB

10 dB

90o

60o

30o0o

−30o

−60o

−90o

−120o

−150o

180o150o

120o

125 Hz250 Hz500 Hz1 kHz

(a) Electrodynamic, low fre-quencies

−40

−40

−30

−30

−20

−20

−10

−10

0

0

10 dB

10 dB

90o

60o

30o0o

−30o

−60o

−90o

−120o

−150o

180o150o

120o

125 Hz250 Hz500 Hz1 kHz

(b) MAP, single exciter, lowfrequencies

−40

−40

−30

−30

−20

−20

−10

−10

0

0

10 dB

10 dB

90o

60o

30o0o

−30o

−60o

−90o

−120o

−150o

180o150o

120o

2 kHz4 kHz8 kHz16 kHz

(c) Electrodynamic, high fre-quencies

−40

−40

−30

−30

−20

−20

−10

−10

0

0

10 dB

10 dB

90o

60o

30o0o

−30o

−60o

−90o

−120o

−150o

180o150o

120o

2 kHz4 kHz8 kHz16 kHz

(d) MAP, single exciter, highfrequencies

Fig. 2: Loudspeaker radiation pattern

1.2. Spatial coherence

Figures 3(a), 3(b), 3(c), and 3(d) display octave bandCross-Correlation Function (CCF) polar plots of bothelectrodynamic and MAP loudspeakers. CCF polar plotswere introduced by Gontcharov and Hill in [5] to de-

AES 30TH INTERNATIONAL CONFERENCE, Saariselka, Finland, 2007 MARCH 15–17Page 2 of 13

Page 3: Objective and subjective comparison of electrodynamic and MAP …articles.ircam.fr/textes/Corteel07b/index.pdf · 2007-07-23 · 1.1. Directivity Figures 2(a), 2(b), 2(c), and 2(d)

Corteel et al. Comparison of loudspeakers for WFS

scribe diffuse radiation properties of Distributed ModeLoudspeakers (DML). They are obtained by extractingthe maximum of the cross-correlation function calculatedfrom the pair of band-pass filtered impulse responsesmeasured in the considered and the frontal (0◦) direction.Such a quantity thus provides an estimate of the ”spatialcorrelation” of the loudspeaker radiation.Figures 3(a) and 3(b) display polar plots at ”low frequen-cies” (125 Hz, 250 Hz, 500 Hz, 1000 Hz). Figures 3(c)and 3(d) display polar plots at ”high frequencies” (2 kHz,4 kHz, 8 kHz, 16 kHz).Both MAP and electrodynamic loudspeakers have highspatial correlation at 125 Hz and 250 Hz. For MAP loud-speakers, CCF values reduce above 500 Hz due to themore complex directivity characteristics of these loud-speakers and their ”diffuse” properties. This is poten-tially detrimental for WFS which relies on phase inter-ference of coherent wave fronts emitted by loudspeakers.

0.2

0.2

0.4

0.4

0.6

0.6

0.8

0.8

1

1

90o

60o

30o0o

−30o

−60o

−90o

−120o

−150o

180o150o

120o

125 Hz250 Hz500 Hz1 kHz

(a) Electrodynamic, low fre-quencies

0.2

0.2

0.4

0.4

0.6

0.6

0.8

0.8

1

1

90o

60o

30o0o

−30o

−60o

−90o

−120o

−150o

180o150o

120o

125 Hz250 Hz500 Hz1 kHz

(b) MAP, single exciter, lowfrequencies

0.2

0.2

0.4

0.4

0.6

0.6

0.8

0.8

1

1

90o

60o

30o0o

−30o

−60o

−90o

−120o

−150o

180o150o

120o

2 kHz4 kHz8 kHz16 kHz

(c) Electrodynamic, high fre-quencies

0.2

0.2

0.4

0.4

0.6

0.6

0.8

0.8

1

1

90o

60o

30o0o

−30o

−60o

−90o

−120o

−150o

180o150o

120o

2 kHz4 kHz8 kHz16 kHz

(d) MAP, single exciter, highfrequencies

Fig. 3: Loudspeaker spatial correlation pattern

2. DIFFUSE FILTERING AND ITS CONSE-QUENCES ON THE SPATIAL RESPONSE OFTHE LOUDSPEAKER ARRAYIn this section, diffuse filtering for WFS rendering isfirst introduced. The impact of diffuse filtering is thenevaluated by computing the spatial response of the loud-speaker array. In this section, ideal omnidirectional loud-speakers are used.

2.1. Diffuse filtering

Figures 4(a), 4(b) and 4(c) display impulse responses offilters that may be used for WFS rendering. The impulseresponse of figure 4(a) is a ”classical” WFS filter (a de-layed, possibly attenuated dirac pulse).Figure 4(b) displays the impulse response of a ”diffuse”filter that will be referred to as ”Full Diffuse” (FD). Thisfilter is generated from a time limited white noise thatis generated independently for each loudspeakers in or-der to obtain uncorrelated outputs. The temporal enve-lope of the noise is modified by applying a window witha sharp attack and a decay slope. The short length ofthe noise and the modification of its temporal structuremay introduce coloration artifacts. A whitening processis thus applied. It consists in adjusting the level in audi-tory (ERBN) frequency bands [8].A third type of filter is displayed in figure 4(c). It appearsas a combination of the ”discrete” and the ”diffuse” fil-ter and is therefore referred to as Discrete-Diffuse (DD)filter. This filter is designed in such a way that half ofthe energy is provided by the discrete part, the other halfby the diffuse part. This defines a more general class ofdiffuse filtering for which the total energy is divided intothe discrete and the diffuse part.

0 10 20−80

−60

−40

−20

0log response

Leve

l in

dB

Time in ms

(a) Discrete filter

0 10 20−80

−60

−40

−20

0log response

Leve

l in

dB

Time in ms

(b) Diffuse filter (FD, Full Dif-fuse)

0 10 20−80

−60

−40

−20

0log response

Leve

l in

dB

Time in ms

(c) Discrete-diffuse filter (DD)

Fig. 4: Various diffusing filters used at high frequencies

2.2. Spatial response

The spatial response of a loudspeaker array may be ob-

AES 30TH INTERNATIONAL CONFERENCE, Saariselka, Finland, 2007 MARCH 15–17Page 3 of 13

Page 4: Objective and subjective comparison of electrodynamic and MAP …articles.ircam.fr/textes/Corteel07b/index.pdf · 2007-07-23 · 1.1. Directivity Figures 2(a), 2(b), 2(c), and 2(d)

Corteel et al. Comparison of loudspeakers for WFS

−6 −4 −2 0 2 4 6

0

2

4

6

8

10

x position, in m

y po

sitio

n, in

m

Fig. 5: Test configuration, loudspeakers (black *), mi-crophones (red o), virtual source (blue .), top view.

tained by first measuring (simulating) each channel on alinear microphone array parallel to the loudspeaker array.The measured (or simulated) impulse responses are thenconvolved with the calculated filters for the synthesis ofa given source. The response of the loudspeaker arrayis finally obtained by summing the contributions of allloudspeakers at each measurement position.We consider here a linear array comprising 48 ideal om-nidirectional loudspeakers with 15 cm spacing (see fig-ure 5). The microphone array is composed of 96 idealomnidirectional microphones with 10 cm spacing. It issituated 2 m from the loudspeaker array. The target vir-tual source is centered and located 6 m behind the loud-speaker array (see figure 5). Filters are calculated at lowfrequencies using the previously mentioned multichan-nel equalization method [7].One of the main limitations of Wave Field Synthesis isknown as spatial aliasing. Spatial aliasing is due to thespatial sampling of the loudspeaker distribution so as tolimit the number of channels [9]. The correspondingNyquist frequency is referred to as the spatial aliasingfrequency. Below the spatial aliasing frequency, contri-butions from all loudspeakers of the loudspeaker arrayfuse into a single target wave front. This is not the caseabove the aliasing frequency and the sound field cannotbe controlled in an extended listening area.Above the aliasing frequency, the three type of filters areused (Discrete, Full-Diffuse FD, Discrete-Diffuse DD).The combination of multichannel equalization at low fre-quencies and discrete filter above the aliasing frequencyis normally used for multichannel equalization [7]. Thisfiltering process will therefore be referred to as ”MEQ”.

2.2.1. Impulse response

Figures 6(a), 6(b), and 6(c) display spatial temporal re-sponse of the loudspeaker array. It can be seen that it

(a) Discrete filter (b) Diffuse filter (FD, Full Dif-fuse)

(c) Discrete-diffuse filter (DD)

Fig. 6: Impulse responses, configuration displayed in fig-ure 5, dependency on diffuse filter characteristics

significantly differs from a delayed and attenuated diracpulse (ideal response). Above the spatial aliasing fre-quency, all loudspeaker contributions create a temporallyspread impulse response. However, it can be shown thatthe first wave front (attack of the response) remains con-sistent with the expected propagation time given the dis-tance between the virtual source and the measurementposition.For the MEQ filter, the length of the non-null temporalresponse (effective length) varies from 10 ms at centerpositions to almost 20 ms to the sides (see figure 6(a)).This is related to the finite length of the loudspeaker ar-ray [10] [6].For DD and FD filters the temporal response is longer(about 25 ms along asides) because of the temporallyspread response of the filters (see figures 6(b) and 6(c)).However, the attack of the DD filter response attack ap-pears somewhat sharper than that of the FD filter re-sponse because of its ”discrete” component.

2.2.2. Frequency response

Figures 7(a), 7(b), and 7(c) display the spatial frequencyresponse of the loudspeaker array for the three types offiltering. The influence of the absolute level at each po-sition has been removed by the applying a normalizationfactor. These responses therefore only display the devia-tion from the target level (0 dB).

AES 30TH INTERNATIONAL CONFERENCE, Saariselka, Finland, 2007 MARCH 15–17Page 4 of 13

Page 5: Objective and subjective comparison of electrodynamic and MAP …articles.ircam.fr/textes/Corteel07b/index.pdf · 2007-07-23 · 1.1. Directivity Figures 2(a), 2(b), 2(c), and 2(d)

Corteel et al. Comparison of loudspeakers for WFS

(a) Discrete filter

(b) Diffuse filter (FD, Full Dif-fuse)

(c) Diffuse-discrete filter (DD)

Fig. 7: Frequency responses, level in dB, color scale in-dicated in the color bar on top of each figure, configu-ration displayed in figure 5, dependency on diffuse filtercharacteristics

It can be seen that, at low frequencies, the response isalmost flat at all microphone positions with only smalloscillations that remain within± 0.5 dB. Above the alias-ing frequency spatial responses are more disturbed. Theyexhibit a ”fast varying” response along both frequencyand space axis. Above the aliasing frequency, interfer-ence patterns can easily be noticed for the MEQ filter(see figure 7(a)). These may be detected with listener’smovements within the listening area [11]. These patternsdo not appear with DD and FD filters (see figures 7(b)and 7(c)) since the diffuse filtering limits correlation ofthe loudspeakers’ contributions to the synthesized soundfield. In the case of the DD filter, only half of the en-ergy is ”diffuse” but this seems to be sufficient to avoidvisually noticeable patterns in the spatial frequency re-sponse. This could be verified in informal listening tests(cf. section 3.3.3).

3. OBJECTIVE COMPARISON

In this section, we propose an evaluation of the render-ing system transparency focusing on two acoustic dimen-sions: timbre and angular localisation. The evaluationrelies on objective criteria calculated from free-field spa-tial response of loudspeaker arrays. These objective cri-

teria are derived from psychoacoustical experiments andphysiological studies.

3.1. Test setup

−4 −2 0 2 40

2

4

6

8

10

12

X position, in m

Y p

ositi

on, i

n m

Left sourceRight source

2 m 3 m

Fig. 8: Test setup: loudspeakers (black *), microphones(red 0), source (blue .), top view

The loudspeaker array layout is identical to the 48 chan-nel system described in the previous section. Electro-dynamic or MAP loudspeakers are compared using thesame configuration. Both ”real” arrays are measured ina large room sufficiently far away from any reflectingsurface so as to window out any reflected contributionsand therefore extract the free-field radiation of each loud-speaker. A 24 each loudspeaker. A 24 channel, 2.4 mlong, microphone array is spanning 9.6 m along 2 linesparallel to the loudspeaker array at 2 and 3 m distance(see figure 8).Two test sources are studied in this experiment (figure8). 8). The ”left source” is positioned 3 m to the leftloudspeaker array. The ”right source” is positioned 3 mto the right and 8 m behind the loudspeaker array. Filtersare designed using the multichannel equalization methoddescribed in [6] and [7] based on the free-field radiationmeasurements at 2 m. Diffuse energy is introduced in thefilters only above the aliasing frequency.The spatial response of the loudspeaker arrays is thenestimated from the calculated filters convolved with thefree field measurements at 3 m distance. As a compar-ison, we will also consider simulated responses of anidentical array composed of ideal omnidirectional loud-speakers at the same distance.Given the test sources, the loudspeaker array geometryand the listening positions, one may expect an averagespatial aliasing frequency of about 1200 Hz. This cornerfrequency is used throughout the objective comparisonas a limit between ”low” frequency and ”high” frequencyanalysis.

AES 30TH INTERNATIONAL CONFERENCE, Saariselka, Finland, 2007 MARCH 15–17Page 5 of 13

Page 6: Objective and subjective comparison of electrodynamic and MAP …articles.ircam.fr/textes/Corteel07b/index.pdf · 2007-07-23 · 1.1. Directivity Figures 2(a), 2(b), 2(c), and 2(d)

Corteel et al. Comparison of loudspeakers for WFS

3.2. Estimation model

We propose to evaluate and compare the impulse re-sponse of the system hΨ(x,y, t) (HΨ(x,y, f ) in the fre-quency domain) with an ”ideal” WFS response aΨ(x,y, t)(AΨ(x,y, f ) in the frequency domain) corresponding toan infinite continuous linear array (i.e. no aliasing, nodiffraction) which is expressed as:

a(x,y, t) =

√dL

M +dLΨ

dLM

1dM

Ψδ

(t− dM

Ψc

+ τeq

), (1)

where dLM is the distance between the listening position

(x, y) and the loudspeaker array, dLΨ the distance of the

virtual source Ψ to the loudspeaker array, and dMΨ the

distance between the virtual source and the listening po-sition. This expression accounts for the propagation timeof waves emitted by the virtual source and the modifiedattenuation law due to the use of a linear loudspeaker ar-ray for WFS. The same formula defines the target soundfield in the multichannel equalization method [7]. The ydependency will be omitted for clarity since all responsesare evaluated at 3 m from the loudspeaker array.

A direct comparison between the cues associated to theideal (a(x, t)) and synthesized (h(x, t)) sound field mayallow to determine acoustic factors that influence thetransparency of the sound reproduction medium.Alternatively, a quality function qΨ(x, t) (QΨ(x, f ) in thefrequency domain) may be defined as:

QΨ(x, f ) =HΨ(x, f )AΨ(x, f )

(2)

Ideally, the time domain quality function should be adirac (constant level of 1 in the frequency domain). Itincorporates all deviations (time and frequency based)between the ideal and synthesized response, and gets ridof absolute level and propagation time.

3.3. Coloration

The coloration analysis employs criteria derived fromobjective/subjective studies based on deviations of thefrequency response. We study both the coloration whichmay be perceived at a fixed position and the spatial colorvariation while wandering in the sound installation.

3.3.1. Coloration at a fixed position

An objective criterion of coloration was recently pro-posed by Moore and Tan in [12]. This criterion estimatescoloration as the variation of excitation level (energy in

frequency bands) across ERBN bands in an impulse re-sponse measured at a given position. For the sake of sim-plicity, we extract excitation levels in ERBN band i fromthe quality function QΨ(x, f ) at position x as:

EQi(x) = 10log10

∫ c f (i+0.5)c f (i−0.5) |QΨ(x, f )|2d f

c f (i+0.5)− c f (i−0.5)

, (3)

where c f (i) is the center frequency of ERBN band i.A first factor D1 is defined as the standard deviation ofexcitation levels across ERBN bands:

D1(x) = σ(W (i)×EQi(x)), (4)

where W (i) are weights that are applied to account forthe limited importance of certain frequency bands in thecoloration estimation (below ∼ 100 Hz and above ∼ 10kHz).A second factor D2 is defined as the standard deviationof the difference between excitation levels in successiveERBN bands:

D2(x) = σ(W (i)× (EQi+1(x)−EQi(x))). (5)

D1 allows to observe general variations around the meandifference across frequency whereas D2 estimates spec-tral ripple density. The overall weighted excitation pat-tern difference D is used to measure the coloration intro-duced by an electroacoustical system. It is defined as aweighted combination of D1 and D2:

D(x) = w×D1(x)+(1−w)×D2(x), (6)

where w is set to 0.4 [12].

Figure 9 shows D values estimated from the loudspeakerarray response at 3 m for the synthesis of both sources.The upper left graph shows mean values of D calculatedover all frequency bands. The upper right graph showsstandard deviation of D across microphone positions. Itcan thus be seen that D values vary little across micro-phone positions. There is no significant difference be-tween DD and FD filtering. These type of filtering how-ever give lower D values than MEQ filtering. The small-est D values are obtained for ideal loudspeakers. For”real” transducers, they are lower for MAP than for elec-trodynamic loudspeakers when using MEQ filters but theopposite occurs for the DD and FD filters. However,these differences are very small (∼ 0.1 dB).Similar tendencies are observed for mean values of D

AES 30TH INTERNATIONAL CONFERENCE, Saariselka, Finland, 2007 MARCH 15–17Page 6 of 13

Page 7: Objective and subjective comparison of electrodynamic and MAP …articles.ircam.fr/textes/Corteel07b/index.pdf · 2007-07-23 · 1.1. Directivity Figures 2(a), 2(b), 2(c), and 2(d)

Corteel et al. Comparison of loudspeakers for WFS

Id Elec MAP0

2

4

D v

alue

, in

dB

Mean, all frequencies

Id Elec MAP0

2

4

D v

alue

, in

dB

Std, all frequencies

MEQFDDD

Id Elec MAP0

2

4

D v

alue

, in

dB

Mean, below 1200 Hz

Id Elec MAP0

2

4

D v

alue

, in

dBMean, above 1200 Hz

Fig. 9: D factor dependency on loudspeaker and filter-ing types, calculated from spatial responses obtained at 3m from the loudspeaker array, Id: ideal omnidirectionalloudspeakers, Elec: electrodynamic loudspeakers

obtained by selecting frequency bands above 1200 Hz(lower right part of figure 9). The lower left part of figure9 shows D values calculated using frequency bands be-low 1200 Hz. In this frequency range, ideal loudspeakershave very low D values. However, these values increasewith electrodynamic (D∼ 0.8 dB) and MAP loudspeak-ers (D∼ 1.4 dB). This is due to their non ideal radiationcharacteristics that can only be partly compensated forusing the multichannel equalization method [7].

3.3.2. Spatial color variation

Sound color variations may be experienced while mov-ing the head or deambulating within the sound installa-tion. This may happen in WFS installations, especiallyabove the aliasing frequency where frequency/spatialpatterns may appear (see figure 7(a)).De Bruijn introduced the Spatial Color Variation Index(SCVI) [11]. We propose here a modified criterion ex-pressed as:

SCV I(x) =1

2× J

J

∑j=−J

√I

∑i=1

(EQi(x)−EQi(x+ j∆x))2.

(7)We define J = 2 and ∆x = 10 cm. This criterion is basedon differences in ERBN band i between excitation levelsEQi(x) at a position x and excitation levels EQi(x+ j∆x)at 4 positions on each side of the position x. The finalcriterion is then obtained by computing the mean valuefor all frequency bands.

Id Elec MAP0

2

4

SC

VI v

alue

, in

dB

Mean, all frequencies

Id Elec MAP0

2

4

SC

VI v

alue

, in

dB

Std, all frequencies

MEQFDDD

Id Elec MAP0

2

4

SC

VI v

alue

, in

dB

Mean, below 1200 Hz

Id Elec MAP0

2

4

SC

VI v

alue

, in

dB

Mean, above 1200 Hz

Fig. 10: SCVI dependency on loudspeaker and filter-ing types, calculated from spatial responses obtained at 3m from the loudspeaker array, Id: ideal omnidirectionalloudspeakers, Elec: electrodynamic loudspeakers

Compared to de Bruijn’s definition, this criterion extractssymmetrical positions on each side of the average listen-ing position instead of relative positions in the same di-rection. It therefore avoids asymmetries of SCVI valuesalong the x axis that could be noticed with the originalcriterion [11]. We also use the quality function insteadof the original response of the loudspeaker array. Thisremoves the influence of absolute level in the calculationthat should not be accounted for in the calculation of acoloration index.

Figure 10 shows SCVI values estimated from the theloudspeaker array response at 3 m for both sources. fig-ure the same as that of figure 9. We also observe similartendencies:

• little or no position dependency,

• values are almost null at low frequencies and higher(SCVI ∼ 3 dB) above 1200 Hz,

• diffusion reduces values of SCVI above 1200 Hzbut there is no significant difference between DDand FD,

3.3.3. Discussion

The D and SCVI factors tend to show that there is a pos-itive impact of diffusion on coloration above the aliasing

AES 30TH INTERNATIONAL CONFERENCE, Saariselka, Finland, 2007 MARCH 15–17Page 7 of 13

Page 8: Objective and subjective comparison of electrodynamic and MAP …articles.ircam.fr/textes/Corteel07b/index.pdf · 2007-07-23 · 1.1. Directivity Figures 2(a), 2(b), 2(c), and 2(d)

Corteel et al. Comparison of loudspeakers for WFS

frequency. Diffusion may be introduced by natural prop-erties of MAP loudspeakers or diffuse filtering. Only acertain amount of diffusion seems to be necessary. Usingthe FD filter which contains only ”diffuse” energy doesnot decrease D or SCVI values compared to the DD fil-ter.However, the employed criteria are still questionablesince they do not account for time and only partly forspace (SVCI). Time related factors may be particularlyimportant for WFS since the ”effective length” of im-pulse responses may approach 10 to even 20 ms. In-formal listening tests showed that using the FD filter onelectrodynamic loudspeakers provides clearly noticeablecoloration artifacts at high frequencies which tend to dis-appear almost completely when using DD filtering.Similarly, above the aliasing frequency, the frequency re-sponse varies significantly given small changes of micro-phone position (5 to 10 cm)[13]. Audible and possiblydisturbing coloration changes are experienced with MEQfiltering of the electrodynamic array while wandering inthe sound installation if pink noise is used as an inputsignal. These coloration changes are reduced using thesame type of filter on MAP loudspeakers and become al-most inaudible using the DD filter on electrodynamic orMAP loudspeakers.

3.4. Spatial cues

The analysis proposed here considers a deviation estima-tion of time-related localisation cues. It is not intended asa localization model. Its role is rather to provide an eval-uation of the deviation from ”ideal” localization cues.The evaluation is performed on the free field measure-ments using omnidirectional microphones. 94 pairs ofmicrophones with 20 cm spacing are extracted fromthe microphone array. This corresponds to a simplifiedmodel of the localization cues provided to a listener fac-ing the loudspeaker array.Temporal localization cues are then estimated in ERBNfrequency bands from the loudspeaker array response oneach microphone pair. The normalized interaural cross-correlation function at position x in frequency band i(ICFi(x)) is calculated as:

ICFi(x,τ) =∫ t0

0 hi(x+∆x, t)hi(x−∆x, t + τ)dt√∫ t00 [hi(x+∆x, t)]2dt

∫ t00 [hi(x−∆x, t)]2dt

(8)where ∆x = 10 cm and hi(x, t) is the band-pass filtered re-sponse in band i of the loudspeaker array response h(x, t)

using Gammatone filters [14]. In total 24 frequencychannels are used to cover the audible range.The Interaural Correlation coefficient ICi(x) can be ex-tracted in frequency band i as:

ICi(x) = maxτ

ICFi(x,τ). (9)

The corresponding Interaural Time Difference (ITD) infrequency band i is then defined as:

IT Di(x) = {τ | ICFi(x,τ) = ICi(x)} (10)

The latter is the only known human localization cue forlow and mid frequencies (below 500 Hz) [15]. It is ex-tracted from phase comparison between signals enteringthe left and the right ears. Above about 1200 Hz, ICi(x)and IT Di(x) are estimated from the signal envelope toaccount for fiber transduction in the auditory system. Aclassical way to extract the signal envelope and mimicthe auditory system is to use half wave rectification andlow-pass filtering (below 800 Hz) [15].

3.4.1. Interaural correlation

Figure 11 displays interaural correlation values belowand above 1200 Hz (mean aliasing frequency). Mean val-ues are computed from frequency bands below and above1200 Hz for both virtual sources and all microphonepairs (upper left and right part of figure 11). Standarddeviation values are computed across frequency and av-eraged for both virtual sources and all microphone pairs(lower left and right part of figure 11). Values are dis-played for the 3 types of filtering and loudspeaker.It can be seen that, as expected, the interaural correlationis very high below 1200 Hz (IC∼ 1) independently of theconsidered frequency band (std ∼ 0). Above the aliasingfrequency, IC values are still high. They are even closeto 1 for the MEQ filter (IC ∼ 0.95) and remain around0.9 for the DD filter and 0.85 for the FD filter. Moreover,they vary little across frequency (std ∼ 0.05).It should be noted that mean IC values are similar for allloudspeaker types given a filtering type. Nonetheless itcan be noted that for the MEQ filter, IC values increasefrom MAP (IC=0.94) to electrodynamic (IC = 0.95) toideal (IC = 0.96).

3.4.2. Interaural time difference

Figure 12 displays ITD errors calculated below andabove the corner frequency 1200 Hz. The ITD error iscomputed by subtracting ITD values estimated off of theideal spatial response from ITD values estimated off of

AES 30TH INTERNATIONAL CONFERENCE, Saariselka, Finland, 2007 MARCH 15–17Page 8 of 13

Page 9: Objective and subjective comparison of electrodynamic and MAP …articles.ircam.fr/textes/Corteel07b/index.pdf · 2007-07-23 · 1.1. Directivity Figures 2(a), 2(b), 2(c), and 2(d)

Corteel et al. Comparison of loudspeakers for WFS

Id Elec MAP0

0.2

0.4

0.6

0.8

1

IAC

C

Mean, below 1200 Hz

Id Elec MAP0

0.2

0.4

0.6

0.8

1

IAC

C

Std, below 1200 Hz

MEQFDDD

Id Elec MAP0

0.2

0.4

0.6

0.8

1

IAC

C

Mean, above 1200 Hz

Id Elec MAP0

0.2

0.4

0.6

0.8

1

IAC

C

Std, above 1200 Hz

Fig. 11: IACC below and above 1200 Hz, Id: ideal om-nidirectional loudspeakers, Elec: electrodynamic loud-speakers

the response being considered. The definition of ”mean”and ”std” in figure 12 are the same as in the previous part.It can be seen that for both frequency regions (belowand above 1200 Hz), the mean ITD error remains closeto zero independently of filtering and loudspeaker type.The standard deviation value of ITD error at low frequen-cies is also close to 0. However, this value significantlydiffer from 0 above 1200 Hz. The standard deviationdepend on loudspeaker and filtering type. Lower valuesof standard deviation are obtained for ideal omnidirec-tional loudspeakers (MEQ: 0.14 ms, DD: 0.24 ms) thanfor electrodynamic (MEQ: 0.2 ms, DD: 0.26 ms) andMAP loudspeakers (MEQ: 0.21 ms, DD: 0.28 ms).

3.4.3. Discussion

This analysis shows that, despite the complexity of thesynthesized sound field above the aliasing frequency,the provided time-based localization cues remain at leastpartly consistent with target ones. IC values are high andmean ITD error is almost null. This can be attributedto the fact that the first peak of the impulse responsewhich corresponds the loudspeaker located in the direc-tion of the virtual source generates the first peak of theimpulse response (verified at all listening positions if thevirtual source is situated behind the loudspeaker array).For virtual sources within the listening environment (fo-cused sources), the situation is different. In this case, thefirst loudspeaker contribution may come from the side ofthe loudspeaker array though at a fairly low level. More

Id Elec MAP−0.5

−0.25

0

0.25

0.5

itd e

rror

, in

ms

Mean, below 1200 Hz

Id Elec MAP0

0.2

0.4

0.6

itd e

rror

, in

ms

Std, below 1200 Hz

MEQFDDD

Id Elec MAP−0.5

−0.25

0

0.25

0.5

itd e

rror

, in

ms

Mean, above 1200 Hz

Id Elec MAP0

0.2

0.4

0.6

itd e

rror

, in

ms

Std, above 1200 Hz

Fig. 12: ITD error below and above 1200 Hz, Id:ideal omnidirectional loudspeakers, Elec: electrody-namic loudspeakers

studies are required to evaluate these types of sources.The perceptual consequences of the variability of the es-timated ITD at high frequencies have yet to be evalu-ated. They should be accompanied by an evaluation ofInteraural Level Difference (ILD) in extended area us-ing dummy head measurements. Early experiments wereachieved in a reduced number of frequency bands in [6].They show potential deviations of ILD for lateral listen-ing positions. However, such inaccuracies may not be animpediment to localization of broadband signals exhibit-ing strong low frequency content. After Wightman andKistler [16], low frequency ITD may dominate localiza-tion estimation for such signal. This is consistent withsound localization experiments conducted by Start. Inhis PhD [9], he showed a good match between localiza-tion performances for real and WFS reproduced sourcesusing broadband white noise. In the same run of exper-iments, Start used high-pass filtered white noise (above1200 Hz). He showed an increased mean error (5◦ in-stead of 2.5◦) and standard deviation (3◦ instead of 1.5◦).This error remains however limited and may be assimi-lated to an increased Auditory Source Width (ASW).ASW is classically estimated using IC values. For a re-view, the reader is referred to [17]. In the present study,the observed deviation of IC from 1 within frequencybands is very limited considering the MEQ filter (IC ∼0.95) and appears only at high frequencies (above 1200Hz). Such values are close to the Just Noticeable Dif-ference (JND) and little is known about strictly high fre-

AES 30TH INTERNATIONAL CONFERENCE, Saariselka, Finland, 2007 MARCH 15–17Page 9 of 13

Page 10: Objective and subjective comparison of electrodynamic and MAP …articles.ircam.fr/textes/Corteel07b/index.pdf · 2007-07-23 · 1.1. Directivity Figures 2(a), 2(b), 2(c), and 2(d)

Corteel et al. Comparison of loudspeakers for WFS

quency variations of IC. More variations can be noticedat high frequencies on the estimated ITD. These may alsopoint to an enhanced ASW. Experiments on similar crite-ria were recently achieved by Hirvonen and Pulkki [18].They divided a signal into frequency bands that were pre-sented using horizontally distributed loudspeakers in ananechoic environment. These test signals elicited differ-ent ITD values in each frequency band with strong in-teraural correlation. However, their study is limited tolow frequencies (below 1200 Hz) and would need to beextended to higher frequencies.

4. SUBJECTIVE COMPARISON

In this section, we present two subjective experimentsthat were carried out during August 2006 at IRCAM. Thefirst experiment is a basic discrimination test on loud-speaker and filtering type. The second experiment is apair comparison test on perceived distance which wasquoted by the subjects as a first order attribute duringthe discrimination test.

4.1. Test setup

Fig. 13: Test setup

The test setup used for the listening test is composed oftwo arrays on top of each other (MAPs above, electrody-namic below, see figure 13). This arrangement is chosenso as to minimize potential discrimination of loudspeak-ers with height difference. They are positioned in theEspace de Projection, a 22(l)×15(w)×11(h) m3 variableacoustic concert hall at IRCAM. All surfaces (periactes)are set to absorptive (walls and ceiling). The obtained re-verberation time is therefore below 1 s at all frequencies.The choice of a 4.5 m distance between the subject andthe loudspeakers is a tradeoff. On one hand, the distanceshould be sufficiently small distance so as to limit the in-fluence of the room effect. On the other hand, it shouldbe sufficiently large so as to avoid discrimination based

on elevation difference. The apparent elevation differ-ence of both loudspeaker arrays is 5 degrees which isbelow the localization blur for frontal position reportedby Blauert in [15] (±9◦).Subjects are not centered but positioned 1.5 m to the left.An acoustically transparent curtain is placed between thesubject and the loudspeaker array. To further limit visualfeedback, all lights are dimmed. The main light sourceis the computer screen situated in front of the listener,approximately 40 cm below the ear level.

4.2. Discrimination test

4.2.1. Test protocol

The subjective discrimination between loudspeakersand filtering types was evaluated using an ABX test.All 6 combinations of loudspeaker (E: electrodynamic,M:MAPs) and filtering types (MEQ, DD, FD) were usedforming 15 comparison pairs. 60 triplets were thenformed using 4 replicates according to the ABX test pro-tocol (ABB, ABA, BAB, BAA).Each triplet was presented at the two test virtual sourcepositions depicted in figure 8 using two different soundmaterials (voice and guitar). The total of 240 triplets (60”ABX” triplets × 2 positions × 2 sound materials) werepresented in random order. The triplets were given onlyonce, no repetition was possible. The presentation of onetriplet took approximately 10 s (three times about 2.5 s,1 s between configurations).The 14 subjects completed the test in 1 hour on average.They had an initial training session with a maximum of30 stimuli. Subjects were free to quit this training sessionbefore end.

4.2.2. Results

Figures 14(a) and 14(b) show mean discrimination ratesfor all stimuli pairs. Figure 14(a) shows within loud-speaker type results for different filtering. It can gen-erally be seen that the discrimination rate is only slightlythough significantly above chance level and that it isgenerally higher for electrodynamic than for MAP loud-speakers (about 5% higher). As expected, discriminationrate is higher for the pair ”MEQ versus FD” than for theother two combinations.

Figure 14(b) shows discrimination rates for pairs hav-ing both different loudspeaker and filtering types. It canbe seen that MAP and electrodynamic loudspeakers areclearly discriminated (80 to 95 % discrimination rate).Highest discrimination rates are obtained for MAP withFD filtering versus electrodynamic loudspeakers with

AES 30TH INTERNATIONAL CONFERENCE, Saariselka, Finland, 2007 MARCH 15–17Page 10 of 13

Page 11: Objective and subjective comparison of electrodynamic and MAP …articles.ircam.fr/textes/Corteel07b/index.pdf · 2007-07-23 · 1.1. Directivity Figures 2(a), 2(b), 2(c), and 2(d)

Corteel et al. Comparison of loudspeakers for WFS

MEQvsDD DDvsFD MEQvsFD50

60

70

80

90

100

disc

rimin

atio

n ra

te, i

n %

ElecMAP

(a) Filtering dependency (MEQversus DD, FD versus DD,MEQ versus DD)

MMEQ MDD MFD50

60

70

80

90

100

disc

rimin

atio

n ra

te, i

n %

EMEQEDDEFD

(b) Filtering loudspeaker de-pendence

Fig. 14: ABX test discrimination rates (E: electrody-namic, M:MAP)

MEQ filtering. More generally, this configuration ex-hibits lowest discrimination rates against other filteringtypes on MAP loudspeakers.A single way ANOVA did not show a significant effect ofthe sound material (p ∼ 0.2) neither did it for the sourceposition (p ∼ 0.6).

−0.2 −0.1 0 0.1 0.2 0.3

−0.2

−0.1

0

0.1

0.2MDS; confidence index = 2.2499e−006

EMeq

EDD

EFD

MMeq

MDD

MFD

Fig. 15: MDS analysis of results for ABX test

A multidimensional scaling analysis was performed onthe results using mdscale function of Matlab. Figure15 displays the obtained space. The axes clearly de-fine loudspeaker type (x axis) and filtering type (y axis).Stimuli are more spread along the loudspeaker axis thanalong the filtering axis, which is consistent with the re-sults displayed in figures 14(a) and 14(b). The ”EFD”stimulus (electrodynamic loudspeakers with FD filter-ing) appears slightly isolated since it always presentshigh discrimination rates.

4.2.3. Discussion

The general outcome of this test is that electrodynamicand MAP loudspeakers are well discriminated. A mod-erate amount of diffusion (DD filter) on electrodynamicloudspeakers reduces discrimination rate when com-pared to MAP loudspeakers which however remains

above 80 %.After the listening test, the subjects were asked to rank(from 1 to 5, most important to less important) an en-semble of five perceptual attributes to discriminate stim-uli within the triplets. Average ranks given by subjects tothese perceptual were computed. According to them, themost important attribute is distance (ranked 2.2 on av-erage), then comes elevation (2.6) and coloration (2.9).The two last attributes (azimuth: 4 and ASW: 4.2) areranked low. This may be due to the fact that the em-ployed sound material are broadband. More experimentsmay be required on these specific attributes, possibly us-ing more critical sound material (eg. high-pass filterednoise).Even though the elevation difference is below 5◦, thesubjects were still discriminating the vertical positionsof the loudspeaker arrays. This artificially increases theobtained discrimination rates between both loudspeakertypes. Concerning coloration, the subjects may haveused both coloration differences between loudspeakertypes and coloration introduced by FD filter which mayappear as severe.The importance of distance confirmed impressions of theauthors during early informal listening session. It wasthus decided to conduct an additional listening test onthis particular attribute.

4.3. Distance judgement

4.3.1. Test protocol

A pair comparison direct scaling method is used for thetest on distance perception. Two successive stimuli arepresented to the subjects. Their task is to indicate on acontinuous scale if the second stimulus is closer, at thesame distance, further than the first stimulus.In this test, the FD filter was discarded since it was foundto exhibit too much coloration that could be disturbingfor subjects. This forms 4 stimuli and 6 pairs which arepresented in both orders (12 and 21). The sound stimuliare rendered on both source positions.A virtual room processor is used in order to elicit threelevels of distance (close: no additional room effect, middistance, far distance) [19]. The room effect is renderedusing three virtual loudspeakers on the WFS array and6 side and rear loudspeakers. The goal of this is to ver-ify that the differences in distance perception betweenloudspeaker and/or filtering types remains consistent forvarious elicited distances. Only the guitar sound mater-ial was chosen since it was used in [19] to validate thevirtual room model on a binaural setup.

AES 30TH INTERNATIONAL CONFERENCE, Saariselka, Finland, 2007 MARCH 15–17Page 11 of 13

Page 12: Objective and subjective comparison of electrodynamic and MAP …articles.ircam.fr/textes/Corteel07b/index.pdf · 2007-07-23 · 1.1. Directivity Figures 2(a), 2(b), 2(c), and 2(d)

Corteel et al. Comparison of loudspeakers for WFS

Md_Mm Ed_Em Mm_Ed Md_Ed Mm_Em Md_Em0

10

20

30

40

50

60

70

80

90

100

Occ

urre

nces

, in

%

GreaterSameSmaller

Fig. 16: Results for distance test. Occurrences of re-sponses indicating greater, same, or smaller distance ofstimulus 1 versus stimulus 2 (Em: electrodynamic MEQfiltering, Ed: electrodynamic DD filtering, Mm: MAPMEQ filtering, Md: MAP DD filtering).

A total of 72 pairs are presented only once in randomorder with no possible repetition. 11 subjects completedthe test in an average of 20 minutes.

4.3.2. Results

The results of the test are simply analyzed by extractingthe number of occurrences indicating a greater, smaller,or same distance for each of the 6 pairs. These are pre-sented in figure 16. MAP loudspeakers are shown to sig-nificantly increase perceived distance compared to elec-trodynamic loudspeakers. Diffusion has a similar but lesspronounced effect, especially considering the electrody-namic loudspeakers.Single way analysis of variance did not show a signifi-cant influence of either pair order (p > 0.9) or eliciteddistance with virtual room effect (p > 0.7). Only a looseinfluence of source position can be noticed (p ∼ 0.2).

4.3.3. Discussion

The perception of distance is classically linked to level,direct to reverberant energy ratio [19]. The more di-rective characteristics of electrodynamic loudspeakers athigh frequencies may limit the interaction of the loud-speaker system with the listening room. A more com-plete analysis of produced room effect is thus necessary.In room measurements have been carried out but theiranalysis is beyond the scope of this article.Another aspect concerns diffusion which seems toslightly increase the perceived distance. The objectivedata presented in section 3.4 show that diffusion reduces

the precision of localization cues which may be inter-preted as a factor increasing distance. However, this isnoticeable only at high frequencies. More studies wouldbe required on this aspect.

5. CONCLUSION

In this article, perceptual aspects of Wave Field Synthesisrendering using electrodynamic compared to MAP loud-speakers are investigated. It is hypothesized that the re-duced spatial coherence that is observed with MAP loud-speakers may enhance the transparency of sound repro-duction. Diffuse filtering for Wave Field Synthesis isthen introduced so as to mimic the behavior of MAP onelectrodynamic loudspeakers and potentially limit per-ceptual artifacts above the aliasing frequency. Objec-tive criteria are used to evaluate perceptual dimensionslinked to reproduction transparency (coloration and lo-calisation cues). Diffusion is shown to potentially reducecoloration and sound color variation. Above the aliasingfrequency, it is also shown that rather consistent localisa-tion cues are available although the synthesized impulseresponse of the loudspeaker array is complex. Two sub-jective experiments showed that diffusion may accountfor some but not all differences between MAP and elec-trodynamic loudspeakers for WFS rendering. In partic-ular, more studies are required to explain the increasedperceived distance using MAP compared to electrody-namic loudspeakers.

6. REFERENCES

[1] A. J. Berkhout, D. de Vries, and P. Vogel. Acousticcontrol by wave field synthesis. Journal ofthe Acoustical Society of America, 93:2764–2778,1993.

[2] M. Lombard and T. Ditton. At the heart of it all:The concept of presence. Journal of Computer-Mediated Communication, 3(2), September 1997.

[3] D. Begault. Auditory and non-auditory factors thatpotentially influence virtual acoustic imagery. InProc. AES 16th Int. Conf. on Spatial Sound Repro-duction, pages 13–26, Rovaniemi, Finland, 1999.

[4] M. M. Boone. Multi-actuator panels (maps) asloudspeaker arrays for wave field synthesis. Jour-nal of the Audio Engineering Society, 52(7/8):712–723, July/August 2004.

AES 30TH INTERNATIONAL CONFERENCE, Saariselka, Finland, 2007 MARCH 15–17Page 12 of 13

Page 13: Objective and subjective comparison of electrodynamic and MAP …articles.ircam.fr/textes/Corteel07b/index.pdf · 2007-07-23 · 1.1. Directivity Figures 2(a), 2(b), 2(c), and 2(d)

Corteel et al. Comparison of loudspeakers for WFS

[5] V. Gontcharov and N. Hill. Diffusivity propertiesof distributed mode loudspeakers. In 108th Con-vention of the Audio Engineering Society, February2000.

[6] E. Corteel. Caractrisation et Extensionsde la Wave Field Synthesis en conditionsrelles d’coute. PhD thesis, Universit deParis VI, Paris, France, 2004. available athttp://mediatheque.ircam.fr/articles/textes/Corteel04a/.

[7] E. Corteel. Equalization in extended area usingmultichannel inversion and wave field synthesis.accepted for publication in Journal of the AudioEngineering Society, 2006.

[8] B. J. C. Moore. An Introduction to the Psychologyof Hearing, 5th ed. Academic Press, San Diego,CA, 2003.

[9] E. W. Start. Direct Sound Enhancement by WaveField Synthesis. PhD thesis, TU Delft, Delft, PaysBas, 1997.

[10] E. Corteel. On the use of irregularly spaced loud-speaker arrays for wave field synthesis, potentialimpact on spatial aliasing frequency. In 9th Int.Conference on Digital Audio Effects (DAFx-06),Montral, 2006.

[11] W. de Bruijn. Application of Wave Field Synthesisin Videoconferencing. PhD thesis, TU Delft, Delft,Pays Bas, 2004.

[12] B. J. C. Moore and C. T. Tan. Development andvalidation of a method for predicting the perceivednaturalness of sounds subjected to spectral distor-tion. Journal of the Audio Engineering Society,52(9):900–914, September 2004.

[13] E. Corteel, U. Horbach, and R. S. Pellegrini. Mul-tichannel inverse filtering of multiexciter distrib-uted mode loudspeaker for wave field synthesis. In112th Convention of the Audio Engineering Soci-ety, Munich, Allemagne, May 2002. Preprint Num-ber 5611.

[14] M. Slaney. An efficient implementation of thepatterson-holdsworth filter bank. Apple ComputerInc. - Technical Report number 35, 1993.

[15] J. Blauert. Spatial Hearing, The Psychophysics ofHuman Sound Localization. MIT Press, 1999.

[16] F. L. Wightman and D. J. Kistler. The dominantrole of low-frequency interaural time differences insound localization. The Journal of the AcousticalSociety of America, 91(3):1648–1641, March 1992.

[17] R. Mason, T. Brooks, and F. Rumsey. Fre-quency dependency of the relationship betweenperceived auditory source width and the interau-ral cross-correlation coefficient for time-invariantstimuli. Journal of the Acoustical Society of Amer-ica, 117(3):1337–1350, March 2005.

[18] T. Hirvonen and V. Pulkki. Perception and analy-sis of selected auditory events with frequency-dependent directions. Journal of the Audio Engi-neering Society, 54(9):803–814, September 2006.

[19] R. Pellegrini and U. Horbach. Perception-based de-sign of virtual rooms for sound reproduction. In112th convention of the Audio Engineering Society,Munich, Germany, May 2002.

AES 30TH INTERNATIONAL CONFERENCE, Saariselka, Finland, 2007 MARCH 15–17Page 13 of 13