Virtual, Synthetic, and Entertainment Audioleaping ahead before one’s eyes into virtual worlds,...

8
804 J. Audio Eng. Soc., Vol. 50, No. 10, 2002 October Virtual, Synthetic, and Entertainment Audio

Transcript of Virtual, Synthetic, and Entertainment Audioleaping ahead before one’s eyes into virtual worlds,...

Page 1: Virtual, Synthetic, and Entertainment Audioleaping ahead before one’s eyes into virtual worlds, aug-mented reality, synthetic 3-D environments, and new con-junctions of art, cognitive

804 J. Audio Eng. Soc., Vol. 50, No. 10, 2002 October

VViirrttuuaall,, SSyynntthheettiicc,, aannddEEnntteerrttaaiinnmmeenntt AAuuddiioo

Page 2: Virtual, Synthetic, and Entertainment Audioleaping ahead before one’s eyes into virtual worlds, aug-mented reality, synthetic 3-D environments, and new con-junctions of art, cognitive

publication. This, they believed, would be a key factor in in-creasing the quality of the conference, and by all accountsthis bold move was rewarded. The proceedings contain 47papers selected from 57 submissions, including four invitedpapers from key experts in the field—Jens Blauert, JürgenHerre, Xavier Rodet, and Peter Svensson.

Held in the futuristic-looking computer science buildingat HUT in Espoo, a major city just across the water fromHelsinki, the conference ran June 15–17 and consisted ofpapers sessions, posters, demonstrations, and social events.Additional support was provide by sponsors Bang &Olufsen, Genelec, HUT, Nokia, and Yamaha (F-Music).The conference provided ample opportunities for delegatesfrom 23 countries to benefit from each other’s knowledgeand expertise. The theme of the conference emphasized vitalnew areas of research concerned with audio creation andprocessing in the artificial rather than the natural sound do-main. Here one could see the future of the audio industryleaping ahead before one’s eyes into virtual worlds, aug-mented reality, synthetic 3-D environments, and new con-junctions of art, cognitive science, and engineering.

OPENING DINNEROn the Friday evening before the start of the conference,delegates were treated to an excellent buffet dinner at

J. Audio Eng. Soc., Vol. 50, No. 10, 2002 October 805

he south of Finland in the middle ofsummer is quite different from north-of-the-Arctic-Circle Rovaniemi, thesite of the AES 16th International Con-

ference, in April 1999. During the AES 22nd InternationalConference in Espoo in June there were warm sea breezesoff the Gulf of Finland instead of snowdrifts; instead ofhuskies and snowmobiles there were sailboats and catama-rans nearby. The conference organizing team, however, un-deterred by their earlier experience of running an AES con-ference, were familiar faces drawn chiefly from the AESFinnish Section.

The HUT (Helsinki University of Technology) acousticslaboratory of Professor Matti Karjalainen, technical chair ofthe conference, continues to spawn enthusiastic and high-quality researchers, not least among these the two energeticconference cochairs, Jyri Huopaniemi and Nick Zacharov.They created a highly topical theme for this event, aiming tobring together scientists working in the related fields of vir-tual, synthetic, and entertainment audio for a peer-reviewedconference of exceptionally good quality. Unlike most AESconferences, they decided to require authors to propose fullpapers instead of just abstracts. Papers Chair Vesa Välimäkiarranged for anonymous reviewers to read and select themost suitable, offer comments, and suggest revisions before

TTJyri Huopaniemi (left) and Nick Zacharov, conference cochairs

AES 22ND

INTERNATIONALCONFERENCE

Espoo, FinlandJune 15-17, 2002

Page 3: Virtual, Synthetic, and Entertainment Audioleaping ahead before one’s eyes into virtual worlds, aug-mented reality, synthetic 3-D environments, and new con-junctions of art, cognitive

AES 22nd International Conference

806 J. Audio Eng. Soc., Vol. 50, No. 10, 2002 October

Nokia Research Center in Helsinki. The dinner followed anintroduction to the conference during which a novel musicalinstrument, the electric kantele, was described and played.This traditional Finnish stringed instrument has been electri-fied as part of a national project to promote and preserve it.The inventor of the electric kantele and his young son per-formed on different instruments to show how the kantelecan be adapted to rock music as well as traditional music.

VIRTUAL AND AUGMENTED REALITYVirtual worlds require virtual acoustics, and the ways inwhich computers can be used to model virtual acousticspaces were admirably explained in an invited paper by Pe-ter Svensson and Ulf Kristiansen of the Norwegian Univer-sity of Science and Technology. Svensson explained howdifferent methods can be used to simulate reflections fromroom surfaces of different types, outlining the relative mer-its of these as well as approaches to auralization—the cre-ation of a rendered audible example of sound in the mod-eled space.

Also in this session Olivier Delerue from IRCAM inFrance described work undertaken in a European projectcalled Listen, concerned with the artificial enhancement ofnatural auditory environments. He explained how virtualsound scenes can be used in museum spaces to modify theexperience of the visitors by tracking their positions and ren-dering binaural signals, such as audio enhancements of paint-ings, to their earphones. A listener can approach a paintingand hear related auditory stimuli that appear to be comingfrom the painting.

Another example of augmented reality, in which a formof hybrid sound reproduction integrates sound reproducedover loudspeakers with that reproduced through earphones,was described by Christian Müeller-Tomfelde of Fraun-hofer. This produces a public signal heard by everyone inthe environment and a private signal heard only by the indi-vidual wearing headphones. Natural localization, for exam-ple, could be augmented by artificial room impression.Miniature hearing-aid-style earphones were discussed as afuture practical tool that might be used to reproduce the re-quired signals, coupled wirelessly by something like Blue-tooth to a mobile computing device as a means of avoidingthe need for obtrusive wired headphones. The concept of ascience-fiction-like auditory implant and receiver can be en-visaged as a step further from this, by which natural listen-ing could be coupled with artificial reproduction of remotelytransmitted signals.

SOUND SYNTHESISXavier Rodet of IRCAM is a name well known to expertsand novices alike in the field of sound synthesis. Amongthe highlights of this session, his invited paper “PresentState and Future Challenges of Synthesis and Processing ofthe Singing Voice” provided an illuminating review of thefield with entertaining examples of different approachesused over the years. Showing examples of both solo voiceand choral singing synthesis, Rodet explained the need fora complex interaction between science, musical creativity,and engineering, a theme that was to recur many timesthroughout the conference both in informal discussions andformal presentations. Naturalness of the synthetic creationwas held to be an important goal by many, although somechallenged the need for artificial creations to be natural,proposing that perhaps some sounds are only perceived tobe unnatural because we are unfamiliar with them.

“Modeling Bill’s Gait: Analysis and Parametric Synthesisof Walking Sounds” was the tongue-in-cheek title of thepresentation by another of the well-known figures in musicsynthesis, Perry Cook of Princeton University. In a rapid-fire and captivating talk Cook showed how a parametricsynthesis approach called PhOLIE (Physically Oriented Li-brary of Interactive Effects) can be used to model walkingsounds like footsteps on gravel, for use in movie sound ef-fects. He developed a novel, hand-held control box inter-faced to his laptop computer that enabled various parame-ters of the synthesizer to be adjusted so as to alter the natureof the crunching sounds resulting from the statistical model-ing of virtual particles rattling around in a virtual com- ➥

Jürgen Herre

Xavier Rodet

Peter Svensson

Jens Blauert

INVITED AUTHORS

Page 4: Virtual, Synthetic, and Entertainment Audioleaping ahead before one’s eyes into virtual worlds, aug-mented reality, synthetic 3-D environments, and new con-junctions of art, cognitive

AES 22nd International Conference

J. Audio Eng. Soc., Vol. 50, No. 10, 2002 October 809

partment or against each other. Sounds such as that ofmaracas can also be created by shaking the device be-cause it contains an accelerometer to detect motion.He explained the value of the approach in efficientsound creation for games, where PCM samples mightrequire too much memory or processing time.

In one of the demo rooms running during the con-ference Mikael Laurson of the Sibelius Academy to-gether with Vesa Välimäki and Cumhur Erkut ofHUT showed a novel approach to the production ofvirtual acoustic guitar music. This used a physicalsynthesis model to generate the guitar sound and anoriginal score-based interface for controlling the com-positional input.

3-D AUDIO TECHNOLOGIESA key factor in the creation and analysis of artificial envi-ronments is spatial-representation techniques. HRTF (head-related transfer function) estimation from head models andinterpolation in moving sound were among the themes de-veloped in papers during this session. There was also an in-teresting paper from Ole Kirkeby of Nokia on a simplestereo-widening network for allowing reproduction of loud-speaker stereo material over headphones. Using basiccrossfeed-with-delay processing to simulate the interauralcrosstalk that takes place in loudspeaker listening, the aimwas to get the stereo image out of the head. In fact duringthe conference a live subjective experiment was conducted,comparing different stereo-enhancement algorithms forheadphone listening, all designed in various ways to im-prove the reproduction of loudspeaker stereo over head-phones. Unfortunately, as Gaetan Lorho of Nokia explainedlater in the conference in his paper on a round robin test us-ing such systems, none of a range of so-called enhancementalgorithms succeeded in being rated more highly on aver-age than an unprocessed stereo signal, perhaps because oflistener familiarity. Some discussion ensued between thedelegates about the reasons why similar circuits had notproved more popular over the years, and it was agreed thatthey had possibly arrived before their time during an erawhen headphone use was not so widespread.

Carlos Avendano’s paper on frequency domain tech-niques for 2-to-5 channel upmixing provided a fascinatinginsight into what can be done to separate ambience informa-tion from discretely panned sources in stereo signals. Hedemonstrated fine control over an algorithm that could sepa-rate ambience to the rear loudspeakers of a 5-channel mix aswell as repanning individual amplitude-panned sources inthe front image. The ambience extraction process even en-abled reverberation levels to be increased or reduced overallin the mix.

AUDIO CODING TECHNIQUESJürgen Herre of Fraunhofer, a leading figure in the field oflow bit-rate audio coding, presented the invited paper “AudioCoding—An All-Round Entertainment Technology.” ➥

Delegates check in at registration desk.

Stadium-style seating inlecture hall provided

excellent visibility.

Author Perry Cook

Friday evening welcome reception at Nokia Research Center

Page 5: Virtual, Synthetic, and Entertainment Audioleaping ahead before one’s eyes into virtual worlds, aug-mented reality, synthetic 3-D environments, and new con-junctions of art, cognitive

AES 22nd International Conference

810 J. Audio Eng. Soc., Vol. 50, No. 10, 2002 October

Concentrating primarily on MPEG, he showed how recentextensions to the standard enabled one to recover the HFcontent of lower sampling rate material using limited helperinformation transmitted alongside the main audio. The im-provement in quality was quite noticeable in 48-kbit/s stereomaterial compared with basic MP3 coding. He pointed outthat there was little prospect of further noticeable savings inthe bit rate for transparent-quality coding—128 kbit/s forstereo using MPEG 2 AAC was still the best that could beachieved. Nonetheless, good quality coding still has furtherpotential for bit-rate savings. With regard to audio transmis-sion over wireless connections, he pointed out that the ro-bustness of the channel (in fact, the lack of it) is the mainproblem, so audio coding schemes need to build in extra ro-bustness of their own to accommodate this, independent ofthe transport stream. This is something that can be accom-modated within MPEG-4.

Michael Goodwin of Creative Advanced TechnologyCenter proceeded to explain ways in which coded audio ma-terial can be postprocessed in computing devices, since it islikely that coded audio will need to coexist with an increas-ing range of processes such as spatialization, mixing, and ef-fects. He showed that some of this processing can beachieved in the so-called compressed domain by unpackingthe coded audio data and processing it in the frequency do-main before transforming the resulting mixed audio signalsback to the time domain. In this way fewer transforms mightbe needed than the number of source signals originally re-ceived. Parametric coding and resynthesis techniques thatare only just beginning to be exploited provide further op-portunities for savings in computational complexity at thedecoder.

A remarkable example of a 3-D virtual environmentcalled EVE (Experimental Virtual Environment) wasdemonstrated by Lauri Savioja and his team from HUT. ASilicon Graphics Onyx computer the size of a small refriger-ator with 2 Gbytes of RAM drove a number of projectors tocreate a 3-D rendering of virtual spaces within a small cube,provided the viewer wore suitable shutter glasses. The view-er could navigate around a virtual lecture theater or rovearound the inside of a chemical molecule after getting overthe initial sense of spatial disorientation or vertigo. A VBAP(vector-base amplitude-panning) spatialization system with

14 loudspeakers had also been installed for audio rendering,enabling viewers to direct the position of a virtual ball in thespace, with audio that followed it.

POSTER SESSIONA lively poster session was conducted on Sunday after-noon. Authors presented their posters to describe their re-search and provide small demonstrations, made possible inmany cases through the convenience of laptop computersand headphones. A number of interesting crossover projectscould be observed involving artistic and technical concepts,including Andrew Paterson’s “Stratigraphical Sound in 4-DSpace,” which provides a novel archaeological approach tothe use of layered strata in the authoring of sounds for dif-ferent phases of an interactive audio-visual experience.

SUBJECTIVE AND OBJECTIVE EVALUATIONThe human receiver is the ultimate judge of the quality ofany audio system, and the papers in this session dealt withexperiments that involved subjective evaluation of soundquality or spatialization. The paper by Matti Gröhn andTapio Lokki of HUT on static and dynamic sound sourcelocalization in a virtual room dealt with the ability of sub-jects to track a moving sound source that traced variousshapes in the virtual 3-D environment; this concept wasdiscussed earlier in the description of the EVE demonstra-tion. Using the VBAP algorithm they found that there was

Vesa Välimäki,papers chair

Matti Karjalainen,technical chair

Consulting at registration desk: from left, Martti Rahkila,webmaster and IT officer; Roger Furness, AES executivedirector; Juha Merimaa, secretary; and David Isherwood,publications officer.

Tapio Lokki, venue officer

Page 6: Virtual, Synthetic, and Entertainment Audioleaping ahead before one’s eyes into virtual worlds, aug-mented reality, synthetic 3-D environments, and new con-junctions of art, cognitive

AES 22nd International Conference

J. Audio Eng. Soc., Vol. 50, No. 10, 2002 October 811

a tendency for the auditory judgments of listeners to be ele-vated compared with the visual stimulus; also the trajecto-ries of sources tended to be pulled toward the loudspeakerpositions. They found that localization blur was greaterwith panned sources than when originating from a singleloudspeaker, which is not unexpected.

COMPUTATIONAL AUDITORY SCENE ANALYSISArguably the most distinguished participant of the 22ndConference was Professor Jens Blauert of the Institute forCommunication Acoustics. His work has spawned numer-ous Ph.D. theses as well as his own masterwork, SpatialHearing. Blauert’s invited paper, “Instrumental Analysisand Synthesis of Auditory Scenes: Communication Acous-tics,” was an inspiring evaluation of the current state ofacoustics as it relates to information technology. At the endof the 1800s Lord Rayleigh’s exposition of the physicallaws of acoustics was consideredcomplete, but then the vacuum triodecame along and audio could be ampli-fied, bringing with it a whole newfield of electroacoustics. Then com-puters came along and changed thingsagain. Blauert was the first professorin his laboratory to have a computer,and many of his colleagues could notunderstand why he would want one,but his foresight paid off.

Computational auditory sceneanalysis (CASA) is said to be theanalysis equivalent of virtual reality(VR) which is concerned with syn-thesis of scenes. Both deal with aparametric version of the world.CASA requires cognitive models ofhuman perceptual processes, and ➥

Carlos Avendano gives demo on frequency domaintechniques.

Numerous, high-quality papers on display at Sundayafternoon’s poster session offer delegates more food forthought.

Excellent lunches were enjoyedthroughout conference at Helsinki

University of Technology.

Delegates enjoy refreshments at coffee break.

Page 7: Virtual, Synthetic, and Entertainment Audioleaping ahead before one’s eyes into virtual worlds, aug-mented reality, synthetic 3-D environments, and new con-junctions of art, cognitive

AES 22nd International Conference

812 J. Audio Eng. Soc., Vol. 50, No. 10, 2002 October

the audio world needs to embrace cognitive engineering ifit is to move forward. Referring to the problem of integrat-ing psychology with engineering, Blauert pointed out thatin the liberal arts you are the king of the hill if you find anew problem, whereas in engineering the goal is solvingthe problem. We somehow need to bridge the divide be-tween these paradigms.

The remainder of the session on CASA was concernedprimarily with issues of blind identification of features in

audio signals, such as the spatial qualities of early and latereverberation and the beat of music.

SOCIAL EVENTSThe Finns know how to enjoy themselves,and the conference organizers did their bestto ensure that delegates did too. TheFinnish Experience, anticipated by somewith trepidation, proved to be a thoroughlyenjoyable evening for all. It started with aconcert given by students from the SibeliusAcademy. They played a variety of Finnishmusic of different styles and with differentcombinations of players who were joined bya singer for some Sibelius songs. The envi-ronment for this was the relatively newrecital hall of the Helsinki Conservatory,

whose acoustic designer, Henryk Møller,was on hand to describe how he had createdhard diffusing surfaces molded out of gyp-sum that also had an interesting visual ap-pearance. The relatively high reverberationtime and lateral efficiency of the hall gaveit a pleasing spaciousness that enhanced thedelightful playing of the enthusiastic andtalented musicians.

The clubhouse of the Finnish Sauna Soci-ety on the shore of the Gulf of Finland wasthe locale for the remainder of the evening.Delegates signed up for one of two sessions:the first group using international rules wore

swimming suits, while later in the evening the second groupof sauna veterans went “au naturalle” according to Finnishrules. Consisting of three smoke saunas, two wood saunas,and a number of more modern systems, the clubhouse pro-vided an ideal venue for the assembled company to experi-ence one of the great pleasures of Finnish life. After allow-ing the penetrating heat of the sauna to soothe and relaxaching muscles, delegates could take a bracing swim in thefrigid Gulf of Finland. The food and drink of an outdoor

At banquet: clockwise from above, AES President GarryMargolis (inset) and Roger Furness compliment conferencecommittee; band called Underground Hammers provide musicalentertainment; Nick Zacharov and Jyri Huopaniemi listen asMartti Rahkila tells delegates that conference planning rackedup tens of thousands of email messages.

Page 8: Virtual, Synthetic, and Entertainment Audioleaping ahead before one’s eyes into virtual worlds, aug-mented reality, synthetic 3-D environments, and new con-junctions of art, cognitive

AES 22nd International Conference

J. Audio Eng. Soc., Vol. 50, No. 10, 2002 October 813

barbeque provided abundant nourishment for repeatedsauna-swims.

On the following evening, the team arranged a boat trip tothe nearby island of Suomenlinna. A representative of thecity of Espoo thanked the AES for its choice of the city forthe conference. The restaurant in the fort on the islandproved an excellent venue for the conference banquet andthe subsequent entertainment, which was provided by an in-teresting band called Underground Hammers. After a sump-tuous dinner, Zacharov and Huopaniemi praised the numer-ous crew members and the other members of the conferencecommittee: Juha Merimaa, secretary; David Isherwood,publications officer; Martti Rahkila, webmaster and IT offi-cer; Tapio Lokki, venue officer; Juha Backman, sponsorscoordinator; and Kari Jaksola, treasurer. They thanked thedelegates, the authors, invited speakers, and AES officersfor their support.

As Jens Blauert implied at the end of his invitedpaper, the topics raised at the conference and therange of problems yet to be addressed in the integra-tion of cognitive science, information technology,art, audio engineering, and acoustics prove thatthere is a long and interesting future ahead of us.

A CD-ROM and a printed version of The Proceedings ofthe AES 22nd International Conference are available fororder on www.aes.org, or from any AES office.

Delegates enjoy toast on boat ride to Suomenlinna Island.

Finnish Experience: after exhilarating saunas and refreshingswim in Gulf of Finland, delegates enjoy barbeque at FinnishSauna Society.

Musical highlight of conference was concert at HelsinkiConservatory by students from Sibelius Academy.

Entire team was recognized for producing outstandingconference: from left, Tapio Lokki, David Isherwood,Jyri Huopaniemi, Mikka Tikander, Matti Karjalainen,

Vesa Välimäki, Juha Merimaa, Nick Zacharov, HenriPenttinen, Martti Rahkila, Juha Backman, Marko

Takanen, Matti Airas, Ville Pulkki, Paulo Esquef, KalleKoivuniemi, and Hanna Järveläinen. (Kari Jaksola,

Miikka Huttunen, Petri Mäntysalo, Sampo Vesa, andRaimo Lauren missed photo.)