AES Future Human Interfaces to Ezproxy.library.nyu.Edu 4426 TmpFiles Elib 20130511 6498

download AES Future Human Interfaces to Ezproxy.library.nyu.Edu 4426 TmpFiles Elib 20130511 6498

of 14

Transcript of AES Future Human Interfaces to Ezproxy.library.nyu.Edu 4426 TmpFiles Elib 20130511 6498

  • 7/28/2019 AES Future Human Interfaces to Ezproxy.library.nyu.Edu 4426 TmpFiles Elib 20130511 6498

    1/14

    Future Human Interfaces to Computer Controlled Sound Systems 3735 (A2-PM-3)Craig RosenbergUniversity of WashingtonSeattle, WA, USABob MosesR ane C or po ra tio nMukilteo, WA 98275, USAPresented at Auo,the 95th Convention1993 October 7-10NewYork

    Thispreprinthasbeen reproducedromtheauthor'sadvancemanuscript,withoutediting ,correctionsor considerationby theReviewBoard. TheAEStakesno responsibilityorthecontents.Additionalpreprintsmaybe obtainedbysendingrequestandremittancetotheAudioEngineeringSociety,60East42ndSt.,New York,New York 10165-2520,USA.Allr_Thtsreserved.Reproductionof thispreprint,oranyportionthereof,isnot permittedwithoutdirectpermissionfrom theJournalof the AudioEngineeringSociety.

    AN AUDIO ENGINEERING SOCIETY PREPRINT

  • 7/28/2019 AES Future Human Interfaces to Ezproxy.library.nyu.Edu 4426 TmpFiles Elib 20130511 6498

    2/14

    Future Human Interfaces ToComputer Controlled Sound Systems

    Craig RosenbergHuman Factors Lab, University of WashingtonBob M osesRaneCorporation

    AbstractComputercontrolledsoundsystemsare amongthemost activeresearch

    topicswithinthe AES community.We are enteringan ageof interoperability,in whichdevicesand the humanoperatorworkas a cohesive team.Thispaper examineshuman-machineinterfaceissuespertainingto computercontrolledsoundsystems.Traditionalhumaninterfacesareanalyzed,andfoundinappropriatefor computercontrolledequipment. New humaninterfacetechnologiesare presentedsuchas: spatialpositiontracking,eye tracking,tactile feedback,andheadmounteddisplays. We describehow thesetechnologieswork,and theirapplicationsn soundsystemcontrolandmusicalperformance.

  • 7/28/2019 AES Future Human Interfaces to Ezproxy.library.nyu.Edu 4426 TmpFiles Elib 20130511 6498

    3/14

    IntroductionSound systems are rapidly incorporating advanced forms of computer control. Since theearly 1980's, a number of computer control systems have flourished: MIDi, IQ, MediaLink, PA-422, Mind~Net, and others. The proliferation of these computer control systems is fueled by anumber of technological advances; the personal computer, digital signal processing, and localarea networking.Of course, audio is not the only field to benefit from these technologies. The "informationage" is upon us, and is changing almost everything in our daily lives: how we bank, howwe shop,howwe communicate, how we entertain, and so on. As information technologies expand,traditional paradigms for human-machine interaction ara stretched beyond their limits. How doesa person, for example, effectively manipulate a spread sheet with millions of cells of

    information? Or, alternatively how does a person operate a large distributed sound system inreal time with hundreds of channels of audio, and nearly a thousand audio signal processingfunctions?Human factors researchers are studying these questions anddevising new human-machineinterface techniques and technologies. Some o f these technologies have become componentsof "virtual reality" (VR) systems, while others live a more humble life in less "trendy" applications.This paper provides a review of human interface technologies, and their possible applications incomputer controlled sound systems.

    The Boon of Digital Comm unicationsThe primary component of computer controlled systems is a communications channelbetween devices in the system. This communications channel can take the form of a master-slave bus, or a peer-to-peer local area network. These systems provide at least two vauablebenefits:Remote Control. Computer controlled systems allow an operator to control devices from aremote location, through the bus or network, As a result, equipment can bedistributed orcentralized--whichever is most convenient. System operation is simplified since a singleoperator has access to all devices in the system. Non-human operators (i.e. computers) cantake over many of the routine tasks such as watching clip indicators and VU meters andadjusting the levels appropriately. This frees the human operator to concentrate on more of thecreative and fun tasks. Few would argue that remote control is not highly desired--just look atthe average American coffee tablelInteroperability. Computer controlled systems based on local area networks (and to acertain degree, buses) provide anarchitecture of interconnected devices. These devices havethe opportunity to interact, share resources, andwork together as a team rather than a collectionof autonomous entities; This capability has not yet been fully exploited by any of the computercontrol systems in the industry. Inthe future, DSP processing modules will sharetheir CPUs toallow flexible (and powerful) distributed parallel processing. Controls on one device might bemapped to other functions in other devices (e.g. an amplifier volume control might control anequalizer to dial in a "loudness" curve as level is turned down), and so on. Interoperability willhave a profound impact on the performance and the flexibi lity of systems, much more so thanwe are aware oftoday.In this paper, we present a third important benefit of computer controlled systems; theopportunity to implement an improved human interface to the system. New human interfacetechniques can be implemented within personal computers, or in dedicated hardware designed tointeract directly with a person and their senses. New human interfaces have the opportunity tobe more intuitive, natural, efficient, and fun. We will explain how, later in this paper. But first, itis instructive to examine some traditional human interfaces to gain perspective.

  • 7/28/2019 AES Future Human Interfaces to Ezproxy.library.nyu.Edu 4426 TmpFiles Elib 20130511 6498

    4/14

    Traditional Human InterfacesAnalog Controls

    Most traditional analog-based controls interact with the human through potentiometers,switches, etc., that are connected directly to the audio signal path in the device. This is verystraight fonNard, but not very flexible. Devices with a large number of controllable functions,such as a mixer, are necessarilly large and heavy. The moving parts involved with analogcontrols are often inaccurate, unreliable, and expensive to automate. To compensate for theseshortcomings, manufacturers are now introducing digitally-controlled analog devices which omitthe use of moving parts and place functions under the control of a microprocessor.Touch Keypads

    Digitally-controlled analog (and fully digital) devices often interact with the human operatorvia a touch keypad. Keypads can either bealphanumericor function-based.Alphanumerickeypads (keyboards) are usedwith personal computers. The user types commands into thecomputer, which are then processed and routed to the devices under computer control. Functionbased keypads are often found on sound equipment and provide a higher level interface to thedevice's internal functions. Typical functions include: edit, store, recall,and so on.Touch keypads usually provide an improvement in reliability, accuracy, and size over analogcontrols, but are frequently more expensive. Touch keypads can provide a simpler, improved,human interface to a device, but often, a user is left wondering which keys to hit to evoke adesired action. More importantly, many sound engineers are accustomed to interacting withcertain devices via traditional analog controls, for example, a graphic equalizer. For thesereasons, it should be noted that a keypad does not necessarily provide a more intuitive or naturalmeans of controlling the device than physical knobs and sliders.

    Alphanumeric DisplaysAlphanumeric displays come in two common flavors; Light Emitting Diodes (LED) and LiquidCrystal Displays (LCD). LED displays are usually small, containing less than ten 7-segment styledigits. LCD displays typically provide 32 or more dot matrix characters. LED displays canpresent simple numeric readouts of a device's parameters. LCD displays communicate with auser threugh written language, and are very flexible. The primary disadvantage of these displaysis that they force the user to translate all meaning from written representations. As one canimagine, the relative positions of faders on a mixer can provide a much better representation of"the mix" than a row of numbers on an alphanumeric display.

    Problems With Traditional Human InterfacesAnalog controls, alphanumeric displays, and touch keypads all have limitations, as discussedin the previous sections. In general, none of these interface components provide an economical,intuitive, andflexible interface to the system. Often, the user is left with an interface that isincapable of translating creative intentions into the proper commands that the system is able tounderstand. Another frequent problem is the user is not able to effectively understand theoutputs that the system is trying to communicate. For these reasons (and others), the user maynot be able to accurately control or understand the system. This often results in operator errors

    and even damage to the system.Searching for the Ideal Human Interface to Sound Systems

    A variety of human interface equipment exists that couples a human operator with computersystems. This equipment can be classified into two major groups: input devices (presentinghuman actions to the computer), and output devices (presenting information from the computerto the human).

  • 7/28/2019 AES Future Human Interfaces to Ezproxy.library.nyu.Edu 4426 TmpFiles Elib 20130511 6498

    5/14

    One aspect of human factors research investigates the design and use of computer interfaceequipment. The field of "virtual reality" is fueled by advances within the field of human factors.It is the physical interface equipment in combination with graphical user interface software thatcan empower people within a virtual environment. In the following section, advanced computerinterface techniques and their associated issues and applications are presented and discussed.The Hum an Factors o f Advanced Hum an Interfaces

    Inthis paper, the human factors research areas are divided into four sections; user input andcomputer recognition, tactile and force feedback, visual display systems, and hearing and spatialsound. Issues are presented along with possibleapplications of the technologies toward the goalof improving the human interface to computer controlled sound systems.User Input and Computer RecognitionUser input refers to the computer being able to recognize the actions and intentions of theuser. A device that recognizes some form of humanexpression, andtranslates that input intonumerical data that the computer can understand, is sometimes called a behavioraltransducer.When the user invokes an action that the system is able to recognize (for example: turns a knobor moves afader), the system may be equipped to recognize and respond to the user's input.The greater the degree of communication from the user to the computer system, the greaterthe capability of the system to respond to the user's intentions. However, the bandwidth ofcontrol, alone, is not enough to ensure a powerful and intuitive user interface. The methods of

    control, and the mappings between users actual inputs and the meaning to the system, are ofgreat importance.There are many forms of user input available to designers of advanced systems. A variety ofnew and unique input devices have been developed specifically for advanced human-computerinteraction. The following sections detail many of these advanced user interface techniques and

    present the issues associatedwith using them. Inthe final section, the applicability of these newinterface tools within sound systems isdiscussed.Voice InputSpoken words are the most common form of communication between human beings.Speech isalso the most rapid form of natural communication. To take advantage of thesehuman capabilities, voice recognition systems have been developed to recognize spoken words.There are several aspects that characterize the performance of voice recognition systems.Voice recognition systems are either speakerdependentor speaker independent. Anyonecan use a speaker independent system without having to train the system. Speaker dependentsystems must be trained to recognize the unique voice of the particular speaker using thesystem.Another variable that characterizes voice recognition systems is the size of the vocabulary(the number of words) that the system is able to recognize. The greater the vocabulary of thesystem, the more flexible the system is in recognizing spoken input. Words can be recognizedand assigned to commands that the system is able to execute. Words can also be used asmodifiers to commands. In this way, the operator of the system is able to speak directly to theprogram to issue commands to the system.Some systems are able to understand continuously spokenwords as opposed to only

    discretely spokenwords. Systems that recognize continuous speech are far more flexible andeasier to use than discrete systems. This is because people speak continuously, as opposed touttering discrete words interspersed by silence. Discrete voice recognition systems are thereforelaborious to use. There are still technical problems associated with continuous word recognition,however, there issignificant research being done in this area. The maximum benefit associated

  • 7/28/2019 AES Future Human Interfaces to Ezproxy.library.nyu.Edu 4426 TmpFiles Elib 20130511 6498

    6/14

    with voice input will be attained from speaker independent, continuous word, voice recognitions y s t e m s .Eye TrackingEye tracking is another form of computer input that can beemployed to recognize intentionfrom the user. Eye tracking involves recognizing where the user is looking at any given instantand transmitting that information tothe computer. Eye tracking hardware is available which cantransmit to the computer precise indications of where a person is looking at any given instant.Eye tracking technology is conceptually simple. An infrared light beam is aimed atthe corneaof the eye, while a small TV camera monitors the eye. By tracking the position of the pupil withrespect to the fixed-position of the reflected infrared light, the computer is able to compute theinstantaneous direction of gaze of the users eye.A possible example of eye tracking within the control of sound systems involves a computerdisplay of several components within asystem. The user's eye could be tracked to determinewhich component of the system is being watched at any given moment. Dwell (the time elapsedwhile staring at the same point on the screen) can effectively beused to select devices oroptions within a device. To select a component you just look at it for a brief moment. After thedevice is selected, you could select a control within the device using your eyes. After the controlon the device is selected, you could look up or down to adjust the level of the control under yourvis ual co mm and.

    GestureRecognitionWe use our hands constantly for most all physical tasks we do in the world around us. Whynot use our handsfor input to the computer as well? There are a variety of glove like devicesthat turn the hand into an input device to a computer. Gloves work by measuring the bend angleof several joints of each finger. By recognizing certain combinations of bend angles, thecomputer is able to recognize a gesture (i.e. the "peace" sign, the "thumbs-up" sign, or the "ok"sign). The computer is also able to compare positions of bend angles over time to deduce amoving gesture (like "let-the-fingers-do-the-walking").Various technologies such as fiber optics, mechanical joints, strain gauges, and Hall effectsensors are used to measure the bending angles of the joints of the fingers. Gloves inputdevices are being used for advanced and intuitive human-computer interaction, teleroboticsapplications, sign language interpretation, and hand injury evaluation. An application using a

    glove interface t o a sound systems is discussed in the last section.Spatial TrackingSeveral types of devices have been invented that are able to track the location andorientation of one object with respect t o another object. In Cartesian space these devices trackthe x, y, and z positionand x, y, and z orientation(sometimes called yaw,pitch, androil). Thesedevices typically usemagnetic induction, ultrasonic sound, mechanical, inertial, or opticaltracking to determine spatial location and orientation with respect to some origin.Systems that accomplish spatial tracking by means of magnetic induction have a transmitterwith three internal orthogonal wire coils that induce a current through a receiver (also with threeorthogonal wire coils) that is proportional to the component distances and the relative angles

    between the receiver and the transmitter. Mechanical methods rely on directly connectedmechanical linkages between a stable origin and the mobile object that is being tracked. Somemechanical trackers also incorporate optically encoded disks at bendpositions which are veryaccurate. Inertial systems use gyroscopes and acceleremeters but are susceptible to drift andneed to be recalibrated. Optical methods usually rely on cameras, light emitting diodes, andimage recognition software to accomplish the spatial tracking task.

  • 7/28/2019 AES Future Human Interfaces to Ezproxy.library.nyu.Edu 4426 TmpFiles Elib 20130511 6498

    7/14

    Six dimensional spatial trackers are characterized by the accuracy in which they can collectspatial data (the resolution ofthe tracker). The repeatabilityrefers to the degree to which thetracker drifts over time. The latencyrefers t o the timeliness of the data. Lastly, the data raterefers to the number of positions per second that can besent from the tracker to the computer,Some trackers have the capability to track multiple sourcesconcurrently. Six dimensionalspatial tracking is necessary for virtual reality applications as the computer must always knowwhere your head and hand are in three dimensions as well as which direction they are pointing,measured in three dimensions. Applications of spatial tracking within sound systems isdiscussed in the last section.The Haptlc ChannelSkin isthe largest sensory system with a surfaco area of about 2 square meters.

    Mechanoreceptionis the neural events and sensations that result from mechanical displacementof cutaneos (skin) tissue. This includes repetitive displacements such as vibration, singledisplacement such as touch and pressure and tangential movement of astimulus along thesurface of the skin,Tactile stimulation in the hands results from information being passed to the spinal cordthrough two principal nerve tracks, the Median and Ulnarnerves. The Median nerve covers themajority of the palm, all of the thumb, index andmiddle fingers and half of the fourth finger. TheUlnar nerve covers the remainder of the palm, half the fourth finger and the pinkie.These two nerve fibers contain four nerve types: slowlyadaptingfibers,rapidlyadaptingfibers,punctuatefibers,and diffuse fibers. Slowly adapting fibers respond as something touchesthe skin and continue to show activity as long as the level of pressure is applied, and then taperoff. Rapidly adapting fibers respond with a rapid burst of activity as soon as pressure is applied,and then level off. Rapidly adapting fibers respond again when pressure is released. Punctuatefibers have small oval shaped receptor fields with distinct boundaries that tell the brainwhere thesensation is coming from. Diffuse fibers possess large receptor fields with vague boundaries.

    Tactile and Force Feedback, ProprioceptlonTactile and force feedback can be exceptionally helpful -- even necessary, in human-computer interfaces. In virtual worlds applications, it is disconcerting to see and hear somethingbut not beable to touch or feel it. There are several devices that have been constructed to

    provide tactile and force feedback for advanced human-computer interfaces. Tactile outputdevices can give the user the sensation of pressure, vibration, heat, as well as the shape of anobject.Force feedback is different from tactile feedback, and involves resisting force applied by ahuman operator. Through force feedback alone, we can tell if we are holding an apple or asponge based on the weight and the resistance to our hand closing around it. The informationthat we are using is called proprioceptivecues. Proprioceptive cues are pieces of informationgathered from our skin, muscles, and tendons. PrepriocepUve cues give information aboutwhere any part of our body is and the forces our body is subjected toe. Proprioceptive cues giveinformation about the shape of objects, the firmness of objects, the position of the body, as wellas forces thai the body is subjected too. Proprioceptive cues are necessary for most hands-onreal world tasks, and are alsodesirable for most virtual world applications.A variety of tactile and force feedback devices exist, however, many of the devices that have

    been built are prototypes and are not available as standardmanufactured models. Someexamples are the Argone Remote Manipulator (ARM), the Portable Dextrous Master, PER-Force Hand controller, TeleTact tactile/force feedback system, the Begej Glove Controller, theTINi Alloy tactile feedback system, and the Sandpaper system developed at MIT.

  • 7/28/2019 AES Future Human Interfaces to Ezproxy.library.nyu.Edu 4426 TmpFiles Elib 20130511 6498

    8/14

    Visual Display SystemsVisual display systems are a component of computercontrolled sound systems. Therefore,the issues associated with designing visual display systems will influence the usability of thesound systems to which they are attached. There are many different issues associated with theuse of electro-optical display systems for human computer interaction. In addition there areseveral major types of visual display systems. Display systems can be in the form of eitheropaque of translucent head mounted displays, projectors, as well as common computermonitors, and active matrix displays. This section looks at some of the human factors issuesassociated with visual display systems.

    Display ResolutionResolution refers to the number of picture elements (pixels) of which the display is composed.A greater number of pixels within the display provides higher resolution image. Screen basedresolution is usually measured in dotsperinch. The horizontal display resolution of a display iscomputed by dividing the number of pixels horizontally by the width of the display horizontally.The vertical display resolution is computed similarly,If a display has a large number of pixels, it can display scenes with a higher degree ofcomplexity, In addition, a higher resolution display can show smaller objects. It has also beenshown that the higher the resolution of the display, the less chanceof eye fatigue after prolongeduse.

    Color vs.Black and WhiteEven though color display systems are a relatively recent advancement in visual displaytechnology, their use has proliferated rapidly. One of the ways in which a color display ischaracterized is by the number of simultaneous colors the device can produce. In general, it isdesirable to have acolor display (instead of a monochrome display) unless this results indecreased image resolution, Most all real world scenes contain color and a color display canreproduce the natural color within ascene. In addition, color displays can encode different typesof information with different colors, increasing the information content and therefore theeffectiveness of the display.

    Field of ViewField ofviewrefers to the horizontal and vertical angular extent that the image subtends atthe retina of the users eyes. The field of view is a measure of the area of visual stimulationwhich the display occupies. As the display size increases, the field of view increases. As displaydistance from the user increases, field of view decreases. Field of view isvery importantbecause asthe field of view becomes larger, the cognitive sense of being includedWithinthescene also increases. In addition, more information can be included in a display that has a largefield of view than a display that has a small field of view given that the resolution is the same.Display optics can also be employed to increase the field of view of a given display,

    Geometric Field of ViewGeometricfieldof view refers to the degree of magnification or minification of the image dueto the specific perspective viewing parameters. The distinction between field of view andgeometric field of view can be described by imagining two photographs; one picture taken with atelephoto lens and the other taken with a wide angle lens. One can hold both pictures at armslength and they will both occupy the same field of view but the picture taken with a wide anglelens has a much greater geometric field of view than the image taken with a telephoto lens. Theimage taken with the wide angle lens (large geometric field of view) hasmore of the scene withinthe frame of the picture. The image taken with the zoom lens has less of the scene within theframe of the picture, but it has a greater degree of magnification.

  • 7/28/2019 AES Future Human Interfaces to Ezproxy.library.nyu.Edu 4426 TmpFiles Elib 20130511 6498

    9/14

    Stereoscopic verses MonoscopicMonoscopic display systems, such as standard computer monitors, provide each eye with thesame image. Stereoscopic display systems provide each eye with adifferent image. Theimages in a stereoscopic display system are horizontally offset just as the users eyes arehorizontally offset. Stereoscopic display systems are advantageous because they can providean intuitive sense of depth to the user of the system. Binocularretinaldispar#yrefers to eacheye receiving a different image of the scene. The brain is able to synthesize both images into acohesive scene containing intuitive depth information.Stereoscopic presentation can be accomplished by means of headmounted displays aswellCRT and projection systems. Methods of presenting stereoscopic images include: timemultiplexing the images on the screen, polarization of light, and chromatic separations.Many studies have shown the advantage of stereoscopic displays over monoscopic displaysfor estimating depth information. With a stereoscopic viewing situation there is a both anincrease in the number of depthcues available to the viewer as well as an increase in theeffectiveness of many of the depth cues that are already available in monoscopic viewingsituations. Because of advances in video display technology, stereoscopic presentation is moreavailable and affordable than in the past. Within the domain of computer generated images,stereoscopic presentation can greatly enhance most visual situations due to the increased depthprovided from the binocular presentation. The principle disadvantage of stereoscopicpresentation is the increased cost of the stereo viewing hardware as well as the increasedcomputational cost associatedwith generating two view (one for each eye) for each stereoscopicdisplay.One possible application of stereoscopic visual presentationwithin the domain of soundsystems involves placement of three dimensional sounds. The operator of the system can use aspatial tracker to interactively position computer graphics objects that represent actual sounds inthree dimensions. The user receives stereoscopic visual feedback corresponding to thelocations of the sound as well as the throe dimensional audio feedback coming from the virtualsound sources.

    Hearing and Spatial SoundBesides sight, sound is a primary way in which humans collect information from theirenvironment. The physiology of hearing involves the pinea (outer ear), the ear canal, theeardrum, the hammer, anvil, and stirrup, the cochlea, organ of corti, and auditory nerve. Each ofthese systems have a unique function in the perception of sound.Sound localization in the horizontal plane is accomplished primarily by means of interauraltime differencesand interauralintensitydifferencesatthe users two eardrums. When a soundoccurs to one side of your head, the sound reaches the closer ear sooner than it reaches thefarther ear. This is the interaural time difference. Inaddition, the soundwill be louder in thecloser ear. This is referred to as an interaural intensitydifference. The brain is able to interpretdifferences in timing and loudness of the sound received at the two ears and determine thelocation from which the sound originated.In addition, there are other cues to sound localization that help a listenerdetermine thelocation from which a sound originated. The head forms an acoustic shadow that filters the

    frequencies received in the occluded ear. When a sound occurs to the right of your head, theright ear receives the full frequency of the sound whereas the occluded ear only receivesfrequencies of approximately 1000Hz and less. The head in effect acts as a Iow pass filter.Echolocationrefers to the ability to judge the size of a room by the amount of reverberance ofthe room. The pinea of the ear plays a crucial roll in our ability to localize sounds in.elevation.The pinea of the ear performs a frequency dependent filtering of the sound depending on theelevation from which the sound originated. Sounds originating from above the ear sound higherpitched than sounds originating below the ear.

  • 7/28/2019 AES Future Human Interfaces to Ezproxy.library.nyu.Edu 4426 TmpFiles Elib 20130511 6498

    10/14

    When using computers to recreate the directional components of sound, it is important to firstacquire the "earprlnt" of the user of the system. An earprint is also called a head-relatedtransferfunctionand refers to the characteristics of how the sound changes as the location of the soundrelative to the listener changes. Earpdnts aretraditionally measured by placing miniaturemicrophones in the subjects ears and then recording white noise originating from many differentazimuthal and elevational directional combinations. The sound received at the user's eardrumsis digitized and a mathematical technique called fourieranalysis is used to collect coefficientsthat closely describe the resulting wave form. By collecting a list of coefficients corresponding todirectional combinations, it is possible to recreate and simulate the directional components ofsound using a convolution engine. A convolution engine uses the fourier coefficients to performthe frequency and time dependent filtering that is present in reality.

    An obvious application of three dimensional sound in computer controlled sound systems liesin the ability of the system operator (human and computer) to interactively position sounds inthree dimensions. The ability to spatialize sound can be used effectively in recordingenvironments as well as live sound.

    Applications of Advanced Human Computer InteractionArchitecture, Engineering, and Computer Aided Design

    Currently, it is very popular for architects, engineers, and designers to use computer aideddesign systems to aid them in their work. Virtual interface technology allows architects anddesigners to intuitively and easily explore their creations by flying through them. The term virtualwalk-throughis used to describe the use of a computer system to virtually experience walking (orflying) through a simulation of a building. The simulation of this model can be presented to theuser both visually and auditorlly bymeans of a stereoscopic head-mounted display (HMD) andheadphones presenting three dimensional spatialized sound. Virtual interface technology canalso greatly aid in constructing mechanical and architectural models as the designer can resideinclusively within the space during the design process.Scientific VisualizationsIn the same way that architects can visualize buildings, scientists can visualize data and

    processes. In addition, scientists are able to use the computer to visualize multidimensional dataas graphic forms with changing attributes such as color, size, position, orientation, etc.Applications of scientific visualizations of data include mathematics, molecular chemistry,meteorology, atomic physics, astronomy, thermodynamics, fluid flowanalysis, as well asfinancial visualizations.

    Visualizing multidimensional data spaces enable the scientist to obtain a much greaterunderstanding of the properties and interrelationships of the system under investigation. As anexample related to the audio Industry, an acoustical engineer could use sophisticated threedimensional computer graphics to visualize sound pressure levels and room modes in aninclusive stereoscopic graphical simulation displaying the acoustics of a performance hall underdesign. The engineer would be able to see graphical change in the room modes as she moveswalls and baffles in the computer graphic simulation.TrainingThe military first used simulators during World War I to train pilots. Since then, the use ofsimulators has grown substantially within military and civilian markets. There are many benefits

    associated with usingsimulators to train and practice. It is much less expensive to "fly" a flightsimulator than it is to fly a real plane. Maneuvers can be practiced that require a high degree ofprecision. These maneuvers would be dangerous to attempt unpracticed. In addition, dangeroussituations can be simulated that would rarely be encountered in a real aircraft, such as an engine

  • 7/28/2019 AES Future Human Interfaces to Ezproxy.library.nyu.Edu 4426 TmpFiles Elib 20130511 6498

    11/14

  • 7/28/2019 AES Future Human Interfaces to Ezproxy.library.nyu.Edu 4426 TmpFiles Elib 20130511 6498

    12/14

    Conceptual Application--Live Sound at the Ear Canal Night ClubThis section describes a conceptual night club, Ear Canal,which has a computer controlledsound system which incorporates many of the advanced human interface components discussedearlier.

    Human Interface Adaptability to System Operator Skill LevelThe sound system at Ear Canalcan beadapted to the skill level of the person(s) operating it.In particular, the system has three standard interface modes; novice,expert, and privileged.Novice. Innovice mode, most of the sound system's functions are hidden from the operator.A very simple interface is provided, with controls resembling a standard home entertainmentsystem.Expert. In expert mode, all sound system functions are available to the operator, thoughsome functions have restricted operating ranges. For example, the power amplifiers can not beadjusted beyond safe levels, limiters can not be uncalibrated, anti-feedback equalization can notbe adjusted, and so on.Privileged. In privileged mode, all sound system functions are fully accessible. The operatormay adjust any parameter in the system, through its full range. Privileged mode is typicallyreserved for the chief engineer of the system, and is restricted from typical operators (evenexperts) to protect the calibrated settings of the system.The adaptable human interface guarantees that any operator will feel comfortable operatingthe system. It also protects the system from unintentional (or intentional) abuse.

    Expert SystemsRecently, an expert system was added to the human interface. The expert system usesartificial intelligence to learn howpeople operate the system, and will eventually take over manyof the routine and non-creative tasks of the system. For example, the expert system has alreadylearned that whenever the clip lights on an amplifier illuminate for an extended time, the operatortums down the level in that channel. In the future, the expert system can perform that taskautomatically, freeing the system operator to perform other tasks. The expert system has alsolearned that strong spectral energy in a very small bandwidth (which the human operator knowsas feedback) is generally notched out with a parametric equalizer. The expert system couldcarry out this operation automatically aswell.In the future, after the expert system has learned many more tricks of the trade, the noviceoperator will be able to operate the system with even better results, as the underlying expertsystem automatically performsmost of the work. The expert operator benefits as well, sincemore time is available to be creative and less time is required for the logistics of controlling thesystem.

    Motion Sensors and Spatial TrackersThe human interface t o the Ear Canalsound system uses a number of motion sensors andspatial trackers. These components are used by the main house engineer, the monitor engineer,the lighting engineer, and performers.House Engineer. The house engineer wears one six dimensional spatial tracker on his hand.This tracker reports hand movement to the sound system control computer. Data from thespatial tracker is used to operate many of the parameters and functions in the sound system.For example, equalization is adjusted bymoving the hand horizontally (to select frequency) andvertically (to select level at the current frequency), The mix is adjusted by pointing at a soundsource and raising the arm. Virtual knobs can beturned by twisting the wrist. Almost any

    1 1

  • 7/28/2019 AES Future Human Interfaces to Ezproxy.library.nyu.Edu 4426 TmpFiles Elib 20130511 6498

    13/14

    physical (real) control can be represented by a virtual one and adjusted by tracking handmovement.Monitor Engineer. Like the house engineer, the monitor engineer wears asix dimensionalspatial tracker on his hand, allowing him to adjust signal processing and virtual controls. Inaddition, he wears another spatial tracker on his head so the sound system can determine whichdirection he is looking. Since the monitor engneer is located close to the stage, discretereadings of head position can be correlated to performers on the stage. Therefore, when themonitor engineer looks at a performer, his head position reveals who that performer is, and thesound system knows which monitor mix to apply the corresponding control operations to.Lighting Engineer. The lighting engineer stands in front of a small video camera, whichfeeds an image to a personal computer. The computer processes the video image, and tracksthe motion of the lighting engineer's body. As the engineer dances to the music, the computer

    recognizes his movements and controls the lights over the network. The human interface to the.system allows the engineer to control the system creatively. Indeed, control of the system is anart in itself. Sometimes the camera is aimed at the audience, and the crowd's motion takescontrol over the lights. This is a very popular effect, and always gets the crowd excited.Performers, Performers use motion sensors and spatial trackers in many configuralions.The video motion sensor used by the lighting engineer can be applied to algorithmic musiccomposition, effects, and soon. Drummers especially enjoydrumming in front of the camera totrigger virtual percussive devices through MIDI. Six dimensional spatial trackers can be appliedto performers bodies and instruments to control many different aspects of the performance. Afavorite application of one regular rock band isto affix six dimensional spatial trackers to theneck of the lead guitar. As the lead guitarist leans back, sustain is automatically turned up. As

    she tilts the neck up, level, equalization, and distortion are adjusted to create a pearcing leadrock guitar sound.Transparent Head Mounted Display ('rHMD)s

    Sound engineers perform several tasks at one time: adjust mix levels, watch meters, adjustparameters on signal processors, watch performers for visual cues, and so on. It is not possibletosee all things simultaneously, since they are located in different places. For this reason,house and monitor engineers at the Ear Canalwear THMDs.The THMD decouples visual information from the actual sources, so the operator can belocated anywhere (perhaps in the "sweet spot" in the live sound arena). The operator sees thereal world through the display (after all, it is transparent), with over-layed live or computer-

    generated images. These images can be human interface menus, representations ofcomponenets of the system, indicators, or live video of action in another part or the venue. Thesound system operator can interact with the sound system equipment andwatch the performerssimultaneously.A THMD combined with a data glove equippedwith a six dimensional spatial tracker providesthe engineers with a very powerful human interface to the system. The THMD provides visualfeedback from the system, while the glove allows the operator to communicate her intentions tothe system. For example, the engineer may see a picture of each sound source, scaled in sizeto represent its current volume setting. The operator can then point to a sound source, which isrecognized by virtue of the six dimensional spatial tracker. When she raises her hand, thevolume of the selected source increases. When she lowers her hand, the volume decreases.

    Spatial SoundThe Ear Canalhas a state-of-the-art three dimensional sound localization system. Soundsare placed in three dimensional spaceby positioning icons representing each source (viewed inthe THMD) with the data glove. There are also a number of preset effects such as: ping pong,

    1 2

  • 7/28/2019 AES Future Human Interfaces to Ezproxy.library.nyu.Edu 4426 TmpFiles Elib 20130511 6498

    14/14

    wave, spiral, random walk, etc., which move sounds around the venue. True three dimensionalspatialized sound is always a crowd pleaser at the Ear CanalThroe dimensional sound is not only used as an effect, it is also helpful for remote control ofthe system. The house engineer can monitor the sound in any position in the room by localizingher headphones within the virtual space. A sophisticated room model is incorporated to includethe known reflections and modes of the room so the artificially localized sound is quite accurate.

    Remote ControlSince all equipment in the Ear Canalsound system Isdigitally controlled, it can also beremotely controlled through the communications network. A standardmodem allows the entiresystem to be remotely controlled from an off-site location. The modem allows an expert locatedoff-site to monitor the work of inexperienced operators adding helpful input as necessary, as well

    as allowing the system's chief engineer perform weekly diagnostic tests from his lab across town.House calls to reset ailing equipment take on awhole new meaning.Su mma ry

    The age of computer controlled sound systems is upon us, with far roaching benefits.However, until new improved human interfaces are evolved and incorporated into soundsystems, the potential powerof computer control will not be fully realized. Today's humaninterface paradigms do not allow the human to enjoy complete creative freedom within thedomain of controlling the system.Human factors researchers ara inventing and investigating newand unique technologies that

    allow humans to interact with computers through the human's natural senses. In the near future,many o f these interface technologies will be available for controlling sound systems. Ultimately,the human operator will be liberated from routine, logistical tasks, andwill be free to performcreative tasks easily and intuitively.The technologies discussed in this paper can improve the human interface to sound systems,with tremendous benefit to the entire system. As costscome down, andmore durableimplementations become available, advanced human interfaces for sound system control willproliferate and sound systems will become mature members of the information age.

    AcknowledgmentsThe authors acknowledge many stimulating conversations related to the topics in this paperwith the following individuals: Colin Bricken, William Bricken, Garrott Cobarr, Geoff Coco, BrianKart, Mark Lacas, Philip "Random" Reay, Rick Spirtes, Steve Tumidge, and David Warman.

    1 3