Auraglyph - CCRMA · Auraglyph Handwriting Input for Computer-based Music Composition and Design...

AuraglyphHandwriting Input for Computer-based Music Composition and Design

Spencer [email protected]

Ge [email protected]

Center for Computer Research in Music and Acoustics (CCRMA)Stanford UniversityStanford, CA 94305

Figure 1: Auraglyph in use.

ABSTRACTEffective software interaction design must consider all ofthe capabilities and limitations of the platform for whichit is developed. To this end, we propose a new model forcomputer music design on touchscreen devices, combiningboth pen/stylus input and multitouch gestures. Such amodel surpasses the barrier of touchscreen-based keyboardinput, preserving the primary interaction of touch and di-rect manipulation throughout the development of a complexmusical program. We have implemented an iPad softwareapplication applying these principles, called “Auraglyph.”Auraglyph offers a number of fundamental audio process-ing and control operators, as well as facilities for structuredinput and output. All of these software objects are created,parameterized, and interconnected via stylus and touch in-put. To enable these interactions, we have employed a pre-existing handwriting recognition framework, LipiTk, whichis capable of recognizing both alphanumeric characters andarbitrary figures, shapes, and patterns.

Keywordshandwriting input, computer music systems, handwritingrecognition, touchscreen, tablet computing

Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee.NIME’14, June 30 – July 03, 2014, Goldsmiths, University of London, UK.Copyright remains with the author(s).

1. INTRODUCTIONTouch interaction has profoundly altered the landscape ofmainstream computing in the early 21st century. Sincethe introduction of the iPhone in 2007 and the iPad in2010, numerous touchscreen devices have entered the popu-lar consciousness – mobile phones, tablet computers, smartwatches, and desktop computer screens, to name a few.New human-computer interaction paradigms have accom-panied these hardware developments, addressing the com-plex shift from classical keyboard-and-mouse computing tomultitouch interaction.

We propose a new model for touchscreen interaction withmusical systems: the combined use of stylus-based hand-writing input and direct touch manipulation. This systemprovides a number of advantages over existing touchscreenparadigms for music. Stylus input, complemented by mod-ern digital handwriting recognition techniques, provides anumber of advantages in this scheme. Firstly, it replacesthe traditional role of keyboard-based text/numeric entrywith handwritten letters and numerals. It additionally al-lows for modal entry of generic shapes and glyphs, for ex-ample, canonical oscillator patterns (sine wave, sawtoothwave, square wave, etc.) or other abstract symbols. Fi-nally, the stylus provides precise graphical free-form inputfor data such as filter transfer functions, envelopes, and pa-rameter automation curves. Multitouch finger input con-tinues to provide functionality that has become expected oftouch-based software, such as direct movement of on-screenobjects, interaction with conventional controls (sliders, but-tons, etc.), and other manipulations. Herein we discuss thedesign, prototyping, and evaluation of such a system, whichwe have named “Auraglyph.”

Proceedings of the International Conference on New Interfaces for Musical Expression

106

2. RELATED WORKIn our research, handwriting recognition has previously foundlimited use in interactive computer music. Most notably,Garcia et al. studied the use of handwritten sketches andoutlines as a preliminary exercise in electronic composi-tion [5]. This study was then used as the basis for mappingthe strokes of an camera-augmented pen to real-time musi-cal synthesis, aiding the compositional process. Graphicalmanipulation of computational data via pen or stylus, aug-mented by computer intelligence, goes back as far as Suther-land’s Sketchpad [13] and the RAND Tablet by Davis etal. [3], which incorporated the Graphical Input Language(GRAIL) by Ellis et al. [4]. GRAIL is the first systemto our knowledge in which computer programs are createdvia stylus input. In GRAIL, hand-drawn figures are auto-matically recognized and converted to meaningful symbolswithin the system, which may then be further modified withstylus gestures, such as drawing connections between nodesand scratching out symbols to erase them.

Recent developments in sketch-oriented computing includeLanday’s SILK, a software prototyping system in which userinterfaces are hand-sketched with a digital tablet [7]. Whereappropriate, sketched graphical widgets (buttons, sliders,etc.) are recognized as such, and converted into actual in-teractive widgets. Interestingly, recognized widgets are nottransformed into any canonical visual representation; ratherthey retain the idiosyncrasies and hafnd-drawn character-istics of the original sketched figures. The SILK systemfurther employs stylus gestures to describe transitions be-tween different parts of the interface, offering an extent ofprogrammability for GUI interactions.

Object graph-based programming languages such as Pure-data [12] and Max/MSP [19] have tremendously influencedthe space of visual computer music design and composition.More recently, the Kronos programming language extendedfunctional programming models to a block diagram-basedvisual space in the context of real-time music compositionand performance [11]. Mira, an iPad application, dynam-ically replicates the interface of desktop-based Max/MSPprograms, joining conventional music software developmentwith touch interaction [14]. The Reactable [6] and thework of Davidson and Han [2] have both provided signif-icant foundational work in interacting with digital musicvia touch and direct manipulation.

Languages designed (or repurposed) for live-coding, suchas SuperCollider [9] and ChucK [15], suggest interestingpossibilities for the expressive use of natural input for mu-sic programming. Such systems also constitute the modernincarnation of unit generator-based sound design and syn-thesis, to which this work owes much of its technical un-derpinnings. Live-coding practice in general is discussed byCollins et al. [1] and Wang and Cook [17].

Work from the commercial iPhone developer Smule hasexplored the space of possibilities for musically enablingmobile touchscreen devices [18]. Smule’s Ocarina is bothan iPhone musical instrument and a musical/social experi-ence, in which performers tune in to each others’ musicalrenditions around the world [16]. Ocarina’s feature set (fourfinger-sized tone-holes on the touchscreen, breath input fordynamic control, tilt-based vibrato, and location-based so-cial networking) is tailored specifically to the technical capa-bilities, physical dimensions, and limitations of the iPhone.This approach to design — the deliberate consideration ofwhat interactions leverage the characteristic features of atechnological platform, and of what interactions are dis-tinctly unsuitable — has directly influenced the develop-ment of Auraglyph.

3. A CASE FOR HANDWRITING INPUT INCOMPUTER MUSIC DESIGN

This work is motivated by the desire to better understandthe distinguishing capabilities and limitations of touchscreentechnology, and, using these as guiding principles, to enableexpressive musical interactions on such devices. Complexsoftware developed for a given interaction model — such askeyboard-and-mouse — may not successfully cross over to adifferent interaction model — such as a touchscreen device.

The initial insight leading to this work was that numericand text input on a touchscreen might be more effectivelyhandled by recognizing hand-drawn numerals and letters,rather than an on-screen keyboard. We soon realized thatwe could use handwriting recognition to analyze a substan-tial number of handwritten figures and objects beyond justletters and numbers. A user might then draw entire ob-ject graphs and audio topologies to be realized as an audiosystem or musical composition by the underlying softwareapplication in real-time.

Evidenced by the previously mentioned work of Garcia etal., handwritten planning is a vital component of many com-positional processes, whether the resulting musical productis electronic, acoustic, or both. More generally, writing anddrawing with a pen or pencil on a flat surface is a founda-tional expressive interaction for an incredible number of in-dividuals; this activity is continuously inculcated from earlychildhood around the world. Therefore sketching, as a nat-ural interaction for expressing ideas in the real world, mightbe apt for realizing them in a virtual world. The originalwork described herein seeks to apply this to the context ofcomputer music and audio design. By shortening the dis-tance between a users’ abstract musical thought and the au-dible realization of that thought, handwriting input mightarm its users to more effectively express those thoughts.

Another distinct advantage of handwriting input in thiscontext is the ability to run and evaluate musical constructsin real-time. As in Landay’s SILK, sketches in Auraglyphare reified into entities specific to the system (e.g., a drawnobject might be converted to a sine wave generator, or atimer). These entities can then present controls and inter-faces for direct touch manipulation or stylus gestures, cus-tomized to that object type or mode. This level of real-timecreation and manipulation affords a composer or program-mer expressive control similar to live-coding.

4. INTERACTING WITH AURAGLYPHAs a framework to prototype and evaluate the concepts dis-cussed above, and for use in composition and sound design,we developed Auraglyph, a software application for iPad.A user interacts with Auraglyph using an iPad-compatiblestylus (many models of which are widely available) and tra-ditional touch interaction.

The basic environment of Auraglyph is an open, scrol-lable canvas in which the user freely draws. Using a varietyof pen strokes, a user creates interactive objects (such asunit generators, control rate processors, and input/outputcontrols), sets parameters of these objects, and forms con-nections between them. Collectively, these objects and theirinterconnections form a patch. After a user completes a penstroke (a single contour between touching the pen to thescreen and lifting it off the screen), it is matched against theset of base object glyphs available in the main canvas, viaa handwriting recognition algorithm (discussed in Section4.2). Main canvas objects whose glyphs can be matched in-clude an audio rate processor (unit generator), control rateprocessor, input, or output (see Section 4.1). If the strokematches an available glyph, the user’s stroke is replaced by


107

Figure 2: A full Auraglyph patch, with annotations.

the actual object. Unmatched strokes remain on the can-vas, allowing the user to embellish the canvas with freehanddrawings.

Tapping and holding an object will open up a list of pa-rameters for that object (Figure 3). Selecting a parameterfrom this list opens a control into which writing a num-ber will set the value. This value can then be acceptedor discarded, or the user can cancel setting the parameterentirely.

Every base object may have inputs, an output, or both.These appear visually as small circles, or nodes, on theperimeter of the object. Drawing a stroke from an inputnode to an output node, or vice-versa, forms a connectionbetween those two objects. For example, connecting a saw-tooth object’s output to the freq input of a sine objectcreates a simple FM (frequency modulation) patch, withthe sine as the carrier wave and the sawtooth as the mod-ulator. Most objects only have one output source, but aninput node may have several destinations within that object(e.g. frequency, amplitude, or phase of a given oscillator).In such cases, a pop-up menu appears from the node todisplay the options a user has for the input destination.

Objects and freehand drawings can be moved around onthe canvas by touching and dragging them, a familiar ges-ture in the touchscreen software ecosystem. While draggingan object, moving the pen over a delete icon in the cornerof the screen will remove that object, along with destroyingany connections between it and other objects. Connectionscan be removed by grabbing them with a touch and thendragging them until they “break.” The entire canvas maybe scrolled through using a two-finger touch, allowing forpatches that extend well beyond the space of the tablet’sscreen.

4.1 ObjectsFour base types of objects can be drawn to the main can-vas: audio-rate processors (unit generators; represented bya circle), control-rate processors (represented by a square),inputs (a downward-pointing triangle), and outputs (an up-ward triangle) (Figure 4). Unit generators currently avail-able include basic oscillators (sine, sawtooth, square, andtriangle waves, plus hand-drawn wavetables), an audio fileplayer, envelopes (linear and ADSR), filters (resonant low-

Figure 3: Modifying the “freq” parameter of a unitgenerator with handwritten numeric input.

pass, high-pass, band-pass, band-reject, parametric EQ, orhand-drawn transfer function), and a reverberator. Control-rate processors include timers, counters, mathematical for-mulae, discrete time-step sequencers, a pitch-to-frequencyconverter, and hand-drawn parameter control curves. In-puts consist of on-screen keyboard controllers, graphicalsliders, and buttons. Outputs consist of numeric displaysand binary “LED” indicators.

Figure 4: Base objects (left to right): unit genera-tor, output, input, control-rate processor.

After creating a base object, a drawing area and menuopen to allow the user to select an object sub-type. In somecases, a glyph for the sub-type can be directly drawn bythe user to select that object. For example, basic oscillatorsand filter types lend themselves to simple glyph patterns.A user can create a sine wave oscillator by simply drawinga circle (creating a generic unit generator object) and thendrawing a sine wave within it.

Some object sub-types do not have straightforward glyphmnemonics, for example a sequencer or a reverberator. Tosupport such objects, a menu appears after creating a baseobject, displaying a list of subtypes to select from. Forcompleteness, objects that can be drawn as correspondingglyphs are also listed in this menu.

Some objects have more advanced needs for modifyingvalues beyond the standard parameter editor. For exam-ple, the wavetable oscillator’s table parameter brings upa large input space for precisely drawing the desired wave-form, as does the xferFn (transfer function) parameter ofthe hand-drawn filter. The formula object eschews the stan-dard parameter editor entirely, instead directly solicitinghandwriting input of a sequence of mathematical opera-tions (currently addition, subtraction, multiplication, anddivision are supported, with one input and one output).

4.2 Implementation of Handwriting Recogni-tion

The concepts presented herein are generally invariant tothe underlying algorithms and framework for handwritingrecognition, but these topics merit discussion with regardsto our own implementation in Auraglyph. The handwrit-ing recognition engine used by Auraglyph is LipiTk [8], acomprehensive open-source project for handwriting recog-nition research. LipiTk is not natively designed to functionwith iPad applications, but we extended it to do so withstraightforward code changes and additions.

LipiTk’s default configuration uses dynamic time warp-ing (DTW) [10] and nearest-neighbor classification (k-NN)to match pen strokes to a pre-existing training set of pos-


108

sible figures. The result of this procedure is one or more“most likely” matches along with confidence ratings for eachmatch. We have found the speed and accuracy of LipiTkin this configuration to be satisfactory for real-time usage,though a slight, noticeable delay exists between finishing astroke and the successful recognition of that stroke.

Before they can be used to classify figures of unknowntypes, the recognition algorithms incorporated into LipiTkmust be primed with a set of “training examples” for eachpossible figure to be matched. This training set is typi-cally created by test users before the software is released,who draw multiple renditions of each figure into a special-ized training program. This training program serializes thesalient features of each figure into a database, which is dis-tributed with the application itself.

In our experience, LipiTk’s recognition accuracy is highlylinked to the quality, size, and diversity of the training set.For instance, a version of our handwriting database trainedsolely by right-handed users suffered reduced accuracy whenused by a left-handed user. A comprehensive training setwould need to encompass strokes from a range of individu-als of varying handedness and writing style. Interestingly,though, LipiTk’s algorithms are able to adapt dynamicallyto new training examples. An advanced system might grad-ually adjust to a particular user’s handwriting eccentricitiesover time, forming an organically personalized software in-teraction. Auraglyph takes advantage of this feature to asmall degree, allowing a user to add new training strokesvia a separate training interface.

5. CONCLUSIONSWhile effective in replacing the keyboard as a textual/numericinput device, touchscreen stylus input also affords a broaderset of interactions apt for computer music. We have devel-oped an iPad software application, Auraglyph, to leveragethese principles into an interactive music environment. Weintend to release Auraglyph under an open source licensewhen it is reasonably mature and bug-free, via the projectwebsite. 1

This system has fulfilled our goals of expressivity and ver-satility, but we believe this conceptual framework providesmany more opportunities for computer music. Stylus in-put might be extended to traditional music notation, morecomplex mathematical formulae, or even Turing-completeprogramming code, whose digital representations can thenbe refined, augmented, and extended with direct manipu-lation and touch input. We believe that this is only thebeginning for handwritten computer music.

6. REFERENCES[1] N. Collins, A. McLean, J. Rohrhuber, and A. Ward.

Live coding in laptop performance. Organised Sound,8(3):321–330, 2003.

[2] P. L. Davidson and J. Y. Han. Synthesis and controlon large scale multi-touch sensing displays. InProceedings of the International Conference on NewInterfaces for Musical Expression, pages 216–219.IRCAM/Centre Pompidou, 2006.

[3] M. R. Davis and T. Ellis. The RAND tablet: Aman-machine graphical communication device. InProceedings of the October 27-29, 1964, fall jointcomputer conference, part I, AFIPS ’64 (Fall, part I),pages 325–331, New York, NY, USA, 1964. ACM.

[4] T. O. Ellis, J. F. Heafner, and W. Sibley. The GRAILlanguage and operations. Technical report, DTIC

1https://ccrma.stanford.edu/~spencer/auraglyph/

Document, 1969.

[5] J. Garcia, T. Tsandilas, C. Agon, W. Mackay, et al.InkSplorer: Exploring musical ideas on paper andcomputer. In Proceedings of the InternationalConference on New Interfaces for Musical Expression,2011.

[6] S. Jorda, G. Geiger, M. Alonso, andM. Kaltenbrunner. The reacTable: Exploring thesynergy between live music performance and tabletoptangible interfaces. In Proceedings of the 1stInternational Conference on Tangible and EmbeddedInteraction, TEI ’07, pages 139–146, New York, NY,USA, 2007. ACM.

[7] J. A. Landay. Interactive sketching for the early stagesof user interface design. PhD thesis, Carnegie MellonUniversity, 1996.

[8] S. Madhvanath, D. Vijayasenan, and T. M.Kadiresan. LipiTk: A generic toolkit for onlinehandwriting recognition. In ACM SIGGRAPH 2007courses, page 13. ACM, 2007.

[9] J. McCartney. Rethinking the computer musiclanguage: Supercollider. Computer Music Journal,26(4):61–68, 2002.

[10] R. Niels, L. Vuurpijl, et al. Using dynamic timewarping for intuitive handwriting recognition. InAdvances in Graphonomics, Proceedings of the 12thConference of the International GraphonomicsSociety, pages 217–221, 2005.

[11] V. Norilo. Visualization of signals and algorithms inKronos. In Proceedings of the InternationalConference on Digital Audio Effects, York, U.K.,2012.

[12] M. Puckette et al. Pure Data: Another integratedcomputer music environment. Proceedings of theSecond Intercollege Computer Music Concerts, pages37–41, 1996.

[13] I. E. Sutherland. Sketch Pad: A man-machinegraphical communication system. In Proceedings ofthe SHARE design automation workshop, pages6–329. ACM, 1964.

[14] S. Tarakajian, D. Zicarelli, and J. K. Clayton. Mira:Liveness in iPad controllers for Max/MSP. InProceedings of the International Conference on NewInterfaces for Musical Expression, Daejeon, Korea,2013.

[15] G. Wang. The ChucK Audio Programming Language:A Strongly-timed and On-the-fly Environ/Mentality.PhD thesis, Princeton University, Princeton, NJ,USA, 2008.

[16] G. Wang. Ocarina: Designing the iPhone’s magicflute. Computer Music Journal, 38(2), 2014.

[17] G. Wang and P. R. Cook. On-the-fly programming:Using code as an expressive musical instrument. InProceedings of the International Conference on NewInterfaces for Musical Expression, pages 138–143.National University of Singapore, 2004.

[18] G. Wang, G. Essl, J. Smith, S. Salazar, P. Cook,R. Hamilton, R. Fiebrink, J. Berger, D. Zhu,M. Ljungstrom, et al. Smule= sonic media: Anintersection of the mobile, musical, and social. InProceedings of the International Computer MusicConference, pages 16–21, 2009.

[19] D. Zicarelli. An extensible real-time signal processingenvironment for MAX. In Proceedings of theInternational Computer Music Conference, 1998.


109

Auraglyph - CCRMA · Auraglyph Handwriting Input for Computer-based Music Composition and Design...

Documents

Transcript of Auraglyph - CCRMA · Auraglyph Handwriting Input for Computer-based Music Composition and Design...