R Programming for Music Informatics Donald Byrd rev. 21 March 2008 Copyright © 2006-08, Donald...

21
R Programming for Music Informatics Donald Byrd rev. 21 March 2008 Copyright © 2006-08, Donald Byrd

Transcript of R Programming for Music Informatics Donald Byrd rev. 21 March 2008 Copyright © 2006-08, Donald...

Page 1: R Programming for Music Informatics Donald Byrd rev. 21 March 2008 Copyright © 2006-08, Donald Byrd.

R Programming for Music Informatics

Donald Byrdrev. 21 March 2008

Copyright © 2006-08, Donald Byrd

Page 2: R Programming for Music Informatics Donald Byrd rev. 21 March 2008 Copyright © 2006-08, Donald Byrd.

30 Jan. 08 2

Intrroducing R

• R is very interactive: instead of programming, can use as powerful graphing calculator

• => easier to experiment with & learn, & useful that way• R was originally designed for statistics• Why R?

– easy to do simple things with it– easy to do many fairly complex things, incl. graphs &

handling audio files• probably not good for really complex programs

– free, & available for all popular operating systems– very interactive => easy to experiment– has good documentation– In use in other Music Informatics classes, & standardizing is

good

Page 3: R Programming for Music Informatics Donald Byrd rev. 21 March 2008 Copyright © 2006-08, Donald Byrd.

30 Jan. 08 3

Getting started with R

• To get R– Web site: http://cran.us.r-project.org/– Has lots of documentation (tutorials, manuals, etc.), too…

though most isn’t for beginners– Versions for Linux, Mac OS X, Windows– On all(?) STC computers

• Tutorial:• http://www.informatics.indiana.edu/donbyrd/Teach/RTools+Do

cs//R_tutorial_DAB.txt• Can use R interactively as a powerful graphing, musicing, etc.

calculator• …but it’s not perfect: sometimes very cryptic

Page 4: R Programming for Music Informatics Donald Byrd rev. 21 March 2008 Copyright © 2006-08, Donald Byrd.

27 Nov. 07 4

Programming in General (1)• Details are often vital (& errors are costly)

– A great many details really are. Commonly:• Quote marks, including single vs. double• Capitalization

– “Wav” & “wav” are different– TIP: “steal” as much as possible!

• Via Copy & Paste is ideal: avoids typos

• Programs tend to be very hard to understand– TIP: include useful, readable comments– TIP: choose variable names for clarity

• “wavdata” isn’t good; how about “samples”?– TIP: consistency helps clarity and correctness

• Don’t mix “v = expr”, “v <- expr”, and “expr -> v”• Use the same variable name for something in every prog.

• Program defensively

Page 5: R Programming for Music Informatics Donald Byrd rev. 21 March 2008 Copyright © 2006-08, Donald Byrd.

rev. 3 Apr. 07 5

Programming in General (2)

• Comments– Classic example of a bad comment

• x <- x+1 # add 1 to x– Doesn’t explain anything!

• Good commenting style (thanks to Ed Wolf)# Using the Add Sines Demo, create and play a wave at G3,# then do the same for a wave at 5/4 this frequency. Finally, # normalize the sum of the two waves and listen to result.…

# create and play first sound wavesndW <- sine(f, duration=secs, samp.rate=sr, bit=16, xunit="time")

play(sndW )…

Page 6: R Programming for Music Informatics Donald Byrd rev. 21 March 2008 Copyright © 2006-08, Donald Byrd.

3 April 2007 6

Programming in General (3)• Block comments (w/ overall description) more

important than comments on single stmts• Ideal: say just the right things: not too

much or too little– Basic principle of all human communication– …including this slide show & music notations (CMN,

tablature, etc.)– …and comments in a program

• Other aspects of formatting & style– Variable names

• Choose variable names for clarity• camelCase is helpful

– Space around operators– “v <- f(expr)”, not “v<-f(expr)”

Page 7: R Programming for Music Informatics Donald Byrd rev. 21 March 2008 Copyright © 2006-08, Donald Byrd.

rev. 30 Jan. 08 7

Programming in R (1)• R offers to save workspace when you quit

– Are you sure it’s what you want?– TIP: Just say no.

• Can restore original with ‘load(".Rdata")’ or menu command

– TIP: Use a text editor & files to save work• If real text editor (not word processor) file, can run with R

“source” command• Regardless, can Copy & Paste, even just part of file

• setwd() to correct path for your computer– Depends on where you have files– Can be tricky, esp. in Windows

• Typical Windows ex.: setwd("C:/Documents and Settings/donbyrd.ADS/Teaching/N560")

• On Mac (& Windows?), can use “~/Teaching/N560"• …or drag & drop• …or use R GUI “Change Working Directory” menu command!

Page 8: R Programming for Music Informatics Donald Byrd rev. 21 March 2008 Copyright © 2006-08, Donald Byrd.

rev. 14 Jan. 08 8

Programming in R (2)

• R has many useful built-in functions– Many of them handle vectors (no loop needed)

• diff(v): vector of consecutive differences• sum(v): sum of vector elements

– Random numbers with various distributions: runif (uniform), rnorm (normal), etc.

– read.table, table (and related functions)– fft– tuneR adds sine, square, noise, bind, mono, etc.

• R (and tuneR) have excellent on-line help– Type either ‘help(sine)’ (e.g.) or ‘?sine’

• …but NB: sometimes need ‘help("sine")’– TIP: Copy & Paste from help window!– Caveat: terminology is statistics oriented

Page 9: R Programming for Music Informatics Donald Byrd rev. 21 March 2008 Copyright © 2006-08, Donald Byrd.

rev. 30 Jan. 08 9

Programming in R (3)• Besides built-in, functions can be user-written

– Hard for many beginners; why?– Probably mostly confusion about variables (including

parameters & return values)

• A simple but realistic example# Convert MIDI note number to frequency in Hertz.

MIDINum2Freq <- function(noteNum) { freq <- 440*2^((noteNum-69)/12) return(freq)}

• Calling it– fr <- MIDINum2Freq(57) # Sets fr = 220– Inside function, parameter noteNum = 57, freq = 220; fr

doesn’t exist (it’s out of scope)– Outside function, noteNum & freq don’t exist

Page 10: R Programming for Music Informatics Donald Byrd rev. 21 March 2008 Copyright © 2006-08, Donald Byrd.

rev. 30 Jan. 08 10

Programming in R (4)• Introducing loops

– Loops also hard for many beginners– Main reason is probably confusion re control variable– A very simple (though pointless) example

• mnnV <- 1:6 # make mnV a 6-place vector• mnnV # see what mnnV is before loop

• for (n in 1:6) {• mnnV[n] <- n+59• }• mnnV # ...and after

– Instead of “in 1:6”, can use any vector!– n (control variable) doesn’t exist outside the loop– C, Perl, etc. users can put the vector in the “for”

• for (n in seq(1, 6)) { …– Loop is a type of control statement

Page 11: R Programming for Music Informatics Donald Byrd rev. 21 March 2008 Copyright © 2006-08, Donald Byrd.

27 Nov. 07 11

Software Engineering & Debugging (1)

• Experience: all complex programs have bugs– Judge in Florida e-voting case: claim that voting

machine software was buggy is speculation– True, but… !

• Disclaimer: I don’t know any hard evidence

• Expect bugs & program defensively• True stories

– The program that failed only on Wednesdays! Why?• Hint: “Wednesday” has 9 characters

– Weeks of debugging to find a “1” that should have been “i”

Page 12: R Programming for Music Informatics Donald Byrd rev. 21 March 2008 Copyright © 2006-08, Donald Byrd.

27 Nov. 07 12

Software Engineering & Debugging (2)

• Good engineering (design, coding, comments, etc.) => less debugging & more robust (reliable & flexible) programs

• Don’t underengineer• …but don’t overengineer, either!• Underengineering is much bigger danger for

inexperienced programmers• Main factors

– Complexity of problem– Is program or code it includes likely to be used for

very long?• If so, how expert are future programmers likely to

be?

Page 13: R Programming for Music Informatics Donald Byrd rev. 21 March 2008 Copyright © 2006-08, Donald Byrd.

27 Nov. 07 13

Software Engineering & Debugging (3)• Standard technique: zero in on problem code• Debug on short/simple cases, not long/complex

ones– Makes it practical to look at results of several print

statements– Reduces or eliminates long delays to see results– “short/simple” often means simply not much data– Can easily reduce days of debugging to hours

• Usually easy to do by turning lots of data into a little data– Real situation: nThemes <- 3500, or 20 sec. audio file– For testing: use nThemes <- 4 (say), or 1 sec. audio– Caveat! the “little” data may not show the bug– …and if bug results from a design problem, fixing it

may be very hard

Page 14: R Programming for Music Informatics Donald Byrd rev. 21 March 2008 Copyright © 2006-08, Donald Byrd.

31 Jan. 08 14

Debugging in R (1)

• Basic technique: zero in on bug with print or cat– E.g., before & after doing something questionable

• print(c("max before scaling=", max(notesW@left)))• wNotes <- wNotes*2.5• cat("max after scaling=", max(notesW@left), “\n”)

– cat merges its arguments, gets rid of the extra parens– …but doesn’t end the line => do it yourself with “\n”

– If you use “source” (& inside loops?), just naming variable doesn’t work; must use print or cat

• A variation: use plot instead of print/cat– The right picture is worth 10,000 words; the wrong

one, zero (cf. Tufte on the Challenger disaster)– …but the right picture for debugging is often simple &

obvious

Page 15: R Programming for Music Informatics Donald Byrd rev. 21 March 2008 Copyright © 2006-08, Donald Byrd.

8 Sept. 07 15

Debugging in R (2)

• More advanced technique: use a good debugger– Allows setting breakpoints, looking at variables, etc.,

while program is running– Especially helpful w/ complex programs– …or learning a new language– To some extent, R’s interactivity accomplishes same

thing

• R has a debugger– One student (an experienced programmer) tried &

liked it! Anyone else?

Page 16: R Programming for Music Informatics Donald Byrd rev. 21 March 2008 Copyright © 2006-08, Donald Byrd.

1 Feb. 08 16

Dangers of R (1)• More danger of nasty bugs in R than many

programming languages & environments– No explicit types => can’t warn of questionable usage– No variable declarations => catches fewer typos (only a

problem in old versions of R?)– Both above like Perl (e.g.), but Java (e.g.) is great on both

=> Java programmers likely to be careless!

• Defensive programming– E.g., add “sanity checks” as you work, use conventions for

variable names, etc.– Always important: a subtle bug can waste a huge amount

of time and/or money• Ex: weeks of debugging to find a “1” that should have been “i”

• Ex: period instead of comma => missile had to be destroyed – …but especially in dangerous environments like R

Page 17: R Programming for Music Informatics Donald Byrd rev. 21 March 2008 Copyright © 2006-08, Donald Byrd.

rev. 23 Feb. 08 17

Dangers of R (2)

• “Gotchas” in R (all from real life)– Surprising operator precedence, esp. in “for”

statement• In sets, need parentheses to get addition before “:”• E.g., say “start:(start+5)”, not “start:start+5” !

– “;” is usually ignored, but not always– Line break sometimes starts a new statement, but

not always• cf. “LineBreaksInRStatements.r” example

– Referring to a column of a table different ways gives same data but can behave very differently

• “noteTbl$Cum.time” & “noteTbl[,1]” are vectors of integers; “noteTbl[1]” is a list

Page 18: R Programming for Music Informatics Donald Byrd rev. 21 March 2008 Copyright © 2006-08, Donald Byrd.

rev. 24 Feb. 08 18

Dangers of R & tuneR• Other real-life examples from Don’s classes

– Undeclared variable: “allNotes” vs. “allnotes” (only a problem in old versions of R?)

– Call a function that returns a value but ignore the value

• Danger much worse because R & tuneR often gives lousy feedback for errors or likely errors– tuneR square & sawtooth functions fail w/o error message

if frequency isn’t an integer—and the manual doesn’t say it has to be an integer!

– Exception: tuneR play w/ unnormalized values => very helpful error message

– Nonexistent named params. sometimes give error, not always

Page 19: R Programming for Music Informatics Donald Byrd rev. 21 March 2008 Copyright © 2006-08, Donald Byrd.

rev. 30 Jan. 08 19

Programming in R with tuneR

• On OS X (and LINUX): play() problem– Must say what program to use to play Waves

• Either setWavPlayer once, or add 2nd param. to each play()

– OS X can use QuickTime Player• It’s on every OS X machine, & it works, but…

– Usually gives scary error messages; must hit the escape key to get R to continue; leaves open more & more QuickTime Players. A serious nusiance.

– OS X alternative: playRWave• Works fine, but…

– Not pre-installed; you must get & install it• Available (with instructions) at:

– http://www.informatics.indiana.edu/donbyrd/Teach/Rtools+Docs/

Page 20: R Programming for Music Informatics Donald Byrd rev. 21 March 2008 Copyright © 2006-08, Donald Byrd.

rev. 20 Mar. 08 20

Don’s Coding Conventions (1)• Chris Raphael’s & Don Byrd’s styles are very different

– Partly a matter of taste, partly reflects goals– Consistency & readability are important– Consistency helps clarity and correctness– But flexibility is important too; these are guidelines only!

• Variable names– General: long enuf to be clear, but no longer– Ex.: use “nNotes”, not “noteCount” or just “notes” – “Hungarian notation”: suffix “V” = vector, “W” = wave– Common examples: nNotes, sampleV, noteW, sr

• Operators– Always use “<-” for assignment

• Reason: with “=” for named parameters and “==” for tests, using “=” for assignment is too confusing

Page 21: R Programming for Music Informatics Donald Byrd rev. 21 March 2008 Copyright © 2006-08, Donald Byrd.

rev. 4 Feb. 08 21

Don’s Coding Conventions (2)• Use of whitespace

– Put space before & after assignment operators– Separate parens & curly braces from adjacent things with

space – Put several spaces before, at least one after “#”

• Program organization1. Initial stuff (libraries, etc.), setting “parameters” likely to change2. Definitions of functions3. Main program (calls the functions, if there are any)

• Specific to audio: creating simple waveforms– When possible, use tuneR sine function– Create samples directly only when tuneR sine isn’t

flexible enough (for glissandi, vibrato, other waveforms, etc.)