Computer Audio and Musica brief aside on Computing History: Musical Instrument Digital igital...
Transcript of Computer Audio and Musica brief aside on Computing History: Musical Instrument Digital igital...
Computer Audio and Music
Perry R. Cook
Princeton Computer Science
(also Music)
Perry R. CookPerry R. Cook
Princeton Computer SciencePrinceton Computer Science
(also Music)(also Music)
Music/Sound Overview
• Basic Audio storage/playback (sampling)
• Human Audio Perception
• Sound and Music Compression
and Representation
• Sound Synthesis
• Music Control and Expression
•• Basic Audio storage/playback (sampling) Basic Audio storage/playback (sampling)
•• Human Audio Perception Human Audio Perception
•• Sound and Music Compression Sound and Music Compression
and Representationand Representation
•• Sound Synthesis Sound Synthesis
•• Music Control and Expression Music Control and Expression
Waveform Sampling and Playback
• Sample and Hold
Sample Rate vs. Aliasing
• Quantize
Word Size vs. Quantization Noise
• Reconstruct: Hold and Smooth (filter)
•• Sample and Hold Sample and Hold
Sample Rate vs. AliasingSample Rate vs. Aliasing
•• Quantize Quantize
Word Size vs. Quantization NoiseWord Size vs. Quantization Noise
•• Reconstruct: Hold and Smooth (filter) Reconstruct: Hold and Smooth (filter)
Waveform Sampling:
Quantization
Quantization
Introduces
Noise
QuantizationQuantization
Introduces Introduces
Noise Noise
Compression and Representation
(Why Bother??)So Many Bits, So Little Time (Space)
• CD audio rate: 2 * 2 * 8 * 44100 = 1,411,200 bps
• CD audio storage: 10,584,000 bytes / minute
• A CD holds only about 70 minutes of audio
• An ISDN line can only carry 128,000 bps
• Even a cable modem might carry only 1Mbps
Security: Best representation removes all
recognizable about the original sound
Graphics people get all the bandwidth, cycles, memory
Expression, composition, interaction wanted too!
So Many Bits, So Little Time (Space)So Many Bits, So Little Time (Space)
•• CD audio rate: 2 * 2 * 8 * 44100 = 1,411,200 bpsCD audio rate: 2 * 2 * 8 * 44100 = 1,411,200 bps
•• CD audio storage: 10,584,000 bytes / minuteCD audio storage: 10,584,000 bytes / minute
•• A CD holds only about 70 minutes of audioA CD holds only about 70 minutes of audio
•• An ISDN line can only carry 128,000 bpsAn ISDN line can only carry 128,000 bps
•• Even a cable modem might carry only 1MbpsEven a cable modem might carry only 1Mbps
Security: Best representation removes all Security: Best representation removes all
recognizable about the original soundrecognizable about the original sound
Graphics people get all the bandwidth, cycles, memoryGraphics people get all the bandwidth, cycles, memory
Expression, composition, interaction wanted too!Expression, composition, interaction wanted too!
Views of Sound
– Sound is Perceived: Perception-Based Psychoacoustically Motivated Compression
– Sound is Produced: Production-Based Physics/Source Model Motivated Compression
– Music(Sound) is Performed/Published/Represented:Event-Based Compression
– Sound is a Waveform / Statistical Distribution / etc.
(these are not very good ideas in general,
unless we get lucky (LPC))
–– Sound is Perceived: Perception-BasedSound is Perceived: Perception-Based Psychoacoustically Psychoacoustically Motivated CompressionMotivated Compression
–– Sound is Produced: Production-Based Sound is Produced: Production-Based Physics/Source Model Motivated CompressionPhysics/Source Model Motivated Compression
–– Music(Sound) is Performed/Published/Represented:Music(Sound) is Performed/Published/Represented:Event-Based CompressionEvent-Based Compression
–– Sound is a Waveform / Statistical Distribution / etc.Sound is a Waveform / Statistical Distribution / etc.
(these are not very good ideas in general,(these are not very good ideas in general,
unless we get lucky (LPC))unless we get lucky (LPC))
Psychoacoustics
Human sound perception:
Human sound perception:Human sound perception:
Auditory cortex:Auditory cortex:
further refinefurther refine
time & frequencytime & frequency
informationinformation
Cochlea:Cochlea:
convert toconvert to
frequencyfrequency
dependentdependent
nerve firingsnerve firings
Ear:Ear:
receivereceive
1-D1-D
waveswaves
Brain:Brain:
Higher levelHigher level
cognition,cognition,
objectobject
formation,formation,
interpretationinterpretation
Perceptual Models
Exploit masking, etc., to discard
perceptually irrelevant information.
• Example: Quantize soft sounds more accurately,
loud sounds less accurately
Benefits: Generic, does not require assumptions
about what produced the sound
Drawbacks: Highest compression is difficult to achieve
Exploit masking, etc., to discardExploit masking, etc., to discard
perceptually irrelevant information.perceptually irrelevant information.
•• Example: Quantize soft sounds more accurately,Example: Quantize soft sounds more accurately,
loud sounds less accuratelyloud sounds less accurately
Benefits: Generic, does not require assumptionsBenefits: Generic, does not require assumptions
about what produced the sound about what produced the sound
Drawbacks: Highest compression is difficult to achieveDrawbacks: Highest compression is difficult to achieve
Production Models
Build a model of the sound production system,
then fit the parameters
• Example: If signal is speech, then a well
parameterized vocal model can yield
highest quality and compression ratio
Benefits: Highest possible compression
Drawbacks: Signal source(s) must be
assumed, known, or identified
Build a model of the sound production system, Build a model of the sound production system,
then fit the parametersthen fit the parameters
•• Example: If signal is speech, then a well Example: If signal is speech, then a well
parameterized vocal model can yield parameterized vocal model can yield
highest quality and compression ratiohighest quality and compression ratio
Benefits: Highest possible compressionBenefits: Highest possible compression
Drawbacks: Signal source(s) must beDrawbacks: Signal source(s) must be
assumed, known, or identified assumed, known, or identified
Audio Compression
Classical Data Compression View:
Take advantage of
• Redundancy/Correlation
• Statistics (Local/Global)
• Assumptions / Models
Problem: Much of this doesn’t work
directly on sound waveform data
Classical Data Compression View:Classical Data Compression View:
Take advantage ofTake advantage of
•• Redundancy/CorrelationRedundancy/Correlation
•• Statistics (Local/Global)Statistics (Local/Global)
•• Assumptions / ModelsAssumptions / Models
Problem: Much of this doesnProblem: Much of this doesn’’t work t work
directly on sound waveform datadirectly on sound waveform data
Transform (Subband) Coders
Split signal into frequency subbands, then
allocate bits to regions adaptively,
based on where ear is most sensitive
Lossless (variable bit rate
& comp. ratio)
Lossy (fixed rate and ratio) MP3
Split signal into frequencySplit signal into frequency subbands subbands, then , then
allocate bits to regions adaptively, allocate bits to regions adaptively,
based on where ear is most sensitivebased on where ear is most sensitive
Lossless (variable bit rate Lossless (variable bit rate
& comp. ratio)& comp. ratio)
Lossy Lossy (fixed rate and ratio) MP3(fixed rate and ratio) MP3
Production Models
Build a parametric model of the
production system, then either
Fit the parameters to a given signal
Use signal processing techniques to
extract parameters
Drive the parameters directly (no encode/decode)
Examples: Rule system to drive speech synthesizer
MIDI file to drive music synthesizer
Build a parametric model of the Build a parametric model of the
production system, then eitherproduction system, then either
Fit the parameters to a given signal Fit the parameters to a given signal
Use signal processing techniques to Use signal processing techniques to
extract parametersextract parameters
Drive the parameters directly (no encode/decode) Drive the parameters directly (no encode/decode)
Examples: Rule system to drive speech synthesizerExamples: Rule system to drive speech synthesizer
MIDI file to drive music synthesizer MIDI file to drive music synthesizer
Speech Coders (production)
Assume speech is produced by a source-filter system
(vocal folds/noise + vocal tract tube)
Identify filter, type of source, then code parameters
Takes advantage of slowly varying nature of vocal tract
shape and other speech parameters
Assume speech is produced by a source-filter systemAssume speech is produced by a source-filter system
(vocal folds/noise + vocal tract tube)(vocal folds/noise + vocal tract tube)
Identify filter, type of source, then code parametersIdentify filter, type of source, then code parameters
Takes advantage of slowly varying nature of vocal tractTakes advantage of slowly varying nature of vocal tract
shape and other speech parametersshape and other speech parameters
Future: Multi-Model
Parametric Compressors?
Analysis front end identifies source(s)
Audio is (separated and) sent to optimal model(s)
Benefits:
High compression
Other knowledge
Drawbacks:
We don’t know how
to do all this yet
Analysis front end identifies source(s)Analysis front end identifies source(s)
Audio is (separated and) sent to optimal model(s)Audio is (separated and) sent to optimal model(s)
Benefits:Benefits:
High compression High compression
Other knowledge Other knowledge
Drawbacks:Drawbacks:
We don We don’’t know howt know how
to do all this yet to do all this yet
Sound Analysis and Classification
Cochlear Modeling
Multi-feature analysis(Tzanetakis)
Segmentation, Classification, Annotation, Thumbnails
Cochlear ModelingCochlear Modeling
Multi-feature analysisMulti-feature analysis((TzanetakisTzanetakis))
Segmentation, Classification, Annotation, ThumbnailsSegmentation, Classification, Annotation, Thumbnails
MIDI and Other ‘Event’ Models
Musical Instrument Digital Interface
Represents Music as Notes and Events
and uses a synthesis engine to “render” it.
An Edit Decision List (EDL) is another example.
A history of source materials, transformations,
and processing steps is kept. Operations can
be undone or recreated easily. Intermediate
non-parametric files are not saved.
Speaking of MIDI and scores,
a brief aside on Computing History:
MMusical usical IInstrument nstrument DDigital igital IInterfacenterface
Represents Music as Notes and EventsRepresents Music as Notes and Events
and uses a synthesis engine to and uses a synthesis engine to ““renderrender”” it. it.
An An EEdit dit DDecision ecision LList (EDL) is another example.ist (EDL) is another example.
A history of source materials, transformations,A history of source materials, transformations,
and processing steps is kept. Operations canand processing steps is kept. Operations can
be undone or recreated easily. Intermediatebe undone or recreated easily. Intermediate
non-parametric files are not saved.non-parametric files are not saved.
Speaking of MIDI and scores, Speaking of MIDI and scores,
a brief aside on Computing History:a brief aside on Computing History:
History of Programmable
Machines• First “programmable
system” was the early
printing process developed
in China circa 800 C.E.
• First “program” was
perhaps Chinese
translation of
Buddhist Canon
(the Tipitaka)
History (cont.)
• Gutenberg’s Printing Press
(circa 1450)
• Main Contribution:
• Just a few “basic
instructions” (smaller
alphabet size) suffice.
History (cont.)
Jacquard’s Loom
(circa 1810)
• Punched Cards
stored program
for weaving patterns.
Wait: Gutenberg 1450
Jacquard 1810
Are we missing
something
here?????Nothing happened in 350 years?
History (cont.)
• Huygen’s Pendulum
Clock
(circa 1650)
• Main Contribution:
• Timing, “clock ticks”
increase accuracy
History (cont.)
• Musical Machines:
Barrel Organs (1500!)
Music boxes (between)
Player Pianos (c. 1700)
• Main Contributions:
• Drive cylinder or disk with
“pins” (bits!!) which play
notes at the right time
• Change disk -> change song
(programmable!)
History (cont.)
• Jacquard’s Loom
(circa 1810)• Punched Cards
stored program
for weaving patterns.
History (cont.)
• Charles Babbage (1822-64):
• Input -- Punched Cards
• Hardware -- general-purpose
mechanical mathematical system
• (Analytical Engine) -- never built
• Could be programmed
• punched card could say:
“Go back 5 punched cards”
• Instructions could be Executed
repeatedly, or in different order.
The Modern Computer
• Basic Idea still the same:
A machine that can execute certain
instructions.
• Machine instructions represented by
sequences of 0’s and 1’s
(Machine Language)
• Instructions stored in Memory
von Neumann (1945)
Princeton, NJ
Anyway,
Event Based Music Representation
MIDI and Other Scorefiles
• A Musical Score is a very compact
representation of music
• Even the score itself can be compressed further
Benefits: Highest possible compression
Encodes “expression”
Drawbacks: Cannot guarantee the “performance”
Cannot assure the quality of the sounds
Cannot make arbitrary sounds (yet)
MIDI and OtherMIDI and Other Scorefiles Scorefiles
•• A Musical Score is a very compact A Musical Score is a very compact
representation of musicrepresentation of music
•• Even the score itself can be compressed furtherEven the score itself can be compressed further
Benefits: Highest possible compressionBenefits: Highest possible compression
Encodes Encodes ““expressionexpression””
Drawbacks: Cannot guarantee the Drawbacks: Cannot guarantee the ““performanceperformance””
Cannot assure the quality of the sounds Cannot assure the quality of the sounds
Cannot make arbitrary sounds (yet) Cannot make arbitrary sounds (yet)
MIDI
Event Based Representation
Enter General MIDI
• Guarantees a base set of instrument sounds,
• and a means for addressing them,
• but doesn’t guarantee any quality
Better Yet, Downloadable Sounds
• Download samples for instruments
• Benefits: Does more to guarantee quality
• Drawbacks: Samples aren’t reality
Enter General MIDIEnter General MIDI
•• Guarantees a base set of instrument sounds,Guarantees a base set of instrument sounds,
•• and a means for addressing them,and a means for addressing them,
•• but doesnbut doesn’’t guarantee any qualityt guarantee any quality
Better Yet, Downloadable SoundsBetter Yet, Downloadable Sounds
•• Download samples for instrumentsDownload samples for instruments
•• Benefits: Does more to guarantee qualityBenefits: Does more to guarantee quality
•• Drawbacks: Samples arenDrawbacks: Samples aren’’t realityt reality
Event Based Representation
Downloadable Algorithms
• Specify the algorithm,
the synthesis engine runs it,
and we just send parameter changes
• Part of “Structured Audio” (MPEG4)
Benefits: Can upgrade algorithms later
Can implement scalable synthesis
Drawbacks: Different algorithm for each class of sounds
(but can always fall back on samples)
Downloadable AlgorithmsDownloadable Algorithms
•• Specify the algorithm,Specify the algorithm,
the synthesis engine runs it,the synthesis engine runs it,
and we just send parameter changesand we just send parameter changes
•• Part of Part of ““Structured AudioStructured Audio”” (MPEG4) (MPEG4)
Benefits: Can upgrade algorithms later Benefits: Can upgrade algorithms later
Can implement scalable synthesis Can implement scalable synthesis
Drawbacks: Different algorithm for each class of sounds Drawbacks: Different algorithm for each class of sounds
(but can always fall back on samples) (but can always fall back on samples)
Physical Modeling for Music
Strings (plucked, struck, bowed)
Winds (clarinet, flute, brass), voice
Plates, membranes, bar percussion
Shakers, scrapers
The Voice
Sounds Effects (PhOLISE)
Strings (plucked, struck, bowed)Strings (plucked, struck, bowed)
Winds (clarinet, flute, brass), voiceWinds (clarinet, flute, brass), voice
Plates, membranes, bar percussionPlates, membranes, bar percussion
Shakers, scrapersShakers, scrapers
The VoiceThe Voice
Sounds Effects (Sounds Effects (PhOLISEPhOLISE))
Physical Modeling: the “Real World”
Synthesizing Solids
O’Brien, Cook,
and Essl
SIGGRAPH 01
OO’’Brien, Cook,Brien, Cook,
and and EsslEssl
SIGGRAPH 01SIGGRAPH 01
Composition and Creation
Garton “Rough
Raga Riffs”
Lansky“mild
und leise”
GartonGarton ““RoughRough
Raga RiffsRaga Riffs””
LanskyLansky““mildmild
und und leiseleise””
Music for Unprepared PianoMusic for Unprepared Piano
BargarBargar,, Choi Choi, Betts, Cook, Betts, Cook
Expression and Control
Trueman: BoSSATruemanTrueman:: BoSSA BoSSA
Cook/Morrill TrumpetCook/Morrill Trumpet
Other ControllersOther Controllers
PICOs (musical and
“real-world” sonic controllers)
K-Frog
J-Mug
P-Pedal
PhilGlas
P-Grinder
T-shoe
T-bourine
Pico Glove
P-Ray’s Cafe
K-FrogK-Frog
J-MugJ-Mug
P-PedalP-Pedal
PhilGlasPhilGlas
P-GrinderP-Grinder
T-shoeT-shoe
T-T-bourinebourine
Pico GlovePico Glove
P-RayP-Ray’’s Cafes Cafe
Audio and Computer Music
Questions ?Questions ?Questions ?