CS1315: Introduction to Media Computation Sound Encoding and Manipulation.

CS1315:Introduction to Media Computation

Sound Encoding and Manipulation

Sound is made when something vibrates

The vibration disturbs the air around it and makes changes in air pressure

These changes in air pressure move through the air as sound waves

The sound waves cause pressure changes against our ear drum sending nerve impulses to our brain

Sound waves are pressure waves

Each air molecule oscillates back and forth, affecting their neighbors…air molecules themselves don’t travel from vibration source to your ear

Creates areas of high and low pressure, called compressions and rarefactions

Sound waves are longitudinal

Representation of sound by a sine wave is merely an attempt to illustrate the sinusoidal nature of the pressure-time fluctuations

Properties of sound waves

Pressure fluctuation comes in cycles frequency of wave is number of cycles per second (cps), or Hertz

(Complex sounds have more than one frequency in them.) amplitude is maximum height of the wave (aka maximum pressure

fluctuation)

Each repetition of a waveform is called a cycle

Volume and pitch: Psychoacoustics, the psychology of sound

Our perception of pitch is related (logarithmically) to frequency Higher frequencies are perceived as higher pitches A above middle C is 440 Hz

Our perception of volume is related (logarithmically) to changes in amplitude If the amplitude doubles, it’s about a 3 decibel (dB) change

Humans and Pitch

In general we can hear frequencies between 20 Hz and 20,000 Hz

Hz = hertz = cycles per second 20,000 Hz = 20 kHz (kilohertz) The older we get, the less we can perceive really high

frequencies – recall those commercials for ‘silent’ ring-tones for teens

Many animals can make sounds and hear frequencies that are beyond what we can hear – hence the creation of dog whistles

“Logarithmically?”

It’s strange, but our hearing works on ratios not differences, e.g., for pitch. We hear the difference between 200 Hz and 400 Hz, as

the same as 500 Hz and 1000 Hz Similarly, 200 Hz to 600 Hz, and 1000 Hz to 3000 Hz

Intensity (volume) is measured as watts per meter squared A change from 0.1W/m2 to 0.01 W/m2, sounds the same

to us as 0.001W/m2 to 0.0001W/m2

Decibel is a logarithmic measure

A decibel is a ratio between two intensities: 10 * log10(I1/I2) As an absolute measure, it’s in comparison to threshold

of audibility 0 dB can’t be heard. Normal speech is 60 dB. A shout is about 80 dB

Digitizing Sound: How do we get that into numbers? Waves are analog (continuous) Remember in calculus,

estimating the curve by creating rectangles?

We can do the same to estimate the sound curve Analog-to-digital conversion

(ADC) will give us the amplitude at an instant as a number: a sample

How many samples do we need?

Remember…pictures are continuous. How do we represent them digitally?

Understanding how computers represent sound Consider how a film represents motion…

Movie is made by taking still photos in rapid sequence at a constant rate, usually twenty-four frames per second

When the photos are displayed in sequence at that same rate, it fools us into thinking we are seeing continuous motion, even though we are actually seeing twenty-four discrete images per second

http://music.arts.uci.edu/dobrian/digitalaudio.htm

http://en.wikipedia.org/wiki/Animation

A collection of still frames

How computers represent sound

Digital recording of sound works on the same principle We take many discrete samples of the sound wave's

instantaneous amplitude, store that information, then later reproduce those amplitudes at the same rate to create the illusion of a continuous wave


How often should we take samples?

Many many samples must be taken per second--many more than are necessary for filming a visual image

In fact, we need to take more than twice as many samples as the highest frequency we wish to record…enter Nyquist Theorem

The number of samples taken per second is known as the sampling rate


Nyquist Theorem* (will be on exam)

We need twice as many samples as the maximum frequency in order to represent (and recreate, later) the original sound.

The number of samples recorded per second is the sampling rate If we capture 8000 samples per second, the highest frequency

we can capture is 4000 Hz That’s how phones work

CD quality is 44,100 samples per second…what is the max frequency a CD can represent?

Nyquist Theorem

Think back to the example of a film camera, which shoots 24 frames per second If we're shooting a movie of a car, and the car wheel spins at a

rate greater than 12 revolutions per second, it's exceeding half the "sampling rate" of the camera… the wheel completes more than 1/2 revolution per frame.

If, for example it actually completes 18/24 of a revolution per frame, it will appear to be going backward at a rate of 6 revolutions per second

In other words, if we don't witness what happens between samples, a 270-degree revolution of the wheel is indistinguishable from a -90-degree revolution. The samples we obtain in the two cases are precisely the same


Nyquist Theorem For audio sampling, the phenomenon is

practically identical Consider a graph of a 4,000 Hz cosine wave,

being sampled at a rate of 22,050 Hz Sample is taken every 0. 045 milliseconds


Nyquist Theorem Consider same 4,000 Hz cosine wave sampled at

6,000 Hz Sample is taken every 0. 167 milliseconds Wave completes more than 1/2 cycle per sample, and

the resulting samples are indistinguishable from those that would be obtained from a 2,000 Hz cosine wave


Nyquist Theorem

The simple lesson to be learned from the Nyquist theorem: digital audio cannot accurately represent any frequency greater than half the sampling rate

If we want to record frequencies as high as 20,000 Hz, how many times per second would we need to sample the sound?


Digitizing sound with the computer: bit depth Each sample is a numerical value representing the

instantaneous amplitude of the signal at the moment it was sampled

The range of possible numbers used by a computer depends on the number of binary digits (bits) used to store each number

As the number of bits increases, the range of possible numbers they can express increases by a power of two

How many bits did we use per color channel per pixel? How many values could each color channel take on?


Digitizing sound with the computer

Each sample is stored as a number (two bytes=16 bits) What’s the range of available combinations?

16 bits, 216 = 65,536 But we want both positive and negative values

To indicate compressions and rarefactions. What if we use one bit to indicate positive (0) or negative (1)? That leaves us with 16 - 1 = 15 bits 15 bits, 215 = 32,768 One of those combinations will stand for zero

We’ll use a “positive” one, so that’s one less pattern for positives

+/- 32K

Each sample can be between -32,768 and 32,767

Compare this to 0..255 for light intensity

(i.e. 8 bits or 1 byte giving us 256 different values)

Why such a bizarre number?

Because 32,768 + 32,767 + 1 = 216

i.e. 16 bits, or 2 bytes< 0 > 0 0

Sounds as arrays (called lists in jython) Samples are just stored one right after the other in the

computer’s memory That’s called an array

It’s an especially efficient (quickly accessed) memory structure

(Like pixels in a picture)

Working with sounds

We’ll use pickAFile and makeSound. We want .wav files (NOTE: not all .wav files will work)

We’ll use getSamples to get all the sample objects out of a sound

We can also get the value at any index with getSampleValueAt

Can get a sound’s length (getLength(sound)) Can get a sound’s sampling rate

(getSamplingRate(sound)) Can save sounds with writeSoundTo(sound,"file.wav")

Demonstrating Working with Sound in JES

>>> filename = pickAFile()>>> print filename/Users/monica/Desktop/MediaSources/preamble.wav>>> sound = makeSound(filename)>>> print soundSound of length 421109>>> samples = getSamples(sound)>>> print samplesSamples, length 421109>>> print getSampleValueAt(sound, 1)36>>> print getSampleValueAt(sound, 2)29

Demonstrating working with samples>>> print getLength(sound)220568>>> print getSamplingRate(sound)22050.0>>> print getSampleValueAt(sound, 220568)68>>> print getSampleValueAt(sound, 220570) # note this is too highI wasn't able to do what you wanted.The error java.lang.ArrayIndexOutOfBoundsException has occuredPlease check line 0 of >>> print getSampleValueAt(sound, 1)36>>> setSampleValueAt(sound,1, 12)>>> print getSampleValueAt(sound, 1)12

Working with Samples

We can get sample objects out of a sound with getSamples(sound) or getSampleObjectAt(sound, index)

A sample object it still part of the sound object, so if you change the sample object, the sound changes.

Sample objects understand getSampleValue(sample) and setSampleValue(sample, value)

Example: Manipulating Samples>>> soundfile=pickAFile()>>> sound=makeSound(soundfile)>>> sample=getSampleObjectAt(sound, 1)>>> print sampleSample at 1 value at 59>>> print soundSound of length 387573>>> print getSound(sample)Sound of length 387573>>> print getSampleValue(sample)59>>> setSampleValue(sample, 29)>>> print getSampleValue(sample)29

“But there are thousands of these samples!” How do we do something to these samples to

manipulate them, when there are thousands of them per second?

We use a loop and get the computer to iterate in order to do something to each sample.

An example loop that just gets and reassigns the same value:

for sample in getSamples(sound): value = getSampleValue(sample) setSampleValue(sample, value)

N.B.: function name changes

setSampleValue(sample, 200) used to be called setSample(sample, 200)

getSampleValue(sample) used to be called getSample(sample)

So if you are using an old version of JES (pre-3.1) you may have to use setSample and getSample.

Properties of sound

Frequency Amplitude Wavelength


Properties of digital sound

Sampling rate/frequency – not to be confused with the frequency of the sound we are trying to capture!

Bit depth


CS1315: Introduction to Media Computation Sound Encoding and Manipulation.

Documents

Transcript of CS1315: Introduction to Media Computation Sound Encoding and Manipulation.