Chapter 12 Sound

Chapter 12Chapter 12SoundSound

Multimedia SystemsMultimedia Systems

Key PointsKey PointsSound is a complex mixture of physical Sound is a complex mixture of physical

and psychological factors, which is difficult and psychological factors, which is difficult to model accurately. to model accurately.

Sounds can be characterized by their Sounds can be characterized by their waveformswaveforms, which plot amplitude against , which plot amplitude against time. time.

CD qualityCD quality sound is sampled at 44.1 kHz, sound is sampled at 44.1 kHz, using a sample size of 16 bits. Multimedia using a sample size of 16 bits. Multimedia productions may have to use lower productions may have to use lower sampling rates and smaller sample sizes. sampling rates and smaller sample sizes.

Key PointsKey PointsThe quality of digitized sound can be The quality of digitized sound can be

improved by improved by ditheringdithering — adding a small — adding a small quantity of noise to randomize the quantity of noise to randomize the quantization error. quantization error.

Software can provide the functions of a Software can provide the functions of a recording studio, including multi-track recording studio, including multi-track recording, mixing and effects, on a recording, mixing and effects, on a desktop computer. desktop computer.

The most vexatious aspect of recording is The most vexatious aspect of recording is getting the getting the levelslevels right. right.

Audio Audio filtersfilters are used to remove noise are used to remove noise and unwanted frequency components.and unwanted frequency components.

Key PointsKey Points Digital versions of established Digital versions of established effectseffects, such as r, such as r

everb and envelope shaping are used to alter theverb and envelope shaping are used to alter the quality of sounds. Digital technology permits ne quality of sounds. Digital technology permits new kinds of alteration, including ew kinds of alteration, including time stretchingtime stretching and and pitch alterationpitch alteration. .

Speech data can be compressed using establishSpeech data can be compressed using established technology, including µ-law and A-law ed technology, including µ-law and A-law compacompandingnding and and ADPCMADPCM. .

MPEG-1 Layer 3 audio (MP3)MPEG-1 Layer 3 audio (MP3) is a lossy method is a lossy method of audio compression that uses a of audio compression that uses a psycho-acoupsycho-acoustical modelstical model to determine which information to d to determine which information to discard. iscard.

Key PointsKey PointsEach of the three major platforms has its oEach of the three major platforms has its o

wn sound file format: wn sound file format: AIFFAIFF for MacOS, for MacOS, WWAVAV for Windows, and for Windows, and AUAU for Unix. for Unix. RealAuRealAudiodio is used for streaming audio. is used for streaming audio.

MIDI (The Musical Instruments Digital InMIDI (The Musical Instruments Digital Interface)terface) provides a standard for controlling provides a standard for controlling digital instruments and communicating betdigital instruments and communicating between them and computers running ween them and computers running sequesequencerncer programs. programs.

When sound is combined with video, When sound is combined with video, syncsynchronizationhronization must be established and main must be established and maintained. tained.

The Nature of SoundThe Nature of Sound All sounds are produced by the conversion of eAll sounds are produced by the conversion of e

nergy into vibrations in the air or some other elnergy into vibrations in the air or some other elastic mediumastic medium

ex: tuning forks (ex: tuning forks ( 音叉音叉 ) and guitars) and guitars A good tuning fork produces the clean tines at A good tuning fork produces the clean tines at

a single frequency, most other sound sources a single frequency, most other sound sources vibrate in more complicated ways.vibrate in more complicated ways.

A single note is composed of several componeA single note is composed of several components at frequencies that are multiplies of fundamnts at frequencies that are multiplies of fundamental pitch of the note.ental pitch of the note.

Harmonic Harmonic The spectrum of a single note from a musical The spectrum of a single note from a musical

instrument usually has a set of peaks at instrument usually has a set of peaks at (approximately) harmonic ratios.(approximately) harmonic ratios.

That is, if the fundamental frequency is f, there That is, if the fundamental frequency is f, there are peaks at f, and also at (about) 2f, 3f, 4f, etc. are peaks at f, and also at (about) 2f, 3f, 4f, etc.

The pitch of a note refers to the The pitch of a note refers to the fundamental frequencyfundamental frequency with which the source with which the source of the tone resonates. of the tone resonates.

http://omix6.omix.com/melosync/AboutSound/Frequency.html

Frequency SpectrumFrequency SpectrumPercussive sounds and most natural Percussive sounds and most natural

sounds do not even have a single sounds do not even have a single identifiable fundamental frequency, but identifiable fundamental frequency, but can still be decomposed into a collection can still be decomposed into a collection of frequency components.of frequency components.Frequency spectrum: relative amplitudes of its Frequency spectrum: relative amplitudes of its

frequency componentsfrequency components

The Nature of SoundThe Nature of SoundThe human ear is able to detect The human ear is able to detect

frequencies in the range between 20 Hz frequencies in the range between 20 Hz and 20 kHzand 20 kHzUpper limit decreases with increasing ageUpper limit decreases with increasing age

We can display the waveform of any We can display the waveform of any sound by plotting its amplitude against sound by plotting its amplitude against timetime

Figs. 12.1-7Figs. 12.1-7some waveforms for a range of types of some waveforms for a range of types of soundsound

SpeechSpeech Speaker repeats “Speaker repeats “Feisty teenager” twice, then a

more distance responds. The second time faster and with more emphasis Record in open air and there is background

noise. Compress speech: removing the silences

Feisty teenager

InstrumentsInstrumentsFigs. 12.2-5Figs. 12.2-5

Didgeridoo

Boogie-woogie

Violin, cello and piano

Men grow cold...

Water soundsWater sounds

A trickling stream

The sea

StereophonyStereophonyOne of the most useful illusions in sound One of the most useful illusions in sound

perception is stereophony.perception is stereophony.Brain identifies the source of a sound on Brain identifies the source of a sound on

the basis of the differences in intensity and the basis of the differences in intensity and phrase between the signals received from phrase between the signals received from the left and right ears.the left and right ears.

Digitizing SoundDigitizing SoundSamplingSampling

The selection of the sampling rateThe selection of the sampling rateIf limiting of hearing is 20 kHz, a minimum rate of If limiting of hearing is 20 kHz, a minimum rate of

40 kHz is required by the Sampling Theorem.40 kHz is required by the Sampling Theorem.The sampling rate of audio CDs is 44.1 kHzThe sampling rate of audio CDs is 44.1 kHz22.05 kHz is commonly used for Internet22.05 kHz is commonly used for Internet

11.025 kHz for speech11.025 kHz for speechDAT (digital audio tape): 48 kHzDAT (digital audio tape): 48 kHz

SamplingSamplingHow does sampling work in computer systemHow does sampling work in computer system

Sound cardSound cardDigital audio inputs are uncommonDigital audio inputs are uncommonAnalog line output of DAT or CD is re-digitalized by Analog line output of DAT or CD is re-digitalized by

sound cardsound cardIncompatible rate: re-samplingIncompatible rate: re-samplingIt’s called jitter that the intervals between samples It’s called jitter that the intervals between samples

driftdrift

SamplingSampling If sampling rate = 40 kHz, the inaudible If sampling rate = 40 kHz, the inaudible

components will manifest as aliasing when components will manifest as aliasing when signal is reconstructed.signal is reconstructed. A filter is used to remove any frequencies than A filter is used to remove any frequencies than

half the sampling rate before the signal is sampled.half the sampling rate before the signal is sampled.

Digitizing SoundDigitizing SoundQuantizationQuantization

It’s usually 65536 quantization levels for CD aIt’s usually 65536 quantization levels for CD audioudio16 bits16 bits

Undersampling a pure sine waveUndersampling a pure sine waveAn analogue signal will be coarsely approximated An analogue signal will be coarsely approximated

by samples that jump between just a few quantized by samples that jump between just a few quantized valuesvalues

DitheringDitheringWhen a small amount of random noise is added to When a small amount of random noise is added to

the analogue signal before samplingthe analogue signal before sampling

QuantizationQuantization

Undersampling a pure sine wave

DitheringDithering

Dithering

DitheringDithering Sampling and dithering on frequency spectrum Sampling and dithering on frequency spectrum

Processing SoundProcessing SoundModern multi-track recording studioModern multi-track recording studioThere is presently no single sound There is presently no single sound

application that has the de facto status.application that has the de facto status.MIDI sequencingMIDI sequencingMulti-track recordingMulti-track recordingVideo editing packages include some Video editing packages include some

integrated sound editing and processing integrated sound editing and processing facilities.facilities.

Recording and Importing SoundRecording and Importing Sound Sampling rate and sampling sizeSampling rate and sampling size If level of signal is too low, then If level of signal is too low, then

resulting recording will be quiet.resulting recording will be quiet. If level is too high, clipping will occur.If level is too high, clipping will occur.

Fig. 12.10Fig. 12.10 Gain control can be used to alter level.Gain control can be used to alter level. Automatic gain controlAutomatic gain control

Sound Editing and EffectsSound Editing and Effects Interface: timelineInterface: timeline TracksTracks Creation of loopsCreation of loops

Very short loops are needed to create voices for the eVery short loops are needed to create voices for the electric musical instruments known as samplers.lectric musical instruments known as samplers.

Longer loops are used in certain styles of dance musiLonger loops are used in certain styles of dance musicc

Post-productionPost-production Correct defects, enhance quality, modify their charactCorrect defects, enhance quality, modify their charact

er.er. Premiere’s effects plug-in format is widely used.Premiere’s effects plug-in format is widely used. Professional level: Cubase VST, DigiDesign ProToolsProfessional level: Cubase VST, DigiDesign ProTools

Removal of unwanted noiseRemoval of unwanted noiseNoise gateNoise gate

Eliminates all samples whose value falls beloEliminates all samples whose value falls below a specified thresholdw a specified threshold

Specify a minimum time that must elapse befoSpecify a minimum time that must elapse before a sequence of low amplitude samples counre a sequence of low amplitude samples counts as a silence and a similar limit before a seqts as a silence and a similar limit before a sequence whose values exceed the threshold couence whose values exceed the threshold counts as sound.unts as sound.

This prevents the gate being turned on or off This prevents the gate being turned on or off by transient glitches (by transient glitches ( 短暫的電磁波干擾短暫的電磁波干擾 ).).

Noise GateNoise Gate Since noise gate has no effect on speaker’s wordSince noise gate has no effect on speaker’s word

s, the background noise will cut in and out as he s, the background noise will cut in and out as he speaks.speaks. Noise combined with signalNoise combined with signal Noise gate: all-or-nothing filteringNoise gate: all-or-nothing filtering

Low-pass, high-pass, notch filtersLow-pass, high-pass, notch filters Specialized filtersSpecialized filters

de-esser: remove the sibilance (de-esser: remove the sibilance (絲絲聲絲絲聲 ) that results fr) that results from speaking or singing into microphone placed too cloom speaking or singing into microphone placed too close to performerse to performer

Click repairerClick repairer Remove clicks from recording taken from damaged or dirty viRemove clicks from recording taken from damaged or dirty vi

nyl records.nyl records.

Single effect may be used in different Single effect may be used in different ways depending on values of parametersways depending on values of parametersReverb effectReverb effect

Small delay and low reflectivity: inside a small Small delay and low reflectivity: inside a small roomroom

Longer reverb times: concert hall or stadiumLonger reverb times: concert hall or stadium

Graphic EqualizationGraphic Equalization Transforms spectrum of a sound using a bank Transforms spectrum of a sound using a bank

of filters, each controlled by its own slider and of filters, each controlled by its own slider and each affecting a fairly narrow band of each affecting a fairly narrow band of frequencies.frequencies.

Envelope ShapingEnvelope Shaping Changing outline of a waveformChanging outline of a waveform Allow user to draw a new envelope around the wAllow user to draw a new envelope around the w

aveform, altering its attack and decay and introdaveform, altering its attack and decay and introducing arbitrary fluctuations of amplitude.ucing arbitrary fluctuations of amplitude.

Fader: a specialized versions of envelope shapinFader: a specialized versions of envelope shapingg Volume to be gradually increased and decreasedVolume to be gradually increased and decreased Tremolo (Tremolo ( 顫音顫音 ))

Cause the amplitude to oscillate periodically from zero to its Cause the amplitude to oscillate periodically from zero to its maximum valuemaximum value

Time stretching and pitch alteration are Time stretching and pitch alteration are two closely related effects two closely related effects Analogue recordings can only be achieved by Analogue recordings can only be achieved by

altering speed at which it is played back, and altering speed at which it is played back, and this alters the pitch.this alters the pitch.

With digital sound, the duration can be With digital sound, the duration can be changed without altering the pitch by inserting changed without altering the pitch by inserting or removing samples.or removing samples.The pitch can be altered without affecting durationThe pitch can be altered without affecting duration

Time stretching required when sound is being Time stretching required when sound is being synchronized to video or another sound.synchronized to video or another sound.

CompressionCompression3 minutes, stereo: 25 MBytes3 minutes, stereo: 25 MBytesHuffman codingHuffman codingRun-length coding: silenceRun-length coding: silence

Speech CompressionSpeech Compression Telephone companies, 1960sTelephone companies, 1960s

Companding: compressing/expandingCompanding: compressing/expanding non-linear quantization: Fig. 12.11non-linear quantization: Fig. 12.11 G.711: G.711: -law, North America and Japan, SUN-law, North America and Japan, SUN A-lawA-law

ADPCM, adaptive differential pulse code mADPCM, adaptive differential pulse code modulationodulation

Differential pulse code modulationDifferential pulse code modulation Linear Predictive CodingLinear Predictive Coding

Mathematical model of state of vocal tract as its Mathematical model of state of vocal tract as its representation of speechrepresentation of speech

2.4 kbps, machine-like quality2.4 kbps, machine-like quality

Perceptually Based CompressionPerceptually Based Compression Threshold of hearingThreshold of hearing

minimum level at which a sound can be heardminimum level at which a sound can be heard Fig. 12.12, the threshold of hearingFig. 12.12, the threshold of hearing

Very low or high frequency sound must be much Very low or high frequency sound must be much louder than a mid-range tone to be heard.louder than a mid-range tone to be heard.

Phycho-acoustical modelPhycho-acoustical model Mathematical description of aspects of the way the eaMathematical description of aspects of the way the ea

r and brain perceive soundsr and brain perceive sounds Loud tones can obscure softer tones that occur at the Loud tones can obscure softer tones that occur at the

same timesame time Depends on the relative frequencies of the two tonesDepends on the relative frequencies of the two tones

MaskingMasking A modification of threshold of hearing curve in region of A modification of threshold of hearing curve in region of

a loud tonea loud tone Fig.12.13, the threshold is raised in neighborhood of masking Fig.12.13, the threshold is raised in neighborhood of masking

tonetone The raised portion, or masking curve is non-linear, and The raised portion, or masking curve is non-linear, and

asymmetrical, raising faster than it fallsasymmetrical, raising faster than it falls Any sound that lies within the masking curve will be inaudible, Any sound that lies within the masking curve will be inaudible,

even though it raises above the unmodified threshold of hearing.even though it raises above the unmodified threshold of hearing. Because masking hides noise as well as some components of Because masking hides noise as well as some components of

the signal, quantization noise can be masked.the signal, quantization noise can be masked. Where a masking sound is present, the signal can be quantized Where a masking sound is present, the signal can be quantized

relatively coarsely, using fewer bits than would otherwise be relatively coarsely, using fewer bits than would otherwise be needed, because the resulting quantization noise can be hidden needed, because the resulting quantization noise can be hidden under the masking curve.under the masking curve.

CompressionCompressionUse a bank of filters to split signal into Use a bank of filters to split signal into

bands of frequencies; 32 bands are bands of frequencies; 32 bands are commonly used.commonly used.

The average signal level in each band is The average signal level in each band is calculated, and using these values and a calculated, and using these values and a psycho-acoustical model, a masking level psycho-acoustical model, a masking level for each band is computed.for each band is computed.

MPEG AudioMPEG Audio3 layers3 layersLayer 1: 192 kbps for each channelLayer 1: 192 kbps for each channel

Layer 2: 128 kbps for each channelLayer 2: 128 kbps for each channelLayer 3: 64 kbps for each channelLayer 3: 64 kbps for each channel

MP3 = MPEG-1 Layer 3MP3 = MPEG-1 Layer 3compression rate = 10:1compression rate = 10:1

FormatsFormatsAIFF for MacOSAIFF for MacOS

WAV for WindowsWAV for WindowsAU for UnixAU for UnixEach can store audio data at a variety of comEach can store audio data at a variety of com

monly used sampling rates and sample sizes.monly used sampling rates and sample sizes.Each supports uncompressed or compressed Each supports uncompressed or compressed

data with a range of compressors.data with a range of compressors.

Streaming AudioStreaming AudioSound is delivered over a network and Sound is delivered over a network and

played as it arrives without having to be played as it arrives without having to be stored on user’s machine first.stored on user’s machine first.

Because of lower bandwidth required by Because of lower bandwidth required by audio, streaming is more successful for audio, streaming is more successful for sound than it is for video.sound than it is for video.

Real Networks’ RealAudioReal Networks’ RealAudioStreaming QuickTimeStreaming QuickTimePlay on demandPlay on demand

MIDIMIDIThe Musical Instruments Digital InterfaceThe Musical Instruments Digital InterfaceStandard protocol for communicating Standard protocol for communicating

between electronic instruments, such as between electronic instruments, such as synthesizers, sampler, and drum synthesizers, sampler, and drum machines.machines.

MIDI allowed instruments to be controlled MIDI allowed instruments to be controlled automatically by devices that could be automatically by devices that could be programmed to send out sequences of programmed to send out sequences of MIDI instructions.MIDI instructions.

MIDI MessagesMIDI Messages An instruction that controls some aspect of the An instruction that controls some aspect of the

performance of an instrumentperformance of an instrument Status byte= type of messageStatus byte= type of message

one or two bytes giving the values of parametersone or two bytes giving the values of parameters Note On, Note Off, Key PressureNote On, Note Off, Key Pressure Running statusRunning status

MIDI data is transmitted using a 10-bit packet that includes a start and stop bit

The MIDI message Note On is followed by two data bytes, as is the Note Off message.

General MIDI and QuickTimeGeneral MIDI and QuickTimeGeneral MIDI specifies 128 standard General MIDI specifies 128 standard

voices, Table 12.1voices, Table 12.1Drum machine and percussion samplersDrum machine and percussion samplers

Drum kits, Table 12.2Drum kits, Table 12.2There is no guarantee that identical There is no guarantee that identical

sounds will be generated for each name sounds will be generated for each name by different instruments.by different instruments.A good sampler may use high quality samples A good sampler may use high quality samples

of the corresponding real instruments.of the corresponding real instruments.QuickTime : MIDI-like functionalityQuickTime : MIDI-like functionality

MIDI SoftwareMIDI Software MIDI sequencing programsMIDI sequencing programs

Capture and editing functions equivalent to those of viCapture and editing functions equivalent to those of video editing software.deo editing software.

Multiple tracksMultiple tracks CompositionComposition

Music can be captured as it is palyed from MIDI Music can be captured as it is palyed from MIDI controllers attached to a computer via a MIDI intcontrollers attached to a computer via a MIDI interface.erface.

Punch inPunch in The start and end point of a defective passage are maThe start and end point of a defective passage are ma

rked, the sequencer starts playing before the beginninrked, the sequencer starts playing before the beginning, and then switches to record mode, allowing a new g, and then switches to record mode, allowing a new version of the passage to be recorded to replace the oversion of the passage to be recorded to replace the original.riginal.

SequencersSequencers Quantize tempo during recording, fitting the Quantize tempo during recording, fitting the

length of notes to exact sixteenth notes, or length of notes to exact sixteenth notes, or eighth note triplets, or whatever duration is eighth note triplets, or whatever duration is specified.specified.

Most programs allow music to be entered using Most programs allow music to be entered using classical music notation.classical music notation.

Printed sheet music to be scanned and will Printed sheet music to be scanned and will perform optical character recognition to perform optical character recognition to transform the music into MIDI.transform the music into MIDI.

The opposite transformation, from MIDI to a The opposite transformation, from MIDI to a printed score, is also often provided, enabling printed score, is also often provided, enabling transcriptions of performed music to be made transcriptions of performed music to be made automatically.automatically.

Piano-roll interface, Fig. 12.14Piano-roll interface, Fig. 12.14Major limitations of MIDIMajor limitations of MIDI

Impossibility of representing vocalsImpossibility of representing vocalsMIDI can be transformed into audio.MIDI can be transformed into audio.Reverse transformation is sometimes Reverse transformation is sometimes

supported, although it is more difficult to supported, although it is more difficult to implement.implement.

Computer Sequencing Software Computer Sequencing Software

Music Notation Software Music Notation Software

Combing Sound and PictureCombing Sound and Picture Voice-overs should match the picture they describe, musVoice-overs should match the picture they describe, mus

ic will often be related to edits, and natural sounds will be ic will often be related to edits, and natural sounds will be associated with events on screen.associated with events on screen.

Synchronization, timecodeSynchronization, timecode If sound and video are physically independently, synchroIf sound and video are physically independently, synchro

nization will sometimes be lost.nization will sometimes be lost. Audio and video data streams must carry the equivalent of timecAudio and video data streams must carry the equivalent of timec

ode, so that their synchronization can be checked.ode, so that their synchronization can be checked. Audio and video play from local hard diskAudio and video play from local hard disk

For short clips, it is possible to load the entire sound track into mFor short clips, it is possible to load the entire sound track into memory before playback begins.emory before playback begins.

This is impractical for movies. Fore these, it is normal to interleaThis is impractical for movies. Fore these, it is normal to interleave the audio and video.ve the audio and video.

Chapter 12 Sound

Documents

Transcript of Chapter 12 Sound