Media Processing – Audio Part

45
1 Media Processing – Audio Part Dr Wenwu Wang Centre for Vision Speech and Signal Processing Department of Electronic Engineering [email protected] http://personal.ee.surrey.ac.uk/Personal/W.Wang/ teaching.html

description

Media Processing – Audio Part. Dr Wenwu Wang Centre for Vision Speech and Signal Processing Department of Electronic Engineering [email protected] http://personal.ee.surrey.ac.uk/Personal/W.Wang/teaching.html. Approximate outline. Week 6: Fundamentals of audio - PowerPoint PPT Presentation

Transcript of Media Processing – Audio Part

Page 1: Media Processing – Audio Part

1

Media Processing – Audio Part

Dr Wenwu Wang

Centre for Vision Speech and Signal Processing

Department of Electronic Engineering

[email protected]

http://personal.ee.surrey.ac.uk/Personal/W.Wang/teaching.html

Page 2: Media Processing – Audio Part

2

Approximate outline

Week 6: Fundamentals of audio

Week 7: Audio acquiring, recording, and standards

Week 8: Audio processing, coding, and standards

Week 9: Audio production and reproduction

Week 10: Audio perception and audio quality assessment

Page 3: Media Processing – Audio Part

3

Audio recording

Microphones

Directional response of microphones

Omni pattern

Figure-eight pattern

Cardioid pattern

A/D conversion

Sampling

Quantisation

D/A conversion

Digital audio recording formats/standards

Free audio recording software

Concepts and topics to be covered:

Page 4: Media Processing – Audio Part

4

Audio recording / processing / production chain

Recording/acquisation system Production system (speaker)

Sound source Listener

Page 5: Media Processing – Audio Part

5

Microphone

A microphone, whose functionality is opposite to a loudspeaker, is a transducer that converts acoustical sound energy into electrical form.

There are three most common operation principles for microphones: the dynamic (i.e. moving coil), the ribbon, and the capacitor (or condensor).

Page 6: Media Processing – Audio Part

6

Dynamic microphone

It consists of a rigid diaphragm, typically 20-30 mm in diameter, suspended in front of a magnet.

The coil sits in the gap of a strong permanent magnet.

Sound waves cause diaphragm to vibrate, and ultimately the coil to move in the magnet’s gap, which results in an alternating current flows in the coil, producing the electrical output.

Such a microphone is useful in applications such as drums, and hand-held vocal use, due to its advantage in robustness. Its disadvantage lies in its limited frequency response (fairly rapid fall-off in response above 8 or 10 kHz).

Source: Francis Rumsey and Tim McCormick (1994)

Page 7: Media Processing – Audio Part

7

Ribbon microphone

The ribbon microphone consists of a long thin strip of conductive metal foil, magnetic poles, and transformer.

The foil is pleated to give it ‘spring’, which is lightly tensioned between two end clamps. The magnetic poles create a magnetic field across the ribbon. When it is excited by the sound waves, a current is generated.

A transformer is used to magnify the electrical output of the ribbon, which is very small. Note that the standard output impedance is 200 ohms, as in the dynamic microphone. Source: Francis Rumsey and

Tim McCormick (1994)

Page 8: Media Processing – Audio Part

8

Capacitor microphone The capacitor consists of a flexible diaphragm and a rigid back plate, separated by

an insulator. The diaphragm is free to vibrate with the sound waves. When one plate (i.e. diaphragm) is free to move with respect to the other (i.e. the earthed back plate), then the capacitance, i.e. the ability to hold electrical charge, will vary.

The DC phantom power charges the capacitor via a very high resistance. A DC blocking capacitor prevents the phantom power from entering the head amplifier, allowing only audio signals to pass.

Sound waves cause the diaphragm to move, and thus the changes in the capacitance and in the voltage across the capacitor proportionally (as the high resistance only allows very slow leakage of charge from the diaphragm).

The head amplifier converts the very high impedance voltage output of the capacitor to a much lower impedance. The transformer further balances the signal for output.

Source: Francis Rumsey and Tim McCormick (1994)

Page 9: Media Processing – Audio Part

9

Directional responses of microphone and polar diagrams

Microphones are designed to have certain directional response pattern, often described by ‘polar diagram’, which is a two-dimensional contour map, showing the magnitude of the microphone’s output at different angles of incidence of a sound wave.

The distance of the polar plot from the centre of the diagram is usually measured by decibels (dB). The further the plot is from the centre, the greater the output of the microphone at that angle.

A nominal 0 dB is usually marked for the response at zero degrees at 1 kHz.

Page 10: Media Processing – Audio Part

10

Omnidirectional pattern An omnidirectional microphone picks up sound equally from all directions,

i.e., a response of 1.

The polar diagram of an ideal omnidirectional microphone is shown below, where the microphone response is omnidirectional for all frequencies.

Such a pattern can be achieved by leaving the microphone diaphragm open at the front, but completely enclosed at the rear, so that it responds only to the change of air pressure caused by the sound waves.

Source: Francis Rumsey and Tim McCormick (1994)

Page 11: Media Processing – Audio Part

11

Omnidirectional pattern (cont.)

The polar diagram of a (typical) real omnidirectional microphone at a number of frequencies is shown in the figure below.

Source: Francis Rumsey and Tim McCormick (1994)

Page 12: Media Processing – Audio Part

12

Omnidirectional pattern (cont.)

For this microphone, its response is perfectly omnidirectional for frequencies up to 2 kHz. For frequencies between 3-6 kHz, its sensitivity at 180 degree (i.e. at the rear of the microphone) drops about 6 dB, as compared with the lower frequencies (up to 2 kHz). For frequencies above 8 kHz, the response at the 180 degree could drop as much as 15 dB. Therefore, sounds picked up by such a microphone will lose considerably the treble (high) frequency components of the signal.

The smaller the dimension of the microphone, the better the polar response at high frequencies, and the mics with quarter-inch diaphragms, for example, maintains good response up to 10 kHz.

Omni microphones are usually the most immune to mic movement and wind noise (as compared to the other types discussed later), as they are only sensitive to the absolute sound pressure.

Page 13: Media Processing – Audio Part

13

Bidirectional (figure-eight) pattern Bidirectional (or figure-eight) microphone has a polar response proportional

to the mathematical cosine of the angle of incidence of the sound waves.

At 90 degree, no sound is picked up. At 0 degree, the sound is picked up by a front lobe, and at 180 degree, by a rear lobe, which will be 180 degree out of phase as compared with the one from the front lobe.

Source: Francis Rumsey and Tim McCormick (1994)

Phase

Page 14: Media Processing – Audio Part

14

Bidirectional pattern (cont.) In such a microphone, such as the traditional ribbon microphone, the

diaphragm operates on the pressure-gradient principle, i.e. responding to the difference in pressure between the front and the rear of the microphone. Therefore, for a sound from a direction 90 degree off axis, the sound pressure will be of equal magnitude to both sides of the diaphragm, and hence cause no movement of diaphragm, giving no output.

For a sound arrives to the microphone from the front at 0 degree, a phase difference arises between the front and the rear of the diaphragm, due to the small additional distance travelled by the wave. The resulting difference in pressure produces movement of the diaphragm and hence gives an output (or response).

For very low frequencies, the phase difference between the front and rear becomes very small (due to the long wavelengths), and the output response will become lower.

The polar response of bidirectional mic tends to be very uniform at all frequencies, except for a slight narrowing at above approximately 10 kHz.

In practice, correct orientation of such microphones is required in use.

Page 15: Media Processing – Audio Part

15

Unidrectional (cardioid) pattern

Unidirectional (also known as cardioid) pattern is described mathematically as 1+cos(phi), where phi is the angle of incidence of the sound signal.

An idealised polar diagram of an unidirectional microphone is shown in the figure below.

Source: Francis Rumsey and Tim McCormick (1994)

Page 16: Media Processing – Audio Part

16

Unidrectional pattern (cont.) The response of the unidirectional microphone can be regarded as a

combination of the omnidirectional and bidirectional responses, as shown in the figure below.

At 0 degree, both polar responses are of equal amplitude and phase, when adding together, they produce a total output which is twice that of either separately. At 180 degree, they cancel each other due to the opposite phase.

Source: Francis Rumsey and Tim McCormick (1994)

Page 17: Media Processing – Audio Part

17

Unidirectional pattern (cont.)

Such microphones can be obtained by leaving the diaphragm open at the front, but introducing various acoustic labyrinths at the rear which cause sound to reach the back of the diaphragm in various combinations of amplitude and phase, resulting in a cardioid response.

A typical polar diagram of an unidirectional microphone at low (LF), middle (MF) and high frequencies (HF) is shown on the right figure.

The polar response at mid-frequencies is very good, but tends to degenerate towards omni at the low frequencies (which are picked up quite uniformly), and becomes more directional than is desirable at high frequencies (sounds arriving from the rear will not be completely attenuated).

Source: Francis Rumsey and Tim McCormick (1994)

Page 18: Media Processing – Audio Part

18

Hypercardioid pattern

The hypercardioid response is described mathematically as 0.5+cos(phi), where phi is the angle of incidence of sound signal.

It can be considered as a combination of an omni response (attenuated by 6 dB), and a figure-eight response. The shape of the response lies in between the cardioid and figure-eight patterns, having a relatively small rear lobe which is out of phase with the front lobe.

The hypercardioid microphone has the highest direct-to-reverberant ratio of the patterns, implying that the ratio between the level of on-axis sound and the level of reflected sounds picked up from other angles is very high. As a result, it is good for excluding unwanted sounds (such as room reverberations or unwanted noise).

Demo for polar patterns:http://www.youtube.com/watch?v=_MMHi8bQVv0http://www.youtube.com/watch?v=TUHpLqvw9AA

Page 19: Media Processing – Audio Part

19

Examples of microphones: switchable polar patterns

Two identical diaphragms are used and placed on each side of a central rigid plate. Perforations in the central plate give both diaphragms an cardioid response.

When the polarising voltage of one side is the opposite to the other, the combined output gives a figure-eight response, as cardioids are out of phase. When the polarising voltage of both sides is the same, the combined output gives omnidirectional response, as the cardioids are in phase. Intermediate combinations give cardioid and supercardioid polar responses.

A typical double-diaphragm microphone with switchable polar patter: AKG C414B-ULS

Page 20: Media Processing – Audio Part

20

Examples of microphones: stereo micophones

Two microphones are built into a single casing where one capsule is rotatable with respect to the other so that the angle between the two can be adjusted.

Each capsule can be switched to give desired polar response, such as a pair of figure-eight microphones, or a pair of cardioids.

A typical stereo microphone: the Neumann SM69

Page 21: Media Processing – Audio Part

21

Examples of microphones: stereo microphones

The sum-and-difference microphone is another type of stereo in which the sum (middle, i.e. (L+R)/2 of the conventional stereo microphone) and difference (side, i.e. (L-R)/2)) are combined in a matrix box to produce a left-right stereo signal.

A typical sum-and-difference stereo microphone: the Shure VP88

Page 22: Media Processing – Audio Part

22

Examples of microphones: stereo microphones

An example of sophisticated stereo microphone is the AMS Soundfield microphone, shown below.

In this microphone, each channel is fully adjustable from omni through cardioid to figure-eight, and angles between the capsules are also fully adjustable. These are controlled electronically by a remotely sited control unit.

Source: Francis Rumsey and Tim McCormick (1994)

Second generation AMS

First generation AMS

Page 23: Media Processing – Audio Part

23

A/D Conversion A/D converter is used to convert the analogue audio signal (a time varying

electrical voltage, say, the output of a microphone), into a series of ‘samples’ which are ‘snapshots’ of the analogue signal taken at periodic intervals (known as the sampling period).

It usually consists of sampling and quantisation steps.

Page 24: Media Processing – Audio Part

24

Sampling In this process, measurements (i.e. samples) are taken from the analogue

audio signal (shown on the left sub-plot below) at regular intervals in time. This is usually achieved by a sample and hold circuit.

To represent the fine detail of the signal (or to reconstruct the analogue signal perfectly from the samples), it is necessary to take a large number of samples per second. As dictated by the Shannon sampling theorem, at least two samples must be taken per audio cycle (i.e. period). In other words, the sampling frequency should be at least two times of the frequency of the highest frequency component within the signal.

Sample period: T (in second)

Sample frequency : f = 1/T (in Hz)

T

Page 25: Media Processing – Audio Part

25

Aliasing effect due to under sampling For the subplot a on the figure below, enough samples have been taken

and the signal can be perfectly reconstructed from the samples.

For the subplot b, less than two samples per cycle are taken from the wave, as a result the signal may be reconstructed as another signal (denoted by the dashed curve), instead of the signal that was originally sampled (the solid curve). This is known as aliasing effect.

Source: Francis Rumsey and Tim McCormick (1994)

Page 26: Media Processing – Audio Part

26

Frequency domain interpretation of sampling

The sampling process can also be considered as a modulation process, called pulse amplitude modulation (PCM) where a series of pulses of constant amplitude is amplitude modulated by the analogue audio waveforms. In other words, the amplitudes of the pulses are modified by the instantaneous amplitude of the analogue audio signal.

Page 27: Media Processing – Audio Part

27

Frequency domain interpretation of sampling (cont.)

Source: Francis Rumsey and Tim McCormick (1994)

Page 28: Media Processing – Audio Part

28

Frequency domain interpretation of sampling (cont.)

(a) The unmodulated sample pulses display a typical harmonic series of components at integer multiples of fs (fs =30 kHz in this case).

(b) When a 1 kHz sine wave is sampled at fs = 30k Hz, it generates sideband components at frequencies spaced at the either side of fs (i.e. 29 = fs-1 and 31 = fs+1 kHz), and its multiples (i.e., 59 =2fs-1 and 61 =2fs+1 kHz).

(c) When a 17 kHz sine wave is sampled at fs = 30k Hz, it generates sideband components at frequencies spaced at the either side of fs (i.e. 13 = fs-17 and 47 = fs+17 kHz), and its multiples (i.e., 43= 2fs-17 and 77 = 2fs+17 kHz). As the sideband 13 kHz will be within the frequency range of the baseband, i.e. the spectrum of the original audio signal), it will also be audible.

Page 29: Media Processing – Audio Part

29

Anti-aliasing One way to remove the aliasing effect is to make sure the sampling frequency

to be at least twice the highest frequency in the signal.

An alternative way is to use an anti-aliasing filter to remove the frequency components of the signal whose frequencies are higher than half of the sampling frequency (also usually called Nyquist frequency), as shown below.

Source: Francis Rumsey and Tim McCormick (1994)

Demo for aliasing effects and anti-aliasing:http://www.youtube.com/watch?v=YB9nALmwSL8http://www.youtube.com/watch?v=EQ-ovLnVTIM

Page 30: Media Processing – Audio Part

30

Quantisation In the quantisation process, each sample is assigned a value from a range of fixed

possibilities, as shown in an example below, where a scale from 1 to 10 is used for both positive and negative ranges (i.e. a decimal system). Each sample is represented by an integer number on this scale, and hence if the amplitude of the sample obtained from the sampling process is a fraction or decimal, it will be rounded to the nearest integer number during quantisation.

Source: Francis Rumsey and Tim McCormick (1994)

The quantised sequence: -3, 1, 5, 7, …, -5, -7, -9

Page 31: Media Processing – Audio Part

31

Quantisation (cont.)

The difference between the sample amplitude represented by the numbers and the original amplitude of the sample is called quantisation error.

The maximum quantisation error will be half of a quantisation step size, Q.

In the subplot (a), there are a fewer number of quantisation steps, therefore, the quantisation error is bigger, as compared to the subplot (b).

Source: Francis Rumsey and Tim McCormick (1994)

Page 32: Media Processing – Audio Part

32

Quantisation (cont.) In digital audio systems, a binary number (instead of decimal) system is

used to quantise the samples, as shown below: (a) a binary number consists of a number of bits; (b) each bit represents a power of two; (c) binary numbers can be represented electrically in pulse-code modulation (PCM) by a string of high and low voltages.

Source: Francis Rumsey and Tim McCormick (1994)

Page 33: Media Processing – Audio Part

33

Quantisation (cont.) A 4-bit binary quantisation scale: two’s complement. The leftmost bit is the

most significant bit (MSB) which determines whether the number is positive or negative.

Source: Francis Rumsey and Tim McCormick (1994)

Page 34: Media Processing – Audio Part

34

Quantisation (cont.) The quantisation error (noise) can be considerably reduced by oversampling (the

Nyquist frequency is above the upper limit of the audio band) which essentially spreads the quantisation noise into a wide range of frequencies, resulting in about 3 dB noise reduction per octave (i.e. double the sampling frequency) of oversampling. Therefore, it is the key in improving digital audio quality on both A/D and D/A converters. ‘Decimation’ is performed to reduce the sampling rate and increase the bit depth of the quantised samples obtained at high sampling rate.

Source: Francis Rumsey and Tim McCormick (1994)

Page 35: Media Processing – Audio Part

35

Quantisation (cont.) The dynamic range of the digital audio is limited by the high-level end of the

quantisation scale. Any amplitude of the samples that is out of this range will be clipped, and, as a result, the signal will be distorted.

Source: Francis Rumsey and Tim McCormick (1994)Demos for quantisation noise:

http://www.youtube.com/watch?v=_cRFBBnUFug

Page 36: Media Processing – Audio Part

36

D/A conversion The audio sample words are converted back into a staircase-like chain of

electrical levels corresponding to the sample values.

Resampling is used to reduce the width of the pulses, in order to reduce the so-called aperture effect (equalisation is required to correct for the aperture effect).

Finally, a low-pass smoothing filter is used to reconstruct the audio signal.

Source: Francis Rumsey and Tim McCormick (1994)

Page 37: Media Processing – Audio Part

37

Earlier digital audio recording formats

Digital tape recording (a magnetic tape data storage format introduced by Sony, 1980s)

Hard-disk recording (a digital magnetic data storage format, introduced by IBM in 1956, used for audio recording in 1976 by Sony)

Compact-disc (CD) recording (an optical disc used originally to store digital audio data, commercially available in 1982)

DVD recording (an optical disc storage format, invented by Philip, Sony, Toshiba, Panasonic in 1995, offering higher storage capacity than CD while having the same dimensions)

Page 38: Media Processing – Audio Part

38

Examples of digital audio recorder

Sony’s PCM-F1, digital tape recording, sampling rate 44.1 kHz.

Source: Francis Rumsey and Tim McCormick (1994)

Page 39: Media Processing – Audio Part

39

Examples of digital audio recorder (cont.)

Sony’s PCM-1610 digital tape recorder, sampling rates 44.1.

Source: Francis Rumsey and Tim McCormick (1994)

Page 40: Media Processing – Audio Part

40

Examples of digital audio recorder (cont.)

A Sony portable DAT digital tape recorder, sampling rates 44.1 and 48 kHz.

Source: Francis Rumsey and Tim McCormick (1994)

Page 41: Media Processing – Audio Part

41

Examples of digital audio recorder (cont.)

Lynx Digital Audio Recorder (containing the A/D, D/A converters)

Demos: http://www.youtube.com/watch?v=gwLTr8v01AI

http://www.youtube.com/watch?v=OVauM51sLYw

Page 42: Media Processing – Audio Part

42

Recent developments in digital audio recording (since 2000s)

Super audio CD (high resolution, optical disc for audio storage)

DVD-A (a digital format for delivering high-fidelity audio contents on DVD)

Blue-ray Disc (an optical disc storage media, a competitor of HD DVD)

HD DVD (a high-density optical disc format using red laser for recording)

Internet radio webcasting (audio service transmitted over the internet)

Podcasting (non-streamed webcast, audio downloaded from web feed (a remote server) through a client software podcatcher)

Page 43: Media Processing – Audio Part

43

Free digital audio recording and editing software

Audacity Audacity is free, open source software for recording and editing sounds. It allows you to

record live audio, converts tapes and records into digital recordings or CDs, edit Ogg Vorbis, MP3, WAV or AIFF sound files. You also can cut, copy, split or mix sounds together with Audacity. Built-in effects are given to remove static, hiss, hum or other constant background noises.

Power Sound Editor Power Sound Editor is a visual audio editing and recording solution, which supports many

advanced and powerful operations with audio data.

MP3DirectCut mp3DirectCut is a fast and extensive audio editor and recorder for compressed mp3. You

can directly cut, copy, paste or change the volume with no need to decompress your files for audio editing. Using Cue sheets, pause detection or Auto cue you can easily divide long files.

Page 44: Media Processing – Audio Part

44

Free digital audio recording and editing software (cont.)

Music Editor Free Music Editor Free (MEF) is a multi-award winning music editor software tool. MEF helps

you to record and edit music and sounds. It lets you make and edit music, voice and other audio recordings. When editing audio files you can cut, copy and paste parts of recordings and, if required, add effects like echo, amplification and noise reduction.

Wavosaur Wavosaur is a free sound editor, audio editor, wav editor software for editing, processing

and recording sounds, wav and mp3 files. Wavosaur has all the features to edit audio (cut, copy, paste, etc.) produce music loops, analyze, record, batch convert. Wavosaur supports VST plugins, ASIO driver, multichannel wav files, real time effect processing. The program has no installer and doesn’t write in the registry. Use it as a free mp3 editor, for mastering, sound design.

Ardour Ardour is a digital audio workstation. You can use it to record, edit and mix multi-track

audio. You can produce your own CDs, mix video soundtracks, or just experiment with new ideas about music and sound.

Source: http://www.hongkiat.com/blog/25-free-digital-audio-editors/, where you can find more free audio recording software from this link.

Page 45: Media Processing – Audio Part

45

Reference F. Rumsey and T. McCormick, Sound and Recording: an Introduction, 2nd

Edition, 1994.