ITC UNIT II

8/8/2019 ITC UNIT II

1/31


2/31


3/31


4/31


5/31

SOURCE CODING- AUDIO

Perceptual Coding (PC) PC are designed for compression of general audio such as that

associated with a digital television broadcast.

sampled segments of the source audio waveform are analysed

but only those features that are perceptible to the ear are

transmitted

E.g although the human ear is sensitive to signals in the range

15Hz to 20 kHz, the level of sensitivity to each signal is non-linear; that is the ear is more sensitive to some signals than

others.


6/31

When multiple signals are present as in audio a

strong signalmay reduce the level of sensitivity ofthe ear to other signals which are near to it in

frequency. The effect is known as frequency

masking

When the ear hears a loud sound it takes a short but

a finite time before it could hear a quieter sound

.This effect is known as temporal masking


7/31

Sensitivity of the ear

The dynamic range of ear is defined as the ratio of themaximum amplitude of the signal to the minimumamplitude and is measured in decibels(dB)

Itis the ratio of loudestsound itcan hearto the quietestsound

Sensitivity of the ear varies with the frequency of the signal The ear is most sensitive to signals in the range 2-5kHz

hence the signals in this band are the quietest the ear issensitive to

Vertical axis gives all the other signal amplitudes relative to

this signal (2-5 kHz) Signal A is above the hearing threshold and B is below the

hearing threshold


8/31

Audio Compression Perceptual properties of the

human ear

Perceptual encoders have been designed for the

compression of general audio such as that associated with a

digital television broadcast


9/31

Audio Compression Perceptual properties of the

human ear

When an audio sound consists of multiple frequency signals

is present, the sensitivity of the ear changes and varies with

the relative amplitude of the signal


10/31

Signal B is larger than signal A. This causes the basic

sensitivity curve of the ear to be distorted in the region ofsignal B

Signal A will no longer be heard as it is within the distortion

band


11/31

Variation with frequency of effect of frequency

masking

The width of each curve at a particular signal level is known

as the critical bandwidth for that frequency


12/31

Variation with frequency of effect of

frequency masking

The width of each curve at aparticular signallevel is known

as the critical bandwidth

It has been observed that for frequencies less than 500Hz,

the critical bandwidth is around 100Hz, however, forfrequencies greater than 500Hz then bandwidth increases

linearly in multiples of 100Hz

Hence if the magnitude of the frequency components that

make up an audio sound can be determined, it becomes

possible to determine those frequencies that will be

masked and do not therefore need to be transmitted


13/31

Temporal masking

After the ear hears a loud sound it takes a further short

time before it can hear a quieter sound

This is known as the temporal masking

After the loud sound ceases it takes a short period of time

for the signal amplitude to decay

During this time, signals whose amplitudes are less than

the decay envelope will not be heard and hence need not

be transmitted

In order to achieve this the input audio waveform must beprocessed over a time period that is comparable with that

associated with temporal masking


14/31

Audio Compression Temporal masking caused by

loud signal

After the ear hears a loud signal, it takes a further short

time before it can hear a quieter sound (temporal masking)


15/31

Audio Compression MPEG perceptual coder

schematic


16/31

MPEG audio coder

The audio input signal is first sampled and quantized using

PCM The bandwidth available for transmission is divided into a

number of frequency subbands using a bank of analysisfilters

The bank of filters maps each set of 32 (time related) PCM

samples into an equivalent set of 32 frequency samples Processing associated with both frequency and temporal

masking is carried out by the psychoacoustic model

In basic encoder the time duration of each sampled

segment of the audio input signal is equal to the time toaccumulate 12 successive sets of 32 PCM

12 sets of 32 PCM are converted into frequencycomponents using DFT


17/31

MPEG audio coder

The output of the psychoacoutic model is a set of what are

known as signal-to-mask ratios (SMRs) and indicate thefrequency components whose amplitude is below the

audible components

This is done to have more bits for highest sensitivity regions

compared with less sensitive regions

In an encoder all the frequency components are carried in

a frame


18/31

Audio Compression MPEG perceptual coder

schematic


19/31

MPEG audio coder frame format

The header contains information such as the samplingfrequency that has been used

The quantization is performed in two stages using a form ofcompanding

The peak amplitude level in each subband is first quantizedusing 6 bits and a further 4 bits are then used to quantizethe 12 frequency components in the subband relative tothis level

Collectively this is known as the subband sample (SBS)format

The ancillary data fieldat the end of the frame is optional.

it is used to carry additional coded samples associatedwith the surround-sound that is present with some digital

video broadcasts


20/31

MPEG audio coder frame format

At the decoder section the dequantizers will determine the

magnitude of each signal The synthesis filters will produce the PCM samples at the

decoders


21/31

HUMAN VOICE

The human voice consists of sounds generated by the opening

and closing of the glottis by the vocal cords, which produces a

periodic waveform with many harmonics.

This basic sound is then filtered by the nose and throat toproduce differences in harmonic content (formants) in a

controlled way

The unvoiced sounds are not modified by the mouth in the

same fashion.


22/31

(Extremely) Simplified Model of Speech

Production


23/31

The Vocoder

The vocoder examines speech by finding this basic carrier

wave, which is at the fundamental frequency, and measuring

how its spectral characteristics are changed over time.

This results in a series of numbers representing thesemodified frequencies at any particular time as the user

speaks.

In doing so, the vocoder dramatically reduces the amount of

information needed to store speech. To recreate speech, the vocoder simply reverses the process,

creating the fundamental frequency in an oscillator, then

passing it through a stage that filters the frequency content

based on the original.


24/31

CHANNEL VOCODER


25/31


26/31


27/31


28/31


29/31


30/31


31/31

ITC UNIT II

Documents

Transcript of ITC UNIT II