8 Speech Coding.doc

8/11/2019 8 Speech Coding.doc

1/13

Speech Coding

Analog to Digital ConversionIn order to fully understand speech and channel coding it is easier to start from the very beginning of the process. The first step in speech coding isto transform the sound waves of our voices (and other ambient noise) into an electrical signal. This is done by a microphone . A microphone consistsof a diaphragm, a magnet, and a coil of wire. When you speak into it, sound waves created by your voice vibrate the diaphragm which is connectedto the magnet which is inside the coil of wire. These vibrations cause the magnet to move inside the coil at the same fre uency as your voice. Amagnet moving in a coil of wire creates an electric current. This current which is at the same fre uency as the sound waves is carried by wires towhereever you wish it to go like an amplifier, transmitter, etc. !nce it gets to its destination the process is reversed and it comes out as sound."peakers basically being the opposite of microphones. The signal created by a microphone is an analog signal. "ince #"$ is an all digital system,

this analog signal is not suitable for use on a #"$ network. The analog signal must be converted into digital form. This is done by using an Analogto %igital &onverter (A%&).

In order to reduce the amount of data needed to represent the sound wave, the analog signal is first inputted into a band pass filter . 'and passmeans that the filter only allows signal that fall within a certain fre uency range to pass through it, and all other signals are cut off, or attenuated .The ' filter only allows fre uencies between **+ and .- k+ to pass through it. This limits the amount of data that the Analog %igital&onverter is re uired to process.


2/13

Band Pass Filter

The filtered signal is inputted into the analog digital converter. The analog digital converter performs two tasks. It converts an analog signal into adigital signal and it does the opposite, converts a digital signal into an analog signal.

In the case of a cell phone, the analog signal created by a microphone is passed to the analog digital converter. The A % converter measures theanalog signal, or samples it /*** times per second. This means that the A%& takes a sample of the analog signal every .012 sec (012 3s). 4achsample is uantified with a 0 5bit data block. If we calculate 0 bits per sample at /*** samples per second, we determine a data rate of 0*-,***

bits per second, or 0*- kb s.


3/13

Analog/Digital Converter

A data rate of 0*- kbps is far too large to be economically handled by a radio transmitter. In order to reduce the bitrate, the signal is inputted into aspeech encoder.A speech encoder is a device that compresses the data of a speech signal. There are many types of speech encoding schemesavailable. The speech encoder used in #"$ is called 6inear redictive &oding (6 &) and 7egular ulse 48citation (7 4). 6 & is a verycomplicated and math5heavy process, so it will only be summari ed here.

Linear Predictive Coding (LPC)7emember that the A%& uantifies each audio sample with a 0 5bit 9word9. In 6 &, 0:* of the 0 5bit samples from the converter are saved up andstored into short5term memory. 7emember that a sample is taken every 012 3s, so 0:* samples covers an audio block of 1*ms. This 1*ms audio

block consists of 1*/* bits. 6 &57 4 analyes each 1*ms set of data and determines / coefficients used for filtering as well as an e8citation signal.


4/13

6 & basically identifies specific bits that correspond to specific aspects of human voice, such as vocal modifiers (teeth, tongue, etc.) and assignscoefficients to them. The e8citation signal represents things like pitch and loudness. 6 & identifies a number of correlations of human voice andredundancies in human speech and removes them.

The 6 & 7 4 se uence is then fed into the 6ong5Term rediction (6T ) Analysis function. The 6T function compares the se uence it receiveswith earlier se uences stored in its memory and selects the se uence that most resembles the current se uence. The 6T function then calculates thedifference between the two se uences. ;ow the 6T function only has to translate the difference value as well as a pointer indicating which earlierse uence it used for comparison. 'y doing this is prevents encoding redundant data.


5/13

Speech Encoding

This bitrate of 0 kbps is known as >ull 7ate "peech (>"). There is another method for encoding speech called +alf 7ate "peech (+"), which

results in a bit rate of appro8imately 2.:kbps. The e8planations in the remainder of this tutorial are based on a full5rate speech bitrate (0 kbps).

Calculate the net data rate:

Description Formula Result

&onvert ms to sec 1* ms ? 0*** .*1 seconds

&alculate bits per second 1:* bits ? .*1 seconds 0 ,*** bits per second (bps)

&onvert bits to kilobits 0 ,*** bps ? 0*** 0 kilobits per sec (kbps)

As we all know, the audio signal must be transmitted across a radio link from the handset to the 'ase "tation Transceiver ('T"). The signal on thisradio link is sub@ect to atmospherics and fading which results in a large amount of data loss and degrades the audio. In order to prevent degradationof audio, the data stream is put through a series of error detection and error correction procedures called channel coding . The first phase of channelcoding is called block coding .

Block CodingA single 1:*5bit (1*ms) audio block is delivered to the block5coder. The 1:* bits are divided up into classes according to their importance inreconstructing the audio. &lass I are the bits that are most important in reconstructing the audio. The class II bits are the less important bits. &lass I

bits are further divided into two categories, Ia and Ib.


6/13

Classes of Bits

The class Ia bits are protected by a cyclic code. The cyclic code is run on the 2* Ia bits and calculates parity bits which are then appended to theend of the Ia bits. !nly the class Ia bits are protected by this cyclic code. The Ia and Ib bits are then combined and an additional - bits are added tothe tail of the class I bits (Ia and Ib together). All four bits are eros (****) and are needed for the ne8t step which is 9convolutional coding9. Thereis no protection for class II bits. As you can see, block coding adds seven bits to the audio block, parity bits and - tail bits, therefore, a 1:*5bit

block becomes a 1: 5bit block.


7/13

Block Coding

Convolutional CodingThis 1: 5bit block is then inputted into a convolutional code. &onvolutional coding allows errors to be detected and to be corrected to a limiteddegree. The class I 9protected9 bits are inputted into a comple8 convolutional code that outputs 1 bits for every bit that enters it. The second bit that

is produced is known as a redundancy bit. The number of class I bits is doubled from 0/B to /.


8/13

This coding uses 2 consecutive bits to calculate the redundancy bit, this is why there are - bits added to the class I bits when the cyclic code wascalculated. When the last data bit enters the register, it uses the remaining four bits to calculate the redundancy bit for the last data bit. The class II

bits are not run through the convolutional code. After convolutional coding, the audio block is -2: bits

Convolutional Coding


9/13

Reordering, Partitioning, and Interleaving ;ow, one problem remains. All of this error detection and error correction coding will not do any good if the entire -2:5bit block is lost or garbled.In order to alleviate this, the bits are reordered and partioned onto eight separate sub5blocks. If one sub5block is lost then only one5eighth of thedata for each audio block is lost and those bits can be recovered using the convolutional code on the receiving end. This is known as interleaving .

4ach -2:5bit block is reordered and partitioned into / sub5blocks of 2 bits eachThese eight 2 5bit sub5blocks are then interleaved onto / separate bursts. As you remember from the T%$A Tutorial, each burst is composed oftwo 2 5bit data blocks, for a total data payload of 00- bits.

The first four sub5blocks (* through ) are mapped onto the even bits of four consecutive bursts. The last four sub5blocks (- through ) are mappedonto the odd bits of the ne8t - consecutive bursts. "o, the entire block is spread out across / separate bursts.

Taking a look at the diagram below we see three -2:5bit blocks, labeled A, ', and &. 4ach block is sub5divided into eight sub5blocks numbered *5 .6etCs take a look at 'lock '. We can see that each sub5block is mapped to a burst on a single time5slot. 'lock ' is mapped onto / separate bursts ortime5slots. >or illustrative purposes, the time5slots are labeled " through D.

6etCs e8pand time5slot E for a close5up view. We can see how the bits are mapped onto a burst. The bits from 'lock ', sub5block (' ) are mappedonto the even numbered bits of the burst (bits *,1,-....0*/,00*,001).


10/13

Reordering, Partitioning, and Interleaving

In the following diagram, we e8amine time5slot W. We see that bits from '- are mapped onto the odd5number bits (bits 0, ,2....0*B,000,00 ) andwe would see bits from &0 mapped onto the even number bits (bits *,1,-....0*/,00*,001). This process continues indefinitely as data is transmitted.Time5slots W, F,


11/13

Interleaving

The process of interleaving effectively distributes a single -2: bit audio block over / separate bursts. If one burst is lost, only 0 / of the data is lost,and the missing bits can be recovered using the convolutional code.

;ow, you might notice that the data it takes to represent a 1*ms (-2:5bits) audio block is spread out across / timeslots. If you remember that eachT%$A frame is appro8imately -.:02ms, we can determine that it takes about ms to transmit one single -2:5bit block. It seems like transmitting1*ms worth of audio over a period of ms would not work. +owever, this is not what is truly happening. If you look at a series of blocks as theyare mapped onto time5slots you will notice that one sub5block ends every four time5slots, which is appro8imately 0/ms. The only effect this has isthat the audio stream is effectively delayed by 1*ms, which is truly negligible.

In the diagram below, we can see how this works. The diagram shows 0: bursts. 7emember that a burst occurs on a single time5slot and the the


12/13

duration of a time5slot is 2 3s. 4ight time5slots make up a T%$A frame, which is -.:02ms. "ince a single resource is only given one time5slot inwhich to transmit, we only get to transmit once every T%$A frame. Therefore, we only get to transmit one burst every -.:02ms.G If this is not clear, please review the T%$A Tutorial.

%uring each time5slot, a burst is transmitted that carries data from two different -2:5bit blocks. In the diagram below, 'urst 0 carries data from Aand ', burst 2 has ' and &, burst B has & and %, etc. 6ooking at the diagram, we can see that it does take appro8imately ms for 'lock ' totransmit all of its data, (bursts 05/). +owever, in bursts 25/, data from block & is also being transmitted. !nce block ' has finished transmitting allof its data (burst /), block & has already transmitted half of its data and only re uires - more bursts to complete its data.

'lock A completes transmitting its data at the end of the fourth burst. 'lock ' finishes in the eighth, block &, in the 01th, and block % in the 0:th.Eiewing it this way shows us that every fourth burst comepletes the data for one block, which takes appro8imately 0/ms.

The following diagram illustrates the entire process, from audio sampling to partitioning and interleaving.


13/13

%ata and signalling messages will be covered in a future tutorial.

8 Speech Coding.doc

Documents

Transcript of 8 Speech Coding.doc