A STUDY OF DESIGN COMPROMISES FOR SPEECH CODERS IN PACKET NETWORKS

1
A STUDY OF DESIGN COMPROMISES A STUDY OF DESIGN COMPROMISES FOR SPEECH CODERS IN PACKET NETWORKS FOR SPEECH CODERS IN PACKET NETWORKS 1.INTRODUCTION In voice over packet networks, the coding gain achieved by prediction-based speech coders is offset by packet losses. Concealment must be applied to the missing packets, which reduces quality for two main reasons : not all missing packets can be concealed, especially when concealment uses only the past signal onsets, transients the concealment error can propagate over several frames, even frames received correctly culprit : desynchronisation of the excitation content (LTP) We propose to compare two approaches for alleviating this problem : Adding redundancy to increase the robustness of a baseline predictive encoder (G.729) Using a speech coding model which does not have interframe dependencies (iLBC) To be compared, solutions should have comparable bit rates 2. ADDED REDUNDANCY versus FRAME INDEPENDENCE 0 5 10 15 20 25 1 1.5 2 2.5 3 3.5 4 4.5 5 Fram e E rasure R ate (% ) MOS O RIG INAL G .729E (0% FER) G.729-4 G.729-3 iLBC G.729-1 G.729-2 G.729-0 6. LISTENING TEST RESULTS 7. CONCLUSIONS R (kbps) D (ms) 16 45 14.1 45 15.2 25 12 35 14.1 25 8 25 3. PROPOSED APPROACHES FOR ADDING REDUNDANCY 4. EFFECT ON ERROR PROPAGATION 0 200 400 600 800 1000 -1 0 1 2 3 4 5 6 7 8 x 10 4 (a) (b) (c) (d) (e) (f) (g) 5. SUBJECTIVE EXPERIMENT A formal listening test was conducted to compare the different solutions for increasing the robustness in case of missing packets. The main features of this test are : clean speech, narrowband, IRS filtered 4 male, 4 female speakers 32 naive listeners listening using binaural headphones following guidelines of ITU-T Rec. P.800 36 conditions in total, including MNRU and other reference conditions 0 – 20% random packet losses, synchronized between iLBC and G.729 20 ms packet 3rd Packet lost G.729 synthesis G.729-0 error at decoder G.729-1 error at decoder G.729-2 error at decoder G.729-3 error at decoder G.729-4 error at decoder iLBC error at decoder (compared to iLBC synthesis without frame loss) 20 ms frame encoded in « absolute » G.729-0 : Consider only G.729 at 8 kbps (baseline predictive coder) and add redundancy to obtain bit rates similar to iLBC at 15.2 kbps. 20 ms packet (two G.729 frames) P k-1 P k P k+1 F 2k-2 F 2k-1 F 2k F 2k+1 F 2k+2 F 2k+3 G.729 frame packet 0 2 4 6 8 10 12 14 16 18 20 0 5 10 15 20 25 30 35 40 45 50 B itrate (kbps) D elay (m s) G.729 -0 G.729 -1 G.729 -2 iLBC G.729 -3 G.729 -4 (Point size proportional to quality at 10 % FER) G.729-1 : Content of each 20-ms packet : Bit rate and algorithmic delay F 2k-2 F 2k-1 F 2k P k-1 P k P k+1 F 2k F 2k+1 F 2k+2 F 2k+2 F 2k+3 F 2k+4 G.729-2 / G.729-3 : F 2k-2 F 2k-1 F’ 2k-3 P k-1 P k P k+1 F 2k F 2k+1 F’ 2k-1 F 2k+2 F 2k+3 F’ 2k+1 F’ 2k-4 F’ 2k-2 F’ 2k F 2k-2 F 2k-1 F 2k-3 P k-1 P k P k+1 F 2k F 2k+1 F 2k-1 F 2k+2 F 2k+3 F 2k+1 F 2k-4 F 2k-2 F 2k G.729-4 : F 2k-2 F 2k-1 P k-1 P k P k+1 F 2k F 2k+1 F 2k+2 F 2k+3 In G.729-2 and G.729-3, F’ k denotes F k but without the 18 LSF bits and pitch parity bit (hence, frame F’ k has 19 bits less than frame F k ). The missing ISFs have to be extrapolated at the decoder when a missing frame occurs. G.729-2 and G.729-3 differ at the decoder : G.729-2 : Decode packet P k when it arrives (do not wait for packet P k+1 ). If packet P k is missing, then apply concealment followed by resynchronisation of filter memories using F’ 2k and F’ 2k+1 that are received when packet P k+1 arrives. Then, start decoding packet P k+1 . G.729-3 : Decode packet P k only after packet P k+1 has arrived (additional delay of 20 ms). If packet P k was missing, then just use F’ 2k and F’ 2k+1 that are added as redundancy in packet P k+1 . No concealment is applied in this case. G.729-4 : At the decoder, wait for packet P k+1 before decoding packet P k . G.729-0 : Every missing 20-ms packet implies that two consecutive 10-ms frames of G.729 are lost. Concealment and propagation introduce large artefacts. G.729-1 : Every missing 20-ms packet reduces to a single 10-ms frame loss in G.729. Concealment is more optimal, and propagation is reduced. G.729-2 : Concealment followed by approximate resynchronisation of filter memories. G.729-3 : Limited concealment (there would be no concealment if F’ was equal to F). G.729-4 : No effective loss in all single packet losses. ILBC : Concealment, but limited error propagation (only due to post- filtering at decoder to smooth frame transitions). From the test results, we can make the following conclusions : In clean channel conditions, iLBC at 15.2 kbps has equivalent quality to G.729 at 8 kbps (i.e. a much higher bit rate is necessary in a « frame- independent » coder to increase both the quality in clean channel and frame loss conditions). extreme example = G.711 at 64 kbps The best quality in frame loss conditions was achieved by using a low-rate CELP coder with added redundancy and delay (G.729-4), with a total bit rate close to iLBC (16 kbps compared to 15.2 kbps) The approaches studied to increase robustness represent only a subset of all possible combinations. Only solutions based on a standard CELP-coder (G.729) were considered, with some of them not optimal (ex.: G.729-2). Improved results could be expected by designing a solution without the constraint of using standard core codecs. The G.729 RTP payload can already support solutions G.729-1 and G.729-4. Roch Lefebvre, Roch Lefebvre, Philippe Gournay Philippe Gournay University of Sherbrooke University of Sherbrooke Sherbrooke, Quebec, Canada Sherbrooke, Quebec, Canada Redwan Salami Redwan Salami VoiceAge Corp. VoiceAge Corp. Montreal, Quebec, Canada Montreal, Quebec, Canada % FER Quality (robustness to frame loss) 0 Codec_P Codec_FI or Codec_P + R Codec_P + R + Delay Codec_P R Redundancy Codec_FI Total payload bit rate Approach 1 : Use a lower bit rate, predictive (CELP) coder, and add channel redundancy to improve robustness to missing frames. Approach 2 : Use a higher bit rate, non-predictive or « frame- independent » codec, to improve robustness to missing frames in the core codec itself. Anticipated gains in quality 10 ms frame Long-term prediction Long-term prediction Past excitation Codec_P : G.729 (CELP-based) Codec_FI : iLBC (Freame-independent) 11.8 15

description

F 2k-2 F 2k-1 F 2k F 2k+1 F 2k+2 F 2k+3. R. Past excitation. P k-1 P k P k+1. Long-term prediction. 10 ms frame. 20 ms frame. Codec_P. Redundancy. Quality (robustness to frame loss). P k-1. P k-1. P k-1. P k-1. P k. P k. P k. P k. P k+1. P k+1. P k+1. - PowerPoint PPT Presentation

Transcript of A STUDY OF DESIGN COMPROMISES FOR SPEECH CODERS IN PACKET NETWORKS

Page 1: A STUDY OF DESIGN COMPROMISES FOR SPEECH CODERS IN PACKET NETWORKS

A STUDY OF DESIGN COMPROMISESA STUDY OF DESIGN COMPROMISESFOR SPEECH CODERS IN PACKET NETWORKSFOR SPEECH CODERS IN PACKET NETWORKS

1. INTRODUCTION

In voice over packet networks, the coding gain achieved by prediction-based speech coders is offset by packet losses. Concealment must be applied to the missing packets, which reduces quality for two main reasons :

• not all missing packets can be concealed, especially when concealment uses only the past signal

onsets, transients

• the concealment error can propagate over several frames, even frames received correctly

culprit : desynchronisation of the excitation content (LTP)

We propose to compare two approaches for alleviating this problem :

• Adding redundancy to increase the robustness of a baseline predictive encoder (G.729)

• Using a speech coding model which does not have interframe dependencies (iLBC)

• To be compared, solutions should have comparable bit rates

2. ADDED REDUNDANCY versus FRAME INDEPENDENCE

0 5 10 15 20 251

1.5

2

2.5

3

3.5

4

4.5

5

Frame Erasure Rate (%)

MO

S

ORIGINAL

G.729E (0% FER)

G.729-4

G.729-3

iLBCG.729-1

G.729-2

G.729-0

6. LISTENING TEST RESULTS

7. CONCLUSIONS

R (kbps) D (ms)

16 45

14.1 45

15.2 2512 35

14.1 25

8 25

3. PROPOSED APPROACHES FOR ADDING REDUNDANCY

4. EFFECT ON ERROR PROPAGATION

0 200 400 600 800 1000-1

0

1

2

3

4

5

6

7

8x 10

4

(a)

(b)

(c)

(d)

(e)

(f)

(g)

5. SUBJECTIVE EXPERIMENT

A formal listening test was conducted to compare the different solutions for increasing the robustness in case of missing packets. The main features of this test are :

• clean speech, narrowband, IRS filtered

• 4 male, 4 female speakers

• 32 naive listeners

• listening using binaural headphones

• following guidelines of ITU-T Rec. P.800

• 36 conditions in total, including MNRU and other reference conditions

• 0 – 20% random packet losses, synchronized between iLBC and G.729

20 mspacket3rd

Packetlost

G.729 synthesis

G.729-0 error at decoder

G.729-1 error at decoder

G.729-2 error at decoder

G.729-3 error at decoder

G.729-4 error at decoder

iLBC error at decoder (compared to iLBC synthesis without frameloss)

20 ms frame

encoded in« absolute »

G.729-0 :

Consider only G.729 at 8 kbps (baseline predictive coder) and add redundancy to obtain bit rates similar to iLBC at 15.2 kbps.

20 ms packet(two G.729 frames)

Pk-1 Pk Pk+1

F2k-2 F2k-1 F2k F2k+1 F2k+2 F2k+3G.729 frame

packet

0 2 4 6 8 10 12 14 16 18 200

5

10

15

20

25

30

35

40

45

50

Bit rate (kbps)

Del

ay (

ms)

G.729-0

G.729-1

G.729-2 iLBC

G.729-3

G.729-4

(Point size proportional to quality at 10 % FER)

G.729-1 :

Content of each 20-ms packet :

Bit rate and algorithmic delay

F2k-2 F2k-1

F2k

… …

Pk-1 Pk Pk+1

F2k F2k+1

F2k+2

F2k+2 F2k+3

F2k+4

G.729-2 /G.729-3 : F2k-2 F2k-1

F’2k-3

… …

Pk-1 Pk Pk+1

F2k F2k+1

F’2k-1

F2k+2 F2k+3

F’2k+1F’2k-4 F’2k-2 F’2k

F2k-2 F2k-1

F2k-3

… …

Pk-1 Pk Pk+1

F2k F2k+1

F2k-1

F2k+2 F2k+3

F2k+1F2k-4 F2k-2 F2k

G.729-4 :

F2k-2 F2k-1… …

Pk-1 Pk Pk+1

F2k F2k+1 F2k+2 F2k+3

In G.729-2 and G.729-3, F’k denotes Fk but without the 18 LSF bits and pitch parity bit (hence, frame F’k has 19 bits less than frame Fk). The missing ISFs have to be extrapolated at the decoder when a missing frame occurs.

G.729-2 and G.729-3 differat the decoder :

G.729-2 :Decode packet Pk when it arrives (do not wait for packet Pk+1). If packet Pk is missing, then apply concealment followed by resynchronisation of filter memories using F’2k and F’2k+1 that are received when packet Pk+1 arrives. Then, start decoding packet Pk+1.

G.729-3 :Decode packet Pk only after packet Pk+1 has arrived (additional delay of 20 ms). If packet Pk was missing, then just use F’2k and F’2k+1 that are added as redundancy in packet Pk+1. No concealment is applied in this case.G.729-4 : At the decoder, wait for packet Pk+1 before decoding packet Pk.

G.729-0 : Every missing 20-ms packet implies that two consecutive 10-ms frames of G.729 are lost. Concealment and propagation introduce large artefacts.

G.729-1 : Every missing 20-ms packet reduces to a single 10-ms frame loss in G.729. Concealment is more optimal, and propagation is reduced.

G.729-2 : Concealment followed by approximate resynchronisation of filter memories.

G.729-3 : Limited concealment (there would be no concealment if F’ was equal to F).

G.729-4 : No effective loss in all single packet losses.

ILBC : Concealment, but limited error propagation (only due to post-filtering at decoder to smooth frame transitions).

From the test results, we can make the following conclusions :

• In clean channel conditions, iLBC at 15.2 kbps has equivalent quality to G.729 at 8 kbps (i.e. a much higher bit rate is necessary in a « frame-

independent » coder to increase both the quality in clean channel and frame loss conditions). extreme example = G.711 at 64 kbps

• The best quality in frame loss conditions was achieved by using a low-rate CELP coder with added redundancy and delay (G.729-4), with a total bit rate close to iLBC (16 kbps compared to 15.2 kbps)

• The approaches studied to increase robustness represent only a subset of all possible combinations. Only solutions based on a standard CELP-coder (G.729) were considered, with some of them not optimal (ex.: G.729-2).

Improved results could be expected by designing a solution without the constraint of using standard core codecs.

• The G.729 RTP payload can already support solutions G.729-1 and G.729-4.

Roch Lefebvre,Roch Lefebvre, Philippe GournayPhilippe GournayUniversity of SherbrookeUniversity of Sherbrooke

Sherbrooke, Quebec, CanadaSherbrooke, Quebec, Canada

Redwan SalamiRedwan SalamiVoiceAge Corp.VoiceAge Corp.

Montreal, Quebec, CanadaMontreal, Quebec, Canada

% FER

Quality(robustnessto frame loss)

0

Codec_P

Codec_FI orCodec_P + R

Codec_P + R + Delay

Codec_P

R

Redundancy

Codec_FI

Total payload bit rate

Approach 1 : Use a lower bit rate, predictive (CELP) coder, and add channel redundancy to improve robustness to missing frames.

Approach 2 : Use a higher bit rate, non-predictive or « frame-independent » codec, to improve robustness to missing frames in the core codec itself.

Anticipated gains in quality

10 ms frame

Long-termprediction

Long-termprediction

Past excitation

Codec_P : G.729 (CELP-based)

Codec_FI : iLBC (Freame-independent)

11.8 15