speechqualitytestingsolutionmoswhitepaperv122-140510224021-phpapp02.pdf

Dingli (27/7/2013) DL1AMOSWP Rev1 1 / 37

Speech Quality Testing Solution

(MOS) Whitepaper


Revision History Date Version Author Description

2013-05-06 1.0 Geng

Xiaoming

First Edition

2013-07-27 1.1 Zhang

Qifei

Change document organization, optimize MOS

testing solution, and add POLQA parameters


Contents 1 Introduction ............................................................................................................................... 4

1.1 Document Organization ............................................................................................ 4

1.2 Abbreviations and Acronyms .................................................................................... 4

2 Development of Speech Quality Testing ................................................................................... 6

3 PESQ Algorithm ....................................................................................................................... 9

3.1 Introduction ............................................................................................................... 9

3.2 PESQ Algorithm Structure ........................................................................................ 9

4 POLQA Algorithm .................................................................................................................. 12

4.1 Introduction ............................................................................................................. 12

4.2 Evolution of POLQA Based on PESQ .................................................................... 12

4.3 Super-Wideband and Narrowband in POLQA ........................................................ 13

4.4 POLQA Perceptual Model and Structure ................................................................ 14

4.5 Ideal POLQA MOS Values in Different Network Environment ............................. 15

5 Dingli MOS Testing Solution .................................................................................................. 16

5.1 Product History ....................................................................................................... 16

5.2 Added Value of Dingli MOS Testing System .......................................................... 16

5.3 Dingli MOS Test System ......................................................................................... 17

5.4 MOS Testing Solutions ........................................................................................... 19

5.4.1 MOS Testing Solution Based on Pilot Pioneer ......................................... 19

5.4.2 MOS Analysis Solution Based on Pilot Navigator ................................... 21

5.4.3 MOS Testing Solution Based on Pilot Fleet (RCU) ................................. 22

5.4.4 MOS Testing Solution Based on Pilot Walktour ....................................... 24

5.5 Details About Dingli MOS Testing Solution ........................................................... 25

5.5.1 Impact of Network Factors on PESQ MOS Value ................................... 25

5.5.2 Parameters in POLQA Testing Result ....................................................... 29

5.5.3 Reference PESQ-LQ Value: Some practical examples .......................... 32

6 Appendix 1: ITUT P.862 Standard Relevant Information ................................................... 33

7 References ............................................................................................................................... 37


11 IInnttrroodduuccttiioonn

This document describes the development and evolution of speech quality testing

technologies in Telecommunications network, and focuses on two kinds of objective

testing methods--PESQ and POLQA.

1.1 Document Organization

Chapter 1: Introduction to document objectives, document organization, and

abbreviations and acronyms

Chapter 2: Basics and development of speech quality testing

Chapter 3: Objective speech quality testing method PESQ algorithm

Chapter 4: New speech quality testing method recommended by ITU POLQA

algorithm

Chapter 5: Dinglis speech quality evaluation solutions

1.2 Abbreviations and Acronyms

Table 1 Abbreviations and acronyms

Item Full Spelling Description

ITU International Telecommunication

Union

MOS Mean Opinion Score

MOS-LQO MOS Listening Quality Objective

AMR Adaptive Multi-Rate Codec

AAC Advance Audio Codec

FER Frame Error Rate

BER Bit Error Rate

POTS Plain Old Telephone System

GSM Global System for Mobile

Communications

CDMA Code Division Multiple Access

LTE Long Term Evolution

rkumar4Highlight


Item Full Spelling Description

PAMS Perceptual Analysis Measurement

System Recommendation of ITU-T P.800

PSQM Perceptual Speech Quality

Measure Recommendation of ITU-T P.861

PESQ Perceptual Evaluation of Speech

Quality Recommendation of ITU-T P.862

POLQA Perceptual Objective Listening

Quality Assessment Recommendation of ITU-T P.863

VoIP Voice Over IP

P.56 ITU-T Rec. P.56 Recommendation of ITU-T P.56

rkumar4Highlight


22 DDeevveellooppmmeenntt ooff SSppeeeecchh QQuuaalliittyy TTeessttiinngg

Speech quality testing on different networks with a unified standard can be a challenge

due to different communication technologies. For example, in GSM network, RxQual

(based on BER) is used to evaluate the speech quality; and in CDMA network, FER is

used to evaluate the speech quality. In addition, even within the same network, a single

RxQual or FER value cannot represent the true speech quality. A professional speech

quality testing system should be an end-to-end and user-based system. Therefore, a

general testing method is required to perform direct comparative testing for different

networks.

Based on the test subject, speech quality testing can be divided into two categories:

subjective and objective test.

Subjective test depends on listeners to evaluate the communication system performance.

According to the studies in ITU-T P.800 and ITU-T P.830, about 40 to 60 trained listeners

are required to perform subjective perceptual comparison for reference signal and

degraded signal based on detailed criteria. Score on the degraded signal is in accordance

with the MOS scoring standard (scoring from 0 to 5). This way, a final MOS value is

obtained.

Table 2 MOS scoring standard

Degree MOS Value Users Satisfaction

Excellent 5.0

Very clear

No distortion

No delay

Good 4.0

Clear

Small delay

Little noise

Fair 3.0

Unclear

A certain of delay

A certain of noise

rkumar4Highlight


Degree MOS Value Users Satisfaction

A certain of distortion

Poor 2.0

Unclear

Big noise or interruption

Serious distortion

Bad 1.0 Silence or totally unclear

Very big noise

The subjective testing result is most reliable, and this method can be used to evaluate

network performance and quality of speech with any speech coding mode. However, its

disadvantages are obvious. In the test, factors (such as evaluation environment and

listener) should be strictly controlled; the speech material must be carefully selected;

otherwise, the final result may be affected. All these makes the test time-consuming,

laborious, difficult to organize, and with poor repeatability. As a result, a more efficient and

repeatable method is required in the actual test, that is, an objective testing method.

In practice, objective testing depends on the parameter comparison (of reference and

degraded speech signal) in time and frequency domain, while the test result is calculated

by hardware or software. Some objective testing methods such as PAMS and PSQM are

introduced during the research of objective speech quality testing. However, these

methods have significant limitations. The test result is affected by particular speech codec,

and in some cases, the result is much different from the MOS value in subjective testing.

In ITU-T P.862-2001, the core speech quality testing method is upgraded to PESQ

algorithm which integrates all advantages of previous algorithms. The PESQ test result is

very close to the MOS value in the subjective testing, and PESQ algorithm is widely

accepted and applied.

Later, with the development and evolution of new communication technologies, POLQA

algorithm is developed to support new speech codecs and super-wideband speech, and

handle the time factor in VoIP. Compared with previous algorithms, introduction of POLQA

algorithm to the unified and complex communication networks will achieve significant


improvements in accuracy and credibility. In addition, the capability of handling time factor

makes POLQA algorithm applicable to any speech quality testing scenarios. POLQA

algorithm includes two modes: NB (Narrow Band) and SWB (Supper Wideband),

corresponding to different speech samples with different frequencies. POLQA algorithm

has become the recommended algorithm in ITU-T P.863-2011.

Figure 2.1 shows the evolution of ITU-T recommendations for speech quality testing. The

information is obtained from http://www.polqa.info/index.html.

Figure 2.1 Evolution of ITU-T recommendations for speech quality testing


33 PPEESSQQ AAllggoorriitthhmm

3.1 Introduction

PESQ algorithm became an ITU-T recommendation (P.862) for objective speech

quality testing in February 2001. Due to its innovative features and correlation accuracy,

PESQ quickly becomes the mainstream speech quality testing method. PESQ

algorithm is applicable to end-to-end speech quality testing. It takes into account

various perceptual factors (such as codec distortion, errors, packet loss, variable delay,

jitter and filtration) to objectively test the speech quality, and provides a fully

quantifiable speech quality testing method.

Figure 3.1 PESQ reference signal and degraded signal

3.2 PESQ Algorithm Structure

Figure 3.2 shows the entire PESQ algorithm structure. The model begins by level

aligning both signals to a standard listening level. They are filtered (using an FFT)

with an input filter to model a standard telephone handset. The signals are aligned in

time and then processed through an auditory transform similar to that of PSQM. The

transformation also involves equalizing for linear filtering in the system and for gain

variation. Two distortion parameters are extracted from the disturbance (the difference

between the transforms of the signals), and are aggregated in frequency and time and

mapped to a prediction of subjective MOS. Generally, the greater the difference

between the degraded signal and the reference signal, the lower the speech quality

score.


Figure 3.2 PESQ algorithm structure

Reference Signal

Level Align

System under test

Level AlignDegraded

Signal

Input filter

Input filter

Time Align and Equailise

Disturbance Processing

Auditory transform

Cognitive modelling

Auditory transform

Identify bad

intervals

Prediction of perceived

speech quality

Re-align bad intervals

PSQM and measuring normalizing blocks (MNB) were only recommended for use in

narrowband codec assessment and were known to produce inaccurate predictions

with certain types of codec, background noise, and end-to-end effects such as filtering

and variable delay. The scope of PESQ is therefore very much wider. In addition,

PESQ provides significantly higher correlation with subjective opinion than the

models by P.861, PSQM, and MNB. Results indicate that it gives accurate

predictions of subjective quality in a very wide range of conditions, including those with

background noise, analogue filtering, and/or variable delay.

Table 3 Correlation of different speech quality testing methods

TYPE Corr. Coeff. PESQ PAMS PSQM PSQM+ MNB

Mobile Network average 0.962 0.954 0.924 0.935 0.884

Mobile Network worst-case 0.905 0.895 0.843 0.859 0.731

According to related ITU-T information, PESQ algorithm can provide very accurate

prediction value, and is applicable to all known network technologies (such as GSM,

CDMA, 3 G, etc.,) at that time.

PESQ algorithm is the most sophisticated and accurate speech quality testing method,

and the test result obtained from this method mostly conforms to users' subjective

perceptions.

There are three kinds of PESQ speech quality testing values:

PESQ MOS: mean opinion score of perceptual evaluation of speech quality

(value range: 1.0 to 5.0, where 1.0 represents the lowest quality)

PESQ SCORE: score of perceptual evaluation of speech quality (value range:

-0.5 to 4.5, where 0.5 represents the lowest quality)

rkumar4Highlight


PESQ LQ: listening quality of perceptual evaluation of speech quality (value

range: 1.0 to 4.5, where 1.0 represents the lowest quality)

The differences between the three values are as follow:

The value of PESQ SCORE is directly calculated from the algorithm; the value of

PESQ MOS is a subjective mean opinion score. If the speech quality is poor, the value

of PESQ SCORE is always higher than the value of PESQ MOS, which is

unreasonable. In this case, PESQ LQ is introduced by ITU, which value is closer to the

subjective value. In other words, PESQ SCORE is the ideal value calculated by the

algorithm; PESQ LQ is a more reasonable value obtained through adjustments.

Based on simulation and actual test, Figure 3.3 shows ideal PESQ values under

various network conditions and codecs. However, these results are based on the

transmission without errors or packet loss. In real networks, the test results may be

close to these values based on different test environments.

Figure 3.3 Typical PESQ score under various network conditions


44 PPOOLLQQAA AAllggoorriitthhmm

4.1 Introduction

POLQA standard was developed during 2006-2010 by leading experts in a competition

carried out by ITU-T Study Group 12, in order to define a technology update for the

P.862 PESQ standard. POLQA which stands for Perceptual Objective Listening

Quality Analysis - will offer a new level of benchmarking capability to determine the

speech quality of mobile network services. The POLQA perceptual measurement

algorithm is a joint development by OPTICOM, SwissQual and TNO. POLQA has been

selected to form the new ITU-T speech quality testing standard, P.863, and will be used

with HD Voice, 3 G and 4 G/LTE.

4.2 Evolution of POLQA Based on PESQ

PESQ itself contains a very wide range of applications, such as fixed and wireless

network data testing, POTS (Plain Old Telephone Service), VoIP, and 3G. Compared to

PESQ, POLQA makes a variety of improvements to suit scenarios that PESQ is

inapplicable of.

The major improvements of POLQA are listed as follows:

Applicable to new speech codecs ( such as AAC and MP3) in 3G/LTE/4G

network technologies

Applicable to speech enhanced system (such as VQE and VED) that uses

non-linear processing to increase intelligence

Applicable to speech codecs (such as SBR) that changes or extends voice

bandwidth

Supports testing with very high background noise

Correct modelling of effects caused by variable sound presentation levels

Providing NB (narrowband) and SWB (super-wideband) modes

Handling time-scaling and time-warping as seen in VoIP and 3G

Using correct weighting of reverberation, linear and non-linear filtering

Supporting direct comparison between AMR (in GSM/CDMA) and EVRC (in

CDMA) encoded speeches

Applying POLQA to today's complex, unified networks will give a significant boost in


accuracy and reliability compared to current standards. Due to its ability to handle time

scaling effects, POLQA can be used in virtually any scenario today; from video

telephony to lab testing, from codec or network optimization to the development and

maintenance of Unified Collaboration and Communication services.

As with all methods of ITU-Ts P.86X series, POLQA compares a known speech signal

to the degraded voice signal by simulating human hearing. Unlike methods estimating

quality based on network parameters or packet data analysis, this guarantees utmost

accuracy and applicability independent of the underlying network technology and

therefore provides measurement of true end-to-end quality.

POLQA is thus not only a full replacement for PESQ, but it is also a significant

enhancement to the applicability of end-to-end speech quality testing. Since it is not

considerably more complex than PESQ, POLQA is a logical upgrade path for all

customers using PESQ today.

4.3 Super-Wideband and Narrowband in POLQA

Telecom industries are now initiating the evolution from narrow-band telephony to

wideband speech transmission. The codecs for wide band are ready. Current

developments of voice codecs are processing the so-called super-wideband (up to

14,000 Hz) or even higher (full-band), up to approx. 24,000 Hz. However, the

perceived difference between super-wideband and full-band can be ignored in the

case of human speech.

In the speech quality testing, users will face corresponding bandwidth problems. In

traditional telephony scenarios, the expectation is set to a perfect narrow-band voice

signal. A signal that is close or identical to such a signal is scored subjectively by

human listeners with a high quality value (usually a MOS-LQ of around 4.5 on a

five-point scale). Within a super-wideband scenario the situation is different. The

expectation of excellent quality is a perfect super-wideband speech signal. Since the

same five-point scale is used, such a perfect super-wideband signal is also

subjectively scored close to excellent in the range of 4.5. Obviously, a narrow-band

signal in that super-wideband context will not fulfil the expectation of high quality due to

its band limitation. Consequently, it will be scored lower in this context.

Since the range of the scores is the same but the meaning is different depending on

the context, the two are named as different scales: narrow-band or super-wideband.

Broadly the main difference is that narrow-band signals will be scored lower in a

super-wideband context than in narrow-band experiments, since the band-limitation is

scored as degradation. Hence, scores given on the two different scales must not be


mixed or directly compared.

To cover both application areas narrowband telephony and super-wideband

communication -POLQA supports two operational modes. The application of POLQA is

exactly the same in both cases; the change between the two modes just requires the

use of a control flag. All required adjustments are automatically made by POLQA

internally. Consequently, in narrowband mode POLQA scores on a five-point

narrowband scale, in the super-wideband mode on a five-point super-wideband scale.

The MOS-LQO value range obtained from these two modes is different (1 to 4.75 for

super-wideband mode, 1 to 4.5 for narrowband mode)

4.4 POLQA Perceptual Model and Structure

POLQA uses an advanced psycho-acoustic model for emulating the human perception

and transforming the sound into an internal neuronal representation. POLQA, as a full

reference approach, compares the input or high quality reference signal and the

associated degraded signal under test. This process is shown in Figure 5. POLQA

takes into account masking effects of the human hearing and uses the concept of

idealization of both input signals in multiple steps. This ensures that only the relevant

perfect speech information is used for comparison and any unwanted signal

components are discarded.

Figure 4.1 shows the POLQA algorithm structure. The module performs space/time

alignment for the reference signal and degrading signal, which is used to estimate the

delay and sample rate differences between the two signals. Once the correct delay is

determined and the sample rate differences have been compensated, the signals and

the delay information are passed on to the core model, which calculates the

perceptibility as well as the annoyance of the distortions and maps them to a MOS

scale.


Figure 4.1 POLQA algorithm structure

4.5 Ideal POLQA MOS Values in Different Network Environment

Figure 4.2 shows typical values to be expected from POLQA. These were confirmed by

subjective auditory experiments. In actual network, the MOS values may be close to

the following values based on environmental factors.

Figure 4.2 Typical values to be expected from POLQA


55 DDiinnggllii MMOOSS TTeessttiinngg SSoolluuttiioonn

5.1 Product History

Domestic partner of PESQ patent owner, the earliest manufacturer that

purchases PESQ patent

Started the R&D of speech quality testing in 2002

Developed PESQ speech quality testing in Drive Test in June 2003

Developed PESQ speech quality testing in Pilot Pioneer based on multi-test

mobile phones and commercial mobile phones in 2004

Released Pilot Pioneer test tool with POLQA support at the end of 2012

As an integrated test platform, Pilot Pioneer can be upgraded to the version with MOS

test functions just by adding a separate audio MOS box.

Note: In the remaining of this document, MOS value refers to the speech quality testing

score based on PESQ or POLQA unless stated otherwise, and MOS box refers to the

independent hardware system used by Dingli to test speech quality.

5.2 Added Value of Dingli MOS Testing System

In addition to providing the latest speech quality test tool, Dingli also focuses on

speech quality optimization in practice. Dinglis solution analyzes and explores the

exact impact of wireless environment factors on the MOS values to provide the most

accurate and credible theoretical and practical reference information for network

optimization. The main research content includes:

Impact of different voice coding rate on the MOS value

Impact of handover frequency and handover failure on the MOS value

Impact of DTX (Discontinuous Transmission) functions on the MOS value

Impact of C/I (Carrier/Interference) on the MOS value within the same coverage

Impact of signal strength on the MOS value when the C/I is good

Impact of transmission quality on the MOS value

Impact of different network technologies on the MOS value

MOS testing, optimization methods and flow

Dingli MOS test system covers a comprehensive solution that includes an automatic

and traditional drive test system, which has been widely used by various operators. It


helps operators to test, locate and solves speech quality related issues, and provides a

lot of relevant experience in MOS testing which is helpful to optimize the MOS test

system.

5.3 Dingli MOS Test System

Dingli MOS box is an accessory specially designed for the wireless network speech

quality testing. Users may use Pilot Pioneer and MOS box with different test terminals

to complete speech quality testing for various networks in various scenarios. In

practical, the MOS box has a variety of technical advantages. See Table 4.

Table 4 Technical advantages of Dingli MOS box

Item Description

Compact

45 (L) x 13 (H) x 38 (W) cm

Standard weight: 5 KG

Power: Built-in battery or external power supply

Easy to carry

Protective Material

Aluminum alloy material, anti-compression and

anti-seismic

Plastics and protective film composition for mobile

phone slots


Item Description

Protect test terminals from damage

Terminal Distance

The distance between test terminals is at least 15 cm,

which avoids mutual interference between the terminals

and ensures test accuracy.

Simple Panel

Test terminals data and headphone cables are located

within lower section of the MOS box. Only the connectors

are visible in the upper layer. This makes the MOS box

more presentable and helps to reduce the impact of

twisted or broken cables during testing.

Easy Operation

All test terminals that are connected to the MOS box, are

connected to a computer via three cables, therefore

simplifying the connection.

Humanized Design

Test terminals can be elevated up to 75 in a fixed slot,

which can improve the receive signal strength and

dissipate heat from the mobile phones.

Table 5 describes the technical specifications of Dingli MOS box.

Table 5 Technical specifications of Dingli MOS box

Item Description

Physical

features

Input voltage: 10 V-18 V

Power: 2.1 W

Working temperature: 0- 55

Storage temperature: -20 - 85

S/N Ratio (Signal to Noise Ratio): 42 dB

Record or playback distortion: 3%

Output power: 50m W (headphone drive)

Record or playback echo Rejection Ratio: 40 dB

System feature

Support mobile to mobile or mobile to land test

Support a maximum of eight test terminals

simultaneously

Provide standard 8S, 12S and 4S male speech

sample files (bit rate: 128 K)

Display the current audio file waveform and MOS

value, save audio files in real time, and save the

entire audio file or save the audio file based on

settings


Item Description

Provide high performance recording and playback

device to avoid noise

Support device abnormality alarms ( audio or

notification)

Able to view relevant in-depth analysis parameters

provided under the MOS algorithm

Support multi-network comparison (such as GSM

/CDMA / WCDMA / TD-SCDMA / LTE CSFB)

Support various type of terminals (such as

SAMSUNG S3, SAMSUNG Note I, ZTE V8860 +,

Huawei 8800 series)

Background

processing

Voice call report (Coverage, Connection Rate,

Call-drop Rate, Call Setup Time)

Voice Call Details (outgoing and incoming call

details)

Diagnostic event details (such as details on

dropped calls, lost calls, and MOS low values)

Parameter list details (various parameter sample

rate)

Display of MOS value on a map

Graphical display of various parameters

5.4 MOS Testing Solutions

MOS test includes three test solutions based on Pilot Pioneer, Pilot RCU, and Pilot

Walktour respectively, and one analysis solution based on Pilot Navigator. All the three

test solutions support PESQ and POLQA algorithm. Users only need to select PESQ or

POLQA during configuration.

5.4.1 MOS Testing Solution Based on Pilot Pioneer

1. Mobile-to-Mobile

This mode supports speech quality and benchmarking test for operators, with a

maximum of four networks simultaneously. The test terminals can be randomly

combined with any network technologies (such as 2G1C, 3W1C, and etc.).

Users may initiate a call from a mobile phone to another mobile phone. The calling

party initiates a call, and a speech sample is replayed after the connection, and

returned to the called party through the base station. The called party records the


speech and compare the speech with a standard speech sample to obtain the uplink

value of the calling party (it is also the downlink value of the called party). Then, the

called party replays the speech, and returns the speech to the calling party through

the base station. The calling party records the speech and compares the speech with

a standard speech sample to obtain the uplink value of the called party (it is also the

downlink value of the calling party). Users may alternate the test terminals and

perform infinite loop testing.

Note: Above description is valid for PESQ. For POLQA, currently Pilot Pioneer

supports POLQA score only for the calling party.


Figure 5.1 Pioneer+MOS (2W1C)

2. Mobile-to-Land

Users may conduct mobile-to-land MOS test based on network type, or customize

the solution by defining the terminal type and quantity according to network type.

Currently, this test mode supports a maximum of four networks simultaneously.

Note: For POLQA, currently Pilot Pioneer supports only 8K speech sampling

and downlink POLQA score when POLQA algorithm is used.

Figure 5.2 Mobile-to-Land

5.4.2 MOS Analysis Solution Based on Pilot Navigator

Pilot Navigator is a professional post processing tool developed by Dingli

Communications. It provides multiple professional analysis reports which include

speech quality test results.

1. Speech Quality Report

In the Speech Quality Report, the PESQ or POLQA Mean Value and Middle Value

are displayed. Figure 5.3 is an example.


Figure 5.3 Speech Quality Report provided by Pilot Navigator

Speech Quality(General) Result Speech Quality(MOC) Result Speech Quality(MTC) Result

PESQ MOS Mean Value 3.63 PESQ MOS Mean Value 3.63 PESQ MOS Mean Value 3.63

PESQ MOS Middle Value 3.71 PESQ MOS Middle Value 3.71 PESQ MOS Middle Value 3.71

POLQA ScoreNB Mean Value 0.00 POLQA ScoreNB Mean Value 0.00 POLQA ScoreNB Mean Value 0.00

POLQA ScoreNB Middle Value POLQA ScoreNB Middle Value POLQA ScoreNB Middle Value

POLQA ScoreSWB Mean Value 0.00 POLQA ScoreSWB Mean Value 0.00 POLQA ScoreSWB Mean Value 0.00

POLQA ScoreSWB Middle Value POLQA ScoreSWB Middle Value POLQA ScoreSWB Middle Value

PESQ Score Average Value 3.58 PESQ Score Average Value 3.58 PESQ Score Average Value 3.58

PESQ Score Medium Value 3.62 PESQ Score Medium Value 3.62 PESQ Score Medium Value 3.62

PESQ LQ Average Value 3.58 PESQ LQ Average Value 3.58 PESQ LQ Average Value 3.58

PESQ LQ Medium Value 3.67 PESQ LQ Medium Value 3.67 PESQ LQ Medium Value 3.67

BLER


multiple channels for speech quality testing, multiple RCU MOS test modules can

simultaneously call the server number. In this way, multi-channel MOS values are

obtained. Since the called party is a fixed-line phone number, the transmission loss is

extremely small. As a result, the MOS values obtained on the server side and the

RCU side can be considered as pure uplink and downlink MOS values.

Note: For POLQA, this mode supports only the downlink score and a maximum

of three dual-core RCU test modules simultaneously.

Figure 5.5 Calling from an RCU MOS Test Module to server

2. From an RCU MOS Test Module to an RCU MOS Test Module

Figure 5.6 shows the general structure of the speech quality testing from an RCU

MOS test module to an RCU MOS test module. Users may initiate calls from one

MOS test module to another to perform MOS comparison testing. The speech can be

replayed at one end and recorded at the other end, and then the MOS value can be

calculated through the MOS algorithm. In addition, users may initiate calls from one

test module to another in the same RCU. This testing method is relatively simple,

without installing speech quality testing server at the server side. Therefore, it is

applicable to the speech quality testing in remote communication.


Figure 5.6 Calling from an RCU MOS Test Module to an RCU MOS Test Module

The testing process is almost the same as that of from an RCU test module to a

server. The difference is that the calls are between two RCU test modules or two test

modules in the same RCU, and this solution also supports multi-channel MOS

testing.

Note: This mode supports a maximum of two dual-core RCU test modules to

dial to each other when POLQA algorithm is used.

5.4.4 MOS Testing Solution Based on Pilot Walktour

1. Mobile to Mobile

As shown in Figure 5.7, Dingli Pilot Walktour supports calling from a mobile phone to

a mobile phone to perform MOS testing. By using the mobile phone integrated

software kernel and MOS algorithm, users may use one mobile phone to replay the

speech sample and the other to record the voice, and perform speech quality testing.

Note: In this mode, IOS Walktour POLQA score can only be viewed when

analyzed with Pilot Navigator but not on the mobile phone; Android Walktour

supports POLQA score only on the mobile phone of the calling party.


Figure 5.7 MOS testing based on Pilot Walktour

2. Mobile-to-Land

Users may conduct mobile-to-land MOS test based on network type, or customize

the solution by defining the terminal type and quantity according to network type.

Note: In this mode, iOS Walktour support POLQA score only when analyzed

with Pilot Navigator but not on the mobile phone. Android Walktour supports

POLQA score only on the mobile phone of the calling party.

5.5 Details About Dingli MOS Testing Solution

5.5.1 Impact of Network Factors on PESQ MOS Value

1. Impact of Voice Coding Mode on MOS Value

In GSM network technology, there are multiple voice coding modes such as

Enhanced Full Rate (EFR), Full Rate (FR), and Half-Rate (HR). Based on the coding

principle, it is possible to obtain:

MOS value in the EFR mode > MOS value in the FR mode > MOS value in the HR

mode

The following is a group of speech quality testing results (from a mobile phone to a

fixed-line phone) without environmental interference.

Uplink PESQ MOS value: value in the EFR mode (max 4.20) > value in the

FR mode (max 3.94) > value in the HR mode (max 3.69)

Downlink PESQ MOS value: value in the EFR mode (max 4.255) > value in

the FR mode (max 3.940) > value in the HR mode (max 3.728)


In commercial networks, because of network interference and other factors, the test

results obtained by using different coding rates will be quite different from the results

obtained in ideal environment. Table 6 shows an example of the average value of an

actual speech quality testing in urban environment.

Table 6 Impact of HR in GSM Network on PESQ result

PESQ Result HR Rate = 0% HR Percentage

= 40%

HR Percentage

=100%

Uplink PESQ

MOS value 3.588 3.428 3.331

Downlink PESQ

MOS value 3.418 3.325 3.259

2. Impact of Link and Routing on MOS Value

In speech quality testing, different links and routings are also important factors that

affect the PESQ MOS value. For example, if the called party is from a different

network (e.g. CDMA to GSM) instead of the same Radio Access Network, the result

may differ.

In an actual network test, it was found that the tested MOS value is lower than usual.

After ruling out equipment and wireless causes it was discovered that the problem is

caused by long-distance switching and routing. In the scenario, the PESQ speech

quality test uses two routings. Each call is then locked to a separate route to

determine which one reported the lower MOS value. One of the routes was in fact

transferring some error code which leads to the lower MOS value. The reported MOS

value was normal again after the issue has been resolved.

3. Impact of Abis Compression on MOS Values

Back in January 2005 (Beijing), Dingli conducted a test on Abis compression

(bandwidth savings on Abis level) impact on MOS value from a live network. The test

was divided into three sections: test with no compression, test with 5-slot

compression, and test with 6-slot compression. See Table 7.

Table 7 Impact of Abis compression on MOS value

Item Average Downlink Value Average Uplink Value

No compression 3.964 3.938

5 Time-slot compression 3.424 3.464

6 Time-slot compression 3.364 3.367


Conclusion: The greater the compression, the lower the MOS value.

4. Impact of Network Quality on MOS Value

There is a certain relationship between the network quality (RxQual Sub for GSM,

FFER for CDMA) and PESQ MOS value.

When the GSM RxQual Sub value range is [0, 2], the PESQ MOS value

range is [3, 4.1].

When the CDMA FFER value range is [0, 3%], the PESQ MOS value range is

[3, 4.1].

Generally, if the RxQual Sub/FFER value is high, the MOS value is low. However, if

the MOS value is low, the RxQual Sub/FFER value is not necessarily high (the MOS

value may be affected by other factors).

Good network environment is the basic element ensuring good wireless

communication. Low C/I and C/A (Carrier/Adjacent) value will result in high BER,

which will decrease the call quality or triggers dropped call. High BER caused by

network frequency interference have been the major concern for network

optimization. From users perceptual aspect, transient BER does not affect user's

listening experience, while continuous BER causes frame loss and serious impact on

listening.

Following conclusion can be made after practical tests were conducted:

In EFR mode, when the downlink RXQUAL value is greater than 4.8, the

downlink PESQ MOS value is lower; when the downlink RXQUAL value is

greater than 5.4, the downlink PESQ MOS value is lower than 3.3; when the

downlink RXQUAL value is greater than 6, the downlink PESQ MOS value is

straight down to an unbearable degree.

In FR mode, when the downlink RXQUAL value is greater than 5.1, the





In HR mode, when the downlink RXQUAL value is greater than 4.8, the






5. Impact of Network Handover on MOS Value

Handover is the most basic and important feature in GSM network technology. In the

handover process, TCH (traffic channel) frame is used as FACCH (Fast Associated

Control Channel) to transfer relevant handover signaling. This temporary interruption

is to maintain the network connection performance and complete the handover to a

more suitable cell. However, it is at the expense of continuous voice and has impact

on the speech quality to a certain extent. When frequent ping-pong handover occurs,

successive frame used (as FACCH) will cause voice interruption similar to that

caused by frame loss.


Handover has serious impact on the PESQ MOS value. In addition, when the

handover frequency increases, the speech quality may drop to an intolerable

degree, causing low value.

In EFR mode, if the handover occurs once every six seconds in transferring

uplink and downlink voice, the average uplink and downlink PESQ MOS

value is 1 lower than the maximum value. If the handover occurs twice every

six seconds in transferring uplink and downlink voice, the average uplink and

downlink PESQ MOS value is 2 lower than the maximum value.

In FR mode, if the handover occurs once every six seconds in transferring

uplink and downlink voice, the average uplink and downlink PESQ MOS

value is 1 lower than the maximum value. If the handover occurs twice every

six seconds in transferring uplink and downlink voice, the average uplink and

downlink PESQ MOS value is 1.5 lower than the maximum value.

In EFR / FR / HR mode, when handover failure occurs ( e.g. similar to

ping-pong handover), as the handover count increases, so is the drop in

PESQ MOS value (the PESQ MOS value may dropped close to 1).

6. Impact of Signal Strength on MOS Value

When signal strength changes and BER / FER is not greater than 0, the RXQUAL


Sub value also maintains the value 0. In this case, the speech quality is theoretically

not affected.


When the C/I value is not changed, signal strength fluctuation does not affect the

PESQ MOS value; when the signal strength fluctuation affects C/I value, the PESQ

MOS value will also be affected.

5.5.2 Parameters in POLQA Testing Result

Theoretically, the parameters affecting PESQ MOS values will have an impact on the

POLQA test results. POLQA research is currently ongoing. This section describes the

detailed information about the POLQA test interface and parameters in Pilot Pioneer.

Figure 5.8 shows the POLQA test interface in Pilot Pioneer.


Figure 5.8 POLQA test interface in Pilot Pioneer

The reference wave and degraded wave is displayed on the upper part of the

interface, and the POLQA test results are displayed on the lower part of interface,

including information about Library Version, Processing Mode (NB/SWB), Mean

Delay, Minimum Delay, Maximum Delay, and etc. For detailed information, see

Table 8.

Table 8 POLQA parameters in Pilot Pioneer

Parameter Description

Library Version POLQA OEM lib version.

P863 Version ITU P.863 reference version code.

Processing Mode Mode, including Narrowband (NB) or Super Wideband

(SWB).

Sample Rate(Hz) Sample rate of the reference speech signal in Hz

Typical sample rates of NB and SWB are 8 KHz and 48



KHz respectively.

POLQA Score POLQA score according to P.863, 1 to 4.5 for NB mode, 1

to 4.75 for SWB mode.

Mean Delay (ms) The average delay between the reference and the test

signal in ms.

Minimum Delay (ms) The minimum delay between the reference and the test

signal in ms.

Maximum Delay (ms) The maximum delay between the reference and the test

signal in ms.

Attenuation(dB) The attenuation between the two input signals in dB.

R value(E Model) The POLQA score mapped as a G.107 (E-Model) Ie value.

Estimated Sample

Rate(Hz)

The sample rate of the degraded signal as measured by

POLQA.

Resampling Applied

If the sample rates of the reference and the degraded

signal differ by more than 0.5%, POLQA will down sample

the signal with the higher sample rate. If this happens,

Resampling Applied will be set to 1.

Direction Uplink or Downlink.

Downlink means the device is playing the reference signal.

Level Reference

(dBov)

The Level of the reference signal in dBov (averaged over

the entire signal).

Level Degraded (dBov) The Level of the degraded signal in dBov (averaged over

the entire signal).

ASL Reference (dBov) The active speech level of the reference signal in dBov,

measured similar to P.56.

ASL Degraded (dBov) The active speech level of the degraded signal in dBov,


Pause Level Reference

(dBov)

The silence level of the reference signal in dBov,


Pause Level Degraded

(dBov)

The silence level of the degraded signal in dBov,


SNR Reference(dB) The SNR in dB of the reference signal.



SNR Degraded(dB) The SNR in dB of the degraded signal.

ASR Reference

The active Speech Ratio (ASR) of the reference signal.

ASR is the ratio of the length of the active speech signal

parts and the total signal length [0...1].

ASR Degraded

The active Speech Ratio (ASR) of the degraded signal.

ASR is the ratio of the length of the active speech signal

parts and the total signal length [0...1].

Pitch Reference(Hz) The average pitch frequency of the reference signal.

Pitch Degraded(Hz) The average pitch frequency of the degraded signal.

Record file The recorded degraded signal file, along with its location.

5.5.3 Reference PESQ-LQ Value: Some practical examples

Table 9, Table 10, and Table 11 describes the mean value and excellent value ratio

(excellent value range: 3.0 to 4.5) of three live networks with different scenarios in a

single test.

Table 9 Testing result of Operator A

Urban DT Voice Highway DT Voice

PESQ-LQ

Mean Value

3.0-4.5

Percentage

PESQ-LQ

Mean Value

3.0-4.5

Percentage

3.27 88.17% 3.25 86.00%

Table 10 Testing result of Operator B


PESQ-LQ

Mean Value

3.0-4.5

Percentage

PESQ-LQ

Mean Value

3.0-4.5

Percentage

3.49 79.69% 3.36 75.64%

Table 11 Testing result of Operator C


PESQ-LQ

Mean Value

3.0-4.5

Percentage

PESQ-LQ

Mean Value

3.0-4.5

Percentage

3.46 78.42% 3.43 78.11%


66 AAppppeennddiixx 11:: IITTUUTT PP..886622 SSttaannddaarrdd RReelleevvaanntt IInnffoorrmmaattiioonn

1. Factors for which PESQ had demonstrated acceptable accuracy


2. Factors for which PESQ had demonstrated acceptable accuracy (concluded)

3. PESQ is known to provide inaccurate predictions when used in conjunction

with these variables, or is otherwise not intended to be use with these variables

4. For further study. Factors, technologies and applications for which PESQ has

not currently been validated


5. For further study. Factors, technologies and applications for which PESQ has

not currently been validated (concluded)


6. ITUT Temporal structure and duration of source material


77 RReeffeerreenncceess

[1]. OPTICOM GmbH& SwissQual AG, Perceptual Objective Listening Quality Analysis,

Technical White Paper.

[2]. Psytechnics Limited, PESQ An Induction White Paper, September 2001

[3]. ITU-T Recommendation P.862. Perceptual Evaluation of Speech Quality (PESQ), an

Objective Method for End-To-End Speech Quality Assessment of Narrowband

Telephone Networks and Speech Codecs, February 2001.

[4]. ITU-T Recommendation P.862.1. Mapping Function for Transforming P.862 Raw

Result Scores to MOS-LQO, November 2003.

[5]. ITU-T Recommendation P.863, Perceptual objective listening quality assessment,

January 2011

[6]. ITU-T TELECOMMUNICATION STANDARDIZATION SECTOR OF ITU P.862 Table

1/P.862, Table 2/P.862, Table 3/P.862 ITUT Temporal structure and duration of source

material (8.1.2) February 2001


Dingli Communications Corp., Ltd

Address: No.8, the Fifth Technology Road, Gangwan Avenue, Zhuhai,

P.R.China, 519085.

Tel: 0756-3391933 Fax: 0756-3391900

Technical Hotline: 4008878822

Email: [email protected]

speechqualitytestingsolutionmoswhitepaperv122-140510224021-phpapp02.pdf

Documents

Transcript of speechqualitytestingsolutionmoswhitepaperv122-140510224021-phpapp02.pdf