speechqualitytestingsolutionmoswhitepaperv122-140510224021-phpapp02.pdf

38
©Dingli (27/7/2013) DL1AMOSWP Rev1 1 / 37 Speech Quality Testing Solution (MOS) Whitepaper

Transcript of speechqualitytestingsolutionmoswhitepaperv122-140510224021-phpapp02.pdf

  • Dingli (27/7/2013) DL1AMOSWP Rev1 1 / 37

    Speech Quality Testing Solution

    (MOS) Whitepaper

  • Dingli (27/7/2013) DL1AMOSWP Rev1 2 / 37

    Revision History Date Version Author Description

    2013-05-06 1.0 Geng

    Xiaoming

    First Edition

    2013-07-27 1.1 Zhang

    Qifei

    Change document organization, optimize MOS

    testing solution, and add POLQA parameters

  • Dingli (27/7/2013) DL1AMOSWP Rev1 3 / 37

    Contents 1 Introduction ............................................................................................................................... 4

    1.1 Document Organization ............................................................................................ 4

    1.2 Abbreviations and Acronyms .................................................................................... 4

    2 Development of Speech Quality Testing ................................................................................... 6

    3 PESQ Algorithm ....................................................................................................................... 9

    3.1 Introduction ............................................................................................................... 9

    3.2 PESQ Algorithm Structure ........................................................................................ 9

    4 POLQA Algorithm .................................................................................................................. 12

    4.1 Introduction ............................................................................................................. 12

    4.2 Evolution of POLQA Based on PESQ .................................................................... 12

    4.3 Super-Wideband and Narrowband in POLQA ........................................................ 13

    4.4 POLQA Perceptual Model and Structure ................................................................ 14

    4.5 Ideal POLQA MOS Values in Different Network Environment ............................. 15

    5 Dingli MOS Testing Solution .................................................................................................. 16

    5.1 Product History ....................................................................................................... 16

    5.2 Added Value of Dingli MOS Testing System .......................................................... 16

    5.3 Dingli MOS Test System ......................................................................................... 17

    5.4 MOS Testing Solutions ........................................................................................... 19

    5.4.1 MOS Testing Solution Based on Pilot Pioneer ......................................... 19

    5.4.2 MOS Analysis Solution Based on Pilot Navigator ................................... 21

    5.4.3 MOS Testing Solution Based on Pilot Fleet (RCU) ................................. 22

    5.4.4 MOS Testing Solution Based on Pilot Walktour ....................................... 24

    5.5 Details About Dingli MOS Testing Solution ........................................................... 25

    5.5.1 Impact of Network Factors on PESQ MOS Value ................................... 25

    5.5.2 Parameters in POLQA Testing Result ....................................................... 29

    5.5.3 Reference PESQ-LQ Value: Some practical examples .......................... 32

    6 Appendix 1: ITUT P.862 Standard Relevant Information ................................................... 33

    7 References ............................................................................................................................... 37

  • Dingli (27/7/2013) DL1AMOSWP Rev1 4 / 37

    11 IInnttrroodduuccttiioonn

    This document describes the development and evolution of speech quality testing

    technologies in Telecommunications network, and focuses on two kinds of objective

    testing methods--PESQ and POLQA.

    1.1 Document Organization

    Chapter 1: Introduction to document objectives, document organization, and

    abbreviations and acronyms

    Chapter 2: Basics and development of speech quality testing

    Chapter 3: Objective speech quality testing method PESQ algorithm

    Chapter 4: New speech quality testing method recommended by ITU POLQA

    algorithm

    Chapter 5: Dinglis speech quality evaluation solutions

    1.2 Abbreviations and Acronyms

    Table 1 Abbreviations and acronyms

    Item Full Spelling Description

    ITU International Telecommunication

    Union

    MOS Mean Opinion Score

    MOS-LQO MOS Listening Quality Objective

    AMR Adaptive Multi-Rate Codec

    AAC Advance Audio Codec

    FER Frame Error Rate

    BER Bit Error Rate

    POTS Plain Old Telephone System

    GSM Global System for Mobile

    Communications

    CDMA Code Division Multiple Access

    LTE Long Term Evolution

    rkumar4Highlight

  • Dingli (27/7/2013) DL1AMOSWP Rev1 5 / 37

    Item Full Spelling Description

    PAMS Perceptual Analysis Measurement

    System Recommendation of ITU-T P.800

    PSQM Perceptual Speech Quality

    Measure Recommendation of ITU-T P.861

    PESQ Perceptual Evaluation of Speech

    Quality Recommendation of ITU-T P.862

    POLQA Perceptual Objective Listening

    Quality Assessment Recommendation of ITU-T P.863

    VoIP Voice Over IP

    P.56 ITU-T Rec. P.56 Recommendation of ITU-T P.56

    rkumar4Highlight

  • Dingli (27/7/2013) DL1AMOSWP Rev1 6 / 37

    22 DDeevveellooppmmeenntt ooff SSppeeeecchh QQuuaalliittyy TTeessttiinngg

    Speech quality testing on different networks with a unified standard can be a challenge

    due to different communication technologies. For example, in GSM network, RxQual

    (based on BER) is used to evaluate the speech quality; and in CDMA network, FER is

    used to evaluate the speech quality. In addition, even within the same network, a single

    RxQual or FER value cannot represent the true speech quality. A professional speech

    quality testing system should be an end-to-end and user-based system. Therefore, a

    general testing method is required to perform direct comparative testing for different

    networks.

    Based on the test subject, speech quality testing can be divided into two categories:

    subjective and objective test.

    Subjective test depends on listeners to evaluate the communication system performance.

    According to the studies in ITU-T P.800 and ITU-T P.830, about 40 to 60 trained listeners

    are required to perform subjective perceptual comparison for reference signal and

    degraded signal based on detailed criteria. Score on the degraded signal is in accordance

    with the MOS scoring standard (scoring from 0 to 5). This way, a final MOS value is

    obtained.

    Table 2 MOS scoring standard

    Degree MOS Value Users Satisfaction

    Excellent 5.0

    Very clear

    No distortion

    No delay

    Good 4.0

    Clear

    Small delay

    Little noise

    Fair 3.0

    Unclear

    A certain of delay

    A certain of noise

    rkumar4Highlight

  • Dingli (27/7/2013) DL1AMOSWP Rev1 7 / 37

    Degree MOS Value Users Satisfaction

    A certain of distortion

    Poor 2.0

    Unclear

    Big noise or interruption

    Serious distortion

    Bad 1.0 Silence or totally unclear

    Very big noise

    The subjective testing result is most reliable, and this method can be used to evaluate

    network performance and quality of speech with any speech coding mode. However, its

    disadvantages are obvious. In the test, factors (such as evaluation environment and

    listener) should be strictly controlled; the speech material must be carefully selected;

    otherwise, the final result may be affected. All these makes the test time-consuming,

    laborious, difficult to organize, and with poor repeatability. As a result, a more efficient and

    repeatable method is required in the actual test, that is, an objective testing method.

    In practice, objective testing depends on the parameter comparison (of reference and

    degraded speech signal) in time and frequency domain, while the test result is calculated

    by hardware or software. Some objective testing methods such as PAMS and PSQM are

    introduced during the research of objective speech quality testing. However, these

    methods have significant limitations. The test result is affected by particular speech codec,

    and in some cases, the result is much different from the MOS value in subjective testing.

    In ITU-T P.862-2001, the core speech quality testing method is upgraded to PESQ

    algorithm which integrates all advantages of previous algorithms. The PESQ test result is

    very close to the MOS value in the subjective testing, and PESQ algorithm is widely

    accepted and applied.

    Later, with the development and evolution of new communication technologies, POLQA

    algorithm is developed to support new speech codecs and super-wideband speech, and

    handle the time factor in VoIP. Compared with previous algorithms, introduction of POLQA

    algorithm to the unified and complex communication networks will achieve significant

  • Dingli (27/7/2013) DL1AMOSWP Rev1 8 / 37

    improvements in accuracy and credibility. In addition, the capability of handling time factor

    makes POLQA algorithm applicable to any speech quality testing scenarios. POLQA

    algorithm includes two modes: NB (Narrow Band) and SWB (Supper Wideband),

    corresponding to different speech samples with different frequencies. POLQA algorithm

    has become the recommended algorithm in ITU-T P.863-2011.

    Figure 2.1 shows the evolution of ITU-T recommendations for speech quality testing. The

    information is obtained from http://www.polqa.info/index.html.

    Figure 2.1 Evolution of ITU-T recommendations for speech quality testing

  • Dingli (27/7/2013) DL1AMOSWP Rev1 9 / 37

    33 PPEESSQQ AAllggoorriitthhmm

    3.1 Introduction

    PESQ algorithm became an ITU-T recommendation (P.862) for objective speech

    quality testing in February 2001. Due to its innovative features and correlation accuracy,

    PESQ quickly becomes the mainstream speech quality testing method. PESQ

    algorithm is applicable to end-to-end speech quality testing. It takes into account

    various perceptual factors (such as codec distortion, errors, packet loss, variable delay,

    jitter and filtration) to objectively test the speech quality, and provides a fully

    quantifiable speech quality testing method.

    Figure 3.1 PESQ reference signal and degraded signal

    3.2 PESQ Algorithm Structure

    Figure 3.2 shows the entire PESQ algorithm structure. The model begins by level

    aligning both signals to a standard listening level. They are filtered (using an FFT)

    with an input filter to model a standard telephone handset. The signals are aligned in

    time and then processed through an auditory transform similar to that of PSQM. The

    transformation also involves equalizing for linear filtering in the system and for gain

    variation. Two distortion parameters are extracted from the disturbance (the difference

    between the transforms of the signals), and are aggregated in frequency and time and

    mapped to a prediction of subjective MOS. Generally, the greater the difference

    between the degraded signal and the reference signal, the lower the speech quality

    score.

  • Dingli (27/7/2013) DL1AMOSWP Rev1 10 / 37

    Figure 3.2 PESQ algorithm structure

    Reference Signal

    Level Align

    System under test

    Level AlignDegraded

    Signal

    Input filter

    Input filter

    Time Align and Equailise

    Disturbance Processing

    Auditory transform

    Cognitive modelling

    Auditory transform

    Identify bad

    intervals

    Prediction of perceived

    speech quality

    Re-align bad intervals

    PSQM and measuring normalizing blocks (MNB) were only recommended for use in

    narrowband codec assessment and were known to produce inaccurate predictions

    with certain types of codec, background noise, and end-to-end effects such as filtering

    and variable delay. The scope of PESQ is therefore very much wider. In addition,

    PESQ provides significantly higher correlation with subjective opinion than the

    models by P.861, PSQM, and MNB. Results indicate that it gives accurate

    predictions of subjective quality in a very wide range of conditions, including those with

    background noise, analogue filtering, and/or variable delay.

    Table 3 Correlation of different speech quality testing methods

    TYPE Corr. Coeff. PESQ PAMS PSQM PSQM+ MNB

    Mobile Network average 0.962 0.954 0.924 0.935 0.884

    Mobile Network worst-case 0.905 0.895 0.843 0.859 0.731

    According to related ITU-T information, PESQ algorithm can provide very accurate

    prediction value, and is applicable to all known network technologies (such as GSM,

    CDMA, 3 G, etc.,) at that time.

    PESQ algorithm is the most sophisticated and accurate speech quality testing method,

    and the test result obtained from this method mostly conforms to users' subjective

    perceptions.

    There are three kinds of PESQ speech quality testing values:

    PESQ MOS: mean opinion score of perceptual evaluation of speech quality

    (value range: 1.0 to 5.0, where 1.0 represents the lowest quality)

    PESQ SCORE: score of perceptual evaluation of speech quality (value range:

    -0.5 to 4.5, where 0.5 represents the lowest quality)

    rkumar4Highlight

  • Dingli (27/7/2013) DL1AMOSWP Rev1 11 / 37

    PESQ LQ: listening quality of perceptual evaluation of speech quality (value

    range: 1.0 to 4.5, where 1.0 represents the lowest quality)

    The differences between the three values are as follow:

    The value of PESQ SCORE is directly calculated from the algorithm; the value of

    PESQ MOS is a subjective mean opinion score. If the speech quality is poor, the value

    of PESQ SCORE is always higher than the value of PESQ MOS, which is

    unreasonable. In this case, PESQ LQ is introduced by ITU, which value is closer to the

    subjective value. In other words, PESQ SCORE is the ideal value calculated by the

    algorithm; PESQ LQ is a more reasonable value obtained through adjustments.

    Based on simulation and actual test, Figure 3.3 shows ideal PESQ values under

    various network conditions and codecs. However, these results are based on the

    transmission without errors or packet loss. In real networks, the test results may be

    close to these values based on different test environments.

    Figure 3.3 Typical PESQ score under various network conditions

  • Dingli (27/7/2013) DL1AMOSWP Rev1 12 / 37

    44 PPOOLLQQAA AAllggoorriitthhmm

    4.1 Introduction

    POLQA standard was developed during 2006-2010 by leading experts in a competition

    carried out by ITU-T Study Group 12, in order to define a technology update for the

    P.862 PESQ standard. POLQA which stands for Perceptual Objective Listening

    Quality Analysis - will offer a new level of benchmarking capability to determine the

    speech quality of mobile network services. The POLQA perceptual measurement

    algorithm is a joint development by OPTICOM, SwissQual and TNO. POLQA has been

    selected to form the new ITU-T speech quality testing standard, P.863, and will be used

    with HD Voice, 3 G and 4 G/LTE.

    4.2 Evolution of POLQA Based on PESQ

    PESQ itself contains a very wide range of applications, such as fixed and wireless

    network data testing, POTS (Plain Old Telephone Service), VoIP, and 3G. Compared to

    PESQ, POLQA makes a variety of improvements to suit scenarios that PESQ is

    inapplicable of.

    The major improvements of POLQA are listed as follows:

    Applicable to new speech codecs ( such as AAC and MP3) in 3G/LTE/4G

    network technologies

    Applicable to speech enhanced system (such as VQE and VED) that uses

    non-linear processing to increase intelligence

    Applicable to speech codecs (such as SBR) that changes or extends voice

    bandwidth

    Supports testing with very high background noise

    Correct modelling of effects caused by variable sound presentation levels

    Providing NB (narrowband) and SWB (super-wideband) modes

    Handling time-scaling and time-warping as seen in VoIP and 3G

    Using correct weighting of reverberation, linear and non-linear filtering

    Supporting direct comparison between AMR (in GSM/CDMA) and EVRC (in

    CDMA) encoded speeches

    Applying POLQA to today's complex, unified networks will give a significant boost in

  • Dingli (27/7/2013) DL1AMOSWP Rev1 13 / 37

    accuracy and reliability compared to current standards. Due to its ability to handle time

    scaling effects, POLQA can be used in virtually any scenario today; from video

    telephony to lab testing, from codec or network optimization to the development and

    maintenance of Unified Collaboration and Communication services.

    As with all methods of ITU-Ts P.86X series, POLQA compares a known speech signal

    to the degraded voice signal by simulating human hearing. Unlike methods estimating

    quality based on network parameters or packet data analysis, this guarantees utmost

    accuracy and applicability independent of the underlying network technology and

    therefore provides measurement of true end-to-end quality.

    POLQA is thus not only a full replacement for PESQ, but it is also a significant

    enhancement to the applicability of end-to-end speech quality testing. Since it is not

    considerably more complex than PESQ, POLQA is a logical upgrade path for all

    customers using PESQ today.

    4.3 Super-Wideband and Narrowband in POLQA

    Telecom industries are now initiating the evolution from narrow-band telephony to

    wideband speech transmission. The codecs for wide band are ready. Current

    developments of voice codecs are processing the so-called super-wideband (up to

    14,000 Hz) or even higher (full-band), up to approx. 24,000 Hz. However, the

    perceived difference between super-wideband and full-band can be ignored in the

    case of human speech.

    In the speech quality testing, users will face corresponding bandwidth problems. In

    traditional telephony scenarios, the expectation is set to a perfect narrow-band voice

    signal. A signal that is close or identical to such a signal is scored subjectively by

    human listeners with a high quality value (usually a MOS-LQ of around 4.5 on a

    five-point scale). Within a super-wideband scenario the situation is different. The

    expectation of excellent quality is a perfect super-wideband speech signal. Since the

    same five-point scale is used, such a perfect super-wideband signal is also

    subjectively scored close to excellent in the range of 4.5. Obviously, a narrow-band

    signal in that super-wideband context will not fulfil the expectation of high quality due to

    its band limitation. Consequently, it will be scored lower in this context.

    Since the range of the scores is the same but the meaning is different depending on

    the context, the two are named as different scales: narrow-band or super-wideband.

    Broadly the main difference is that narrow-band signals will be scored lower in a

    super-wideband context than in narrow-band experiments, since the band-limitation is

    scored as degradation. Hence, scores given on the two different scales must not be

  • Dingli (27/7/2013) DL1AMOSWP Rev1 14 / 37

    mixed or directly compared.

    To cover both application areas narrowband telephony and super-wideband

    communication -POLQA supports two operational modes. The application of POLQA is

    exactly the same in both cases; the change between the two modes just requires the

    use of a control flag. All required adjustments are automatically made by POLQA

    internally. Consequently, in narrowband mode POLQA scores on a five-point

    narrowband scale, in the super-wideband mode on a five-point super-wideband scale.

    The MOS-LQO value range obtained from these two modes is different (1 to 4.75 for

    super-wideband mode, 1 to 4.5 for narrowband mode)

    4.4 POLQA Perceptual Model and Structure

    POLQA uses an advanced psycho-acoustic model for emulating the human perception

    and transforming the sound into an internal neuronal representation. POLQA, as a full

    reference approach, compares the input or high quality reference signal and the

    associated degraded signal under test. This process is shown in Figure 5. POLQA

    takes into account masking effects of the human hearing and uses the concept of

    idealization of both input signals in multiple steps. This ensures that only the relevant

    perfect speech information is used for comparison and any unwanted signal

    components are discarded.

    Figure 4.1 shows the POLQA algorithm structure. The module performs space/time

    alignment for the reference signal and degrading signal, which is used to estimate the

    delay and sample rate differences between the two signals. Once the correct delay is

    determined and the sample rate differences have been compensated, the signals and

    the delay information are passed on to the core model, which calculates the

    perceptibility as well as the annoyance of the distortions and maps them to a MOS

    scale.

  • Dingli (27/7/2013) DL1AMOSWP Rev1 15 / 37

    Figure 4.1 POLQA algorithm structure

    4.5 Ideal POLQA MOS Values in Different Network Environment

    Figure 4.2 shows typical values to be expected from POLQA. These were confirmed by

    subjective auditory experiments. In actual network, the MOS values may be close to

    the following values based on environmental factors.

    Figure 4.2 Typical values to be expected from POLQA

  • Dingli (27/7/2013) DL1AMOSWP Rev1 16 / 37

    55 DDiinnggllii MMOOSS TTeessttiinngg SSoolluuttiioonn

    5.1 Product History

    Domestic partner of PESQ patent owner, the earliest manufacturer that

    purchases PESQ patent

    Started the R&D of speech quality testing in 2002

    Developed PESQ speech quality testing in Drive Test in June 2003

    Developed PESQ speech quality testing in Pilot Pioneer based on multi-test

    mobile phones and commercial mobile phones in 2004

    Released Pilot Pioneer test tool with POLQA support at the end of 2012

    As an integrated test platform, Pilot Pioneer can be upgraded to the version with MOS

    test functions just by adding a separate audio MOS box.

    Note: In the remaining of this document, MOS value refers to the speech quality testing

    score based on PESQ or POLQA unless stated otherwise, and MOS box refers to the

    independent hardware system used by Dingli to test speech quality.

    5.2 Added Value of Dingli MOS Testing System

    In addition to providing the latest speech quality test tool, Dingli also focuses on

    speech quality optimization in practice. Dinglis solution analyzes and explores the

    exact impact of wireless environment factors on the MOS values to provide the most

    accurate and credible theoretical and practical reference information for network

    optimization. The main research content includes:

    Impact of different voice coding rate on the MOS value

    Impact of handover frequency and handover failure on the MOS value

    Impact of DTX (Discontinuous Transmission) functions on the MOS value

    Impact of C/I (Carrier/Interference) on the MOS value within the same coverage

    Impact of signal strength on the MOS value when the C/I is good

    Impact of transmission quality on the MOS value

    Impact of different network technologies on the MOS value

    MOS testing, optimization methods and flow

    Dingli MOS test system covers a comprehensive solution that includes an automatic

    and traditional drive test system, which has been widely used by various operators. It

  • Dingli (27/7/2013) DL1AMOSWP Rev1 17 / 37

    helps operators to test, locate and solves speech quality related issues, and provides a

    lot of relevant experience in MOS testing which is helpful to optimize the MOS test

    system.

    5.3 Dingli MOS Test System

    Dingli MOS box is an accessory specially designed for the wireless network speech

    quality testing. Users may use Pilot Pioneer and MOS box with different test terminals

    to complete speech quality testing for various networks in various scenarios. In

    practical, the MOS box has a variety of technical advantages. See Table 4.

    Table 4 Technical advantages of Dingli MOS box

    Item Description

    Compact

    45 (L) x 13 (H) x 38 (W) cm

    Standard weight: 5 KG

    Power: Built-in battery or external power supply

    Easy to carry

    Protective Material

    Aluminum alloy material, anti-compression and

    anti-seismic

    Plastics and protective film composition for mobile

    phone slots

  • Dingli (27/7/2013) DL1AMOSWP Rev1 18 / 37

    Item Description

    Protect test terminals from damage

    Terminal Distance

    The distance between test terminals is at least 15 cm,

    which avoids mutual interference between the terminals

    and ensures test accuracy.

    Simple Panel

    Test terminals data and headphone cables are located

    within lower section of the MOS box. Only the connectors

    are visible in the upper layer. This makes the MOS box

    more presentable and helps to reduce the impact of

    twisted or broken cables during testing.

    Easy Operation

    All test terminals that are connected to the MOS box, are

    connected to a computer via three cables, therefore

    simplifying the connection.

    Humanized Design

    Test terminals can be elevated up to 75 in a fixed slot,

    which can improve the receive signal strength and

    dissipate heat from the mobile phones.

    Table 5 describes the technical specifications of Dingli MOS box.

    Table 5 Technical specifications of Dingli MOS box

    Item Description

    Physical

    features

    Input voltage: 10 V-18 V

    Power: 2.1 W

    Working temperature: 0- 55

    Storage temperature: -20 - 85

    S/N Ratio (Signal to Noise Ratio): 42 dB

    Record or playback distortion: 3%

    Output power: 50m W (headphone drive)

    Record or playback echo Rejection Ratio: 40 dB

    System feature

    Support mobile to mobile or mobile to land test

    Support a maximum of eight test terminals

    simultaneously

    Provide standard 8S, 12S and 4S male speech

    sample files (bit rate: 128 K)

    Display the current audio file waveform and MOS

    value, save audio files in real time, and save the

    entire audio file or save the audio file based on

    settings

  • Dingli (27/7/2013) DL1AMOSWP Rev1 19 / 37

    Item Description

    Provide high performance recording and playback

    device to avoid noise

    Support device abnormality alarms ( audio or

    notification)

    Able to view relevant in-depth analysis parameters

    provided under the MOS algorithm

    Support multi-network comparison (such as GSM

    /CDMA / WCDMA / TD-SCDMA / LTE CSFB)

    Support various type of terminals (such as

    SAMSUNG S3, SAMSUNG Note I, ZTE V8860 +,

    Huawei 8800 series)

    Background

    processing

    Voice call report (Coverage, Connection Rate,

    Call-drop Rate, Call Setup Time)

    Voice Call Details (outgoing and incoming call

    details)

    Diagnostic event details (such as details on

    dropped calls, lost calls, and MOS low values)

    Parameter list details (various parameter sample

    rate)

    Display of MOS value on a map

    Graphical display of various parameters

    5.4 MOS Testing Solutions

    MOS test includes three test solutions based on Pilot Pioneer, Pilot RCU, and Pilot

    Walktour respectively, and one analysis solution based on Pilot Navigator. All the three

    test solutions support PESQ and POLQA algorithm. Users only need to select PESQ or

    POLQA during configuration.

    5.4.1 MOS Testing Solution Based on Pilot Pioneer

    1. Mobile-to-Mobile

    This mode supports speech quality and benchmarking test for operators, with a

    maximum of four networks simultaneously. The test terminals can be randomly

    combined with any network technologies (such as 2G1C, 3W1C, and etc.).

    Users may initiate a call from a mobile phone to another mobile phone. The calling

    party initiates a call, and a speech sample is replayed after the connection, and

    returned to the called party through the base station. The called party records the

  • Dingli (27/7/2013) DL1AMOSWP Rev1 20 / 37

    speech and compare the speech with a standard speech sample to obtain the uplink

    value of the calling party (it is also the downlink value of the called party). Then, the

    called party replays the speech, and returns the speech to the calling party through

    the base station. The calling party records the speech and compares the speech with

    a standard speech sample to obtain the uplink value of the called party (it is also the

    downlink value of the calling party). Users may alternate the test terminals and

    perform infinite loop testing.

    Note: Above description is valid for PESQ. For POLQA, currently Pilot Pioneer

    supports POLQA score only for the calling party.

  • Dingli (27/7/2013) DL1AMOSWP Rev1 21 / 37

    Figure 5.1 Pioneer+MOS (2W1C)

    2. Mobile-to-Land

    Users may conduct mobile-to-land MOS test based on network type, or customize

    the solution by defining the terminal type and quantity according to network type.

    Currently, this test mode supports a maximum of four networks simultaneously.

    Note: For POLQA, currently Pilot Pioneer supports only 8K speech sampling

    and downlink POLQA score when POLQA algorithm is used.

    Figure 5.2 Mobile-to-Land

    5.4.2 MOS Analysis Solution Based on Pilot Navigator

    Pilot Navigator is a professional post processing tool developed by Dingli

    Communications. It provides multiple professional analysis reports which include

    speech quality test results.

    1. Speech Quality Report

    In the Speech Quality Report, the PESQ or POLQA Mean Value and Middle Value

    are displayed. Figure 5.3 is an example.

  • Dingli (27/7/2013) DL1AMOSWP Rev1 22 / 37

    Figure 5.3 Speech Quality Report provided by Pilot Navigator

    Speech Quality(General) Result Speech Quality(MOC) Result Speech Quality(MTC) Result

    PESQ MOS Mean Value 3.63 PESQ MOS Mean Value 3.63 PESQ MOS Mean Value 3.63

    PESQ MOS Middle Value 3.71 PESQ MOS Middle Value 3.71 PESQ MOS Middle Value 3.71

    POLQA ScoreNB Mean Value 0.00 POLQA ScoreNB Mean Value 0.00 POLQA ScoreNB Mean Value 0.00

    POLQA ScoreNB Middle Value POLQA ScoreNB Middle Value POLQA ScoreNB Middle Value

    POLQA ScoreSWB Mean Value 0.00 POLQA ScoreSWB Mean Value 0.00 POLQA ScoreSWB Mean Value 0.00

    POLQA ScoreSWB Middle Value POLQA ScoreSWB Middle Value POLQA ScoreSWB Middle Value

    PESQ Score Average Value 3.58 PESQ Score Average Value 3.58 PESQ Score Average Value 3.58

    PESQ Score Medium Value 3.62 PESQ Score Medium Value 3.62 PESQ Score Medium Value 3.62

    PESQ LQ Average Value 3.58 PESQ LQ Average Value 3.58 PESQ LQ Average Value 3.58

    PESQ LQ Medium Value 3.67 PESQ LQ Medium Value 3.67 PESQ LQ Medium Value 3.67

    BLER

  • Dingli (27/7/2013) DL1AMOSWP Rev1 23 / 37

    multiple channels for speech quality testing, multiple RCU MOS test modules can

    simultaneously call the server number. In this way, multi-channel MOS values are

    obtained. Since the called party is a fixed-line phone number, the transmission loss is

    extremely small. As a result, the MOS values obtained on the server side and the

    RCU side can be considered as pure uplink and downlink MOS values.

    Note: For POLQA, this mode supports only the downlink score and a maximum

    of three dual-core RCU test modules simultaneously.

    Figure 5.5 Calling from an RCU MOS Test Module to server

    2. From an RCU MOS Test Module to an RCU MOS Test Module

    Figure 5.6 shows the general structure of the speech quality testing from an RCU

    MOS test module to an RCU MOS test module. Users may initiate calls from one

    MOS test module to another to perform MOS comparison testing. The speech can be

    replayed at one end and recorded at the other end, and then the MOS value can be

    calculated through the MOS algorithm. In addition, users may initiate calls from one

    test module to another in the same RCU. This testing method is relatively simple,

    without installing speech quality testing server at the server side. Therefore, it is

    applicable to the speech quality testing in remote communication.

  • Dingli (27/7/2013) DL1AMOSWP Rev1 24 / 37

    Figure 5.6 Calling from an RCU MOS Test Module to an RCU MOS Test Module

    The testing process is almost the same as that of from an RCU test module to a

    server. The difference is that the calls are between two RCU test modules or two test

    modules in the same RCU, and this solution also supports multi-channel MOS

    testing.

    Note: This mode supports a maximum of two dual-core RCU test modules to

    dial to each other when POLQA algorithm is used.

    5.4.4 MOS Testing Solution Based on Pilot Walktour

    1. Mobile to Mobile

    As shown in Figure 5.7, Dingli Pilot Walktour supports calling from a mobile phone to

    a mobile phone to perform MOS testing. By using the mobile phone integrated

    software kernel and MOS algorithm, users may use one mobile phone to replay the

    speech sample and the other to record the voice, and perform speech quality testing.

    Note: In this mode, IOS Walktour POLQA score can only be viewed when

    analyzed with Pilot Navigator but not on the mobile phone; Android Walktour

    supports POLQA score only on the mobile phone of the calling party.

  • Dingli (27/7/2013) DL1AMOSWP Rev1 25 / 37

    Figure 5.7 MOS testing based on Pilot Walktour

    2. Mobile-to-Land

    Users may conduct mobile-to-land MOS test based on network type, or customize

    the solution by defining the terminal type and quantity according to network type.

    Note: In this mode, iOS Walktour support POLQA score only when analyzed

    with Pilot Navigator but not on the mobile phone. Android Walktour supports

    POLQA score only on the mobile phone of the calling party.

    5.5 Details About Dingli MOS Testing Solution

    5.5.1 Impact of Network Factors on PESQ MOS Value

    1. Impact of Voice Coding Mode on MOS Value

    In GSM network technology, there are multiple voice coding modes such as

    Enhanced Full Rate (EFR), Full Rate (FR), and Half-Rate (HR). Based on the coding

    principle, it is possible to obtain:

    MOS value in the EFR mode > MOS value in the FR mode > MOS value in the HR

    mode

    The following is a group of speech quality testing results (from a mobile phone to a

    fixed-line phone) without environmental interference.

    Uplink PESQ MOS value: value in the EFR mode (max 4.20) > value in the

    FR mode (max 3.94) > value in the HR mode (max 3.69)

    Downlink PESQ MOS value: value in the EFR mode (max 4.255) > value in

    the FR mode (max 3.940) > value in the HR mode (max 3.728)

  • Dingli (27/7/2013) DL1AMOSWP Rev1 26 / 37

    In commercial networks, because of network interference and other factors, the test

    results obtained by using different coding rates will be quite different from the results

    obtained in ideal environment. Table 6 shows an example of the average value of an

    actual speech quality testing in urban environment.

    Table 6 Impact of HR in GSM Network on PESQ result

    PESQ Result HR Rate = 0% HR Percentage

    = 40%

    HR Percentage

    =100%

    Uplink PESQ

    MOS value 3.588 3.428 3.331

    Downlink PESQ

    MOS value 3.418 3.325 3.259

    2. Impact of Link and Routing on MOS Value

    In speech quality testing, different links and routings are also important factors that

    affect the PESQ MOS value. For example, if the called party is from a different

    network (e.g. CDMA to GSM) instead of the same Radio Access Network, the result

    may differ.

    In an actual network test, it was found that the tested MOS value is lower than usual.

    After ruling out equipment and wireless causes it was discovered that the problem is

    caused by long-distance switching and routing. In the scenario, the PESQ speech

    quality test uses two routings. Each call is then locked to a separate route to

    determine which one reported the lower MOS value. One of the routes was in fact

    transferring some error code which leads to the lower MOS value. The reported MOS

    value was normal again after the issue has been resolved.

    3. Impact of Abis Compression on MOS Values

    Back in January 2005 (Beijing), Dingli conducted a test on Abis compression

    (bandwidth savings on Abis level) impact on MOS value from a live network. The test

    was divided into three sections: test with no compression, test with 5-slot

    compression, and test with 6-slot compression. See Table 7.

    Table 7 Impact of Abis compression on MOS value

    Item Average Downlink Value Average Uplink Value

    No compression 3.964 3.938

    5 Time-slot compression 3.424 3.464

    6 Time-slot compression 3.364 3.367

  • Dingli (27/7/2013) DL1AMOSWP Rev1 27 / 37

    Conclusion: The greater the compression, the lower the MOS value.

    4. Impact of Network Quality on MOS Value

    There is a certain relationship between the network quality (RxQual Sub for GSM,

    FFER for CDMA) and PESQ MOS value.

    When the GSM RxQual Sub value range is [0, 2], the PESQ MOS value

    range is [3, 4.1].

    When the CDMA FFER value range is [0, 3%], the PESQ MOS value range is

    [3, 4.1].

    Generally, if the RxQual Sub/FFER value is high, the MOS value is low. However, if

    the MOS value is low, the RxQual Sub/FFER value is not necessarily high (the MOS

    value may be affected by other factors).

    Good network environment is the basic element ensuring good wireless

    communication. Low C/I and C/A (Carrier/Adjacent) value will result in high BER,

    which will decrease the call quality or triggers dropped call. High BER caused by

    network frequency interference have been the major concern for network

    optimization. From users perceptual aspect, transient BER does not affect user's

    listening experience, while continuous BER causes frame loss and serious impact on

    listening.

    Following conclusion can be made after practical tests were conducted:

    In EFR mode, when the downlink RXQUAL value is greater than 4.8, the

    downlink PESQ MOS value is lower; when the downlink RXQUAL value is

    greater than 5.4, the downlink PESQ MOS value is lower than 3.3; when the

    downlink RXQUAL value is greater than 6, the downlink PESQ MOS value is

    straight down to an unbearable degree.

    In FR mode, when the downlink RXQUAL value is greater than 5.1, the

    downlink PESQ MOS value is lower; when the downlink RXQUAL value is

    greater than 5.6, the downlink PESQ MOS value is lower than 3.3; when the

    downlink RXQUAL value is greater than 6, the downlink PESQ MOS value is

    straight down to an unbearable degree.

    In HR mode, when the downlink RXQUAL value is greater than 4.8, the

  • Dingli (27/7/2013) DL1AMOSWP Rev1 28 / 37

    downlink PESQ MOS value is lower; when the downlink RXQUAL value is

    greater than 5.2, the downlink PESQ MOS value is lower than 3.3; when the

    downlink RXQUAL value is greater than 6, the downlink PESQ MOS value is

    straight down to an unbearable degree.

    5. Impact of Network Handover on MOS Value

    Handover is the most basic and important feature in GSM network technology. In the

    handover process, TCH (traffic channel) frame is used as FACCH (Fast Associated

    Control Channel) to transfer relevant handover signaling. This temporary interruption

    is to maintain the network connection performance and complete the handover to a

    more suitable cell. However, it is at the expense of continuous voice and has impact

    on the speech quality to a certain extent. When frequent ping-pong handover occurs,

    successive frame used (as FACCH) will cause voice interruption similar to that

    caused by frame loss.

    Following conclusion can be made after practical tests were conducted:

    Handover has serious impact on the PESQ MOS value. In addition, when the

    handover frequency increases, the speech quality may drop to an intolerable

    degree, causing low value.

    In EFR mode, if the handover occurs once every six seconds in transferring

    uplink and downlink voice, the average uplink and downlink PESQ MOS

    value is 1 lower than the maximum value. If the handover occurs twice every

    six seconds in transferring uplink and downlink voice, the average uplink and

    downlink PESQ MOS value is 2 lower than the maximum value.

    In FR mode, if the handover occurs once every six seconds in transferring

    uplink and downlink voice, the average uplink and downlink PESQ MOS

    value is 1 lower than the maximum value. If the handover occurs twice every

    six seconds in transferring uplink and downlink voice, the average uplink and

    downlink PESQ MOS value is 1.5 lower than the maximum value.

    In EFR / FR / HR mode, when handover failure occurs ( e.g. similar to

    ping-pong handover), as the handover count increases, so is the drop in

    PESQ MOS value (the PESQ MOS value may dropped close to 1).

    6. Impact of Signal Strength on MOS Value

    When signal strength changes and BER / FER is not greater than 0, the RXQUAL

  • Dingli (27/7/2013) DL1AMOSWP Rev1 29 / 37

    Sub value also maintains the value 0. In this case, the speech quality is theoretically

    not affected.

    Following conclusion can be made after practical tests were conducted:

    When the C/I value is not changed, signal strength fluctuation does not affect the

    PESQ MOS value; when the signal strength fluctuation affects C/I value, the PESQ

    MOS value will also be affected.

    5.5.2 Parameters in POLQA Testing Result

    Theoretically, the parameters affecting PESQ MOS values will have an impact on the

    POLQA test results. POLQA research is currently ongoing. This section describes the

    detailed information about the POLQA test interface and parameters in Pilot Pioneer.

    Figure 5.8 shows the POLQA test interface in Pilot Pioneer.

  • Dingli (27/7/2013) DL1AMOSWP Rev1 30 / 37

    Figure 5.8 POLQA test interface in Pilot Pioneer

    The reference wave and degraded wave is displayed on the upper part of the

    interface, and the POLQA test results are displayed on the lower part of interface,

    including information about Library Version, Processing Mode (NB/SWB), Mean

    Delay, Minimum Delay, Maximum Delay, and etc. For detailed information, see

    Table 8.

    Table 8 POLQA parameters in Pilot Pioneer

    Parameter Description

    Library Version POLQA OEM lib version.

    P863 Version ITU P.863 reference version code.

    Processing Mode Mode, including Narrowband (NB) or Super Wideband

    (SWB).

    Sample Rate(Hz) Sample rate of the reference speech signal in Hz

    Typical sample rates of NB and SWB are 8 KHz and 48

  • Dingli (27/7/2013) DL1AMOSWP Rev1 31 / 37

    Parameter Description

    KHz respectively.

    POLQA Score POLQA score according to P.863, 1 to 4.5 for NB mode, 1

    to 4.75 for SWB mode.

    Mean Delay (ms) The average delay between the reference and the test

    signal in ms.

    Minimum Delay (ms) The minimum delay between the reference and the test

    signal in ms.

    Maximum Delay (ms) The maximum delay between the reference and the test

    signal in ms.

    Attenuation(dB) The attenuation between the two input signals in dB.

    R value(E Model) The POLQA score mapped as a G.107 (E-Model) Ie value.

    Estimated Sample

    Rate(Hz)

    The sample rate of the degraded signal as measured by

    POLQA.

    Resampling Applied

    If the sample rates of the reference and the degraded

    signal differ by more than 0.5%, POLQA will down sample

    the signal with the higher sample rate. If this happens,

    Resampling Applied will be set to 1.

    Direction Uplink or Downlink.

    Downlink means the device is playing the reference signal.

    Level Reference

    (dBov)

    The Level of the reference signal in dBov (averaged over

    the entire signal).

    Level Degraded (dBov) The Level of the degraded signal in dBov (averaged over

    the entire signal).

    ASL Reference (dBov) The active speech level of the reference signal in dBov,

    measured similar to P.56.

    ASL Degraded (dBov) The active speech level of the degraded signal in dBov,

    measured similar to P.56.

    Pause Level Reference

    (dBov)

    The silence level of the reference signal in dBov,

    measured similar to P.56.

    Pause Level Degraded

    (dBov)

    The silence level of the degraded signal in dBov,

    measured similar to P.56.

    SNR Reference(dB) The SNR in dB of the reference signal.

  • Dingli (27/7/2013) DL1AMOSWP Rev1 32 / 37

    Parameter Description

    SNR Degraded(dB) The SNR in dB of the degraded signal.

    ASR Reference

    The active Speech Ratio (ASR) of the reference signal.

    ASR is the ratio of the length of the active speech signal

    parts and the total signal length [0...1].

    ASR Degraded

    The active Speech Ratio (ASR) of the degraded signal.

    ASR is the ratio of the length of the active speech signal

    parts and the total signal length [0...1].

    Pitch Reference(Hz) The average pitch frequency of the reference signal.

    Pitch Degraded(Hz) The average pitch frequency of the degraded signal.

    Record file The recorded degraded signal file, along with its location.

    5.5.3 Reference PESQ-LQ Value: Some practical examples

    Table 9, Table 10, and Table 11 describes the mean value and excellent value ratio

    (excellent value range: 3.0 to 4.5) of three live networks with different scenarios in a

    single test.

    Table 9 Testing result of Operator A

    Urban DT Voice Highway DT Voice

    PESQ-LQ

    Mean Value

    3.0-4.5

    Percentage

    PESQ-LQ

    Mean Value

    3.0-4.5

    Percentage

    3.27 88.17% 3.25 86.00%

    Table 10 Testing result of Operator B

    Urban DT Voice Highway DT Voice

    PESQ-LQ

    Mean Value

    3.0-4.5

    Percentage

    PESQ-LQ

    Mean Value

    3.0-4.5

    Percentage

    3.49 79.69% 3.36 75.64%

    Table 11 Testing result of Operator C

    Urban DT Voice Highway DT Voice

    PESQ-LQ

    Mean Value

    3.0-4.5

    Percentage

    PESQ-LQ

    Mean Value

    3.0-4.5

    Percentage

    3.46 78.42% 3.43 78.11%

  • Dingli (27/7/2013) DL1AMOSWP Rev1 33 / 37

    66 AAppppeennddiixx 11:: IITTUUTT PP..886622 SSttaannddaarrdd RReelleevvaanntt IInnffoorrmmaattiioonn

    1. Factors for which PESQ had demonstrated acceptable accuracy

  • Dingli (27/7/2013) DL1AMOSWP Rev1 34 / 37

    2. Factors for which PESQ had demonstrated acceptable accuracy (concluded)

    3. PESQ is known to provide inaccurate predictions when used in conjunction

    with these variables, or is otherwise not intended to be use with these variables

    4. For further study. Factors, technologies and applications for which PESQ has

    not currently been validated

  • Dingli (27/7/2013) DL1AMOSWP Rev1 35 / 37

    5. For further study. Factors, technologies and applications for which PESQ has

    not currently been validated (concluded)

  • Dingli (27/7/2013) DL1AMOSWP Rev1 36 / 37

    6. ITUT Temporal structure and duration of source material

  • Dingli (27/7/2013) DL1AMOSWP Rev1 37 / 37

    77 RReeffeerreenncceess

    [1]. OPTICOM GmbH& SwissQual AG, Perceptual Objective Listening Quality Analysis,

    Technical White Paper.

    [2]. Psytechnics Limited, PESQ An Induction White Paper, September 2001

    [3]. ITU-T Recommendation P.862. Perceptual Evaluation of Speech Quality (PESQ), an

    Objective Method for End-To-End Speech Quality Assessment of Narrowband

    Telephone Networks and Speech Codecs, February 2001.

    [4]. ITU-T Recommendation P.862.1. Mapping Function for Transforming P.862 Raw

    Result Scores to MOS-LQO, November 2003.

    [5]. ITU-T Recommendation P.863, Perceptual objective listening quality assessment,

    January 2011

    [6]. ITU-T TELECOMMUNICATION STANDARDIZATION SECTOR OF ITU P.862 Table

    1/P.862, Table 2/P.862, Table 3/P.862 ITUT Temporal structure and duration of source

    material (8.1.2) February 2001

  • Dingli (27/7/2013) DL1AMOSWP Rev1 38 / 37

    Dingli Communications Corp., Ltd

    Address: No.8, the Fifth Technology Road, Gangwan Avenue, Zhuhai,

    P.R.China, 519085.

    Tel: 0756-3391933 Fax: 0756-3391900

    Technical Hotline: 4008878822

    Email: [email protected]