Tems Voice Service Quality Evaluation Techniques and Polqa

download Tems Voice Service Quality Evaluation Techniques and Polqa

of 13

Transcript of Tems Voice Service Quality Evaluation Techniques and Polqa

  • 7/28/2019 Tems Voice Service Quality Evaluation Techniques and Polqa

    1/13

    Prepared by: Date: Document:

    Dr. Irina Cotanis 3 November 2010 NT11-1037

    Ascom (2010)All rights reserved. TEMS is a trademark of Ascom. All other trademarks are the property of their respective holders.

    Voice Service Quality EvaluationTechniques and the New Technology,POLQA

    White Paper

  • 7/28/2019 Tems Voice Service Quality Evaluation Techniques and Polqa

    2/13

    Ascom (2010) Document:

    NT11-1037 2(13)

    Contents

    1 Todays Voice Service Challenges .................................... 32 Speech Quality Evaluation Techniques ............................ 3

    2.1 Intrusive Techniques ...................................................... 42.2 Non-Intrusive Techniques .............................................. 42.3 Standardization Status and Evolution Related to

    the Listening Quality of Voice Service ................ .......... 53 POLQA Technology ............................................................ 6

    3.1 POLQA Algorithms Overview .............................. .......... 73.2 Operability Requirements............................................... 83.3 Telecommunication Test and Application

    Scenarios ......................................................................... 93.4 Understanding POLQA Limitations ................ ............. 103.5 POLQA Algorithms Performance Evaluation ............. 10

    4 Beyond the MOS Score .................................................... 115 Ascom Network Testing Presence in the

    Standardization Work on Objective EvaluationMetrics for Listening Speech Quality ............................. 12

    6 Conclusions ...................................................................... 127 References ........................................................................ 13

  • 7/28/2019 Tems Voice Service Quality Evaluation Techniques and Polqa

    3/13

    Ascom (2010) Document:

    NT11-1037 3(13)

    1 Todays Voice Service Challenges

    Almost 10 years ago, operators and infrastructure vendors were strugglingto provide speech quality on 2G networks at the level expected by usersaccustomed to PSTN levels of quality. Network optimization and

    troubleshooting, as well as advanced speech processing techniques and anin-depth understanding of speech transport on wireless networks, helpedoperators bring the level of speech quality on 2G networks to that of fixednetworks. With the 3G network evolution, with the move to all IP, and withthe transition from narrowband (NB) to wideband (WB) speech, it wasexpected that wireless voice services would even supersede traditionalPSTN quality.

    However, todays voice services are still raising a set of challenges foroperators as they attempt to continue meeting their users expectations.The roots of these challenges lie mainly in the convergence andcoexistence of voice, data, and multimedia application services, whichinvolve a multitude of factors that invariably produce new types of

    distortions that dynamically, variably, and sometimes randomly affectspeech quality.

    These factors range from the increased demand for capacity generated byhigh and dynamic traffic patterns with various application-dependentpatterns to low and adaptive bit rate codecs with different bandwidths (NB,WB, and super wideband (SWB)) and complex error concealment solutionsas well as voice enhancement devices (e.g., noise suppressors, automaticgain control, echo cancellers) designed to counter speech degradation withspeech processing techniques that, if not well designed and implemented,could have an effect opposite that of the desired speech qualityenhancement.

    In addition, with next generation network (NGN) (LTE/SAE-SON) evolution,

    network vendors as well as operators are looking to a challenging changefrom traditional CS, and then from VoIP to VoIP over IMS (VoLTE). Detailson these challenges can be found in [1].

    2 Speech Quality Evaluation Techniques

    Providing voice service on NGNs at the quality level demanded bysubscribers while supporting backward compatibility with 3G/2G networksas well as integrating voice with a myriad of multimedia and data servicesincreases the need for voice quality testing. Likewise, providing andensuring a high quality level for testing and evaluating speech qualitycomes with its own series of challenges.

    The need for cost efficient speech quality evaluation techniques to replacesubjective testing while ensuring high accuracy on a larger variety ofnetwork configurations and conditions, codecs, bandwidths, andapplications continues to drive network testing tools and infrastructurevendors, as well as operators and standardization organizations, tocollaborative work on speech quality evaluation techniques.

    Extensive work has been performed during the last decade by both theITU-T and the telecommunication industry in developing speech quality

  • 7/28/2019 Tems Voice Service Quality Evaluation Techniques and Polqa

    4/13

    Ascom (2010) Document:

    NT11-1037 4(13)

    evaluation algorithms designed to accurately evaluate any networkdegradation impact on subscriber perception as well as to cope with thecomplex testing conditions of the 3G environment.

    These speech quality evaluation algorithms have been developed with

    different scopes and applications. They can be either intrusive perceptualsolutions performing end-to-end speech quality evaluation [2], [3], [6] ondifferent types of networks (wireless, VoIP, or fixed) based on the speechsignal, or non-intrusive perceptual (single-ended algorithms) [4] and non-intrusive parametric [5], which can evaluate speech quality at differentnodes of the network (including the end node) based on the degradedspeech signal and, respectively, on network parameters.

    2.1 Intrusive Techniques

    These algorithms provide speech quality scores by comparing reference(transmitted) and degraded (received) speech samples. Therefore, intrusive

    assessment techniques require access to both the transmission andreception ends of communication. Comparing time-frequency processedreference and degraded speech samples based on human perception andcognition models facilitates an accurate estimation of the subjectiveperception of speech quality received by the terminal. An accurateestimation, however, is performed at the cost of sending the test samplesthrough the network under test. The connection under test is thereforewithdrawn from normal service and rendered unavailable to the customer.During peak hours, and for some technologies and certain areas, thissituation may generate artificially low quality scores.

    Intrusive perceptual metrics estimate end-to-end speech quality, and thusare useful and meaningful to network operators for monitoring the quality

    experienced (QoE) by their voice service subscribers.

    2.2 Non-Intrusive Techniques

    Non-intrusive metrics can be network parameter based or speech based.Parametric methods can use RF and/or IP parameters for predictingquality. Their limitation comes from the fact that these algorithms canactually predict quality affected either by the radio access network or by theIP-core network. Just a few studies are going on investigating the possibilityof combining the effects of both RF and IP parameters on speech quality.The non-intrusive speech based methods need to use predictions regardingthe transmitted original speech based on the degraded signal. Strong

    degradations could easily affect the accuracy of these predictions and,therefore, the overall speech quality evaluation. As a result, even thoughthey are based on the processing of the speech signal using humanperception and cognition models, these algorithms are recommended onlywhen large amount of samples are available for averaging [4].

    Although less accurate than intrusive perceptual metrics, non-intrusiveperceptual and parametric algorithms have an important role in network

  • 7/28/2019 Tems Voice Service Quality Evaluation Techniques and Polqa

    5/13

    Ascom (2010) Document:

    NT11-1037 5(13)

    monitoring for SLA agreements as well as troubleshooting and optimizationof different network elements.

    2.3 Standardization Status and Evolution Related to theListening Quality of Voice Service

    Techniques for objective and subjective evaluation of voice service qualityare developed within ITU-T Study Group 12 Performance, QoS and QoE.Standardization organizations such as ETSI/3GPP and other industryforums work in liaison with ITU-T.

    For almost a decade, the intrusive perceptual solution for listening speechquality evaluation has been PESQ standard P.862 (along with P.862.1, 2,and 3) [2]. With the 3G network evolution towards all IP, particularly NGN(LTE/SAE-SON), ITU-T recognized the industrys immediate need for anew standard that would both improve current PESQ performance under

    certain specific network conditions (e.g., CDMA networks, EVRC codecs)and cover 3G network evolution for voice service: from traditional CS toVoIP and VoIP over IMS, from NB to WB and SWB, and from low codecrates to very low and adaptive codec rates. As a result, POLQA wasdeveloped [3], [6].

    POLQA development and the wireless technology evolution toward NGNshowed that more than a subjective mean opinion score (MOS) is neededfor infrastructure vendors and operators to understand subscriberperception and to appropriately troubleshoot and optimize their networks forthe voice service. Details related to new study items initiated in ITU-T arepresented in [1].

    The non-intrusive solution is covered by the perceptual metric P.563 and by

    the IP parametric based P.564.Comprehensive summaries of standardized speech quality evaluationmetrics, their characteristics, and their applications are presented in Figure1 for perceptual based metrics and in Figure 2 for parametric basedmetrics.

  • 7/28/2019 Tems Voice Service Quality Evaluation Techniques and Polqa

    6/13

    Ascom (2010) Document:

    NT11-1037 6(13)

    Figure 1. Perceptual (Signal-Based)

    Figure 2. No Reference (Parametric Based)

    3 POLQA TechnologyToday, voice service quality is determined by more than speech codecsused or frames lost. Networks and devices now integrate many newcomponents ranging from voice enhancement devices (e.g., automatic gaincontrollers, noise reduction, and smart loss concealment schemes) to newtechniques and features such as time scaling (stretching and compressionof the speech signals in the time domain). All these components have beendesigned to ensure, maintain, and possibly even increase user experience

    E2EQoE monitoringTroubleshooting in correlation

    with perception metric

    Intrusive:

    Uses test original and degraded speech signals to

    provide quality score

    Non-intrusive:

    Uses impaired, received speech to

    predict quality

    Advantages: Highly accurate estimator of subscribers opinion Reflects the quality ensured by the entire network

    as perceived by users Requires access only to the end point

    Advantages:

    Normal usage of the network Troubleshooting the problem generating node

    High time and space granularity

    Disadvantages:

    Uses test stimuli that could artificially load the

    network

    Limited space-time granularityd efined by the

    speech/video sample length requirement

    Disadvantages:

    Low accuracy (high-order averaging is required and

    therefore possible problems could be smoothed out)

    Algorithms:

    ITU-T P.862, 1-3 series (PESQ)

    ITU-T P.863 (POLQA) (ITU-T consented on 17September 2010 )

    Algorithms:

    ITU-T P.563

    Perceptual (signal based)

    Troubleshooting in correlation with the network

    parameters and perceptual metric

    Non-intrusive:

    Uses IP / transport parameters (or could possibly use RF, too)

    Advantages:

    Normal usage of the network Troubleshooting the problem generating node (if access enabled) High time and space granularity Possibility for quick correlation with network behavior

    Disadvantages:

    Low accuracy (high-order averaging is required and therefore possible problems could be smoothed out)

    Quality evaluation is one-dimensional, taking into consideration metrics belonging to a single segment of the

    entire network (such as IP)

    Algorithms:

    ITU-T P.564 (IP parameter based)

    No Reference (parametric-based)

  • 7/28/2019 Tems Voice Service Quality Evaluation Techniques and Polqa

    7/13

    Ascom (2010) Document:

    NT11-1037 7(13)

    of the perceived voice service quality. However, due to the complexity ofthe speech processing involved, these components might cause new andunexpected degradation effects. POLQA is especially designed to handledisruptive effects caused by these multicomponent distortions.

    3.1 POLQA Algorithms Overview

    As an intrusive perceptual metric, POLQA processes and compares thetransmitted original speech signal and the degraded received speech signalin order to provide a prediction of the quality that would be perceived bysubjects (regular subscribers) in a subjective listening test. The high levelarchitecture of the algorithm is presented in Figure 3.

    POLQA processes both the original signal and the degraded signal beforeperforming the comparison. The processing of the original signal is basedon the fact that since the subjective testing is carried out without a direct

    comparison against an original (Absolute Category Rating), the ideal signalassumption on which the subject bases his or her opinion is unknownduring the test. The processing of the degraded signal is related to high-level cognitive processes (e.g., relative insensitivity to linear frequencyresponse distortion and to steady state wideband noise [3]).

    POLQA runs a time alignment of the degraded signal against the originalspeech signal before the comparison process. The determined delay isused both for estimating and using the proper sampling frequency as wellas for delay compensation in the comparison process performed based ona perceptual model [3]. The accuracy of the comparison process isdetermined by the transformation applied to the original and degradedsignals to an internal representation that is similar to the psychophysicalrepresentation of audio signals in the human auditory system. The

    transformation is applied in the perceptual frequency (Bark) and theloudness domains (Sone), and runs in several steps: time alignment, levelalignment to a calibrated listening level, time-frequency mapping, frequencywarping, and compressive loudness scaling [3].

    The internal representation takes into account several factors impacting theperceived quality, such as playback level mapping from the digital signalrepresentation level, local gain variations, rapid variations, linear filtering,and noise levels. In addition, it applies different levels of compensation forthese factors depending on their final contribution to the overall perceptualdisturbance. Therefore, minor and stationary differences between theoriginal and degraded speech signals are compensated, while more severeeffects known to have a greater impact on the perceived quality are only

    partially compensated [3]. .The final quality perception at the output of the module calculates thedifference between the original and degraded internal representationsbased on a small number of quality indicators that are used to model allrelated subjective effects. The cognitive model calculates the followingparameters: frequency response indicator, noise indicator, roomreverberation indicator, and three more indicators describing the internaldifferences in the time-pitch-loudness domain. All these indicators are

  • 7/28/2019 Tems Voice Service Quality Evaluation Techniques and Polqa

    8/13

    Ascom (2010) Document:

    NT11-1037 8(13)

    combined to give an objective listening quality expressed by the rawPOLQA score [3].

    The raw POLQA score is then mapped to the subjective MOS domain,MOS-LQO. The mapping is a third order polynomial mapping developed

    based on a large set of databases (tens of thousands of speech samples)containing a broad range of network types (fixed, IP, and mobile) andconditions (simulated error patterns and live degradations), codecs (e.g.,AMR NB,/WB, G.722.1, iLBC, EVRC, EVRC-WB, EVRC-A/B, AAC/AACLD, Skype, MP3 low bit rate, G.726, EFR), various BGN types and levels,different languages (American and British English, German, Swedish,French, Dutch, Czech, Chinese, and Japanese) and three speechbandwidths (NB, WB, and SWB).

    Figure 3. High Level Architecture of POLQA Algorithm

    3.2 Operability Requirements

    The POLQA algorithm is designed to predict overall listening speech qualityunder NB, WB, and SWB (50 to 14000Hz) conditions in 3G/4G (LTE-SAE)networks, including advanced speech processing technologies, acousticalinterfaces, and hands-free applications. It should be noted that POLQA hastwo operational modes: SWB and NB. The main difference is the bandwidthof the original speech signal used by the model. In SWB mode, thereceived (and potentially degraded) speech signal is compared with anSWB reference. Therefore, band limitations are considered to bedegradations and are scored accordingly. The listening quality is modelledas perceived by a human listener using a diffuse-field equalized headphonewith diotic presentation (same signal at both ear-caps). In NB mode, thereceived (and potentially degraded) speech signal is compared to an NB(300 to 3400Hz) original. Thus, normal telephone band limitations are notconsidered to be severe degradations. NB mode maintains compatibility tothe previously developed ITU-T Recommendation P.862.1 (PESQ) [2]. Thelistening quality is modelled as perceived by a human listener using aloosely coupled IRS type handset at one ear (monotic presentation).

    Perceptual

    model

    Cognitive

    model

    Environment

    modeling

    Listening

    conditions /

    cognitive

    perceptionDelay

    estimates

    Raw

    POLQA

    POLQA

    MOS-LQO

    Possibly various speech

    based diagnostic (e.g.,

    delay, gain levels, noise)

    Psycho-acoustic model

    Original speech

    Degraded speech

    Perceptualmodel

    Timealignment

    Internal representationof original (transmitted)

    speech signal

    Internal representationof degraded (received)

    speech signal

    Mapping tosubjective

    domain

    Speech databases(NB/WB/SWB; variety of co decs,

    wireless / VoIP simulated / liv econditions, acoustic / electrical,

    BGN conditions, languages)

    Difference between

    internal representations(user perceived)

  • 7/28/2019 Tems Voice Service Quality Evaluation Techniques and Polqa

    9/13

    Ascom (2010) Document:

    NT11-1037 9(13)

    3.3 Telecommunication Test and Application Scenarios

    The telecommunication scenarios include current transmissiontechnologies [3]

    Public switched networks (e.g., fixed wire PSTN, GSM, WCDMA,CDMA)

    Push-over-Cellular, Voice over IP, and PSTN-to-VoIPinterconnections, Tetra

    Commonly used speech processing components (e.g., codecs suchas AMR NB/WB, G.722.1, iLBC, EVRC, EVRC-WB, EVRC-A/B,AAC/AAC LD, Skype, MP3 low bit rate, G.726, and EFR; noisereduction systems for different types of BGN such as office, street,car, and babble; adaptive gain control; comfort noise; and othertypes of voice enhancement devices) and their combinations.

    The tested distortion types [3] cover:

    Single speech codecs and speech codecs used in tandem, ascurrently used in telecommunication scenarios

    Packet loss and concealment strategies (packet-switchedconnections)

    Frame errors and bit errors (wireless connections)

    Interruptions (such as unconcealed packet loss or handover inGSM)

    Front-end clipping (temporal clipping)

    Amplitude clipping (overload, saturation)

    Variable delay (VoIP, video-telephony) / time warping

    Gain variations

    Influence of linear distortions (spectral shaping), being also timevariant

    Non-linear distortions produced by the microphone / transducer atacoustical interfaces

    Reverberations caused by hands-free test setups in definedacoustical environments

    The application scenarios cover both electrical and acoustical measuringinterfaces as well as different terminal types (handset, headphone, orhands-free).

  • 7/28/2019 Tems Voice Service Quality Evaluation Techniques and Polqa

    10/13

    Ascom (2010) Document:

    NT11-1037 10(13)

    3.4 Understanding POLQA Limitations

    It should be noted that there are several conditions and applications forwhich POLQA was not designed. POLQA scores obtained in these types ofconditions are not reliable and should not be considered for any kind of

    speech quality evaluation. These conditions include:

    Other dimensions of speech quality such as conversational aspectsand talking quality.

    Speech quality per call. POLQA is not intended to score longersequences of speech. It is focused on prediction of quality forshorter speech utterances of 6 to 12 seconds.

    Noisy listening environments. POLQA does not predict perceivedspeech quality in these environments; it is designed in accordancewith P.800, ACR testing.

    Music (including multimedia).

    Evaluation of performance or ranking of voice enhancement devices(e.g., noise suppressors).

    Other technologies or components such as speech storage formatsor non-telephony applications such as public safety networks orprofessional mobile radio connections.

    Although yet not tested or evaluated, POLQA could be cautiously appliedfor the following applications:

    Other languages (e.g., Russian, Arabic, etc.)

    Longer speech samples

    Subjective tests for confirming POLQA performance on these types ofapplications are recommended.

    3.5 POLQA Algorithms Performance Evaluation

    Understanding POLQA performance as an estimator of subscriberperception relies on the fact that results from a subjective experimentreflect the relative quality between the tested speech samples, while theabsolute values could vary from experiment to experiment depending onthe listener group and the design of the subjective test.

    Unlike subjective results, POLQA is independent of test context andindividual voter behavior. POLQA estimates the average subjective score

    obtained from a group of voters listening to the same speech sample.Although it does not provide an exact absolute score of an individualexperiment; POLQA does reproduce the relative quality ranking [3].

    Therefore, POLQA performance evaluation involves comparison tosubjective scores as well as consideration of the variability that exists withina listening panel. In addition, the differences between individual subjectiveexperiments must be removed. This is achieved by determining and

  • 7/28/2019 Tems Voice Service Quality Evaluation Techniques and Polqa

    11/13

    Ascom (2010) Document:

    NT11-1037 11(13)

    applying an optimal regression function (3rd order polynomial) between thesubjective and objective scores.

    Due to the large numbers and types of databases, as well as their contentvariability, a rigorous and extensive evaluation procedure has been

    developed for POLQA testing. A series of different statistical metrics as wellas statistical significance testing have been used [3], but the core oneagainst which the algorithm has been optimized is the epsilon insensitiveroot mean square error that brings statistical significance and accuracy inthe sense that it best emulates the usability of POLQA and its performancein real life scenarios. The epsilon insensitive root mean square errorexpresses POLQA error against the average MOS of individual votersconsidering only differences related to an epsilon-wide band around thetarget average value. Therefore, the uncertainty of a MOS panel is takeninto account by the epsilon value defined as the 95% confidence interval ofthe averaged MOS.

    N iPerrordNrmse 1

    *

    The Perror is defined as:

    ))()()(,0max()( 95 iciiMOSLQOiMOSLQSiPerror

    where the index idenotes the condition of the speech sample, Ndenotesthe number of conditions or speech samples, and ddenotes the degrees offreedom (d= 4 in the case of a 3rd order regression).

    The results reported in [12] provide general information on the POLQAperformance on a broad range of databases containing a large variety of

    technologies, codecs and bandwidths. These results representing anoverall performance might be misleading to a certain extent. Due to thevariety of databases and the statistical aggregation procedure of the results[3], [12], a weaker or better performance for a specific application and/orbandwidth could be smoothed out or hidden. Therefore, additional analysisis expected for more detailed analysis or for a particular application. Thisanalysis is planned by ITU-T during the POLQA characterization phase andthe results are expected to be published in the forthcoming POLQAApplication Guide (estimated for June 2011).

    4 Beyond the MOS Score

    Due to the complexity of the NGN environment, as well as the challenges insupporting voice service on LTE-SAE/SON networks, several solutions forproviding voice service are currently envisioned. Therefore, test andevaluation of speech quality in the NGN environment must becomprehensive. In order to understand and cost efficiently control thespeech degradation of different implementation solutions, evaluationtechniques need to go beyond the MOS score.

  • 7/28/2019 Tems Voice Service Quality Evaluation Techniques and Polqa

    12/13

    Ascom (2010) Document:

    NT11-1037 12(13)

    To a large extent, as in the PESQ case, interim calculations of POLQA aswell as the six degradation parameters used as input to the POLQAalgorithms cognitive model would allow some network diagnosis based onspeech quality evaluation. Details are discussed in [1], but generally themain diagnosis could regard aspects such as latency, jitter (variable delay),gain variations, speech signal and BGN level measurements, level clipping,dropouts (e.g., generated by packet loss), operability of VAD, and short-term spectra (linear degradations caused by either the frequency responseof the devices and/or by the VoIP landline connection).

    5 Ascom Network Testing Presence in theStandardization Work on Objective EvaluationMetrics for Listening Speech Quality

    For more than 10 years, Ascom Network Testing has been an activemember within ITU-T Study Group 12, which develops objective speech

    quality evaluation metrics. Our contributions to the standardization workcover different areas and stages of objective metric development.

    Ascom Network Testing contributed live recorded speech databasesneeded for accurate training and tuning of the algorithms running in real lifescenarios typical of network troubleshooting, optimization, and operationapplications performed by operators. Within ITU-T, we were the initiatorand developer of the statistical evaluation procedure for objective metricsthat was first applied to PESQ and that was later applied in a modified formto POLQA [8]. Recently, based on our initial work as well as workperformed for POLQA performance evaluation, Ascom Network Testingintroduced a new study item within ITU-T on a more general statisticalevaluation procedure to be applied to various types of objective metrics [9].This type of evaluation becomes more and more a must for all kinds ofobjective metrics (e.g., speech, video, audio, multimedia) that are designedfor testing in real life networks and therefore for their implementation innetwork testing tools. We also developed a technique for objective qualitymetrics calibration to the MOS scale. As a result, we co-authored twostandards in relation to PESQ: P.862.1 (Mapping PESQ to MOS domain)and P.862.3 (Guidance for PESQ usage) [2].

    Additionally, Ascom Network Testing recently wrote a white papercontribution [10] on aspects related to POLQA implementation in fieldtesting tools, as well as a white paper contribution related to topics that arerequired to be studied during the POLQA characterization phase [11].

    6 ConclusionsThe convergence and coexistence of voice, data, and multimediaapplication services, which involve a multitude of factors that invariablyproduce new types of distortions that dynamically, variably, and sometimesrandomly affect voice service quality. Today, speech quality is determinedby more than speech codecs used or frames lost. Networks and devicesnow integrate many new components ranging from voice enhancementdevices to new techniques such as time scaling.

  • 7/28/2019 Tems Voice Service Quality Evaluation Techniques and Polqa

    13/13

    Ascom (2010) Document:

    NT11-1037 13(13)

    Extensive work has been performed during the past decade by both theITU-T and the telecommunication industry in developing speech qualityevaluation algorithms designed to accurately evaluate any networkdegradation impact on subscriber perception as well as to cope with thecomplex testing conditions of the 3G networks and beyond. The newtechnology POLQA was developed to cope with the evolving networkscomplexities. Like with all new technologies, extensive life testing isexpected to complete POLQA algorithms performance picture. AscomNetwork Testing, a proved veteran in ITU-T on the objective quality metricsevaluation, continues to play an active role in the standardization work onthis topic.

    7 References

    [1] I. Cotanis, Voice Services in the Next Generation Networks/LTE-SON as Perceived by Users, Ascom Network Testing white paper,November 2010.

    [2] ITU-T P.862.x series; P.862 (PESQ algorithm), P.862.1 (Mapping toMOS domain), P.862.2 (WB-PESQ), P.862.3 (PESQ-Applicationguide); PESQ algorithm.

    [3] ITU-T P.863, Perceptual Objective Listening Quality Assessment(POLQA), Geneva, January 2011.

    [4] ITU-T P.563, Single-ended method for objective speech qualityassessment in narrow-band telephony applications.

    [5] ITU-T P.564, Conformance testing for voice over IP transmissionquality assessment models.

    [6] ITU-T TD SG 12 Gen 345, Final report of Working Party 2,Geneva, May 2010.

    [7] ITU-T P.800, Subjective testing of overall listening speech quality.

    [8] I Cotanis, ITU-T SG12/Q9 C137, A procedure for statisticalevaluation of the objective quality metrics performance, May 2008.

    [9] I. Cotanis, ITU-T C151, Proposal on statistical evaluationframework for objective quality algorithms, submitted for ITU-TJanuary 2011 meeting.

    [10] I. Cotanis, ITU-T SG 12 C112, Some aspects related to P.OLQAstandard, May 2010.

    [11] I. Cotanis, ITU-T C142, Proposed study items for POLQAcharacterization phase, September 2010.

    [12] Opticom, TNO, SwissQual, ITU-T C148, Performance of the jointPOLQA model, September 2010.

    [13] POLQA coalition, www.polqa.info, July 2010.