Huawei Full HD Voice White Paper.pdf

16
Full HD Voice Just like speaking face to face”…. Full High Definition voice, refers to the next generation of voice quality for telephony audio resulting in crystal clear voice quality compared to digital telephony "toll quality" and even to HD voice. Full HD Voice extends the frequency range of audio signals up to 20000 Hz which covers the whole range of the human voice and that of the human ear. The AMR-WB codec, representing HD voice quality, was completed by 3GPP (The 3rd Generation Partnership Project) in 2001 and since that time codec technology has developed significantly. Codecs such as ITU-T G.718 have shown enhanced performance in poor radio channels and codecs such as 3GPP AMR-WB+ have demonstrated better quality for music signals. In March 2010 3GPP completed a study item on use-cases for Enhanced Voice Services (EVS) over the Evolved Packet System of LTE. This study [1] led directly to the development of the EVS Codec which will be completed in 3Q2014. The EVS Codec represents a huge improvement in terms of speech/audio quality and functionality when compared to existing conversational (low delay) codecs. For the first time a 3GPP conversational codec will combine high quality speech and music performance across four bandwidths; Narrowband (NB = 200 - 4000 Hz), Wideband (WB = 50 8000 Hz), Superwideband (SWB = 50 16000 Hz) and Fullband (FB = 50 20000 Hz). This level of performance exceeds that of all existing 3GPP codecs and in particular the AMR-WB codec which led to the creation of the GSMA HD Voice Logo which has been successful in encouraging the deployment of AMR-WB services. The EVS Codec is also able to compete directly in over-the-top VoIP applications with codecs such as the recently introduced OPUS. Both fixed point and floating point versions of EVS make it suitable for low power devices and PC’s. This document first presents the services and features of existing 3GPP Wideband Codec (AMR-WB) and describes the current HD Voice Logo. Over the top codecs such as OPUS are described and then the performance and features of the EVS Codec are examined. Finally we examine a new Full HD Voice Logo. and immersive sound experience for future. Full HD Voice Huawei October 2014

Transcript of Huawei Full HD Voice White Paper.pdf

Page 1: Huawei Full HD Voice White Paper.pdf

Full HD Voice

Enterprise VoIP

“Just like speaking face to face”….

Full High Definition voice, refers to the next generation of voice quality for

telephony audio resulting in crystal clear voice quality compared to digital

telephony "toll quality" and even to HD voice. Full HD Voice extends the

frequency range of audio signals up to 20000 Hz which covers the whole range

of the human voice and that of the human ear.

The AMR-WB codec, representing HD voice quality, was completed by 3GPP

(The 3rd Generation Partnership Project) in 2001 and since that time codec

technology has developed significantly. Codecs such as ITU-T G.718 have

shown enhanced performance in poor radio channels and codecs such as 3GPP

AMR-WB+ have demonstrated better quality for music signals. In March 2010

3GPP completed a study item on use-cases for Enhanced Voice Services (EVS)

over the Evolved Packet System of LTE. This study [1] led directly to the

development of the EVS Codec which will be completed in 3Q2014.

The EVS Codec represents a huge improvement in terms of speech/audio

quality and functionality when compared to existing conversational (low delay)

codecs. For the first time a 3GPP conversational codec will combine high quality

speech and music performance across four bandwidths; Narrowband (NB = 200

- 4000 Hz), Wideband (WB = 50 – 8000 Hz), Superwideband (SWB = 50 –

16000 Hz) and Fullband (FB = 50 – 20000 Hz). This level of performance

exceeds that of all existing 3GPP codecs and in particular the AMR-WB codec

which led to the creation of the GSMA HD Voice Logo which has been

successful in encouraging the deployment of AMR-WB services.

The EVS Codec is also able to compete directly in over-the-top VoIP applications

with codecs such as the recently introduced OPUS. Both fixed point and floating

point versions of EVS make it suitable for low power devices and PC’s.

This document first presents the services and features of existing 3GPP

Wideband Codec (AMR-WB) and describes the current HD Voice Logo. Over the

top codecs such as OPUS are described and then the performance and features

of the EVS Codec are examined. Finally we examine a new Full HD Voice Logo.

and immersive sound experience for future.

Full HD Voice

Huawei

October 2014

Page 2: Huawei Full HD Voice White Paper.pdf

Full HD Voice

2014-10-20 Page 2/16

Page 3: Huawei Full HD Voice White Paper.pdf

Full HD Voice

2014-10-20 Page 3/16

Contents

Introduction ...................................................................... 4

HD Voice and the 3GPP AMR-WB Codec .............................. 4

Over the Top Conversational Codecs .................................. 6

Full HD voice and new EVS Codec for VoLTE ....................... 7

Features and Performance of the EVS Codec ....................... 8

Why Operators should deploy EVS .................................... 11

EVS impact on VoLTE ...................................................... 12

Full HD Voice proposal in GSMA ....................................... 13

Future Voice: EVS Beyond 3GPP Release 12 ...................... 15

References ...................................................................... 16

Page 4: Huawei Full HD Voice White Paper.pdf

Full HD Voice

2014-10-20 Page 4/16

Introduction In March 2010 3GPP completed a study item on use-cases for Enhanced Voice

Services (EVS) over the Evolved Packet System of LTE. This study [1] led directly to

the development of the EVS Codec was completed in 3Q2014. After a competitive

qualification phase, a consortium of all of the qualified codec developers, including

Huawei Technologies, was formed and the Selection phase became a collaborative

development.

This document first presents the services and features of existing 3GPP and over the

top codecs and describes the current HD Voice Logo. Then performance and

features of the EVS Codec are examined. Finally we examine a new Full HD Voice

Logo and opportunities for Huawei to lead in the deployment of the EVS Codec.

HD Voice and the 3GPP AMR-WB Codec The better voice quality of HD voice improves the call experience over conventional

Narrowband, allowing people to be better understood, share their feelings, do

business and communicate ideas more easily. HD voice transmits slightly more of

the human voice spectrum; making conversations more natural and easily

understood. HD voice also helps people hear better in noisy environments.

HD voice helps operators to differentiate their voice service offerings and enables

high quality services e.g. voice dependent business like call centers, information and

emergency services, etc. HD voice is much better for conference calls and can

contribute to a reduction in business travel - raising productivity while reducing

environmental impact. Calls which are easier to hear and understand reduce the

fatigue often associated with long conference calls.

Orange R&D studies of HD voice customers confirmed: 96% of customers are

satisfied with HD voice calls [2].

The HD Voice Logo of GSMA (Global System for Mobile Communications

Association) has been successful in encouraging both operators and manufacturers

to provide AMR-WB and EVRC-NW based services.

Both the 3GPP AMR-WB and the 3GPP2 EVRC-NW codecs are essentially speech

codecs. A degree of performance for music signals at the higher bit rates of

operation is achieved but these codec have not been designed to provide other than

tolerable rendering.

HD voice

improves the

call experience

over

conventional

Narrowband

Page 5: Huawei Full HD Voice White Paper.pdf

Full HD Voice

2014-10-20 Page 5/16

Initially take-up of the AMR-WB codec and wideband speech services was slow,

partially due to the need for either tandem-free operation (TFO) or transcoder-free

operation (TrFO) to be available in the network, but once these innovations were in-

place the service started to take off.

There are currently many well established operators and major manufacturers signed

up as licensees of the HD Voice Logo - see Figure 1 and the Global Mobile Suppliers

Association announced in March 2014 that one hundred operators worldwide have

enabled mobile HD Voice services in 73 countries [3] - see Figure 2.

Currently the HD Voice Logo requirements for GSM/UMTS mandate use of AMR-WB

and those for CDMA2000 mandate the use of EVRC-NW; both of which are

wideband speech codecs (50 Hz to 7000 Hz). This is well aligned with the

conventional definition of “HD Voice”, which is synonymous with wideband speech

services (50 Hz to 7000 Hz); matching as it does the frequency response of these

two codecs.

Figure 1: GSMA HD Voice Logo

Figure 2: GSMA HD Voice Logo Licensees

HD Voice Operator Licensees

文档名称 文档密级

…one hundred

operators

worldwide

have enabled

HD Voice

services in 73

countries

Page 6: Huawei Full HD Voice White Paper.pdf

Full HD Voice

2014-10-20 Page 6/16

HD Voice Manufacturer Licensees

文档名称 文档密级

The group that is responsible for developing the HD Voice Logo Requirements within

GSMA, TSG VLR, is in the process of determining priorities for version 3.0; version

2.0 was approved in 2013 [3].

Over the Top Conversational Codecs Over-the-top (OTT) service providers such as Skype have been providing VoIP point-

to-point services for several years. The flexibility and processing power of the PC

platform combined with IP and little or no legacy infrastructure allowed the services

to shift easily from conventional NB services to WB and even SWB using proprietary

codecs such as SiLK. Broadband IP networks do not suffer the same radio resource

constraints as wide area mobile networks and so the drive for high quality at lower bit

rates is less obvious but nevertheless such services are already threatening the

capacity and revenue streams of mobile operators. Many operators attempt to control

their use by deep packet inspection or other profiling methods but smart phones

using WiFi connections can easily circumvent the mobile networks.

The recently standardized Opus codec in IETF RFC 6716 [4] represents a

performance benchmark that is hard to ignore for conventionally standardized

codecs. This codec which is a hybrid between the Skype SiLK voice codec and the

CELT audio codec spans a range in bit rate from 6 kbit/s to 510 kbit/s. At lower bit

rates performance is somewhat limited and the coded bandwidth is less than SWB.

The Opus codec may not live up to all of the claims as “a totally open, royalty-free

audio codec” but it represents a high quality codec at, and above, 24 kbit/s where it

codes more of the SWB bandwidth. See [5]. Unfortunately, or perhaps fortunately, 24

kbit/s represents a rather high bit rate for efficient use of the radio resource for

speech/audio in mobile systems.

The recently

standardized

Opus codec

represents a

performance

benchmark

that is hard to

ignore

Page 7: Huawei Full HD Voice White Paper.pdf

Full HD Voice

2014-10-20 Page 7/16

Full HD voice and new EVS Codec for VoLTE Full HD Voice will go beyond the quality of the current HD Voice to deliver unrivalled

quality to mobile users and provides even greater benefits. The better voice quality of

Full HD voice will improve the call experience still further, allowing people to

experience calls just as if they are speaking face to face or directly to the person they

are speaking to. Full HD voice transmits almost the entire human voice spectrum;

making conversations completely natural and as understandable as possible. Like

HD voice, Full HD Voice will also help people hear better in noisy environments.

Full HD voice will provide additional means for operators to differentiate their voice

service offerings and enable even higher quality services. The additional error

robustness of Full HD Voice will also mean that these higher quality services are

provided over more of the coverage area of an operator’s network; increasing the

satisfaction level of end-users.

The features of Full HD Voice cannot be provided using existing speech and audio

codecs and therefore a new codec is clearly needed. The AMR-WB codec was

completed by 3GPP in 2001 and since that time codec technology has developed

significantly. Codecs such as the 3GPP2 VMR Codec, ITU-T G.718 and 3GPP AMR-

WB+ have built upon the best features of AMR-WB and been shown to provide

enhanced performance in poor radio channels and better quality for music signals.

Over the same period codecs have been developed that encode more and more of

the audio spectrum. Codecs such as ITU-T G.719, and Superwideband extensions to

codecs such as ITU-T G.718 and G.729.1 have demonstrated that the additional

audio bandwidths above 7kHz do not require very much extra data to encode well.

There have also been significant developments in Mobile System infrastructure. With

the deployment of the Internet Protocol (IP) based infrastructure known as IMS, in

conjunction with LTE which is also a packet-based air interface technology, the

introduction of new codecs is also much more easily achieved than in the past. This

is because fewer changes are required within the infrastructure to support the new

codecs as the data packets can remain in-tact from one handset to the other in a call.

The transmission of voice packets over the LTE air interface is known as Voice over

LTE (VoLTE) to mirror the similarity to VoIP. VoLTE is currently being rolled out in

Korea with more general deployment later in the year and throughout 2014.

In response to these developments a study item on use-cases for Enhanced Voice

Services (EVS) over the Evolved Packet System of LTE was initiated in 3GPP; and

…the

introduction of

new codecs is

more easily

achieved than

in the

past…. …fewer

changes are

required

within the

infrastructure.

Page 8: Huawei Full HD Voice White Paper.pdf

Full HD Voice

2014-10-20 Page 8/16

in March 2010 it was completed. This study [1] led directly to the development of the

EVS Codec which will be completed in 3Q2014.

Features and Performance of the EVS Codec Looking to the Enhanced Voice Services (EVS) Codec; it represents a very

significant milestone in terms of speech/audio quality and functionality when

compared to existing conversational (low delay) codecs. For the first time a 3GPP

conversational codec will combine high quality speech and music performance

across four bandwidths; Narrowband (NB = 200 - 4000 Hz), Wideband (WB = 50 –

8000 Hz), Superwideband (SWB = 50 – 16000 Hz) and Fullband (FB = 50 – 20000

Hz). These wider audio bandwidths, combined with improved quality for music and

mixed content signals, are at the heart of what constitutes Full HD Voice.

The 3GPP Work Item Description for the EVS Codec which will be completed during

2014 lists the objectives of the new codec as follows;

1. Enhanced quality and coding efficiency for narrowband (NB) and wideband

(WB) speech services, leading to improved user experience and system efficiency.

This should also be achieved in interoperation with pre-Rel-10 systems and services

employing WB voice.

2. Enhanced quality by the introduction of super-wideband (SWB) speech,

leading to improved user experience.

3. Enhanced quality for mixed content and music in conversational

applications (for example, in-call music), leading to improved user experience for

cases when selection of dedicated 3GPP audio codecs is not possible.

4. Robustness to packet loss and delay jitter, leading to optimized behavior in

IP application environments like MTSI within the EPS.

5. Backward interoperability to the 3GPP AMR-WB codec by having some WB

EVS modes supporting the AMR-WB codec format used throughout 3GPP

conversational speech telephony service (including CS). The AMR-WB interoperable

operation modes of the EVS codec may be either identical to those in the AMR-WB

codec or different but bitstream interoperable with them.

Many of the improvements in NB and WB represent a capacity boost for mobile

systems whilst delivering the same audio quality. It is also clear that the EVS codec

Page 9: Huawei Full HD Voice White Paper.pdf

Full HD Voice

2014-10-20 Page 9/16

will provide improvements to the Wideband speech services that are at the heart of

the HD Voice Logo Terminal Requirements (WID Items 1, 3, 4 & 5).

Perhaps the main enhancement to voice services provided by EVS though will be

SWB speech (and in-call music - WID Item 2 in combination with Items 3 & 4) which

obviously goes beyond the wideband frequencies up to 7kHz and covers frequencies

up to at least 14kHz. In-fact the current frequency masks used within the EVS

standardization exercise extend beyond 15000 Hz at certain bitrates. The Fullband

audio mode of EVS operating from 16.4 kbit/s will also provide even greater

improvement. As mentioned previously, it will be these broader audio bandwidths

which will define Full HD Voice.

Table 1: Source codec bit-rates for the EVS codec (from draft TS 26.441)

Source codec bit-rate (kbit/s)

Signal bandwidths supported

Source Controlled Operation Available

5.9 (SC-VBR) NB, WB Yes (Always On)

7.2 NB, WB Yes

8 NB, WB Yes

9.6 NB, WB, SWB Yes

13.2 NB, WB, SWB Yes

13.2 Channel Aware WB, SWB Yes

16.4 NB, WB, SWB, FB Yes

24.4 NB, WB, SWB, FB Yes

32 WB, SWB, FB Yes

48 WB, SWB, FB Yes

64 WB, SWB, FB Yes

96 WB, SWB, FB Yes

128 WB, SWB, FB Yes

There have been conversational SWB and FB codecs before in both ITU-T and VoIP

applications such as Skype but the EVS Codec achieves with SWB coding from 9.6

kbit/s and FB coding from 16.4 kbit/s as shown in Table 1. The SWB coding of EVS

comes close to achieving the quality and reproducing the bandwidth of broadcast FM

radio. Fullband coding comes close to HiFi bandwidths and systems such as MP3.

See Figure 3.

the EVS codec

provides

unrivalled

quality…

particularly at

bit rates up to

24.4 kbit/s

Page 10: Huawei Full HD Voice White Paper.pdf

Full HD Voice

2014-10-20 Page 10/16

From a quality perspective, the EVS codec provides this unrivalled quality for not

only clean speech but noisy speech and music/audio across the entire bit rate range;

but particularly at bit rates up to 24.4 kbit/s. This, combined with better capacity and

excellent robustness to frame erasures, makes the EVS codec supremely adapted to

mobile applications.

Figure 3: Bandwidths of 3GPP Codecs

The EVS codec also has an example solution of a jitter buffer manager (JBM) which

evens out the packet delay variation experienced by speech data packets

transported over the IMS which is a voice over IP (VoIP) system.

The quality of the EVS codec operating in its SWB modes can be seen in Figure 4.

This figure shows the performance of the codec in clean speech (Figure 4a), clean

speech with frame losses (Figure 4b), noisy speech (Figure 4c) and music/mixed

content (Figure 4d). The tests were performed as part of the independent evaluation

of the codec in the EVS Selection Phase.

In almost all cases the EVS Codec is superior to the reference codecs used to define

the requirements – Note in Figure 4d the reference codecs although operating at the

same bit rate have significant longer delays making them unsuitable for

conversational applications. Similar performance against the references is achieved

in NB and WB.

This extra audio

bandwidth will

make a really

significant

improvement in

the user

experience of

VoLTE systems…

Page 11: Huawei Full HD Voice White Paper.pdf

Full HD Voice

2014-10-20 Page 11/16

This level of performance exceeds that of all existing 3GPP codecs and in particular

the AMR-WB codec which led to the creation of the GSMA HD Voice Logo – after all

HD Voice is synonymous with Wideband audio.

Figure 4: Quality of The EVS Codec operating in SWB (Selection test results)

Why Operators should deploy EVS As described above, the EVS Codec provides a quantum leap in terms of quality and

efficiency and results in business benefits to operators.

As previous studies of HD voice customers have shown, customers notice the

difference when they are provided with high quality voice calls [2] and this naturally

leads to longer duration calls. This extended use brings greater user satisfaction

levels and leads to less churn and/or greater ARPU.

Competition from OTT services such as Skype has been naturally limited by the

universal addressing provided by the unique address space represented by ITU-T

E.164 and yet they have flourished due to enhanced audio quality and lower cost.

EVS provides a real opportunity for mobile operators to devalue the proposition of

these OTT providers by offering a highly competitive audio quality package to both to

consumers and business/enterprise customers, in addition to the addressing

convenience.

In addition to the EVS primary modes, the codec has modes that allow it to

interoperate with the 3GPP AMR-WB codec and achieve enhanced quality and

robustness to packet loss (see Figure 5). This feature allows EVS enabled phones to

Page 12: Huawei Full HD Voice White Paper.pdf

Full HD Voice

2014-10-20 Page 12/16

communicate directly with AMR-WB VoLTE phones and 2G/3G phones and gives

operators flexibility to roll-out VoLTE handsets featuring the EVS codec as an

alternative to AMR-WB. During this initial phase of EVS deployment operators will

also benefit from enhanced performance of their AMR-WB service.

Figure 5: The EVS Codec operation in AMR-WB I/O Mode

EVS impact on VoLTE To enable EVS services in emerging LTE networks, some network nodes need to be

updated from two aspects:

1. Media handling enhanced for EVS codec: SBC, MGW

2. Signaling handling enhanced for SDP Offer/Answer: SBC, AS, MGCF

Figure 6 highlights the necessary network node changes for EVS over VoLTE.

Page 13: Huawei Full HD Voice White Paper.pdf

Full HD Voice

2014-10-20 Page 13/16

Full HD Voice proposal in GSMA The most significant enhancement to VoLTE services provided by the EVS codec will

be SWB speech (and in-call music) which obviously goes beyond the wideband

audio frequencies associated with the current HD Voice Logo. What’s more the EVS

codec will also be capable of FB speech. This extra audio bandwidth will make a

really significant improvement in the user experience of VoLTE systems and, if

marketed well, could provide a valuable selling point for LTE systems and handsets.

However for this strategy to be successful the new EVS SWB and FB services need

to be differentiated from the current HD Voice Logo service in the minds of network

operators and consumers alike.

Figure 6: Network Enhancement to Support Full HD Voice (EVS codec) in VoLTE

SBC(P-CSCF/ATCF/ATGW/E-CSCF)

S-GW/P-GW MME

MRFP

I/S-CSCF/MRFC

CS

EPC

Converged SDB

HLR/HSS/ENUM/DNS

LTE

Application Server

MGCF

TAS/IP-SM-GW/T-ADS

SIP

SI

P

SIP

Diameter

PLMN/PSTN Network

PCRF

H.248

Data card + SoftClient CPE + Fixed Phone VoLTE Smartphone

IM-MGW

IMS Core

RCS Server

EMSC

MGW

LTE2G/3G 2G/3G

Media handling enhanced for EVS codec: SBC, MGWSignaling handling enhanced for SDP O/A : SBC, AS,

MGCF

The group that is responsible for developing the HD Voice Logo Requirements within

GSMA, TSG VLR, is in the process of determining priorities for version 3.0; version

2.0 was approved in 2013. The timescales for version 3.0 are well aligned with

Release 12 completion of the EVS Codec standard and the Huawei Media Lab has

been actively working within TSG VLR to encourage the development of a new

enhancement to the HD Voice Logo to promote the deployment of SWB services with

the EVS Codec.

Page 14: Huawei Full HD Voice White Paper.pdf

Full HD Voice

2014-10-20 Page 14/16

Figure 7: Example New Logos proposed for SWB and FB variants of the HD Voice

Logo in GSMA.

The rationale for a new Logo is that the existing Logo is very well adapted to WB

speech services provided by AMR-WB but the significant improvements in user

experience enabled by EVS go far beyond this. Good progress toward this goal has

been made and there is good support for the initiative within the TSG VLR group.

The marketing and project management groups within GSMA are now considering

the proposal.

Figure 8: Example GSMA HD Voice Logo with Tag-line.

The proposal made and accepted by TSG VLR was not to employ a completely new

logo but to build on the success of the original logo by creating a slightly modified

logo as shown in Figure 7. As an alternative it has been suggested that a tag-line

beneath the current logo may also be considered as shown in Figure 8.

Page 15: Huawei Full HD Voice White Paper.pdf

Full HD Voice

2014-10-20 Page 15/16

Figure 9: Relationship between EVS, Enhancements to EVS and 3GPP 5G Radio

Standards.

Future Voice: EVS Beyond 3GPP Release 12 There are plans within the main players in 3GPP EVS to develop a stereo variant of

the EVS codec for Release 13 or perhaps Release 14. It is preferred that the stereo

extensions should be built upon the EVS mono codec operating modes in an

embedded sense.

The development timescales of EVS and the extensions to EVS in relation to the

3GPP developments towards 5G can be seen in Figure 9.

Beyond stereo, one of the next key areas which is likely to enhance the perceived

audio quality for communication will be binaural rendering and immersive audio. In

such a system the user will experience the full effect of immersion within a recreated

sound field. This requires 3-D head-tracking so that as a user moves their head the

source of all sounds within the sound field change position naturally. This technology

is already used in virtual reality gaming but represents the next logical step in the

evolution of audio and speech communication. The goal being to get ever closer to…

“Just like speaking face to face”.

…one of the

next key areas

to enhance the

perceived

audio

quality … will

be binaural

rendering and

immersive

audio.

Page 16: Huawei Full HD Voice White Paper.pdf

Full HD Voice

2014-10-20 Page 16/16

References [1] 3GPP TR 22.813 – “Study of Use Cases and requirements for enhanced

voice codecs for the EPS”, v.10.0.0, March 2010.

[2 ] http://www.gsacom.com/downloads/pdf/GSA_mobile_hd_voice_020614.php4,

June 2014.

[3] http://www.gsacom.com/news/gsa_407.php June 2014.

[4] ftp://ftp.3gpp2.org/TSGAC/Working/2014/20140318_Kyoto/TSG-AC-2014-03-

Kyoto/WG1/14_01_20_Position/AC10-20140120-010A_HD-Voice-Annex-C-

Minimum-Requirements-with-GSM-UMTS.pdf

[5] http://tools.ietf.org/html/rfc6716

[6] http://www.opus-codec.org/