A Novel Frequency Domain BWE with Relaxed Synchronization and Associated BWE Switching Lei Miao,...

15

Click here to load reader

description

3GPP EVS codec overview  3GPP Enhanced Voice Services (EVS) codec standardized in Sep  Significantly improves user experience  EVS codec encodes the signal based on the signal content:  Time-Domain LP coding techniques are used: ACELP and GSC GSC (Generic audio Signal Coding) is used to improve the quality of music/mixed segments in LP domain.  Frequency-Domain coding techniques

Transcript of A Novel Frequency Domain BWE with Relaxed Synchronization and Associated BWE Switching Lei Miao,...

Page 1: A Novel Frequency Domain BWE with Relaxed Synchronization and Associated BWE Switching Lei Miao, Zexin Liu, Xingtao Zhang, Chen Hu, Jon Gibbs Huawei Technologies.

A Novel Frequency Domain BWE with Relaxed Synchronization

and Associated BWE Switching

Lei Miao, Zexin Liu, Xingtao Zhang, Chen Hu,

Jon GibbsHuawei Technologies Co. Ltd

Beijing, China

Kihyun Choo, Eunmi OhSamsung Electronics Co., Ltd.,

Seoul, Korea

Václav Eksler VoiceAge Corp.,

Montreal, QC, Canada

Page 2: A Novel Frequency Domain BWE with Relaxed Synchronization and Associated BWE Switching Lei Miao, Zexin Liu, Xingtao Zhang, Chen Hu, Jon Gibbs Huawei Technologies.

Agenda 3GPP EVS codec overview

Prior-art review of BWE techniques

Multi-mode FD BWE

Multi-Mode FD BWE with Relaxed Synchronization

BWE Switching Mechanism

Quality Evaluation

Conclusions

Page 3: A Novel Frequency Domain BWE with Relaxed Synchronization and Associated BWE Switching Lei Miao, Zexin Liu, Xingtao Zhang, Chen Hu, Jon Gibbs Huawei Technologies.

3GPP EVS codec overview

3GPP Enhanced Voice Services (EVS) codec standardized in Sep. 2014. Significantly improves user experience

EVS codec encodes the signal based on the signal content: Time-Domain LP coding techniques are used: ACELP and GSC

GSC (Generic audio Signal Coding) is used to improve the quality of music/mixed segments in LP domain.

Frequency-Domain coding techniques

Page 4: A Novel Frequency Domain BWE with Relaxed Synchronization and Associated BWE Switching Lei Miao, Zexin Liu, Xingtao Zhang, Chen Hu, Jon Gibbs Huawei Technologies.

Prior-art Review of BWE techniques

BWE exploits the intrinsic correlation between the low and high frequency parts of a signal’s spectrum in order to reconstruct the high frequency part. Spectral Band Replication (SBR) in MPEG-4 HE‑AAC. A multi-mode bandwidth extension scheme in Recommendations

ITU-T G.711.1 Annex D and G.722 Annex B. SWB is a key feature of EVS as a new 3GPP codec, important to

extend the bandwidth from WB to SWB/FB. Time domain (TD) or Frequency domain (FD) BWE for different types

of input signals. The paper focuses on FD-BWE on top of either ACELP or GSC. This switched bandwidth extension (BWE) approach improves the

EVS codec LP based coding efficiency.

Page 5: A Novel Frequency Domain BWE with Relaxed Synchronization and Associated BWE Switching Lei Miao, Zexin Liu, Xingtao Zhang, Chen Hu, Jon Gibbs Huawei Technologies.

Multi-mode FD BWE Concept standardized for the first time in Recommendations ITU-

T G.711.1 Annex D and G.722 Annex B. A transient detector identifies rapid variations of the high band

signal over time. 4 classes: TRANSIENT (TS), HARMONIC (HM), NORMAL (NM) or

NOISE (NS). A combination of adaptive spectral envelope and time

envelope coding, derived from the high band signal. TRANSIENT frames: four spectral envelopes and four time

envelopes. Non-TRANSIENT frames: fourteen spectral envelopes and no time

envelope. The high frequency band excitation is generated by either

normalizing the selected region of the low frequency band with an adaptive normalization length or generated by random noise.

Page 6: A Novel Frequency Domain BWE with Relaxed Synchronization and Associated BWE Switching Lei Miao, Zexin Liu, Xingtao Zhang, Chen Hu, Jon Gibbs Huawei Technologies.

The design constraints imposed upon the EVS codec specify that the total algorithmic delay of the codec must not exceed 32 ms.

Transform used by FD-BWE: an Asymmetric Low Delay Optimized (ALDO) window with a time support of 40 ms while the non-zero window length is 28.75 ms.

Still insufficient delay allowance remaining for the FD BWE to achieve the overall 32 ms delay requirement.

Multi-mode BWE in ITU-T G.711.1 Annex D and G.722 Annex B as a baseline. Then the new low delay FD-BWE approach relies on a relaxed synchronization scheme, based on the multi-mode BWE in ITU-T codecs.

FD BWE constraints in EVS

Page 7: A Novel Frequency Domain BWE with Relaxed Synchronization and Associated BWE Switching Lei Miao, Zexin Liu, Xingtao Zhang, Chen Hu, Jon Gibbs Huawei Technologies.

The relaxed synchronization of multi-mode FD BWE is achieved by utilizing the time difference between the high frequency band excitation and the high frequency band envelope.

Assume D1 is the delay of the low frequency band coding, D2 is the delay of high frequency band coding where D2 is introduced by the windowing prior to the MDCT.

The total algorithmic delay of the prior bandwidth extension algorithms is (D1+ D2). By the proposed scheme, the delay may be adaptively reduced to the range [max(D1, D2), (D1+D2)] with minimal impact on the perceptual quality. When the spectrum is relatively stable, the phase of the high frequency

band signal is of relatively minor perceptual significance when compared to its energy.

By reducing the delay, the time alignment between the energy envelope of the low and high frequency band signals is maintained while a short time misalignment (or a relaxation) of the synchronization between the high frequency band excitation and envelopes is permitted.

Relaxed synchronization

Page 8: A Novel Frequency Domain BWE with Relaxed Synchronization and Associated BWE Switching Lei Miao, Zexin Liu, Xingtao Zhang, Chen Hu, Jon Gibbs Huawei Technologies.

An asymmetric low delay optimized (ALDO) window is used for FD BWE.

Assumes that the target overall delay is Dt, which is in the range [max(D1, D2), (D1+D2)]. The input time domain signal may be delayed by (Dt−D2).

frame m+1 frame m-1 frame m

Dt – D2

Encoder

frame m

frame m

frame m

Delayed low frequency band signal frame m

High frequency band excitation

High frequency band spectral

envelope frame m Dt-D2

Dt

Decoder

Input signal

D1+D2

Decoded high frequency band signal

Dt

frame m Decoded low

frequency band signalD1

The time misalignment between the high frequency band excitation and the high frequency band envelope is {(D1+D2)−Dt}. Consequently a lower delay than (D1+D2) is achieved by the proposed FD BWE.

Time alignment

Page 9: A Novel Frequency Domain BWE with Relaxed Synchronization and Associated BWE Switching Lei Miao, Zexin Liu, Xingtao Zhang, Chen Hu, Jon Gibbs Huawei Technologies.

The time envelope is calculated on top of the delayed high frequency band time domain signal:

It is then adjusted by an attenuation factor R, which represents the energy attenuation of the low frequency band due to the LP based low band coding:

Finally the time envelope is adjusted:

Spectral envelopes: multi-stage split VQ. The envelopes at even positions are quantized by Split VQs. The prediction errors at odd positions are calculated withinterpolation and quantized by another stage of the Split VQ.

.3,,0,))80((801)( 279

0

jnjsjtn hbrms

,))(())((1

021

02

LFLF N

n oriN

n syn nsnsR

,3,,0,,otherwise)(1ifelse)(

,5.0if)(5.1)(

j

jtRRjt

RjtRjt

rms

rms

rms

rms

Transient frames

Page 10: A Novel Frequency Domain BWE with Relaxed Synchronization and Associated BWE Switching Lei Miao, Zexin Liu, Xingtao Zhang, Chen Hu, Jon Gibbs Huawei Technologies.

Non-Transient frames Energy control in each sub-band to prevent unpleasant distortion. The distortion may occur due to the un-matched characteristics of the original

and the generated spectra. Energy control adjusts the energies depending on the comparison of the

tonalities of the two spectra to avoid the distortion. Adaptive normalization length to generate the high frequency band

excitation The more harmonic the high frequency band is, the longer normalization length. Depends on the number of the sub-bands of the low frequency band whose

peak to average ratio is larger than a threshold, nh,

A pre-echo reduction is performed to improve the performance of fricatives for Non-Transient frames.

,if24,232max

,if5.08,if25.04

HMmodenNSorNMmoden

TSmodenL

h

h

h

Other techniques

Page 11: A Novel Frequency Domain BWE with Relaxed Synchronization and Associated BWE Switching Lei Miao, Zexin Liu, Xingtao Zhang, Chen Hu, Jon Gibbs Huawei Technologies.

The proposed relaxed synchronization FD BWE scheme is applied in the EVS codec for WB at 13.2 kbps and for SWB at 13.2 kbps and 32 kbps.

The BWE bit budget for SWB is 31 bits, while it is 6 bits for WB. For WB coding, there are fewer spectral envelopes and no time

envelopes encoded since the WB FD BWE covers only frequencies from 6~8 kHz.

The delay parameters correspond in the EVS codec to Dt = 12 ms (overall delay constraint minus the frame length), D1 = 9.6875 ms (encoder look-ahead plus encoder resampling) D2 = 8.75 ms (overlap length).

This results in the time misalignment between the high frequency band excitation and the high frequency band envelope being 6.4375 ms.

EVS implementation details

Page 12: A Novel Frequency Domain BWE with Relaxed Synchronization and Associated BWE Switching Lei Miao, Zexin Liu, Xingtao Zhang, Chen Hu, Jon Gibbs Huawei Technologies.

BWE Switching Mechanism

In general, TD BWE on top of ACELP performs well when encoding active speech segments and FD BWE on top of GSC performs well when encoding inactive and mixed/music segments. However, some mixed/music segments are better coded with ACELP coding and FD BWE.

If the input signal is classified as a music signal, or the low frequency band signal is classified as inactive, multi-mode FD BWE is used irrespective of whether the low frequency band is coded with ACELP or GSC. Otherwise, if the input signal is classified as a speech signal, TD BWE is used no matter how the low frequency band has been coded.

When the high frequency band signal is judged to contain inactive or mixed/music content signals then FD BWE is used as the high band coding technology.

ACELP coding

TD-BWE

bandwidth

GSC coding

0 Technology

FD-BWE FD-BWE

bandwidth

0 Technology

bandwidth

0 Technology

TD-BWE

FD-BWE TD-BWE TD-BWEFD-BWETD-BWE

GSC coding

ACELP coding

GSC coding

GSC coding

ACELP coding

ACELP coding

Page 13: A Novel Frequency Domain BWE with Relaxed Synchronization and Associated BWE Switching Lei Miao, Zexin Liu, Xingtao Zhang, Chen Hu, Jon Gibbs Huawei Technologies.

Quality Evaluation MUSHRA, 95% confidence intervals, 16 expert listeners, 16

mixed content items and 16 music items. Two variants were evaluated:

Low delay (LD) FD BWE configured to have overall delay Dt = 12 ms.

High delay (HD) FD BWE configured to have overall delay of Dt = (D1+D2) = 18.4375 ms.

EVS SWB at 13.2 kbps.

Low delay FD BWE is statistically equivalent to the high delay FD BWE.

Page 14: A Novel Frequency Domain BWE with Relaxed Synchronization and Associated BWE Switching Lei Miao, Zexin Liu, Xingtao Zhang, Chen Hu, Jon Gibbs Huawei Technologies.

Conclusions A novel multi-mode FD BWE scheme with relaxed

synchronization optimized for inactive and mixed/music content signals is presented.

It forms a part of the LP based coding of the 3GPP EVS codec.

High subjective quality and low algorithmic delay are achieved by relaxing the time alignment constraints between the high frequency band excitation and its envelope.

A switching mechanism between two different BWE technologies shows a performance advantage.

Page 15: A Novel Frequency Domain BWE with Relaxed Synchronization and Associated BWE Switching Lei Miao, Zexin Liu, Xingtao Zhang, Chen Hu, Jon Gibbs Huawei Technologies.

Thank you!