Perceptual wideband speech and audio quality measurement · Perceptual wideband speech and audio...

20
Perceptual wideband speech and audio quality measurement Dr Antony Rix Psytechnics Limited

Transcript of Perceptual wideband speech and audio quality measurement · Perceptual wideband speech and audio...

Perceptual wideband speech and audio quality measurement

Dr Antony RixPsytechnics Limited

ETSI wideband workshop, 8-9 June 2004

Copyright (c) Psytechnics Limited, 2004.

2

Agenda

Background

Perceptual models• BS.1387 PEAQ• P.862 PESQ• Scope• Extension to wideband

Performance of wideband PESQ

• Results for speech• Results for audio• Next steps – discussion

AMR-WB case study

ETSI wideband workshop, 8-9 June 2004

Copyright (c) Psytechnics Limited, 2004.

3

Psytechnics background

• Solutions for measuring/monitoring speech, audio, video quality

• Extensive subjective testing background• Main products are objective quality models

(software)– Intrusive (P.862 PESQ, …) – for testing– Non-intrusive (P.VTQ/psyVoIP, P.563

SEAM/NiQA, P.562 CCI) – for monitoring• Experience in wideband in both subjective

testing and objective models (PAMS, PESQ).

ETSI wideband workshop, 8-9 June 2004

Copyright (c) Psytechnics Limited, 2004.

4

BS.1387 PEAQ

• High-quality audio model for small impairments• Comparable with BS.1116 subjective tests

• General audio model, not designed or optimised for “wideband speech”

• Mobile/IP multimedia is at edge of or outside scope• Some issues with accuracy (see BS.1387 for results).

Not currently applicable to 16kHz wideband speech

ETSI wideband workshop, 8-9 June 2004

Copyright (c) Psytechnics Limited, 2004.

5

P.862 PESQ

• Speech quality model for telephony applications• Comparable with P.800 subjective tests

• Assumes listening through narrowband IRS handset• Was not extensively tested on perceptual waveform

codecs (e.g. MP3, AAC) or with non-speech signals

Not currently applicable to 16kHz wideband speech or audio

ETSI wideband workshop, 8-9 June 2004

Copyright (c) Psytechnics Limited, 2004.

6

P.862 PESQ – scope

Re-align bad intervals

Degraded signal

System under test

Reference signal

Auditory transform

Auditory transform

Cognitive modelling

Prediction of perceived

speech quality

Time align and equalise

Disturbance processing

Input filter

Input filter

Level align

Identify bad intervals

Level align

ETSI wideband workshop, 8-9 June 2004

Copyright (c) Psytechnics Limited, 2004.

7

0 1000 2000 3000 4000−50

−40

−30

−20

−10

0

10

20

Gai

n (d

B)

PESQ input filter

P.862 PESQ – scope

Re-align bad intervals

Degraded signal

System under test

Reference signal

Auditory transform

Auditory transform

Cognitive modelling

Prediction of perceived

speech quality

Time align and equalise

Disturbance processing

Input filter

Input filter

Level align

Identify bad intervals

Level align

Scope assumes narrowband telephone handsetlistening, and speech signals

ETSI wideband workshop, 8-9 June 2004

Copyright (c) Psytechnics Limited, 2004.

8

0 1000 2000 3000 4000 5000 6000 7000 8000−50

−40

−30

−20

−10

0

10

20

Gai

n (d

B)

PESQ wideband input filter

Extending PESQ for wideband speech & audio

Re-align bad intervals

Degraded signal

System under test

Reference signal

Auditory transform

Auditory transform

Cognitive modelling

Prediction of perceived

speech quality

Time align and equalise

Disturbance processing

Input filter

Input filter

Level align

Identify bad intervals

Level align

Modification proposed in COM12-D7:

Input filter replaced by 100Hz high-pass with 9dB additional gain.No other changes (e.g. same psychoacoustic model).

ETSI wideband workshop, 8-9 June 2004

Copyright (c) Psytechnics Limited, 2004.

9

Use of WPESQ

• Select wideband mode whenever headphone listening is used

• Also operates at 8kHz sampling rate (same filter frequency response)

• Be careful about mixing narrowband and wideband PESQ – binaural headphone listening is more sensitive, so the results are different

• Reference signal should normally be full bandwidth

ETSI wideband workshop, 8-9 June 2004

Copyright (c) Psytechnics Limited, 2004.

10

WPESQ results – speech

1 1.5 2 2.5 3 3.5 4 4.5 51

1.5

2

2.5

3

3.5

4

4.5

5P .905 P ES Q vs . s ubjective quality, exp1

ρ=95.2%

S ubjective condition MOS

Mapped condition ave. WPESQ

Wideband codecNarrowband codecWideband MNRUNarrowband MNRU

1 1.5 2 2.5 3 3.5 4 4.5 51

1.5

2

2.5

3

3.5

4

4.5

5P .905 P ES Q vs . s ubjective quality, exp2a

ρ=98.1%

S ubjective condition MOS

Mapped condition ave. WPESQ

Codec A, error-freeCodec A, packet los sCodec B, error-freeCodec B, packet los sNarrowband MNRU

Eurescom P905 exp1Multiple audio bandwidths

Eurescom P905 exp2a8kHz conditions only

ETSI wideband workshop, 8-9 June 2004

Copyright (c) Psytechnics Limited, 2004.

11

WPESQ results – speech

1 1.5 2 2.5 3 3.5 4 4.5 51

1.5

2

2.5

3

3.5

4

4.5

5P .905 P ES Q vs . s ubjective quality, exp2b

ρ=97.7%

S ubjective condition MOS

Mapped condition ave. WPESQ

Codec C, error-freeCodec C, packet los sCodec D, error-freeCodec D, packet los sWideband MNRU

1 2 3 4 51

2

3

4

5

S ubjective condition MOS

Mapped condition ave. WPESQ

All conditions

ρ=94.9%

Eurescom P905 exp2b16kHz conditions only

BT AES experimentMultiple audio bandwidths

ETSI wideband workshop, 8-9 June 2004

Copyright (c) Psytechnics Limited, 2004.

12

WPESQ results – NTT

• Morioka & Takahashi have published an independent evaluation of wideband PESQ– Wideband results: 91.2% correlation– Main issue is slight offset between G.722.1 and other

conditions – will be investigated further– Problem with analysis – used narrow-band PESQ for 8kHz

(wideband headphone) conditions although WPESQ should be used for this.

– This caused offset between 8kHz and 16kHz conditions• Wideband PESQ is more critical than narrowband

– 8kHz and overall results not included here.

ETSI wideband workshop, 8-9 June 2004

Copyright (c) Psytechnics Limited, 2004.

13

WPESQ results – audio

• New subjective test by Psytechnics using:– 8 audio signals representative of PC and mobile multimedia

(advertisement, movies, news documentary, pop music, speech, sports), of duration 8-12sec

– 20 conditions – Range of codecs (AAC, AMR, G.711, G.722, and direct)– Range of bandwidths (8, 11.025, 12, 16kHz sample rates)– Presented to subjects and model at 16kHz, mono– Wideband binaural free field equalised headphones at 76dB

SPL– Bit-rates from 4.75-256kbit/s

ETSI wideband workshop, 8-9 June 2004

Copyright (c) Psytechnics Limited, 2004.

14

WPESQ results – audio

ETSI wideband workshop, 8-9 June 2004

Copyright (c) Psytechnics Limited, 2004.

15

WPESQ results – overall

95.4Overall mean

95.2Psytechnics multimedia (16kHz mono audio)

91.2NTT wideband results (speech)94.9AES107 (speech)97.7P905 exp 2b (speech)98.1P905 exp 2a (speech)95.2P905 exp 1 (speech)R %Test

ETSI wideband workshop, 8-9 June 2004

Copyright (c) Psytechnics Limited, 2004.

16

WPESQ discussion

• WPESQ shows excellent correlation with MOS, comparing favourably with narrowband PESQ.

• Explore issues identified in P905 exp1 and NTT test:– Bandwidth and context effect– G.722.1 codec

• Can be used for both wideband speech and 16kHz mono audio – e.g. mobile multimedia applications

• Mapping between WPESQ and subjective MOS is required (like P.862.1 MOS-LQO).

ETSI wideband workshop, 8-9 June 2004

Copyright (c) Psytechnics Limited, 2004.

17

Case study – Validation of AMR-WB (G.722.2) floating-point codec

• Fixed-point AMR-WB codec had been approved; needed to validate non-bit-exact floating-point version

• Used WPESQ to compare speech quality of codecs over 1280 test cases.Identified bug in fixed-point codec mode-switchingShowed bug was corrected in floating-point and modified fixed-point codecsFound no significant difference in quality between (corrected) fixed-point and floating-point codecs.Took just 2 days of processing and analysis.

ETSI wideband workshop, 8-9 June 2004

Copyright (c) Psytechnics Limited, 2004.

18

Conclusions

• BS.1387 PEAQ and P.862 PESQ not originally designed for wideband speech quality measurement

• By changing PESQ to use an appropriate input filter, WPESQ is able to make accurate quality measurements of wideband speech and 16kHz audio

• WPESQ allows interesting new applications in wideband speech and 16kHz audio quality testing, such as codec development, multimedia quality

• Some issues with subjective tests remain to be explored and further testing is desirable.

ETSI wideband workshop, 8-9 June 2004

Copyright (c) Psytechnics Limited, 2004.

19

ReferencesITU-T P.800. Methods for subjective determination of transmission quality. Aug 1996.Rix, A. W. and Hollier, M. P. Perceptual speech quality assessment from narrowband

telephony to wideband audio. 107th AES Convention, New York, preprint 5018, September 1999.

ITU-R BS.1387. Method for objective measurements of perceived audio quality. January 1999.

ITU-T P.862. Perceptual evaluation of speech quality (PESQ): An objective method for end-to-end speech quality assessment of narrow-band telephone networks and speech codecs. Feb 2001.

Eurescom P905. AQUAVIT - Assessment of Quality for Audio-Visual signals over Internet and UMTS

Rix, A. W. et al. Proposed modification to draft P.862 to allow PESQ to be used for quality assessment of wideband speech. ITU-T COM12-D007, Feb 2001.

Morioka, C. and Takahashi, A. Performance evaluation of the wideband PESQ algorithm. ITU-T COM12-D187, April 2004.

Barrett, P. A. and Rix, A. W. Verification of floating-point implementation of AMR-WB using Wideband-PESQ. 3GPP Tdoc S4 (02)0049r1 and S4 (02)0124, Feb 2002.

Dr Antony RixPsytechnics Limited

[email protected]