Subjective Sound Quality Assessment of Mobile Phones for Production SupportThorsten Drascher, Martin SchultesWorkshop on Wideband Speech Quality in Terminals and Networks: Assessment and Prediction, 8th and 9th June 2004 - Mainz, Germany
Subjective Audio Quality Assessment, June 2004 Page 2 © Siemens, 2004
Introduction
The goal of the tests presented in this talk is to ensure customer acceptance of audio quality by statistically approved data.
Customers rate the sum of
Echo cancellation, noise reduction, automatic gain control, …
Contradicting to ancillary conditions of:
Short time (No waste of production capacities)
Low cost
Only limited correlation of objective measurements and subjective sound perception.
Execute subjective audio quality tests before the release for unrestricted serial production
Former results often not reliable due to friendly users and too few tests to guarantee statistical approval
Introduction
Presentation Outline
Test Design Laboratory or in-
situ tests? Laboratory test
design Conversational
task Statistical
Reliability
First Test Presentation Overall Quality Most Annoying
Properties
Discussion & Outlook
Subjective Audio Quality Assessment, June 2004 Page 3 © Siemens, 2004
Presentation Outline
Test Design
Laboratory or in-situ tests?
Laboratory test design
Conversational task
Statistical reliability
First Test Presentation
Overall Quality
Most Annoying Properties
Discussion & Outlook
Introduction
Presentation Outline
Test Design Laboratory or in-
situ tests? Laboratory test
design Conversational
task Statistical
Reliability
First Test Presentation Overall Quality Most Annoying
Properties
Discussion & Outlook
Subjective Audio Quality Assessment, June 2004 Page 4 © Siemens, 2004
Test Design
Typical conversation situations for a mobile phone
Single Talk
Double talk
Two different test subject groups
Naive users
Expert Users
Different recommended test methods
Absolute category rating
Comparative category rating
Degraduating category rating
Threshold Method
Quantal-response detectability tests
Introduction
Presentation Outline
Test Design Laboratory or in-
situ tests? Laboratory test
design Conversational
task Statistical
Reliability
First Test Presentation Overall Quality Most Annoying
Properties
Discussion & Outlook
Subjective Audio Quality Assessment, June 2004 Page 5 © Siemens, 2004
Test Design (ctd.)
Naive user tests will be carried out as single talk and double talk.
Naive user testsAbsolute category rating of
overall quality and collecting most
annoying properties.
Evaluation
Trained user testsComparative category rating
of different parameter setson most annoying properties
(in parallel furtherparameter alteration)
Satisfyingresults?
Introduction
Presentation Outline
Test Design Laboratory or in-
situ tests? Laboratory test
design Conversational
task Statistical
Reliability
First Test Presentation Overall Quality Most Annoying
Properties
Discussion & Outlook
UnrestrictedSerial
production
yes
no
Subjective Audio Quality Assessment, June 2004 Page 6 © Siemens, 2004
Laboratory or in-situ tests?
in-situ
+ Nothing is more real than reality
+ More interesting for test persons
- Large effort
- Difficult controlling
- Time intensive
Laboratory
+ Good controlling
+ Small effort
+ Reproducible conditions
+ Easy control of environmental conditions
- Some effects have to be neglected
- Psychological influence of laboratory environment on test results
Laboratory tests are much more cost-effective than in-situ tests.
But: How close can reality be rebuilt in laboratories?
There should be at least one comparison between laboratory and in-situ.
Introduction
Presentation Outline
Test Design Laboratory or in-
situ tests? Laboratory test
design Conversational
task Statistical
Reliability
First Test Presentation Overall Quality Most Annoying
Properties
Discussion & Outlook
Subjective Audio Quality Assessment, June 2004 Page 7 © Siemens, 2004
Laboratory test design
Terminal A: fixed network, hand held, specified,silent office environment(e.g. according toITU-T P.800)
Reproducible playback of previously recorded environmental noises as diffuse sound field
Terminal B: mobile or carkit
under test
Car Noise
Babble NoiseSilence
Single and double talk tests are carried out using different noise levels
Roles within the tests are interchanged
Rating interview with both test subjects
Introduction
Presentation Outline
Test Design Laboratory or in-
situ tests? Laboratory test
design Conversational
task Statistical
Reliability
First Test Presentation Overall Quality Most Annoying
Properties
Discussion & Outlook
Subjective Audio Quality Assessment, June 2004 Page 8 © Siemens, 2004
Conversational Tasks
Properties of short conversation test scenarios (SCTs)
Typical conversation tasks
Ordering pizza
Booking a flight
Conversation lasts about 2 ½ min
Extended to about 4 min by following interview
SCTs are judged as natural by test subjects
Greeting
Formal structure
caller called person
Enquiry
Question
Precision
Offer
Order
InformationTreating of Order
Discussion of open question
Farewell
[S. Möller, 2000]
Introduction
Presentation Outline
Test Design Laboratory or in-
situ tests? Laboratory test
design Conversational
task Statistical
Reliability
First Test Presentation Overall Quality Most Annoying
Properties
Discussion & Outlook
Subjective Audio Quality Assessment, June 2004 Page 9 © Siemens, 2004
Statistical Reliability
Moments of interest are the mean and the error of the mean
Error of the mean is a function of the standard deviation
Worst case approximation:
Error of the mean is maximised if supreme and inferior ratings are given with relative frequency of 50%
An error of the mean accounting less than 10 % of the rating interval width is guaranteed after 30 tests
30 tests of 4 min each, resulting in an overall test duration of 2 hours
Tests with 3 different background noises at 3 different levels and in silent environment can be carried out in 40 h (1 week) over 2 different networks
Introduction
Presentation Outline
Test Design Laboratory or in-
situ tests? Laboratory test
design Conversational
task Statistical
Reliability
First Test Presentation Overall Quality Most Annoying
Properties
Discussion & Outlook
Subjective Audio Quality Assessment, June 2004 Page 10 © Siemens, 2004
First Test Presentation
Internal fair at the beginning of May
Non representative, just “testing the test“
Background: babble noise ~70dB(A)
Terminal under test:
Known to be too silent (not known by test subjects and experimenter)
Development concluded
interview only for the mobile terminal user (19 subjects)
Naive user tests with two questions
What is your opinion of the overall quality of the connection you have just been using?
What were the most annoying properties of the connection you have just been using?
Results given as
Numbers on a scale from 0 to 120
Predefined answers without technical terms (adding new ones was possible)
Introduction
Presentation Outline
Test Design Laboratory or in-
situ tests? Laboratory test
design Conversational
task Statistical
Reliability
First Test Presentation Overall Quality Most Annoying
Properties
Discussion & Outlook
Subjective Audio Quality Assessment, June 2004 Page 11 © Siemens, 2004
Overall Quality
Numbers invisible for test subjects
Average overall rating: 74 ± 4
(62 ± 3)% of rating interval width
Start value 60 with highest relative frequency
To compare the internal scale with standard MOS ratings, a normalisation is required
Introduction
Presentation Outline
Test Design Laboratory or in-
situ tests? Laboratory test
design Conversational
task Statistical
Reliability
First Test Presentation Overall Quality Most Annoying
Properties
Discussion & Outlook
Bad Poor Fair Good Excellent0 120
TS
Rating
1 38
2 103
3 95
4 60
5 60
6 82
7 81
8 60
9 67
10 72
11 90
12 74
13 103
14 73
15 93
16 38
17 60
18 82
19 78
Subjective Audio Quality Assessment, June 2004 Page 12 © Siemens, 2004
Overall Quality
MOSc: MOS rating intervals with scale labels in the center
Extreme value 5 rated 5 times (>25 %)
Extreme value 1 never assigned
Average overall rating: 3.8 ± 0.2
(70 ± 5)% of rating interval width
Introduction
Presentation Outline
Test Design Laboratory or in-
situ tests? Laboratory test
design Conversational
task Statistical
Reliability
First Test Presentation Overall Quality Most Annoying
Properties
Discussion & Outlook
Bad Poor Fair Good Excellent0 120
TS
Rating
MO
Sc
1 38 2
2 103 5
3 95 5
4 60 3
5 60 3
6 82 4
7 81 4
8 60 3
9 67 3
10 72 4
11 90 5
12 74 4
13 103 5
14 73 4
15 93 5
16 38 2
17 60 3
18 82 4
19 78 4
1 2 3 4 5
Subjective Audio Quality Assessment, June 2004 Page 13 © Siemens, 2004
Overall Quality
MOSl: MOS rating intervals with scale labels at the lower end
Complete range is used
Extreme value 5 rated twice
Average overall rating: 3.3 ± 0.2
(58 ± 5)% of rating interval width
Introduction
Presentation Outline
Test Design Laboratory or in-
situ tests? Laboratory test
design Conversational
task Statistical
Reliability
First Test Presentation Overall Quality Most Annoying
Properties
Discussion & Outlook
Bad Poor Fair Good Excellent0 120
TS
Rating
MO
Sl
1 38 1
2 103 5
3 95 4
4 60 3
5 60 3
6 82 4
7 81 4
8 60 3
9 67 3
10 72 3
11 90 4
12 74 3
13 103 5
14 73 3
15 93 4
16 38 1
17 60 3
18 82 4
19 78 3
1 2 3 4 5
Subjective Audio Quality Assessment, June 2004 Page 14 © Siemens, 2004
Most Annoying Properties
My partner‘s voice was too silent
Loud noise during the call
I heard my own voice as echo
My partner‘s voice was reverberant
My partner‘s voice sounded robotic
I heard artificial sounds
*My partner‘s voice sounded modulated
*My partners voice was too deep
I heard my partner‘s voice as echo
My partner‘s voice was too loud
*) Properties added during test
About 50% of test subjects regarded the partner‘s voice as too silent (known before, but not by the subjects and the experimenter)
7 of 8 test subjects regarded the environmental noise as annoying property
Introduction
Presentation Outline
Test Design Laboratory or in-
situ tests? Laboratory test
design Conversational
task Statistical
Reliability
First Test Presentation Overall Quality Most Annoying
Properties
Discussion & Outlook
1
1
1
1
1
1
8
9
Subjective Audio Quality Assessment, June 2004 Page 15 © Siemens, 2004
Discussion & Outlook
A short-time intensive subjective test method and a first test were presented.
After ratings of 19 test subjects
the error of the mean overall quality was assessed to about 3 % of rating interval width
statistical approval of being too silent
Questions and predefined answers have to be chosen very carefully
Scale rating normalisation to MOS is a non trivial problem
Next steps:
Comparison of laboratory and in-situ tests
Tests of terminals and car kits currently in development state.
Introduction
Presentation Outline
Test Design Laboratory or in-
situ tests? Laboratory test
design Conversational
task Statistical
Reliability
First Test Presentation Overall Quality Most Annoying
Properties
Discussion & Outlook
Subjective Audio Quality Assessment, June 2004 Page 16 © Siemens, 2004
Top Related