Impact of the Content on Subjective Evaluation of Audiovisual Quality:
What dimensions influence our perception?
J.Lassalle, L.Gros (Orange Labs) , G.Coppin (Télécom Bretagne) & T. Morineau (UBS)
Audiovisual (AV) perceived quality is the result of the interaction between sound and image qualities:
what are the impacts of A and/or V degradations on the perceived AVQ and consequently on overall QoE ?
Several studies have shown a test material dependency on perceived quality:
Consequently, methods for evaluating AV quality should consider the influence of contents (regarding the influence of audio, video and the relationship between audio and video) to avoid uncontrolled effects
Today, only the ITU P.911 method is dedicated to AV quality evaluation in a passive viewing context:
It suggests a classification of test sequences but only on the basis of individual characteristics of the content A and V separately
Thus, it does not provide recommendations on the characterization of the AV event (i.e. considering semantic link between audio and video)
Current Context
VQEG june 2012
Corpus:
Dance Theatre Opera Sport Documentary
Expert characterization: extraction of 9 low-levels descriptors
Gwinner & Lalaurette, 2004 (MPEG7); Amiar, 1995
5 Semantic Descriptors: audio-visual relationship/diegesis (sound in, sound off, off-screen sound), sound expression (speech, music, sound effects), number of characters (few, some, many), content dynamic (low, moderate, high) and dominant modality (A, V, AV)
4 Technical Descriptors: brightness (low, moderate, high), color temperature (hot, moderate, cool), dynamic camera (low, moderate, high), and level of details (low, moderate, high).
=> The entire corpus has been characterized by considering these descriptors
Experiment 1Expert Characterization
VQEG june 2012
Descriptors Modes Dim. LevelUsed Expert
Brightness low, moderate, high Tech Low X
Color temperature hot, moderate, cool Tech Low X
Details low, moderate, high Tech Low X
Camera dynamic low, moderate, high Tech Low X
AV diegesisSound-in/off, off screen
Sem. Low X
Sound typespeech, music, sound effects
Sem. Low X
Content dynamic low, moderate, high Sem. Low X
Dominant modality
A, V or AVSem. Low X
Nb of characters few, some, high Sem. Low X
Comprehension low, moderate, high Sem. High
Quantity of information
low, moderate, highSem. High
Interest low, moderate, high Hed. HighValence 9-points scale Hed. HighArousal 9-points scale Hed. high
VQEG june 2012
Experiment 1Used Description Material
28 non expert participants
Corpus:
20 sequences (8-10s), extracted from the 5 contents characterized by the expert
Task:
Quality Evaluation (P.911 9-point scale) :
• AVQ, VQ and AQ
Evaluation of 4 expert low-level descriptors (unchanging nature for the other criteria) :
• dominant modality, color, brightness and content dynamic
Evaluation of the five additional high-level descriptors:
• 3 Hedonic descriptors: interest, valence and arousal (Self-Assessment Manikin-Scales)
• 2 Semantic descriptors: comprehension and quantity of information
VQEG june 2012
Experiment 1Protocol
1. Verify the relevance of descriptors
all semantic, technical and hedonic descriptors and Quality scores significantly depend on the sequence and more generally on the content
2. Obtain a corpus of AV sequences representing these different descriptors
VQEG june 2012
Experiment 1Results
Impact of “Sequence” condition on MOS for AV, A and V qualities
A V V A
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
S E QUE NC E S
1
2
3
4
5
6
7
8
9
MO
S
VQEG june 2012
Experiment 1Results
35 non expert Participants
Corpus:
200 sequences: 20 sequences (from experiment 1) * 10 degradation conditions
• AV asynchrony (1500 ms audio delay),
• A bitrate variation (at 64Kbps/8Khz),
• V bitrate variations (between 93 to1600 Kbps),
• Freeze of frames packets
• A packet loss (10%)
• A and V degradation combined
Task:
AV quality assessment (as recommended by P.911 – only overall AV quality)
SEOVQ software was used to perform the test and collect the judgments of participants
VQEG june 2012
Experiment 2Content and Degradations: which interactions ?
1. Study the potential impacts of low-level descriptors on the overall perceived AV quality, in interaction with various degradations
discomfort > for “verbal” sequences compared to "nonverbal" sequences (sound / music), for asynchrony degradation
S peech (S 9) E ffects sound (S 10) M usic (S 13) S peech (S 14)
Re
f.
Ab
V
Ap
L
Ab
V
AfrZ A
synC
Vb
V_
Ap
L
Vb
V_
Ab
V
VfrZ
_A
pL
VfrZ
_A
bV
1
2
3
4
5
6
7
8
9
MO
Sa
v
Effect of Sound Type
VQEG june 2012
Experiment 2Results
1. Study the potential impacts of low-level descriptors on the overall perceived AV quality, in interaction with various degradations
discomfort > for “verbal” sequences compared to "nonverbal" sequences (sound / music), for asynchrony degradation
Speech (S9) Effects sound (S10) M usic (S13) Speech (S14)
Ref.
AbV
ApL
VbV
VfrZ A
synC
VbV
_ApL
VbV
_AbV
VfrZ
_ApL
VfrZ
_AbV
1
2
3
4
5
6
7
8
9
MO
Sav
S14 tagged “speech” and “sound-off” : a Diegesis effect
Effect of Sound Type
VQEG june 2012
Experiment 2Results
2. Obtain a catalog of interactions between degradation and some semantic and/or technical descriptors
Effect of dynamic, modality and interest on AVQ scores
Low M oderate High
D ynam ic
3,9
4,0
4,1
4,2
4,3
4,4
4,5
4,6
4,7
MO
Sa
v
V A A V
M odality
3 ,9
4,0
4,1
4,2
4,3
4,4
4,5
4,6
4,7
MO
Sa
v
VQEG june 2012
Experiment 2Results
2. Obtain a catalog of interactions between degradation and some semantic and/or technical descriptors
Effect of dynamic, modality and interest on AVQ scores
Low M oderate High
Interest
3 ,9
4,0
4,1
4,2
4,3
4,4
4,5
4,6
4,7
MO
Sa
v
VQEG june 2012
Experiment 2Results
2. Obtain a catalog of interactions between degradation and some semantic and/or technical descriptors
Interactions between dynamic, modality and interest and the factor “Degradation”
Illustration of modality and “Degradation“ interaction
A V AV
Ref.
AbV
ApL
VbV
VfrZ A
synC
VbV
_ApL
VbV
_AbV
VfrZ
_ApL
VfrZ
_AbV
1
2
3
4
5
6
7
8
9
MO
Sav
VQEG june 2012
Experiment 2Results
It would be relevant to consider:
1. a complete content characterization which takes dominant modality, dynamic, sound type (with sounds-effects class) and diegesis (sound-in/off/off screen) into account
2. a multi criteria evaluation in addition to the overall AVQ evaluation with:
a separate assessment of Audio and Video quality (as recommended in P.920)
a specific question on asynchrony to allow participants to express their discomfort on this kind of degradation
VQEG june 2012
Conclusions and normalization perspectives
Test material dependency on perceived quality
D. H. Hands, “A basic multimedia quality model,” IEEE Trans. Multimedia, vol.6(6), pp.806-816, December 2004
N. F. Dixon, and L. Spitz, “The diction of auditory visual desynchrony,” Perception, vol. 9, pp. 719–721, 1980
M. P. Hollier, A. N. Rimell, D.S. Hand, and R.M. Voelcker, “Multi-modal perception,” J. BT Technol., vol. 17, pp. 35–46 January 1999
A/V interaction
J. G. Beerends, and F. E. De Caluwe, “The influence of video quality on perceived audio quality and vice versa,” J. Audio Eng. Soc., vol. 47(5), pp. 355-362, May 1999
ITU-T Contribution COM 12-19-E, Relation between audio, video and audiovisual qualitys, KPN, The Netherlands, December 1997
VQEG june 2012
References
Category Description
A One person, mainly head and shoulders, limited detail and motion
B One person with graphics and/or more detail
C More than one person
D Graphics with pointing
EHigh object and/or camera motion beyond the range usually found in video teleconferencing
Table A.1/P.911 – Video content categories
VQEG june 2012
Current Context
Category Description
I Speech/one speaker
II Speech/Multiple speakers
III Speech + background music
IV Music/single instrument
V Music/multiple instruments
Table A.2/P.911 – Audio content categories
VQEG june 2012
Current Context
Top Related