Networked Audiovisual Media Technologies VISNET IST ee98235/Files/d40_final.pdf Networked...
date post
10-Jun-2020Category
Documents
view
0download
0
Embed Size (px)
Transcript of Networked Audiovisual Media Technologies VISNET IST ee98235/Files/d40_final.pdf Networked...
Networked Audiovisual Media Technologies
VISNET
IST-2003-506946
D40
Review of the work done in Audio-Video fusion
VISNET/WP4.3/D40/V1.0 Page 1/70
Document description Document’s name
Review of the work done in Audio-Video fusion
Abstract : This deliverable presents a “state of the art” in multimodal analysis. The main objective of this document is to review the work already done in Audio-Video fusion in person location and identification and video indexing so that research areas can be identified.
Document Identifier : D40 Document Class : Deliverable Version : V 1.0 Authors: EPFL: Yousri Abdeljaoued
INESC: Luis Gustavo PdM: Marco Marcon, Augusto Sarti TUB: Markus Schwab UPC: Toni Rama, Francesc Tarrés Edited by UPC
Creation date: 15/04/2004 Last modification date: 31/05/2004 Status: Final Destination: Consortium WP n°: 4.3
VISNET/WP4.3/D40/V1.0 Page 2/70
TABLE OF CONTENTS
1. INTRODUCTION ................................................................................................................................................... 6
1.1 OVERVIEW OF MULTIMODAL ANALYSIS: PROBLEM STATEMENT....................................................................... 6 1.2 OVERVIEW OF THE DELIVERABLE......................................................................................................................... 7 1.3 BIBLIOGRAPHY ....................................................................................................................................................... 8
2. MEASURING AUDIO FEATURES ...................................................................................................................... 9
2.1 INTRODUCTION ....................................................................................................................................................... 9 2.2 FRAME-LEVEL FEATURES ................................................................................................................................... 10 2.2.1 VOLUME – SHORT TIME ENERGY (STE) - LOUDNESS ........................................................................................ 10 2.2.2 ZERO CROSS RATE (ZCR)................................................................................................................................... 10 2.2.3 BAND ENERGY (BE) AND BAND ENERGY RATIO (BER OR ERSB) .................................................................... 11 2.2.4 FREQUENCY CENTROID (FC) .............................................................................................................................. 11 2.2.5 BANDWIDTH (BW).............................................................................................................................................. 12 2.2.6 SPECTRAL ROLLOFF POINT ................................................................................................................................. 12 2.2.7 SPECTRAL FLATNESS MEASURES ....................................................................................................................... 12 2.2.8 CEPSTRAL COEFFICIENTS (CC)........................................................................................................................... 13 2.2.9 MEL FREQUENCY CEPSTRUM COEFFICIENTS (MFCC) ....................................................................................... 13 2.2.10 PITCH OR FUNDAMENTAL FREQUENCY ............................................................................................................. 14 2.3 CLIP-LEVEL FEATURES ....................................................................................................................................... 15 2.3.1 VOLUME-BASED.................................................................................................................................................. 15 2.3.2 ENERGY BASED................................................................................................................................................... 16 2.3.3 ZCR-BASED......................................................................................................................................................... 17 2.3.4 NON-SILENCE RATIO (NSR) ............................................................................................................................... 17 2.3.5 NOISE FRAME RATIO (NFR) ............................................................................................................................... 17 2.3.6 PITCH-BASED ...................................................................................................................................................... 17 2.3.7 SPECTRUM FLUX (SF) ......................................................................................................................................... 18 2.3.8 BAND PERIODICITY (BP)..................................................................................................................................... 18 2.3.9 LSP DISTANCE MEASURE ................................................................................................................................... 19 2.3.10 COMPRESSED DOMAIN AUDIO FEATURES ........................................................................................................ 19 2.4 BIBLIOGRAPHY ..................................................................................................................................................... 20
3. MEASURING VIDEO FEATURES .................................................................................................................... 22
3.1 INTRODUCTION ..................................................................................................................................................... 22 3.2 COLOR................................................................................................................................................................... 22 3.3 SHAPE .................................................................................................................................................................... 22 3.4 TEXTURE ............................................................................................................................................................... 23 3.5 MOTION................................................................................................................................................................. 23 3.6 BIBLIOGRAPHY ..................................................................................................................................................... 23
4. STATISTICAL PATTERN RECOGNITION: A REVIEW ............................................................................. 25
4.1 INTRODUCTION ..................................................................................................................................................... 25 4.2 CLASSIFIERS ......................................................................................................................................................... 25 4.2.1 BAYESIAN APPROACH......................................................................................................................................... 25 4.2.2 DISCRIMINANT FUNCTIONS ................................................................................................................................. 26 4.2.3 LINEAR DISCRIMINANT FUNCTIONS .................................................................................................................... 26
VISNET/WP4.3/D40/V1.0 Page 3/70
4.2.4 PIECEWISE LINEAR DISCRIMINANT FUNCTIONS .................................................................................................. 28 4.2.5 GENERALIZED LINEAR DISCRIMINANT FUNCTIONS............................................................................................. 28 4.3 CLASSIFIER COMBINATION .................................................................................................................................. 29 4.3.1 COMBINATION SCHEMES..................................................................................................................................... 29 4.3.2 TRAINING METHODS OF INDIVIDUAL CLASSIFIERS TO ASSURE INDEPENDENCY ................................................. 31 4.4 BIBLIOGRAPHY ..................................................................................................................................................... 31
5. FUNDAMENTALS OF INFORMATION FUSION .......................................................................................... 33
5.1 INTRODUCTION ..................................................................................................................................................... 33 5.2 PRE-MAPPING FUSION .......................................................................................................................................... 34 5.2.1 SENSOR DATA LEVEL FUSION.............................................................................................................................. 34 5.2.2 FEATURE LEVEL FUSION...................................................................................................................................... 34 5.3 POST-MAPPING FUSION........................................................................................................................................ 35 5.3.1 DECISION FUSION .............................................