Networked Audiovisual Media Technologies VISNET IST ee98235/Files/d40_final.pdf Networked...

download Networked Audiovisual Media Technologies VISNET IST ee98235/Files/d40_final.pdf Networked Audiovisual

If you can't read please download the document

  • date post

    10-Jun-2020
  • Category

    Documents

  • view

    0
  • download

    0

Embed Size (px)

Transcript of Networked Audiovisual Media Technologies VISNET IST ee98235/Files/d40_final.pdf Networked...

  • Networked Audiovisual Media Technologies

    VISNET

    IST-2003-506946

    D40

    Review of the work done in Audio-Video fusion

    VISNET/WP4.3/D40/V1.0 Page 1/70

  • Document description Document’s name

    Review of the work done in Audio-Video fusion

    Abstract : This deliverable presents a “state of the art” in multimodal analysis. The main objective of this document is to review the work already done in Audio-Video fusion in person location and identification and video indexing so that research areas can be identified.

    Document Identifier : D40 Document Class : Deliverable Version : V 1.0 Authors: EPFL: Yousri Abdeljaoued

    INESC: Luis Gustavo PdM: Marco Marcon, Augusto Sarti TUB: Markus Schwab UPC: Toni Rama, Francesc Tarrés Edited by UPC

    Creation date: 15/04/2004 Last modification date: 31/05/2004 Status: Final Destination: Consortium WP n°: 4.3

    VISNET/WP4.3/D40/V1.0 Page 2/70

  • TABLE OF CONTENTS

    1. INTRODUCTION ................................................................................................................................................... 6

    1.1 OVERVIEW OF MULTIMODAL ANALYSIS: PROBLEM STATEMENT....................................................................... 6 1.2 OVERVIEW OF THE DELIVERABLE......................................................................................................................... 7 1.3 BIBLIOGRAPHY ....................................................................................................................................................... 8

    2. MEASURING AUDIO FEATURES ...................................................................................................................... 9

    2.1 INTRODUCTION ....................................................................................................................................................... 9 2.2 FRAME-LEVEL FEATURES ................................................................................................................................... 10 2.2.1 VOLUME – SHORT TIME ENERGY (STE) - LOUDNESS ........................................................................................ 10 2.2.2 ZERO CROSS RATE (ZCR)................................................................................................................................... 10 2.2.3 BAND ENERGY (BE) AND BAND ENERGY RATIO (BER OR ERSB) .................................................................... 11 2.2.4 FREQUENCY CENTROID (FC) .............................................................................................................................. 11 2.2.5 BANDWIDTH (BW).............................................................................................................................................. 12 2.2.6 SPECTRAL ROLLOFF POINT ................................................................................................................................. 12 2.2.7 SPECTRAL FLATNESS MEASURES ....................................................................................................................... 12 2.2.8 CEPSTRAL COEFFICIENTS (CC)........................................................................................................................... 13 2.2.9 MEL FREQUENCY CEPSTRUM COEFFICIENTS (MFCC) ....................................................................................... 13 2.2.10 PITCH OR FUNDAMENTAL FREQUENCY ............................................................................................................. 14 2.3 CLIP-LEVEL FEATURES ....................................................................................................................................... 15 2.3.1 VOLUME-BASED.................................................................................................................................................. 15 2.3.2 ENERGY BASED................................................................................................................................................... 16 2.3.3 ZCR-BASED......................................................................................................................................................... 17 2.3.4 NON-SILENCE RATIO (NSR) ............................................................................................................................... 17 2.3.5 NOISE FRAME RATIO (NFR) ............................................................................................................................... 17 2.3.6 PITCH-BASED ...................................................................................................................................................... 17 2.3.7 SPECTRUM FLUX (SF) ......................................................................................................................................... 18 2.3.8 BAND PERIODICITY (BP)..................................................................................................................................... 18 2.3.9 LSP DISTANCE MEASURE ................................................................................................................................... 19 2.3.10 COMPRESSED DOMAIN AUDIO FEATURES ........................................................................................................ 19 2.4 BIBLIOGRAPHY ..................................................................................................................................................... 20

    3. MEASURING VIDEO FEATURES .................................................................................................................... 22

    3.1 INTRODUCTION ..................................................................................................................................................... 22 3.2 COLOR................................................................................................................................................................... 22 3.3 SHAPE .................................................................................................................................................................... 22 3.4 TEXTURE ............................................................................................................................................................... 23 3.5 MOTION................................................................................................................................................................. 23 3.6 BIBLIOGRAPHY ..................................................................................................................................................... 23

    4. STATISTICAL PATTERN RECOGNITION: A REVIEW ............................................................................. 25

    4.1 INTRODUCTION ..................................................................................................................................................... 25 4.2 CLASSIFIERS ......................................................................................................................................................... 25 4.2.1 BAYESIAN APPROACH......................................................................................................................................... 25 4.2.2 DISCRIMINANT FUNCTIONS ................................................................................................................................. 26 4.2.3 LINEAR DISCRIMINANT FUNCTIONS .................................................................................................................... 26

    VISNET/WP4.3/D40/V1.0 Page 3/70

  • 4.2.4 PIECEWISE LINEAR DISCRIMINANT FUNCTIONS .................................................................................................. 28 4.2.5 GENERALIZED LINEAR DISCRIMINANT FUNCTIONS............................................................................................. 28 4.3 CLASSIFIER COMBINATION .................................................................................................................................. 29 4.3.1 COMBINATION SCHEMES..................................................................................................................................... 29 4.3.2 TRAINING METHODS OF INDIVIDUAL CLASSIFIERS TO ASSURE INDEPENDENCY ................................................. 31 4.4 BIBLIOGRAPHY ..................................................................................................................................................... 31

    5. FUNDAMENTALS OF INFORMATION FUSION .......................................................................................... 33

    5.1 INTRODUCTION ..................................................................................................................................................... 33 5.2 PRE-MAPPING FUSION .......................................................................................................................................... 34 5.2.1 SENSOR DATA LEVEL FUSION.............................................................................................................................. 34 5.2.2 FEATURE LEVEL FUSION...................................................................................................................................... 34 5.3 POST-MAPPING FUSION........................................................................................................................................ 35 5.3.1 DECISION FUSION .............................................