Part 2 Visual Processing and Saliency -...

58
Computer Vision, Speech Communication & Signal Processing Group, National Technical University of Athens, Greece (NTUA) Robotic Perception and Interaction Unit, Athena Research and Innovation Center (Athena RIC) Part 2 Visual Processing and Saliency Petros Koutras 1 Tutorial at IEEE International Conference on Acoustics, Speech and Signal Processing 2017, New Orleans, USA, March 5, 2017

Transcript of Part 2 Visual Processing and Saliency -...

Page 1: Part 2 Visual Processing and Saliency - COGNIMUSEcognimuse.cs.ntua.gr/sites/default/files/ICASSP2017... · 2018-09-07 · Tutorial: Multimodal Signal Processing, Saliency and Summarization

Computer Vision, Speech Communication & Signal Processing Group, National Technical University of Athens, Greece (NTUA)

Robotic Perception and Interaction Unit,

Athena Research and Innovation Center (Athena RIC)

Part 2Visual Processing and Saliency

Petros Koutras

1

Tutorial at IEEE International Conference on Acoustics, Speech and Signal Processing 2017, New Orleans, USA, March 5, 2017

Page 2: Part 2 Visual Processing and Saliency - COGNIMUSEcognimuse.cs.ntua.gr/sites/default/files/ICASSP2017... · 2018-09-07 · Tutorial: Multimodal Signal Processing, Saliency and Summarization

2Tutorial: Multimodal Signal Processing, Saliency and Summarization

Visual Processing and SaliencySpatio-Temporal

Processing

Eyes Fixation Prediction Framewise Saliency

Visual Saliency Models

Page 3: Part 2 Visual Processing and Saliency - COGNIMUSEcognimuse.cs.ntua.gr/sites/default/files/ICASSP2017... · 2018-09-07 · Tutorial: Multimodal Signal Processing, Saliency and Summarization

3Tutorial: Multimodal Signal Processing, Saliency and Summarization

Part 2: Outline

Visual Saliency and Attention

State-of-the-Art in Visual Saliency

Spatio-Temporal Framework for Visual Saliency

Applications: Eyes Fixation Prediction, FramewiseSaliency

Page 4: Part 2 Visual Processing and Saliency - COGNIMUSEcognimuse.cs.ntua.gr/sites/default/files/ICASSP2017... · 2018-09-07 · Tutorial: Multimodal Signal Processing, Saliency and Summarization

4Tutorial: Multimodal Signal Processing, Saliency and Summarization

Visual Saliency and Attention

Visual Attention Top-down, Task-driven High level topics

Visual Saliency Bottom-up, Data-Driven Low level sensory cues

Applications Systems for selecting the most important regions of a large

amount of visual data Movie Summarization Visual Frontend for other applications.

Page 5: Part 2 Visual Processing and Saliency - COGNIMUSEcognimuse.cs.ntua.gr/sites/default/files/ICASSP2017... · 2018-09-07 · Tutorial: Multimodal Signal Processing, Saliency and Summarization

5Tutorial: Multimodal Signal Processing, Saliency and Summarization

Visual Saliency: Approaches, Measurements

Predict Viewers Fixations both in space and time Eye-tracking data from different users (CRCNS, Eye-Tracking Movie Database ETMD)

Detect Salient Objects Hand annotated databases

Framewise saliency: find the frames that are more salient than the others COGNIMUSE annotated database

Page 6: Part 2 Visual Processing and Saliency - COGNIMUSEcognimuse.cs.ntua.gr/sites/default/files/ICASSP2017... · 2018-09-07 · Tutorial: Multimodal Signal Processing, Saliency and Summarization

6

State-of-the-Art in Visual Saliency

Page 7: Part 2 Visual Processing and Saliency - COGNIMUSEcognimuse.cs.ntua.gr/sites/default/files/ICASSP2017... · 2018-09-07 · Tutorial: Multimodal Signal Processing, Saliency and Summarization

7Tutorial: Multimodal Signal Processing, Saliency and Summarization

Feature Integration Theory (FIT)

[A. Treisman, G. Gelade, “A feature integration theory of attention”, Cognit. Psychol, 1980.][A. Treisman and S. Sato, “Conjunction search revisited”, J. of experimental psychology: human perception and performance, 1990.]

We can detect and identify separable features in parallel across a display this early, parallel, process of

feature registration mediates texture segregation and figure ground grouping

locating any individual feature requires an additional operation

Conjunctions, require focal attention to be directed serially to each relevant location they do not mediate texture

segregation, and they cannot be identified without also being spatially localized

Page 8: Part 2 Visual Processing and Saliency - COGNIMUSEcognimuse.cs.ntua.gr/sites/default/files/ICASSP2017... · 2018-09-07 · Tutorial: Multimodal Signal Processing, Saliency and Summarization

8Tutorial: Multimodal Signal Processing, Saliency and Summarization

Saliency Map Concept

[C. Koch and S. Ullman, “Shifts in selective visual attention: towards the underlying neural circuitry”, Human Neurobiol., 1985.]

Page 9: Part 2 Visual Processing and Saliency - COGNIMUSEcognimuse.cs.ntua.gr/sites/default/files/ICASSP2017... · 2018-09-07 · Tutorial: Multimodal Signal Processing, Saliency and Summarization

9Tutorial: Multimodal Signal Processing, Saliency and Summarization

Saliency Map

Estimated Saliency MapOriginal Image

Spatial Saliency Benchmarks: http://saliency.mit.edu/index.html

Page 10: Part 2 Visual Processing and Saliency - COGNIMUSEcognimuse.cs.ntua.gr/sites/default/files/ICASSP2017... · 2018-09-07 · Tutorial: Multimodal Signal Processing, Saliency and Summarization

10Tutorial: Multimodal Signal Processing, Saliency and Summarization

First Computational Model (Itti et al. 1998)

[L. Itti, C. Koch and E. Niebur, “A model of saliency-based visual attention for rapid scene analysis”, IEEE Trans. PAMI, 1998.]

Page 11: Part 2 Visual Processing and Saliency - COGNIMUSEcognimuse.cs.ntua.gr/sites/default/files/ICASSP2017... · 2018-09-07 · Tutorial: Multimodal Signal Processing, Saliency and Summarization

11Tutorial: Multimodal Signal Processing, Saliency and Summarization

Spatio-temporal Extension (Itti et al.)

[L. Itti, N. Dhavale and F. Pighin, “Realistic avatar eye and head animation using a neurobiological model of visual attention”, SPIE 48th Annual International Symposium on Optical Science and Technology, 2003.]

5 Feature maps 3 static features Intensity Color Orientation

2 spatiotemporal features Flicker Motion

Page 12: Part 2 Visual Processing and Saliency - COGNIMUSEcognimuse.cs.ntua.gr/sites/default/files/ICASSP2017... · 2018-09-07 · Tutorial: Multimodal Signal Processing, Saliency and Summarization

12Tutorial: Multimodal Signal Processing, Saliency and Summarization

Graph-Based Visual Saliency (GBVS)

Markovian Approach Dissimilarity between the pixels (i,j)

& (p,q) of the feature Map M(i,j):

Weight for the edge from node (i,j) to node (p,q)

Define a Markov Chain on the Graph Normalize the weights of the outbound

edges Nodes States Weights transition probabilities Find the Equilibrium Distribution of this

Chain

Stages for Visual Saliency Extraction: extract feature vectors (intensity,

color, orientation) Activation: form the activation maps from the

feature vectors Normalization/Combination: Normalize the

activation maps and combine the maps into a single map

J. Harel, C. Koch and P. Perona, “Graph-based visual saliency”, NIPS 2006.

Page 13: Part 2 Visual Processing and Saliency - COGNIMUSEcognimuse.cs.ntua.gr/sites/default/files/ICASSP2017... · 2018-09-07 · Tutorial: Multimodal Signal Processing, Saliency and Summarization

13Tutorial: Multimodal Signal Processing, Saliency and Summarization

Adaptive Whitening Saliency (AWS)

Chromatic Decomposition Log-Gabor Filters Oriented Multiscale Decomposition and Whitening[A. Garcia-Diaz, X.R. Fernandez-Vidal, X.M. Pardo and R. Dosil, “Saliency from hierarchical adaptation through decorrelation and variance normalization”, Image Vis. Comput., 2012.]

Page 14: Part 2 Visual Processing and Saliency - COGNIMUSEcognimuse.cs.ntua.gr/sites/default/files/ICASSP2017... · 2018-09-07 · Tutorial: Multimodal Signal Processing, Saliency and Summarization

14Tutorial: Multimodal Signal Processing, Saliency and Summarization

Scene Context (GIST)

From very brief exposure to a scene, we can already extract a lot of information about its global structure, its category and some of its components.

[A. Torralba, A. Oliva, M. Castelhano and J. M. Henderson, “Contextual Guidance of Attention in Natural scenes: The role of Global features on object search”, Psychological Review, 2006.]

Page 15: Part 2 Visual Processing and Saliency - COGNIMUSEcognimuse.cs.ntua.gr/sites/default/files/ICASSP2017... · 2018-09-07 · Tutorial: Multimodal Signal Processing, Saliency and Summarization

15Tutorial: Multimodal Signal Processing, Saliency and Summarization

Saliency Using Natural Scene Statistics (SUN)

Static model of natural image statistics, modeled as lends itself to a very fast computational framework

Spatio-temporal extension: SUNDAy, Dynamic analysis of scenes

[L. Zhang, M.H. Tong, T.K. Marks, H. Shan and G.W. Cottrell, “Sun: a Bayesian framework for saliency using natural statistics”, J. Vis., 2008.][L. Zhang, M.H. Tong and G.W. Cottrell, “SUNDAy: Saliency using natural statistics for dynamic analysis of scenes”, 31st annual cognitive science conference, 2009.]

Page 16: Part 2 Visual Processing and Saliency - COGNIMUSEcognimuse.cs.ntua.gr/sites/default/files/ICASSP2017... · 2018-09-07 · Tutorial: Multimodal Signal Processing, Saliency and Summarization

16Tutorial: Multimodal Signal Processing, Saliency and Summarization

Bayesian and Surprise Models

[L. Itti and P. Baldi, “Bayesian surprise attracts human attention”, NIPS 2005.]

Page 17: Part 2 Visual Processing and Saliency - COGNIMUSEcognimuse.cs.ntua.gr/sites/default/files/ICASSP2017... · 2018-09-07 · Tutorial: Multimodal Signal Processing, Saliency and Summarization

17Tutorial: Multimodal Signal Processing, Saliency and Summarization

Spatial Bayesian Surprise Spatial surprise from an

image region to account for saliency due to contrast with context: prior: feature distribution in

spatial context (surroundings) posterior: distribution after

observing the region of interest

Extension of visual attention model through the use of surprise values instead of raw feature maps

[I.Gkioulekas,G.Evangelopoulos andP.Maragos,“SpatialBayesianSurpriseforImageSaliencyandQualityAssessment”,ICIP2010]

Page 18: Part 2 Visual Processing and Saliency - COGNIMUSEcognimuse.cs.ntua.gr/sites/default/files/ICASSP2017... · 2018-09-07 · Tutorial: Multimodal Signal Processing, Saliency and Summarization

18Tutorial: Multimodal Signal Processing, Saliency and Summarization

AIM (Attention by Information Maximization)

Independent (sparse) coding

Want to quantify likelihood of observing local patch/region of image

Likelihood related to self-information via –log(p(x))

[N. Bruce and J. Tsotsos, “Saliency based on information maximization”, NIPS 2005.]

Page 19: Part 2 Visual Processing and Saliency - COGNIMUSEcognimuse.cs.ntua.gr/sites/default/files/ICASSP2017... · 2018-09-07 · Tutorial: Multimodal Signal Processing, Saliency and Summarization

19Tutorial: Multimodal Signal Processing, Saliency and Summarization

Incremental Coding Length

Measure entropy gain of each feature

Maximize entropy across sample features

Select features with large coding length increment

[X. Hou and L. Zhang, “Dynamic visual attention: searching for coding length increments”, NIPS 2009.]

Page 20: Part 2 Visual Processing and Saliency - COGNIMUSEcognimuse.cs.ntua.gr/sites/default/files/ICASSP2017... · 2018-09-07 · Tutorial: Multimodal Signal Processing, Saliency and Summarization

20Tutorial: Multimodal Signal Processing, Saliency and Summarization

Discriminant / Decision Theoretic Saliency

Derived explicitly from a minimum Bayes error definition “c” applicable to centre/surround, but also other classes

(e.g. face vs. null hypothesis)

[D. Gao and N. Vasconcelos, “Discriminant saliency for visual recognition from cluttered scenes”, NIPS 2004.][D. Gao, S. Han and N. Vasconcelos, “Discriminant saliency, the detection of suspicious coincidences, and applications to visual recognition”, IEEE Trans. PAMI, 2009.]

Page 21: Part 2 Visual Processing and Saliency - COGNIMUSEcognimuse.cs.ntua.gr/sites/default/files/ICASSP2017... · 2018-09-07 · Tutorial: Multimodal Signal Processing, Saliency and Summarization

21Tutorial: Multimodal Signal Processing, Saliency and Summarization

Rarity Based Saliency

Considers rarity of features (both local and global, including self-information)

Multi-scale approach reminiscent of Itti et al.

Normalization/Whitening across color inputs and across scale, weighted combination/fusion

[N. Riche, M. Mancas, M. Duvinage, M. Mibulumukini, B. Gosselin and T. Dutoit, “Rare2012: a multi-scale rarity-based saliency detection with its comparative statistical analysis”, Sig. Proc.: Im. Com., 2013.]

Page 22: Part 2 Visual Processing and Saliency - COGNIMUSEcognimuse.cs.ntua.gr/sites/default/files/ICASSP2017... · 2018-09-07 · Tutorial: Multimodal Signal Processing, Saliency and Summarization

22Tutorial: Multimodal Signal Processing, Saliency and Summarization

Saliency by Self-Resemblance

Local structure represented by matrix of local descriptors (steering kernels robust to noise/image distortions)

Matrix cosine similarity forms a metric for resemblance at pixel to surround

Amounts to an estimate of likelihood of local feature matrix given feature matrix of pixels in surround

[H.J. Seo and P. Milanfar, “Static and space-time visual saliency detection by self-resemblance”, J. Vis., 2009.]

Page 23: Part 2 Visual Processing and Saliency - COGNIMUSEcognimuse.cs.ntua.gr/sites/default/files/ICASSP2017... · 2018-09-07 · Tutorial: Multimodal Signal Processing, Saliency and Summarization

23Tutorial: Multimodal Signal Processing, Saliency and Summarization

Boolean Map Based Saliency

Generate a set of Boolean maps by randomly thresholding the input image’s feature maps CIE Lab color space (perceptually uniform)

Given a Boolean map B , BMS computes an attention map A(B)based on a Gestalt principle for figure-ground segregation: surrounded regions are more likely to be perceived as figures

All attention maps are linearly combined into a full resolution mean attention map

[J. Zhang and S. Sclaroff, “Saliency detection: A boolean map approach”, CVPR 2013.]

Page 24: Part 2 Visual Processing and Saliency - COGNIMUSEcognimuse.cs.ntua.gr/sites/default/files/ICASSP2017... · 2018-09-07 · Tutorial: Multimodal Signal Processing, Saliency and Summarization

24Tutorial: Multimodal Signal Processing, Saliency and Summarization

Spectral Saliency Estimation

Phase-only Fourier Transform (PFT): All you need is the phase! Quaternion Fourier Transform (PQFT): Computing grayscale image,

color-opponent images, and frame difference image in one Quaternion transform.

[X. Hou and L. Zhang, “Saliency detection: a spectral residual approach”, CVPR 2007.][C. Guo, Q. Ma and L. Zhang, “Spatio-temporal saliency detection using phase spectrum of quaternion fourier transform”, CVPR 2008.] [C. Guo and L. Zhang, “A novel multiresolution spatiotemporal saliency detection model and its applications in image and video compression, IEEE Trans. Image Process., 2010.]

Page 25: Part 2 Visual Processing and Saliency - COGNIMUSEcognimuse.cs.ntua.gr/sites/default/files/ICASSP2017... · 2018-09-07 · Tutorial: Multimodal Signal Processing, Saliency and Summarization

25Tutorial: Multimodal Signal Processing, Saliency and Summarization

More on Spectral Saliency No scale parameter in spectral saliency?

Scale is the size! [32x24], [64x48], [128x96] are reasonable

choices.

PQFT [Guo et. al., CVPR 2008]: Compute frame difference as the “motion

channel”. Apply spectral saliency (separately or using

quaternion).

Spectral saliency in real domain Image Signature (SIG): [Hou et. al., PAMI 12]ImageSignature = sign(dct2(img));

QDCT: [Schauerte et. al., ECCV 12]Extending Image Signature to Quaternion DCT.

64x48 681x511

[C. Guo, Q. Ma and L. Zhang, “Spatio-temporal saliency detection using phase spectrum of quaternion fourier transform”, CVPR 2008.][X. Hou, J. Harel and C. Koch, “Image signature: highlighting sparse salient regions”, IEEE Trans. PAMI 2012.][B. Schauerte and R. Stiefelhagen, “Quaternion-based spectral saliency detection for eye fixation prediction”, ECCV 2012.]

Page 26: Part 2 Visual Processing and Saliency - COGNIMUSEcognimuse.cs.ntua.gr/sites/default/files/ICASSP2017... · 2018-09-07 · Tutorial: Multimodal Signal Processing, Saliency and Summarization

26Tutorial: Multimodal Signal Processing, Saliency and Summarization

Machine Learning Techniques Still Images [Judd et al. CVPR 2009]

Features: Low level: luminance, orientation, color Mid level: vanishing point, horizon line High level: face detection, object detection

Linear Support Vector Machine Test on single features and all features

Ensemble of Deep Networks (eDN): Features from 1-3 layer networks SVM based training fixated and non-fixated regions

Video Saliency [Rudoy et al. CVPR 2013] Candidate extraction:

Static (GBVS) Motion (Optical flow, DoG) Semantic (Face and body estimation)

Modeling gaze dynamics: Gaze transitions for training Learning transition probability

[T. Judd, K. Ehinger, F. Durand and A. Torralba, “Learning to predict where humans look”, CVPR 2009.][E. Vig, M. Dorr and D. Cox, “Large-scale optimization of hierarchical features for saliency prediction in natural images”, CVPR 2014.][D. Rudoy, D.B. Goldman, E. Shechtman and L. Zelnik-Manor, “Learning video saliency from human gaze using candidate selection”, CVPR 2013.]

Page 27: Part 2 Visual Processing and Saliency - COGNIMUSEcognimuse.cs.ntua.gr/sites/default/files/ICASSP2017... · 2018-09-07 · Tutorial: Multimodal Signal Processing, Saliency and Summarization

27Tutorial: Multimodal Signal Processing, Saliency and Summarization

Task-specific Learning Techniques Based on bottom-up

saliency and gist descriptors

Employed for task-specific or multi-task eye-tracking prediction in spatio-temporal stimuli

[A. Borji, D.N. Sihite and L. Itti, “Probabilistic learning of task-specific visual attention”, CVPR 2012.][J. Li, Y. Tian, T. Huang and W. Gao, “Probabilistic multi-task learning for visual saliency estimation in video”, Int. J. Comp. Vis., 2010.]

Page 28: Part 2 Visual Processing and Saliency - COGNIMUSEcognimuse.cs.ntua.gr/sites/default/files/ICASSP2017... · 2018-09-07 · Tutorial: Multimodal Signal Processing, Saliency and Summarization

28Tutorial: Multimodal Signal Processing, Saliency and Summarization

CNN-based Saliency Models Adaptation of CNN models for visual

recognition task Linear combination of different layers

and Gaussian blurring Multiscale Information Objective functions to optimize

common saliency evaluation metrics

[M. Kümmerer, L. Theis, and M. Bethge, “Deep Gaze I: Boosting Saliency Prediction with Feature Maps Trained on ImageNet”, ICLR Workshop 2015]

[X. Huang, C. Shen, X. Boix and Q. Zhao, “SALICON: Reducing the Semantic Gap in Saliency Prediction by Adapting Deep Neural Networks”, ICCV 2015.

Page 29: Part 2 Visual Processing and Saliency - COGNIMUSEcognimuse.cs.ntua.gr/sites/default/files/ICASSP2017... · 2018-09-07 · Tutorial: Multimodal Signal Processing, Saliency and Summarization

29Tutorial: Multimodal Signal Processing, Saliency and Summarization

Patch-based CNN Saliency Model Extract fixation and non-fixation image regions to train end-to-end

binary multiresolution CNN At testing composite maps from small image regions to construct

the final saliency map

[N.Liu,J.Han,D.Zhang,S.WenandT.Liu,“Predictingeyefixationsusingconvolutionalneuralnetworks”.CVPR2015.]

Page 30: Part 2 Visual Processing and Saliency - COGNIMUSEcognimuse.cs.ntua.gr/sites/default/files/ICASSP2017... · 2018-09-07 · Tutorial: Multimodal Signal Processing, Saliency and Summarization

30Tutorial: Multimodal Signal Processing, Saliency and Summarization

Loss Functions for End-to-End Saliency Mapping

Saliency is a dense prediction problem: Standard loss

functions for regression

Losses based on probability distance measures

[S.Jetley,N.MurrayandE.Vig,“End‐to‐endsaliencymappingviaprobabilitydistributionprediction”,CVPR2016.]

Page 31: Part 2 Visual Processing and Saliency - COGNIMUSEcognimuse.cs.ntua.gr/sites/default/files/ICASSP2017... · 2018-09-07 · Tutorial: Multimodal Signal Processing, Saliency and Summarization

31Tutorial: Multimodal Signal Processing, Saliency and Summarization

Fixation Prediction Evaluation Datasets

Spatial (Still images) MIT Bruce and Tsotsos (Torondo) Kootstra CAT2000 SALICON …

Spatio-Temporal (Videos) CRCNS DIEM Action in the Eye Eye-Tracking Movie Database

(ETMD) …

[A.Borji andL.Itti,“State‐of‐the‐artinvisualattentionmodeling”,IEEETrans.PAMI,2013.]

Page 32: Part 2 Visual Processing and Saliency - COGNIMUSEcognimuse.cs.ntua.gr/sites/default/files/ICASSP2017... · 2018-09-07 · Tutorial: Multimodal Signal Processing, Saliency and Summarization

32Tutorial: Multimodal Signal Processing, Saliency and Summarization

Evaluation Results (Static Databases)

[A. Borji, D.N. Sihite and L. Itti, “Quantitative analysis of human-model agreement in visual saliency modeling: a comparative study”, IEEE Trans. Image Process., 2013.]

Page 33: Part 2 Visual Processing and Saliency - COGNIMUSEcognimuse.cs.ntua.gr/sites/default/files/ICASSP2017... · 2018-09-07 · Tutorial: Multimodal Signal Processing, Saliency and Summarization

33Tutorial: Multimodal Signal Processing, Saliency and Summarization

Evaluation Results (Video Databases)

[A. Borji, D.N. Sihite and L. Itti, “Quantitative analysis of human-model agreement in visual saliency modeling: a comparative study”, IEEE Trans. Image Process., 2013.]

Page 34: Part 2 Visual Processing and Saliency - COGNIMUSEcognimuse.cs.ntua.gr/sites/default/files/ICASSP2017... · 2018-09-07 · Tutorial: Multimodal Signal Processing, Saliency and Summarization

34Tutorial: Multimodal Signal Processing, Saliency and Summarization

Salient Object Detection

Labeled Regions rather than fixation points: Salient Objects Dataset

(SOD) Extended Complex

Scene Saliency Dataset (ECSSD)

These two kinds of evaluation can disagree with each other.

[A. Borji, M.M. Cheng, H. Jiang and J. Li, “Salient object detection: A benchmark”, IEEE Trans. Image Process., 2015.]

Page 35: Part 2 Visual Processing and Saliency - COGNIMUSEcognimuse.cs.ntua.gr/sites/default/files/ICASSP2017... · 2018-09-07 · Tutorial: Multimodal Signal Processing, Saliency and Summarization

35

Spatio-Temporal Framework for

Visual Saliency

[P. Koutras and P. Maragos, “A Perceptually-based Spatio-Temporal Computational Framework for Visual Saliency Estimation”, Sig. Proc.: Im. Com., 2015.]

Page 36: Part 2 Visual Processing and Saliency - COGNIMUSEcognimuse.cs.ntua.gr/sites/default/files/ICASSP2017... · 2018-09-07 · Tutorial: Multimodal Signal Processing, Saliency and Summarization

36Tutorial: Multimodal Signal Processing, Saliency and Summarization

Why Spatio-Temporal Saliency?AWS

Spatio-Temporal Energy

Page 37: Part 2 Visual Processing and Saliency - COGNIMUSEcognimuse.cs.ntua.gr/sites/default/files/ICASSP2017... · 2018-09-07 · Tutorial: Multimodal Signal Processing, Saliency and Summarization

37Tutorial: Multimodal Signal Processing, Saliency and Summarization

Visual Saliency RepresentationsSpatio-Temporal Processing

for Visual Saliency

EnergySTIP

SaliencyMaps

VisualCurves

Movie Summarization Action Recognition

Eyes Fixation Prediction

Page 38: Part 2 Visual Processing and Saliency - COGNIMUSEcognimuse.cs.ntua.gr/sites/default/files/ICASSP2017... · 2018-09-07 · Tutorial: Multimodal Signal Processing, Saliency and Summarization

38Tutorial: Multimodal Signal Processing, Saliency and Summarization

Spatio-Temporal Frontend for Visual Saliency

Relevant to the cognition-inspired saliency methods, based on Koch & Ullman theory.

Uses biologically plausible spatio-temporal filters, like oriented 3D Gabor filters, in order to extract visual features.

Detects both the fastest changes in the video stimuli (e.g. flicker) and the slowest motion changes related to action events.

Page 39: Part 2 Visual Processing and Saliency - COGNIMUSEcognimuse.cs.ntua.gr/sites/default/files/ICASSP2017... · 2018-09-07 · Tutorial: Multimodal Signal Processing, Saliency and Summarization

39Tutorial: Multimodal Signal Processing, Saliency and Summarization

Spatio-Temporal Frontend for Visual SaliencyOverview

Color Modeling CIE-Lab or PCA projected color space Luminance stream: Color steam:

[P. Koutras and P. Maragos, “A Perceptually-based Spatio-Temporal Computational Framework for Visual Saliency Estimation”, Sig. Proc.: Im. Com., 2015.]

Page 40: Part 2 Visual Processing and Saliency - COGNIMUSEcognimuse.cs.ntua.gr/sites/default/files/ICASSP2017... · 2018-09-07 · Tutorial: Multimodal Signal Processing, Saliency and Summarization

40Tutorial: Multimodal Signal Processing, Saliency and Summarization

Spatio-Temporal Dominant Analysis (STDA)

Extract 3 dominant energy volumes for each stream (expressing basic perceptual concepts in visual saliency) Spatio-Temporal related with motion Static (or Spatial) related with frames texture or edges LowPass related with that other model call “intensity” (which can be either in

luminance or color stream)

Page 41: Part 2 Visual Processing and Saliency - COGNIMUSEcognimuse.cs.ntua.gr/sites/default/files/ICASSP2017... · 2018-09-07 · Tutorial: Multimodal Signal Processing, Saliency and Summarization

41Tutorial: Multimodal Signal Processing, Saliency and Summarization

Spatio-Temporal Gabor Filterbank

Page 42: Part 2 Visual Processing and Saliency - COGNIMUSEcognimuse.cs.ntua.gr/sites/default/files/ICASSP2017... · 2018-09-07 · Tutorial: Multimodal Signal Processing, Saliency and Summarization

42Tutorial: Multimodal Signal Processing, Saliency and Summarization

Spatial Gabor Filterbank

5 Positive & 5 Negative Temporal Frequencies

5 scales8 orientations

Full Spatial Filterbank(40 Filters)

3 scales

8 orientations

Reduced Spatial Filterbank(12 Filters)

Page 43: Part 2 Visual Processing and Saliency - COGNIMUSEcognimuse.cs.ntua.gr/sites/default/files/ICASSP2017... · 2018-09-07 · Tutorial: Multimodal Signal Processing, Saliency and Summarization

43Tutorial: Multimodal Signal Processing, Saliency and Summarization

Separable 3D Gabor Filters

Quadrature Pairs of Separable 3D Gabor Filters

[K. Maninis, P. Koutras and P. Maragos, “Advances on Action Recognition in Videos Using and Interest Point Detector Based on Multiband Spatio-Temporal Energies”, ICIP 2014.]

Page 44: Part 2 Visual Processing and Saliency - COGNIMUSEcognimuse.cs.ntua.gr/sites/default/files/ICASSP2017... · 2018-09-07 · Tutorial: Multimodal Signal Processing, Saliency and Summarization

44Tutorial: Multimodal Signal Processing, Saliency and Summarization

Postprocessing

Quadrature Pair Square Energy for each Gabor filter:

Dominant Energy Selection:and

Center-Surround Difference for Low-pass Energy

Temporal Moving Average (TMA) for all energy types.

Page 45: Part 2 Visual Processing and Saliency - COGNIMUSEcognimuse.cs.ntua.gr/sites/default/files/ICASSP2017... · 2018-09-07 · Tutorial: Multimodal Signal Processing, Saliency and Summarization

45Tutorial: Multimodal Signal Processing, Saliency and Summarization

Visual Saliency in Movie Videos - Demo

Original RGB Frames Luminance STDE Color Contrast Low-pass Energy

COGNIMUSE Database:Lord of the Rings: The Return of the King

Page 46: Part 2 Visual Processing and Saliency - COGNIMUSEcognimuse.cs.ntua.gr/sites/default/files/ICASSP2017... · 2018-09-07 · Tutorial: Multimodal Signal Processing, Saliency and Summarization

46

Eyes Fixation Prediction

[P. Koutras and P. Maragos, “A Perceptually-based Spatio-Temporal Computational Framework for Visual Saliency Estimation”, Sig. Proc.: Im. Com., 2015.]

Page 47: Part 2 Visual Processing and Saliency - COGNIMUSEcognimuse.cs.ntua.gr/sites/default/files/ICASSP2017... · 2018-09-07 · Tutorial: Multimodal Signal Processing, Saliency and Summarization

47Tutorial: Multimodal Signal Processing, Saliency and Summarization

Visual Saliency Demo

Original RGB Frames with Eye Tracking

Luminance Spatio-Temporal Dominant Energy

Page 48: Part 2 Visual Processing and Saliency - COGNIMUSEcognimuse.cs.ntua.gr/sites/default/files/ICASSP2017... · 2018-09-07 · Tutorial: Multimodal Signal Processing, Saliency and Summarization

48Tutorial: Multimodal Signal Processing, Saliency and Summarization

Eye-Tracking Movie Database - Examples

[P. Koutras, A. Katsamanis and P. Maragos, “Predicting Eyes' Fixations in Movie Videos: Visual Saliency Experiments on a New Eye-Tracking Database”, HCI 2014.]

Page 49: Part 2 Visual Processing and Saliency - COGNIMUSEcognimuse.cs.ntua.gr/sites/default/files/ICASSP2017... · 2018-09-07 · Tutorial: Multimodal Signal Processing, Saliency and Summarization

49Tutorial: Multimodal Signal Processing, Saliency and Summarization

Center Bias – ETMD Fixations

Page 50: Part 2 Visual Processing and Saliency - COGNIMUSEcognimuse.cs.ntua.gr/sites/default/files/ICASSP2017... · 2018-09-07 · Tutorial: Multimodal Signal Processing, Saliency and Summarization

50Tutorial: Multimodal Signal Processing, Saliency and Summarization

Evaluation Measures for Visual Saliency

Correlation Coefficient (CC)Centering a 2D Gaussian at each viewer’ eye fixation.

Normalized Scanpath Saliency (NSS)Standardization (zero-mean and unit-variance)Values at each viewer fixation positionTake the mean over all viewers fixations

Area Under Curve (AUC)Area under the Receiver Operating Characteristic

(ROC) curve (False Positive Rate – Recall)Binary classification problem: (salient / non salient

regions)

[A. Borji, D.N. Sihite and L. Itti, “Quantitative analysis of human-model agreement in visual saliency modeling: a comparative study”, IEEE Trans. Image Process., 2013.]

Page 51: Part 2 Visual Processing and Saliency - COGNIMUSEcognimuse.cs.ntua.gr/sites/default/files/ICASSP2017... · 2018-09-07 · Tutorial: Multimodal Signal Processing, Saliency and Summarization

51Tutorial: Multimodal Signal Processing, Saliency and Summarization

Fixation Prediction Results – CRCNS ORIG Compare with 15 state-of-art

model 3 spatio-temporal models

related with 3 basic approaches: Cognitive inspired Statistical framework Frequency domain analysis

Page 52: Part 2 Visual Processing and Saliency - COGNIMUSEcognimuse.cs.ntua.gr/sites/default/files/ICASSP2017... · 2018-09-07 · Tutorial: Multimodal Signal Processing, Saliency and Summarization

52Tutorial: Multimodal Signal Processing, Saliency and Summarization

6 Oscar-winning Hollywood Movies Chicago, Crash, Departed, Finding Nemo,

Gladiator, Lord of the Rings 2 short video clips (3-3.5 min) from each

movies Scenes with both high action and dialogues

Eye-tracking Human Annotation Eye-tracking data by 10 different people Both grayscale and color versions of each

video One fixation point per frame

Fixation Prediction Results – ETMD

Page 53: Part 2 Visual Processing and Saliency - COGNIMUSEcognimuse.cs.ntua.gr/sites/default/files/ICASSP2017... · 2018-09-07 · Tutorial: Multimodal Signal Processing, Saliency and Summarization

53

Framewise Saliency

[P. Koutras, A. Zlatintsi, E.Iosif, A. Katsamanis, P. Maragos and A. Potamianos, “Predicting Audio-Visual Salient Events Based on Visual, Audio and Text Modalities for Movie Summarization”, ICIP 2015.]

Page 54: Part 2 Visual Processing and Saliency - COGNIMUSEcognimuse.cs.ntua.gr/sites/default/files/ICASSP2017... · 2018-09-07 · Tutorial: Multimodal Signal Processing, Saliency and Summarization

54Tutorial: Multimodal Signal Processing, Saliency and Summarization

Framewise Visual Saliency - Features

Visual Features

3D Gabor Energy model

Both luminance and color streams: Spatio-Temporal Dominant

Energies (Filterbank of 400 3D Gabor filters)

Spatial Dominant Energies(Filterbank of 40 Spatial Gabor filters)

Page 55: Part 2 Visual Processing and Saliency - COGNIMUSEcognimuse.cs.ntua.gr/sites/default/files/ICASSP2017... · 2018-09-07 · Tutorial: Multimodal Signal Processing, Saliency and Summarization

55Tutorial: Multimodal Signal Processing, Saliency and Summarization

Framewise Visual Saliency – Energy Curves3D Gabor Energy model

Energy Curves Simple 3D to 1D Mapping Mean value for each 2D frame slice of each 3D energy volume (STDE, SDE) 4 temporal sequences of visual feature vectors

Page 56: Part 2 Visual Processing and Saliency - COGNIMUSEcognimuse.cs.ntua.gr/sites/default/files/ICASSP2017... · 2018-09-07 · Tutorial: Multimodal Signal Processing, Saliency and Summarization

56Tutorial: Multimodal Signal Processing, Saliency and Summarization

Framewise Visual Saliency - Summarization

Features Postprocessing: Standardize Features (zero mean, unit covariance) Compute 1st and 2nd order derivatives (deltas)

Classification based on KNN or GMM: Binary classification problem (salient / non salient video segments) Confidence Scores Median Filtering of Saliency Measurement Sorting the Frames based on Saliency Measurement

Page 57: Part 2 Visual Processing and Saliency - COGNIMUSEcognimuse.cs.ntua.gr/sites/default/files/ICASSP2017... · 2018-09-07 · Tutorial: Multimodal Signal Processing, Saliency and Summarization

57Tutorial: Multimodal Signal Processing, Saliency and Summarization

Results on Hollywood Movies - ROC Curves

AUCTMM’13 (KNN) 0,603

ICIP’15 (KNN) 0,699ICIP’15

GMM(M=10) 0,660ICIP’15

GMM(M=10, Viterbi) 0,668

[G. Evangelopoulos, A. Zlatintsi, A. Potamianos, P. Maragos, K. Rapantzikos, G. Skoumas, Y. Avrithis, “Multimodal Saliency and Fusionfor Movie Summarization based on Aural, Visual, and Textual Attention” IEEE Trans.-MM, 2013.][P. Koutras, A. Zlatintsi, E.Iosif, A. Katsamanis, P. Maragos and A. Potamianos, “Predicting Audio-Visual Salient Events Based on Visual,Audio and Text Modalities for Movie Summarization”, ICIP 2015.]

Page 58: Part 2 Visual Processing and Saliency - COGNIMUSEcognimuse.cs.ntua.gr/sites/default/files/ICASSP2017... · 2018-09-07 · Tutorial: Multimodal Signal Processing, Saliency and Summarization

58Tutorial: Multimodal Signal Processing, Saliency and Summarization

Part 2: Conclusions

Importance of spatio-temporal video processing: Saliency estimation Event detection Summarization

Visual Saliency Deep networks achieve state-of-the-art performance on standard

benchmarks Datasets: small-scale, center-biased, biased towards semantic

objects Spatio-temporal saliency networks Eye fixation prediction Framewise saliency

Tutorial slides: http://cognimuse.cs.ntua.gr/icassp17