ICME08 Visual Saliency - pdfs.semanticscholar.org · Detecting visual saliency from a video is...

45
Dynamic Visual Saliency Modeling Based on Spatiotemporal Analysis Duan-Yu Chen, Hsiao-Rong Tyan, Dun-Yu Hsiao, Shen-Wen Shih, and Hong-Yuan Mark Liao

Transcript of ICME08 Visual Saliency - pdfs.semanticscholar.org · Detecting visual saliency from a video is...

Page 1: ICME08 Visual Saliency - pdfs.semanticscholar.org · Detecting visual saliency from a video is considered a good way of understanding or annotating the video content. ICME 2008 Duan-Yu

Dynamic Visual Saliency Modeling Based on Spatiotemporal Analysis

Duan-Yu Chen, Hsiao-Rong Tyan, Dun-Yu Hsiao, Shen-Wen Shih, and Hong-Yuan Mark Liao

Page 2: ICME08 Visual Saliency - pdfs.semanticscholar.org · Detecting visual saliency from a video is considered a good way of understanding or annotating the video content. ICME 2008 Duan-Yu

ICME 2008 Duan-Yu Chen 2

Outline

IntroductionRelated WorksProposed Approach

Selective Visual Attention ModelingDetect Spatiotemporal Salient PointsCompute Motion Attention MapDetermine the Extent of Attended Regions

Experiment ResultsConclusion

Page 3: ICME08 Visual Saliency - pdfs.semanticscholar.org · Detecting visual saliency from a video is considered a good way of understanding or annotating the video content. ICME 2008 Duan-Yu

ICME 2008 Duan-Yu Chen 3

IntroductionMethods for annotating video content as well as related video retrieval techniques have attracted a great deal of attention in recent years.Video content contains much richer information than other medias and thus can be annotated for applications, such as video surveillance and entertainment. Detecting visual saliency from a video is considered a good way of understanding or annotating the video content.

Page 4: ICME08 Visual Saliency - pdfs.semanticscholar.org · Detecting visual saliency from a video is considered a good way of understanding or annotating the video content. ICME 2008 Duan-Yu

ICME 2008 Duan-Yu Chen 4

Introduction (cont.)The idea is from the father of American psychology, William James in 1980, who suggests that subjects selectively direct attention to objects in a scene using both bottom-up, image-based saliency cues and top-down, task-dependent cues.

Page 5: ICME08 Visual Saliency - pdfs.semanticscholar.org · Detecting visual saliency from a video is considered a good way of understanding or annotating the video content. ICME 2008 Duan-Yu

ICME 2008 Duan-Yu Chen 5

A Typical Model for the Control of Bottom-up Attention

Itti’01

Page 6: ICME08 Visual Saliency - pdfs.semanticscholar.org · Detecting visual saliency from a video is considered a good way of understanding or annotating the video content. ICME 2008 Duan-Yu

ICME 2008 Duan-Yu Chen 6

Introduction (c.1)Motion attention models can be classified into two categories

Motion EstimationE.g. block-based matching for motion vector estimationH. J. Zhang, et al., “A User Attention Model for Video Summarization,” ACM MM’02.

Structural tensor-based approachS. Li and M. C. Lee, “An Efficient Spatiotemporal Attention Model and Its Application to Shot Matching,” IEEE Trans. on CSVT, Vol.17, No.10, Oct. 2007.

Page 7: ICME08 Visual Saliency - pdfs.semanticscholar.org · Detecting visual saliency from a video is considered a good way of understanding or annotating the video content. ICME 2008 Duan-Yu

ICME 2008 Duan-Yu Chen 7

Motivation – using Structural Tensor

Spatio-temporal image data contains rich information

Emphasize local structures by spatio-temporal analysis over a set of consecutive video frames.

Observation:

Events in video are frequently characterized by non-constant motion and non-constant appearance

Traditional methods for video analysis include

• optical flow estimation

• tracking of features/models over time• Mubarak Shah, et al.,” Visual Attention Detection in Video Sequences

Using Spatiotemporal Cues,” ACM MM’06.

Page 8: ICME08 Visual Saliency - pdfs.semanticscholar.org · Detecting visual saliency from a video is considered a good way of understanding or annotating the video content. ICME 2008 Duan-Yu

ICME 2008 Duan-Yu Chen 8

Mubarak Shah, et al.,” Visual Attention Detection in Video Sequences Using Spatiotemporal Cues,” ACM MM’06.

Page 9: ICME08 Visual Saliency - pdfs.semanticscholar.org · Detecting visual saliency from a video is considered a good way of understanding or annotating the video content. ICME 2008 Duan-Yu

ICME 2008 Duan-Yu Chen 9

Mubarak Shah, et al.,” Visual Attention Detection in Video Sequences Using Spatiotemporal Cues,” ACM MM’06.

Temporal Attention ModelUse SIFT (Scale Invariant Feature Transformation) to compute correspondences between points in frame pairs.

Page 10: ICME08 Visual Saliency - pdfs.semanticscholar.org · Detecting visual saliency from a video is considered a good way of understanding or annotating the video content. ICME 2008 Duan-Yu

ICME 2008 Duan-Yu Chen 10

Temporal Attention Model

'33:'22:

'11:

3

2

1

→→→

HHHH: Homography

Use RANSAC to determine 321 ,, HHH

Page 11: ICME08 Visual Saliency - pdfs.semanticscholar.org · Detecting visual saliency from a video is considered a good way of understanding or annotating the video content. ICME 2008 Duan-Yu

ICME 2008 Duan-Yu Chen 11

Temporal Salience MapsExample

Page 12: ICME08 Visual Saliency - pdfs.semanticscholar.org · Detecting visual saliency from a video is considered a good way of understanding or annotating the video content. ICME 2008 Duan-Yu

ICME 2008 Duan-Yu Chen 12

S. Li and M. C. Lee, “An Efficient Spatiotemporal Attention Model and Its Application to Shot Matching,” IEEE Trans. on CSVT, Vol.17, No.10, Oct. 2007.

Spatiotemporal Attention Modeling Center-surround pyramids are used in motion attention detection to produce salience maps with different precisions.

Page 13: ICME08 Visual Saliency - pdfs.semanticscholar.org · Detecting visual saliency from a video is considered a good way of understanding or annotating the video content. ICME 2008 Duan-Yu

ICME 2008 Duan-Yu Chen 13

S. Li and M. C. Lee, “An Efficient Spatiotemporal Attention Model and Its Application to Shot Matching,” IEEE Trans. on CSVT, Vol.17, No.10, Oct. 2007. (c.1)

Center-Surround pyramidsThe weighting is realized by a Gaussian kernel. ∇g = (gx, gy, gt) is the space-time gradients of the intensity.

Given the spatiotemporal structural tensors M1 and M2of a local neighborhood Ω1 and a background Ω2respectively,

Page 14: ICME08 Visual Saliency - pdfs.semanticscholar.org · Detecting visual saliency from a video is considered a good way of understanding or annotating the video content. ICME 2008 Duan-Yu

ICME 2008 Duan-Yu Chen 14

Compute Salience Maps using Different Visual Cues

Page 15: ICME08 Visual Saliency - pdfs.semanticscholar.org · Detecting visual saliency from a video is considered a good way of understanding or annotating the video content. ICME 2008 Duan-Yu

ICME 2008 Duan-Yu Chen 15

T. Liu, et al., “Learning to Detect A Salient Object,” CVPR’07

• Detect A Salient Object in Still Images

Page 16: ICME08 Visual Saliency - pdfs.semanticscholar.org · Detecting visual saliency from a video is considered a good way of understanding or annotating the video content. ICME 2008 Duan-Yu

ICME 2008 Duan-Yu Chen 16

20 40 60 80 100 120 140 160

20

40

60

80

100

120

140

20 40 60 80 100 120 140 160

20

40

60

80

100

120

140

shape from: SaliencyMap

20 40 60 80 100 120 140 160

20

40

60

80

100

120

140

1

2

3

Static Visual Saliency –color, intensity, orientation

Page 17: ICME08 Visual Saliency - pdfs.semanticscholar.org · Detecting visual saliency from a video is considered a good way of understanding or annotating the video content. ICME 2008 Duan-Yu

ICME 2008 Duan-Yu Chen 17

20 40 60 80 100 120 140 160

20

40

60

80

100

120

140

20 40 60 80 100 120 140 160

20

40

60

80

100

120

140

20

40

60

80

100

120

140 20 40 60 80 100 120 140 160

20

40

60

80

100

120

1

1

1

1

2

2

22

3

3

3

3

Page 18: ICME08 Visual Saliency - pdfs.semanticscholar.org · Detecting visual saliency from a video is considered a good way of understanding or annotating the video content. ICME 2008 Duan-Yu

ICME 2008 Duan-Yu Chen 18

Proposed Approach

GoalTo design a measure that better captures the extent of motion saliency without using the clue of spatial saliency, even in a dynamic cluttered background.

Visual Saliency Maps IntegrationsDetect Salient Points to determine the rough boundary of a moving targetDetect Salient Regions with consistent motion to determine the regions inside the estimated boundary

Page 19: ICME08 Visual Saliency - pdfs.semanticscholar.org · Detecting visual saliency from a video is considered a good way of understanding or annotating the video content. ICME 2008 Duan-Yu

ICME 2008 Duan-Yu Chen 19

Salient points in space

(Harris and Stephenes 1988): image points with high variation of values in both image direction

⇒ High eigenvectors of the second-moment matrix integrated at the local neighbourhood

where Lx, Ly are Gaussian derivatives

⇒ Select points with maxima of the corner function

Structural Tensor

Page 20: ICME08 Visual Saliency - pdfs.semanticscholar.org · Detecting visual saliency from a video is considered a good way of understanding or annotating the video content. ICME 2008 Duan-Yu

ICME 2008 Duan-Yu Chen 20

Interest points in space-time

High variation of image values in both space and time

⇒ extend Harris corner function into 3D spatio-temporal domain; compute the second moment matrix

where Lx, Ly , Lt are Gaussian derivatives in space-time obtained by spatio-temporal convolution:

with

Page 21: ICME08 Visual Saliency - pdfs.semanticscholar.org · Detecting visual saliency from a video is considered a good way of understanding or annotating the video content. ICME 2008 Duan-Yu

ICME 2008 Duan-Yu Chen 21

Salient region in space-time

Function of Visual Saliency

where are eigenvalues of .

Page 22: ICME08 Visual Saliency - pdfs.semanticscholar.org · Detecting visual saliency from a video is considered a good way of understanding or annotating the video content. ICME 2008 Duan-Yu

ICME 2008 Duan-Yu Chen 22

Problem of only detecting salient regions based on 3D corner detection

Broken parts of moving targets

Why results in broken parts ?Regions inside a moving target are relatively smoothly changed within XYT video volumes.

20 40 60 80 100 120 140 160

20

40

60

80

100

120

140

X: 23 Y: 59Index: 0RGB: 0, 0, 0

Page 23: ICME08 Visual Saliency - pdfs.semanticscholar.org · Detecting visual saliency from a video is considered a good way of understanding or annotating the video content. ICME 2008 Duan-Yu

ICME 2008 Duan-Yu Chen 23

SolutionSmoothly changed regions physically mean that these regions are moving with consistent motionsWhat is the motion cue ?

The rank of structural tensor

Rank(μ)=3: multiple motionsRank(μ)=2: one dominant motion

Page 24: ICME08 Visual Saliency - pdfs.semanticscholar.org · Detecting visual saliency from a video is considered a good way of understanding or annotating the video content. ICME 2008 Duan-Yu

ICME 2008 Duan-Yu Chen 24

Solution (c.1)

Noise and FG/BG movements in video sequences would result in multiple motions within a neighborhood.

Rank(μ)=3 in almost all video volumes

A new measure of rank deficiency dST =

Page 25: ICME08 Visual Saliency - pdfs.semanticscholar.org · Detecting visual saliency from a video is considered a good way of understanding or annotating the video content. ICME 2008 Duan-Yu

ICME 2008 Duan-Yu Chen 25

Median Filter for dST

Goal: to drop off the regions with multiple motions and only keep regions with consistent motion

Use median filter to filter out two extreme cases

rank(μ)=1 and rank(μ)=3

20 40 60 80 100 120 140 160

20

40

60

80

100

120

140

X: 32 Y: 46Index: 0.0002428RGB: 0, 0, 0.563

mask:[30 30]

20 40 60 80 100 120 140 160

20

40

60

80

100

120

140

Page 26: ICME08 Visual Saliency - pdfs.semanticscholar.org · Detecting visual saliency from a video is considered a good way of understanding or annotating the video content. ICME 2008 Duan-Yu

ICME 2008 Duan-Yu Chen 26

Demo – 1WP with more complex background

Page 27: ICME08 Visual Saliency - pdfs.semanticscholar.org · Detecting visual saliency from a video is considered a good way of understanding or annotating the video content. ICME 2008 Duan-Yu

ICME 2008 Duan-Yu Chen 27

Visual Salient Regions Enclosure Computed Using Exponential Entropy

The most appropriate scale Se for each region centered at the seed (xs,ys) of a motion attention map is defined by

( )( ) ( )( ){ }tyxeWtyxeHmaxargS ssssee ,,,,,, expexp ×=

( )( ) ( ) ( )( )∑∈

−×=Dv

tyxevtyxevss sssspexpptyxeH ,,,,,,,,exp 1,,,

( )( ) ( )( ) ( )( )tyxeHtyxeHtyxeW ssexpssexpss ,,,1,,,,,,exp −−=

Page 28: ICME08 Visual Saliency - pdfs.semanticscholar.org · Detecting visual saliency from a video is considered a good way of understanding or annotating the video content. ICME 2008 Duan-Yu

ICME 2008 Duan-Yu Chen 28

Experimental Results

(a) (b)

Qualitative EvaluationQuantitative Evaluation – Precision & Recall

Page 29: ICME08 Visual Saliency - pdfs.semanticscholar.org · Detecting visual saliency from a video is considered a good way of understanding or annotating the video content. ICME 2008 Duan-Yu

ICME 2008 Duan-Yu Chen 29

Demo

Page 30: ICME08 Visual Saliency - pdfs.semanticscholar.org · Detecting visual saliency from a video is considered a good way of understanding or annotating the video content. ICME 2008 Duan-Yu

ICME 2008 Duan-Yu Chen 30

Page 31: ICME08 Visual Saliency - pdfs.semanticscholar.org · Detecting visual saliency from a video is considered a good way of understanding or annotating the video content. ICME 2008 Duan-Yu

ICME 2008 Duan-Yu Chen 31

Demo (c.1)

Page 32: ICME08 Visual Saliency - pdfs.semanticscholar.org · Detecting visual saliency from a video is considered a good way of understanding or annotating the video content. ICME 2008 Duan-Yu

ICME 2008 Duan-Yu Chen 32

Page 33: ICME08 Visual Saliency - pdfs.semanticscholar.org · Detecting visual saliency from a video is considered a good way of understanding or annotating the video content. ICME 2008 Duan-Yu

ICME 2008 Duan-Yu Chen 33

Demo (c.2)

Page 34: ICME08 Visual Saliency - pdfs.semanticscholar.org · Detecting visual saliency from a video is considered a good way of understanding or annotating the video content. ICME 2008 Duan-Yu

ICME 2008 Duan-Yu Chen 34

Page 35: ICME08 Visual Saliency - pdfs.semanticscholar.org · Detecting visual saliency from a video is considered a good way of understanding or annotating the video content. ICME 2008 Duan-Yu

ICME 2008 Duan-Yu Chen 35

Performance Comparison[Mubarak Shah, et al.,” Visual Attention Detection in Video Sequences Using Spatiotemporal Cues,” ACM MM’06.]

Page 36: ICME08 Visual Saliency - pdfs.semanticscholar.org · Detecting visual saliency from a video is considered a good way of understanding or annotating the video content. ICME 2008 Duan-Yu

ICME 2008 Duan-Yu Chen 36

Page 37: ICME08 Visual Saliency - pdfs.semanticscholar.org · Detecting visual saliency from a video is considered a good way of understanding or annotating the video content. ICME 2008 Duan-Yu

ICME 2008 Duan-Yu Chen 37

Performance Comparison[Mubarak Shah, et al.,” Visual Attention Detection in Video Sequences Using Spatiotemporal Cues,” ACM MM’06.]

Page 38: ICME08 Visual Saliency - pdfs.semanticscholar.org · Detecting visual saliency from a video is considered a good way of understanding or annotating the video content. ICME 2008 Duan-Yu

ICME 2008 Duan-Yu Chen 38

Performance Comparison[Mubarak Shah, et al.,” Visual Attention Detection in Video Sequences Using Spatiotemporal Cues,” ACM MM’06.]

Page 39: ICME08 Visual Saliency - pdfs.semanticscholar.org · Detecting visual saliency from a video is considered a good way of understanding or annotating the video content. ICME 2008 Duan-Yu

ICME 2008 Duan-Yu Chen 39

Quantitative Evaluation

0 10 20 30 40 50 60 700.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Frame

Prec

isio

n

OursZhaiOurs MeanZhai Mean

Page 40: ICME08 Visual Saliency - pdfs.semanticscholar.org · Detecting visual saliency from a video is considered a good way of understanding or annotating the video content. ICME 2008 Duan-Yu

ICME 2008 Duan-Yu Chen 40

Quantitative Evaluation (cont.)

0 10 20 30 40 50 60 700.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Rec

all

Frame

OursZhaiOurs MeanZhai Mean

Page 41: ICME08 Visual Saliency - pdfs.semanticscholar.org · Detecting visual saliency from a video is considered a good way of understanding or annotating the video content. ICME 2008 Duan-Yu

ICME 2008 Duan-Yu Chen 41

Quantitative Evaluation (cont.)

Our method Zhai and Shah[3]

Precision 71.8% 57.9%

Recall 80.6% 53.8%

Page 42: ICME08 Visual Saliency - pdfs.semanticscholar.org · Detecting visual saliency from a video is considered a good way of understanding or annotating the video content. ICME 2008 Duan-Yu

ICME 2008 Duan-Yu Chen 42

Conclusion and Future Work

Visual Salient Maps IntegrationsDetect Salient Points to determine the boundary of a moving targetDetect the appropriate Extent of Salient Regions in the motion attention map and refine using the exponential entropy-based center-surround approach.

Future WorkAnnotate the extent of salient regions automatically in general video sequences.

Page 43: ICME08 Visual Saliency - pdfs.semanticscholar.org · Detecting visual saliency from a video is considered a good way of understanding or annotating the video content. ICME 2008 Duan-Yu

ICME 2008 Duan-Yu Chen 43

Object Annotation

Hierarchical FilteringOrientation Filters

Examples

0 90 180 2700

1

2

3

4

5

6

7

8

9

10x 10

5 Car

Orientation

Orie

ntat

ion

Ene

rgy

0 90 180 2700

0.5

1

1.5

2

2.5

3x 10

6

Orientation

Orie

ntat

ion

Ene

rgy

People

Page 44: ICME08 Visual Saliency - pdfs.semanticscholar.org · Detecting visual saliency from a video is considered a good way of understanding or annotating the video content. ICME 2008 Duan-Yu

ICME 2008 Duan-Yu Chen 44

Page 45: ICME08 Visual Saliency - pdfs.semanticscholar.org · Detecting visual saliency from a video is considered a good way of understanding or annotating the video content. ICME 2008 Duan-Yu

ICME 2008 Duan-Yu Chen 45