How to Evaluate Foreground Maps ? CVPR2014 Poster.

28
How to Evaluate Foreground Maps ? CVPR2014 Poster

Transcript of How to Evaluate Foreground Maps ? CVPR2014 Poster.

Page 1: How to Evaluate Foreground Maps ? CVPR2014 Poster.

How to Evaluate Foreground Maps ?

CVPR2014 Poster

Page 2: How to Evaluate Foreground Maps ? CVPR2014 Poster.

Outline

Introduction Limitation of Current Measures Solution Experiment Conclusions

Page 3: How to Evaluate Foreground Maps ? CVPR2014 Poster.

Introduction

The comparison of a foreground map against a binary ground-truth is common in various computer-vision problems salient object detection object segmentation foreground-extraction

Several measures have been suggested to evaluate the accuracy of these foreground maps. AUC measure AP measure F-measure PASCAL

Page 4: How to Evaluate Foreground Maps ? CVPR2014 Poster.

Introduction But the most commonly-used measures for

evaluating both non-binary maps and binary maps do not always provide a reliable evaluation.

Page 5: How to Evaluate Foreground Maps ? CVPR2014 Poster.

Introduction

Our contributions:1. Identifying three assumptions in

commonly-used measures.2. We proceed to amend each of these flaws

and to suggest a novel measure that evaluates foreground maps at an increased accuracy .

3. Proposing four meta-measures to analyze the performance of evaluation measures.

Page 6: How to Evaluate Foreground Maps ? CVPR2014 Poster.

Introduction

Two appealing properties of our measure are:1. being a generalization of the FB –measure2. providing a unified evaluation to both binary and

non-binary maps.

Page 7: How to Evaluate Foreground Maps ? CVPR2014 Poster.

Limitation of Current Measures

Three flawed assumptions :

1. Interpolation flaw

2. Dependency flaw

3. Equal-important flaw

Page 8: How to Evaluate Foreground Maps ? CVPR2014 Poster.

Limitation of Current Measures

Current Evaluation Measures Evaluation of binary maps:

4 basic quantities :

1. TP (true-positive)

2. TN (true-negative)

3. FP (false-positive)

4. FN (false-negative)

Page 9: How to Evaluate Foreground Maps ? CVPR2014 Poster.

Limitation of Current Measures

Current Evaluation Measures Evaluation of binary maps:

Common score :

TPR=

FPR=

Page 10: How to Evaluate Foreground Maps ? CVPR2014 Poster.

Limitation of Current Measures

Current Evaluation Measures Evaluation of non-binary maps:

AUC (Area-Under-the-Curve) AP (Average-Precision)

Image Source: http://zh.wikipedia.org/wiki/File:Curvas.png

Page 11: How to Evaluate Foreground Maps ? CVPR2014 Poster.

Interpolation flaw

The source of the interpolation flaw is the thresholding of the non-binary maps.

Page 12: How to Evaluate Foreground Maps ? CVPR2014 Poster.

Dependency flaw

dependency between false-negatives

Page 13: How to Evaluate Foreground Maps ? CVPR2014 Poster.

Equal-important flaw

the location of the false-positives

Page 14: How to Evaluate Foreground Maps ? CVPR2014 Poster.

Solution

Resolving the Interpolation Flaw Resolving the Dependency Flaw & the Equal-

Importance Flaw The New Measure – -measure

Page 15: How to Evaluate Foreground Maps ? CVPR2014 Poster.

Resolving the Interpolation Flaw

The key idea is to extend the four basic quantities: TP, TN, FP and FN , to deal with non-binary values.

G1xN : the column-stack representation of the binary ground-truth, where N is the number of pixels in the image.

D1xN : the non-binary map to be evaluated against the ground-truth.

Page 16: How to Evaluate Foreground Maps ? CVPR2014 Poster.

Resolving the Interpolation Flaw

For binary map, pixel i correct G[ i ] = D[ i ] incorrect G[ i ] ≠ D[ i ]

For non-binary

Page 17: How to Evaluate Foreground Maps ? CVPR2014 Poster.

Resolving the Dependency Flaw & the Equal-Importance Flaw

Assumptions deal with detection errors. Our key idea is to attribute different importance

to different errors.

Reformulate the basic quantities:

Page 18: How to Evaluate Foreground Maps ? CVPR2014 Poster.

Resolving the Dependency Flaw & the Equal-Importance Flaw

We suggest applying a weighting function to the errors.

ANxN : captures the dependency between pixels

BNx1 : represents the varying importance of the pixels

Page 19: How to Evaluate Foreground Maps ? CVPR2014 Poster.

Resolving the Dependency Flaw & the Equal-Importance Flaw

Page 20: How to Evaluate Foreground Maps ? CVPR2014 Poster.

Resolving the Dependency Flaw & the Equal-Importance Flaw

Reformulate the basic quantities with weight:

Page 21: How to Evaluate Foreground Maps ? CVPR2014 Poster.

The New Measure – -measure

Having dealt with all three flaws, we proceed to construct our evaluation measure.

Page 22: How to Evaluate Foreground Maps ? CVPR2014 Poster.

Experiments

Meta-measure :

1. The ranking of an evaluation measure should agree with the preferences of an application that uses the map as input.

2. A measure should prefer a good result by an algorithm that considers the content of the image, over an arbitrary map.

Page 23: How to Evaluate Foreground Maps ? CVPR2014 Poster.

Experiments

meta-measure :

3. The score of a map should decrease when using a wrong ground-truth map.

4. The ranking of an evaluation measure should not be sensitive to inaccuracies in the manually marked boundaries in the ground-truth maps.

Page 24: How to Evaluate Foreground Maps ? CVPR2014 Poster.

Experiments :Meta-measure(1)

Application Ranking

Page 25: How to Evaluate Foreground Maps ? CVPR2014 Poster.

Experiments :Meta-measure(2)

State-of-art vs. Generic

Page 26: How to Evaluate Foreground Maps ? CVPR2014 Poster.

Experiments :Meta-measure(3)

Ground-truth Switch

Page 27: How to Evaluate Foreground Maps ? CVPR2014 Poster.

Experiments :Meta-measure(4)

Annotation errors

Page 28: How to Evaluate Foreground Maps ? CVPR2014 Poster.

Conclusions

We analyzed the currently-used evaluation measures that suffer from three flawed assumptions: interpolation, dependency and equal-importance.

We suggested an evaluation measure that amends these assumptions, and it offers a unified solution to the evaluation of non-binary and binary maps.

The advantages of our measure were shown via four different meta-measures, both qualitatively and quantitatively.