Ranking Metrics and Evaluation Measures Jie Yu, Qi Tian, Nicu Sebe Presented by Jie Yu.

34
Ranking Metrics Ranking Metrics and Evaluation and Evaluation Measures Measures Jie Yu, Qi Tian, Nicu Seb Jie Yu, Qi Tian, Nicu Seb e e Presented by Jie Yu Presented by Jie Yu

Transcript of Ranking Metrics and Evaluation Measures Jie Yu, Qi Tian, Nicu Sebe Presented by Jie Yu.

Ranking Metrics and Ranking Metrics and Evaluation MeasuresEvaluation Measures

Jie Yu, Qi Tian, Nicu SebeJie Yu, Qi Tian, Nicu Sebe

Presented by Jie YuPresented by Jie Yu

OutlineOutline

IntroductionIntroduction

Distance Metric AnalysisDistance Metric Analysis

Boosting Distance MetricsBoosting Distance Metrics

Experiments and AnalysisExperiments and Analysis

Conclusions and DiscussionsConclusions and Discussions

IntroductionIntroduction Similarity Evaluation:Similarity Evaluation:

• An organizing process in which individuals classify objects, form concepts, and make generalizations

• In automated programs, similarity evaluation can be achieved by ranking the distances between objects.

The similarity between two vectors is often determined The similarity between two vectors is often determined by computing the distance between them using certain by computing the distance between them using certain distance metric. distance metric.

• Euclidean Distance (Sum of Squared Distance, Euclidean Distance (Sum of Squared Distance, LL22))• Manhattan Distance (Sum of Absolute Distance - Manhattan Distance (Sum of Absolute Distance - LL11))

Which metric to use and why? Is there any better metric? Which metric to use and why? Is there any better metric?

Distance Metric and Maximum Distance Metric and Maximum LikelihoodLikelihood

Maximum likelihood theory allows us to relate a data distribution to a Maximum likelihood theory allows us to relate a data distribution to a distance metricdistance metric

• The problem of finding the right measure for the distance comes down to the maximization of the similarity probability.

Given a specific distribution, the distribution mean can be decided aGiven a specific distribution, the distribution mean can be decided according to Maximum Likelihood theoryccording to Maximum Likelihood theory

• is equivalent to is equivalent to

• M-Estimator: Any estimate M-Estimator: Any estimate defined by a minimization problem of the fodefined by a minimization problem of the form is called an M-estimate.rm is called an M-estimate.

• implying the additive distance modelimplying the additive distance model• In similarity estimation, the In similarity estimation, the could be could be the query data the query data yy. .

i

ixd )(min

i

ixd )(min )ˆ|(max XP

Distance Metric and Maximum Distance Metric and Maximum LikelihoodLikelihood

Given a specific distribution, the distance metric Given a specific distribution, the distance metric dd can be decided.can be decided.

According to Maximal Likelihood the M-estimator According to Maximal Likelihood the M-estimator can be solved by can be solved by

=>=>

• Exponential distribution:Exponential distribution:• LL1 1 => Median=> Median

• Gaussian distribution:Gaussian distribution:• LL2 2 => Arithmetic Mean=> Arithmetic Mean

0)ˆ,(ˆ

xdd

d)ˆ,(min xd

Distance Metric and Maximum Distance Metric and Maximum LikelihoodLikelihood

New distance metrics are found by doing New distance metrics are found by doing reverse engineering on harmonic and reverse engineering on harmonic and geometric mean estimation. geometric mean estimation.

The distance function can be derived from The distance function can be derived from a mean estimation given the restriction a mean estimation given the restriction that that

0),( d

Distance Metric and Maximum Distance Metric and Maximum LikelihoodLikelihood

Further study on generalized harmonic and geometric mean gives us more distance metrics

Distance Metric and Maximum Distance Metric and Maximum LikelihoodLikelihood

Comparison of Distance FunctionsComparison of Distance Functions

Robust Distance MetricRobust Distance Metric For a specific distribution a distance metric may fit it best.For a specific distribution a distance metric may fit it best.

In content-based image retrieval feature elements are extracted for In content-based image retrieval feature elements are extracted for different statistical properties associated with entire digital images, different statistical properties associated with entire digital images, or perhaps with specific region of interest, or perhaps with specific region of interest, e.ge.g.: .:

• Color : color moment and color histogram FeatureColor : color moment and color histogram Feature• Texture : wavelet featureTexture : wavelet feature• Shape: water-filling featureShape: water-filling feature

Images are represented by a concatenated feature vector of Images are represented by a concatenated feature vector of different feature elements.different feature elements.

• Distance between two image feature vectors are often evaluated by Distance between two image feature vectors are often evaluated by isotropic distances such as isotropic distances such as LL11 or or L L22..

• Assumption: The feature elements from heterogeneous sources have Assumption: The feature elements from heterogeneous sources have the same distribution, e.g. Gaussian. the same distribution, e.g. Gaussian.

• The above assumption is often inappropriate. The above assumption is often inappropriate.

Robust Distance MetricRobust Distance MetricRelated WorkRelated Work

Mahananobis distanceMahananobis distance

• d= (xd= (xii-y-yii))TTW(xW(xii-y-yii) ) • Solving for Solving for WW involves estimation of involves estimation of dd22 parameters parameters• Assumes Gaussian distributionAssumes Gaussian distribution• Sensitive to sample set sizeSensitive to sample set size

Relevant Component Analysis (RCA)Relevant Component Analysis (RCA)• Applying the estimated distance along with equivalenc

e constraints• Assumes Gaussian distributionAssumes Gaussian distribution

Boosting Distance MetricBoosting Distance Metric

An anisotropic and heterogeneous distance metrAn anisotropic and heterogeneous distance metric may be more suitable for estimating the similaic may be more suitable for estimating the similarity between features. rity between features.

we propose a boosted distance metric for similarwe propose a boosted distance metric for similarity estimation where similarity function for certain ity estimation where similarity function for certain class of samples can be estimated by a generaliclass of samples can be estimated by a generalization of different distance metrics on selected fzation of different distance metrics on selected feature elements. eature elements. • In particular, we use AdaBoost with decision stumps aIn particular, we use AdaBoost with decision stumps a

nd our distance metric analysis to estimate the similarind our distance metric analysis to estimate the similarity. ty.

Boosting Distance MetricBoosting Distance Metric Training set Training set DD consists of pair-wise distance vector consists of pair-wise distance vector dd bet bet

ween two samples given a specific distance metric ween two samples given a specific distance metric mm..

The corresponding label for the distance vector is defineThe corresponding label for the distance vector is defined as follows:d as follows:

A weak classifier is defined by a distance metric A weak classifier is defined by a distance metric mm on a on a feature element feature element ff with estimated parameter(s) with estimated parameter(s) θθ

otherwise 0

class same from are x and x if 1 jidl

}1,0{)(,, dh fm

Boosting Distance Metric AlgorithmBoosting Distance Metric Algorithm

Boosting Distance Metric AlgorithmBoosting Distance Metric Algorithm

Theoretical AnalysisTheoretical Analysis

Advantages:Advantages:

• The similarity estimation only uses a small set of elements that is most The similarity estimation only uses a small set of elements that is most useful for similarity estimation.useful for similarity estimation.

• For each element the distance metric that best fits its distribution is For each element the distance metric that best fits its distribution is learnt.learnt.

• it adds effectiveness and robustness to the classifier when we have a it adds effectiveness and robustness to the classifier when we have a small training set compared to the number of dimensions.small training set compared to the number of dimensions.

Since the training iteration Since the training iteration TT is usually much smaller than the is usually much smaller than the original data dimension, the boosted distance metric works as a original data dimension, the boosted distance metric works as a non-linear dimension reduction technique, which keeps the most non-linear dimension reduction technique, which keeps the most important elements to similarity judgment. It could be very helpful to important elements to similarity judgment. It could be very helpful to overcome the small sample set problem. overcome the small sample set problem.

The proposed method is general and can be plugged into many The proposed method is general and can be plugged into many similarity estimation techniques, such as widely used similarity estimation techniques, such as widely used KK-NN. -NN.

Experiments and AnalysisExperiments and Analysis

Are the new metrics better than Are the new metrics better than LL11 and and LL22??

• Stereo matchingStereo matching• Motion trackingMotion tracking

Is boosted distance metric better than Is boosted distance metric better than single distance metric?single distance metric?• Benchmark datasetsBenchmark datasets• Image retrievalImage retrieval

Experiments and Analysis:Experiments and Analysis:Stereo MatchingStereo Matching

Stereo matching: to find correspondences between entities in images with Stereo matching: to find correspondences between entities in images with overlapping scene content. overlapping scene content.

• The images are taken from cameras at different viewpoints.The images are taken from cameras at different viewpoints.• The intensities of corresponding pixels are different. The intensities of corresponding pixels are different. • A stereo matcher based on the distance between different image regions is implemented. A stereo matcher based on the distance between different image regions is implemented. • The optimal metric in this case will give the most accurate stereo matching performance. The optimal metric in this case will give the most accurate stereo matching performance.

Two standard stereo data sets, Castle and Tower, from Carnegie Mellon University Two standard stereo data sets, Castle and Tower, from Carnegie Mellon University are used for training and testing. are used for training and testing.

28 reference points in each frame

Experiments and Analysis:Experiments and Analysis:Stereo MatchingStereo Matching

Distance-based Stereo MatcherDistance-based Stereo Matcher

• ObjectiveObjective: To match a template (5X5) defined around one point from an image with the : To match a template (5X5) defined around one point from an image with the templates around points in the other images in order to find similarity templates around points in the other images in order to find similarity

• We search the templates centered at a 7X7 zone around the reference points. We search the templates centered at a 7X7 zone around the reference points. • Using the distance metrics we discussed, we obtain sum of distances between pixel values Using the distance metrics we discussed, we obtain sum of distances between pixel values

of two templates. of two templates. • If the reference point is in the predicted template in the next frame, we consider that we If the reference point is in the predicted template in the next frame, we consider that we

have a have a hithit, otherwise, we have a , otherwise, we have a missmiss..

Experiments and Analysis:Experiments and Analysis:Stereo MatchingStereo Matching

Chi-Square TestChi-Square Test• To verify if the ground truth distance matches the modeled distribution most To verify if the ground truth distance matches the modeled distribution most

accuratelyaccurately• Chi-Square tests are run on the distances between ground truth and Chi-Square tests are run on the distances between ground truth and

predicted templates.predicted templates.• The smaller the value the more similar are the distributions.The smaller the value the more similar are the distributions.

The parameters The parameters pp, , qq and and r r for generalized harmonic and geometric distance are for generalized harmonic and geometric distance are tested from -5 to 5 with step size 0.1tested from -5 to 5 with step size 0.1

Generalized geometric (0.0239)

L2 (0.0378)

Experiments and Analysis:Experiments and Analysis:Stereo MatchingStereo Matching

L1 and L2 are outperformed by several metrics.

The generalized geometric distance gives the best accuracy.

According to Chi-Square test it also fits the distribution better.

Experiments and Analysis:Experiments and Analysis:Motion TrackingMotion Tracking

In this experiment distance metric analysis is tested on motion tracking application. In this experiment distance metric analysis is tested on motion tracking application. • To tracing moving facial expressions.To tracing moving facial expressions.• A video sequence containing 19 images on a moving head in a static background A video sequence containing 19 images on a moving head in a static background

• For each image in this video sequence, there are 14 points given as ground truth. For each image in this video sequence, there are 14 points given as ground truth. • The motion tracking algorithm between the test frame and another frame performs The motion tracking algorithm between the test frame and another frame performs

template matching to find the best match in a template around a central pixel. template matching to find the best match in a template around a central pixel. • In searching for the corresponding pixel, we examine a region of width and the In searching for the corresponding pixel, we examine a region of width and the

height of 7 pixels centered at the position of the pixel in the previous frame. height of 7 pixels centered at the position of the pixel in the previous frame.

Experiments and Analysis:Experiments and Analysis:Motion TrackingMotion Tracking

This figure shows the prediction errors vs. the step between frames compared.

L1 and L2 are not the best choice.

Generalized geometric distance gives the best performance.

Boosted Distance Metric on Boosted Distance Metric on Benchmark DatabaseBenchmark Database

we compare the performance of our boosted distwe compare the performance of our boosted distance metric with several well-known traditional ance metric with several well-known traditional metrics. metrics.

• LL11,L,L22,RCA,Mah, Mah-C,RCA,Mah, Mah-C• The last 3 needs parameter estimation.The last 3 needs parameter estimation.

• RCA-CDRCA-CD, , Mah-DMah-D, , Mah-CDMah-CD• Diagonal matrix is used to avoid small sample size problemDiagonal matrix is used to avoid small sample size problem

• Simple AdaBoost with Decision Stump and C4.5Simple AdaBoost with Decision Stump and C4.5

Boosted Distance Metric on Boosted Distance Metric on Benchmark DatabaseBenchmark Database

Database: 15 data sets Database: 15 data sets from UCI repositoryfrom UCI repository

The first 4 datasets have The first 4 datasets have large dimensionality.large dimensionality.

20% of data used as 20% of data used as training while 80% as training while 80% as testing.testing.

Average on results of 100 Average on results of 100 runs.runs.

Boosted Distance Metric on Boosted Distance Metric on Benchmark DatabaseBenchmark Database

For traditional metrics only the one gives the best performance is shown in the table.

Boosted Distance Metric on Boosted Distance Metric on Benchmark DatabaseBenchmark Database

Our proposed error metric performs the best on 13 out of 15 data sets.

Boosted Distance Metric in Image Boosted Distance Metric in Image RetrievalRetrieval

The boosted distance metric performs an element The boosted distance metric performs an element selection that is highly discriminant for similarity selection that is highly discriminant for similarity estimation estimation • it doesn’t suffer from the small sample set problem as LDA and it doesn’t suffer from the small sample set problem as LDA and

other dimension reduction techniques. other dimension reduction techniques.

To evaluate the performance, we tested the boosted To evaluate the performance, we tested the boosted distance metric on image classification against some distance metric on image classification against some state-of-the-art dimension reduction techniques: state-of-the-art dimension reduction techniques: • PCA, LDA, NDA and plain Euclidean distance in the original PCA, LDA, NDA and plain Euclidean distance in the original

feature space using simple NN classifierfeature space using simple NN classifier

Boosted Distance Metric in Image Boosted Distance Metric in Image RetrievalRetrieval

Two data sets are used:Two data sets are used:

• A subset of MNIST data set A subset of MNIST data set containing similar hand-written containing similar hand-written 1’s and 7’s 1’s and 7’s

• A gender recognition database A gender recognition database containing facial images from containing facial images from the AR database and the the AR database and the XM2TVS databaseXM2TVS database

• The dimension of the feature The dimension of the feature for both databases is 784 for both databases is 784 while the size of training set is while the size of training set is fixed to 200 which is small fixed to 200 which is small compared to the compared to the dimensionality of feature.dimensionality of feature.

Boosted Distance Metric in Image Boosted Distance Metric in Image RetrievalRetrieval

For the boosted distance metric, iteration T is equivalent to the dimensionality we reduced to.

DiscussionsDiscussions Our main contribution is to provide a general guideline Our main contribution is to provide a general guideline

for designing robust distance estimation that could adapt for designing robust distance estimation that could adapt data distributions automatically. data distributions automatically.

Novel distance metrics deriving from harmonic, Novel distance metrics deriving from harmonic, geometric mean and their generalized forms are geometric mean and their generalized forms are presented and discussed. presented and discussed.

• The creative component of our work is to start from an estimator The creative component of our work is to start from an estimator and perform reverse engineering to obtain a metric. and perform reverse engineering to obtain a metric.

• Some of the proposed metrics cannot be translated into a known Some of the proposed metrics cannot be translated into a known probabilistic model probabilistic model

DiscussionsDiscussions In similarity estimation the feature elements are often In similarity estimation the feature elements are often

from heterogeneous sources. The assumption that the from heterogeneous sources. The assumption that the feature has a unified isotropic distribution is invalid. feature has a unified isotropic distribution is invalid.

• Unlike traditional anisotropic distance metric, our proposed Unlike traditional anisotropic distance metric, our proposed method does not make any assumption on the feature method does not make any assumption on the feature distribution. Instead it learns distance metrics on each element distribution. Instead it learns distance metrics on each element to capture the underlying feature structure. to capture the underlying feature structure.

We examined the new metrics on several applications in We examined the new metrics on several applications in computer vision, and the estimation of similarity can be computer vision, and the estimation of similarity can be significantly improved by the proposed distance metric significantly improved by the proposed distance metric analysis. analysis.

ReferenceReference

1.1. M. Zakai, “General distance criteria,” M. Zakai, “General distance criteria,” IEEE Trans. on Information TheoryIEEE Trans. on Information Theory, pp. 94-9, pp. 94-95, January 1964.5, January 1964.

2.2. M. Swain and D. Ballard, “Color indexing,” M. Swain and D. Ballard, “Color indexing,” Intl. Journal Computer Vision,Intl. Journal Computer Vision, vol. 7, no. vol. 7, no.1, pp. 11-32, 1991.1, pp. 11-32, 1991.

3.3. R. M. Haralick, et al, “Texture features for image classification,” R. M. Haralick, et al, “Texture features for image classification,” IEEE Trans. on SyIEEE Trans. on Sys. Man and Cyb.s. Man and Cyb., 1990., 1990.

4.4. B. M. Mehtre, et al., “Shape measures for content based image retrieval: a comparB. M. Mehtre, et al., “Shape measures for content based image retrieval: a comparison”, ison”, Information Proc. ManagementInformation Proc. Management, 33(3):319-337, 1997., 33(3):319-337, 1997.

5.5. R. Haralick and L. Shapiro, R. Haralick and L. Shapiro, Computer and Robot Vision IIComputer and Robot Vision II, Addison-Wesley, 1993., Addison-Wesley, 1993.

6.6. R. E. Schapire, Y. Singer, “Improved boosting using confidence-rated predictions,” R. E. Schapire, Y. Singer, “Improved boosting using confidence-rated predictions,” Machine LearningMachine Learning 37 (3) (1999) 297–336. 37 (3) (1999) 297–336.

7.7. C. Domeniconi, J. Peng, D. Gunopulos, “Locally adaptive metric nearestneighbor cC. Domeniconi, J. Peng, D. Gunopulos, “Locally adaptive metric nearestneighbor classification,” lassification,” IEEE Trans. PAMIIEEE Trans. PAMI 24 (9) (2002) 1281–1285. 24 (9) (2002) 1281–1285.

8.8. J. Peng, et al., “LDA/SVM driven nearest neighbor classification,” J. Peng, et al., “LDA/SVM driven nearest neighbor classification,” IEEE Proc. CVPIEEE Proc. CVPRR, 2001, pp. 940–942., 2001, pp. 940–942.

9.9. E. P. Xing, A. Y. Ng, M. I. Jordan, S. Russell, “Distance metric learning, with applicE. P. Xing, A. Y. Ng, M. I. Jordan, S. Russell, “Distance metric learning, with application to clustering with side-information.”, ation to clustering with side-information.”, Proc. NIPSProc. NIPS, 2003, pp. 505–512, 2003, pp. 505–512

ReferenceReference

10.10. A. Bar-Hillel, T. Hertz, N. Shental, D. Weinshall, “Learning distance functions using A. Bar-Hillel, T. Hertz, N. Shental, D. Weinshall, “Learning distance functions using equivalence relations,” equivalence relations,” Proc. ICML, Proc. ICML, 2003, pp. 11–18.2003, pp. 11–18.

11.11. T. Hertz, A. Bar-Hillel, D. Weinshall, “Learning distance functions for image retrievT. Hertz, A. Bar-Hillel, D. Weinshall, “Learning distance functions for image retrieval,” al,” IEEE Proc. CVPRIEEE Proc. CVPR, 2004, pp. 570–577., 2004, pp. 570–577.

12.12. P. J. Huber, P. J. Huber, Robust StatisticsRobust Statistics, John Wiley & Sons, 1981., John Wiley & Sons, 1981.

13.13. L. Tang, et al., “Performance evaluation of a facial feature tracking algorithm,” L. Tang, et al., “Performance evaluation of a facial feature tracking algorithm,” Proc.Proc. NSF/ARPA Workshop: Performance vs. Methodology in Computer Vision NSF/ARPA Workshop: Performance vs. Methodology in Computer Vision , 1994., 1994.

14.14. Y. LeCun, et al., MNIST database, http://yann.lecun.com/exdb/mnist/.Y. LeCun, et al., MNIST database, http://yann.lecun.com/exdb/mnist/.

15.15. A. Martinez, R. Benavente, A. Martinez, R. Benavente, “The AR face database,” “The AR face database,” Tech. Rep. 24, Computer VisiTech. Rep. 24, Computer Vision Center (1998).on Center (1998).

16.16. J. Matas, et al , “Comparison of face verification results on the xm2vts database,” J. Matas, et al , “Comparison of face verification results on the xm2vts database,” Proc. ICPRProc. ICPR, 1999, pp. 858–863. , 1999, pp. 858–863.