rips-hk-lenovo (1)

Post on 13-Apr-2017

161 views 1 download

Transcript of rips-hk-lenovo (1)

Creation and Optimization of a LogoRecognition System

Haozhi Qi, Owen Richfield, Xiaohui Zeng, Michael Zhao

Academic Mentor: Dr. Albert KuIndustrial Mentor: Mr. Sun Lin

August 6, 2015

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Problem Description

Problem: What if there was anapp that could provide asmartphone user withinformation about a companyjust by recognizing thatcompany’s logo in an image?Goal: Create this app.

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Outline

� Model Introduction

I Bag of Features ModelI Convolutional Neural Network

� Model Testing and Results� Application Demonstration� Conclusions and Future Work

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Outline

� Model IntroductionI Bag of Features Model

I Convolutional Neural Network

� Model Testing and Results� Application Demonstration� Conclusions and Future Work

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Outline

� Model IntroductionI Bag of Features ModelI Convolutional Neural Network

� Model Testing and Results� Application Demonstration� Conclusions and Future Work

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Outline

� Model IntroductionI Bag of Features ModelI Convolutional Neural Network

� Model Testing and Results

� Application Demonstration� Conclusions and Future Work

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Outline

� Model IntroductionI Bag of Features ModelI Convolutional Neural Network

� Model Testing and Results� Application Demonstration

� Conclusions and Future Work

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Outline

� Model IntroductionI Bag of Features ModelI Convolutional Neural Network

� Model Testing and Results� Application Demonstration� Conclusions and Future Work

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Bag of Features Model

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Bag of Features Model

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Bag of Features Model

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Bag of Features Model

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Bag of Features Model

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Bag of Features Model

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Feature Extraction

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Feature Extraction and description: SURF

� Interest points detection

I Rotational and scale-invariant features� Interest points description

I Good representation form of image

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Feature Extraction and description: SURF

� Interest points detectionI Rotational and scale-invariant features

� Interest points description

I Good representation form of image

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Feature Extraction and description: SURF

� Interest points detectionI Rotational and scale-invariant features

� Interest points description

I Good representation form of image

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Feature Extraction and description: SURF

� Interest points detectionI Rotational and scale-invariant features

� Interest points descriptionI Good representation form of image

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Feature Extraction and description: SURF

� Interest points detectionI Rotational and scale-invariant features

� Interest points descriptionI Good representation form of image

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

SURF: Interest points detection

Use determinant of Hessian to detect blob-like structure

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

SURF: Interest points detection

Use determinant of Hessian to detect blob-like structure

Use box filter to approximate the second order derivative of Gaussian filter

Second-order box filter

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

SURF: Interest points detection

Use determinant of Hessian to detect blob-like structure

Use box filter to approximate the second order derivative of Gaussian filter

Taking advantages of integral image

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

SURF: Interest points detection

Use determinant of Hessian to detect blob-like structure

Use box filter to approximate the second order derivative of Gaussian filter

Taking advantages of integral domain

Apply scale-space analysis to choosethe appropriate points scale

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

SURF: Interest points description

Calculate dominant orientation based on Haar wavelet analysis

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

SURF: Interest points description

Calculate dominant orientation based on Haar wavelet analysis

Build 4*4 descriptor

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

BOW Training

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Feature Vector Clustering

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Basics of K-means

� Clustering Method in N -dimensional Space

� Algorithmic Steps:

I With a given set of data, choose k cluster centersI Calculate distances between each data point and each

clusterI Cluster points based on min distanceI Recalculate cluster centers:

vi =1

ci

ci∑j=1

xj

I vi=new cluster center, ci=number of data points in ith

cluster, xj=jth data point in ith cluster.

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Basics of K-means

� Clustering Method in N -dimensional Space� Algorithmic Steps:

I With a given set of data, choose k cluster centersI Calculate distances between each data point and each

clusterI Cluster points based on min distanceI Recalculate cluster centers:

vi =1

ci

ci∑j=1

xj

I vi=new cluster center, ci=number of data points in ith

cluster, xj=jth data point in ith cluster.

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Basics of K-means

� Clustering Method in N -dimensional Space� Algorithmic Steps:

I With a given set of data, choose k cluster centers

I Calculate distances between each data point and eachcluster

I Cluster points based on min distanceI Recalculate cluster centers:

vi =1

ci

ci∑j=1

xj

I vi=new cluster center, ci=number of data points in ith

cluster, xj=jth data point in ith cluster.

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Basics of K-means

� Clustering Method in N -dimensional Space� Algorithmic Steps:

I With a given set of data, choose k cluster centersI Calculate distances between each data point and each

cluster

I Cluster points based on min distanceI Recalculate cluster centers:

vi =1

ci

ci∑j=1

xj

I vi=new cluster center, ci=number of data points in ith

cluster, xj=jth data point in ith cluster.

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Basics of K-means

� Clustering Method in N -dimensional Space� Algorithmic Steps:

I With a given set of data, choose k cluster centersI Calculate distances between each data point and each

clusterI Cluster points based on min distance

I Recalculate cluster centers:

vi =1

ci

ci∑j=1

xj

I vi=new cluster center, ci=number of data points in ith

cluster, xj=jth data point in ith cluster.

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Basics of K-means

� Clustering Method in N -dimensional Space� Algorithmic Steps:

I With a given set of data, choose k cluster centersI Calculate distances between each data point and each

clusterI Cluster points based on min distanceI Recalculate cluster centers:

vi =1

ci

ci∑j=1

xj

I vi=new cluster center, ci=number of data points in ith

cluster, xj=jth data point in ith cluster.

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Basics of K-means

� Clustering Method in N -dimensional Space� Algorithmic Steps:

I With a given set of data, choose k cluster centersI Calculate distances between each data point and each

clusterI Cluster points based on min distanceI Recalculate cluster centers:

vi =1

ci

ci∑j=1

xj

I vi=new cluster center, ci=number of data points in ith

cluster, xj=jth data point in ith cluster.

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

K-means Clustering

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Hierarchical K-means

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Bag of Words and Hierarchical K-means

FEATURE VECTORS

CL.

CL. CL. CL.

CL.

CL. CL.

CL.

CL. CL.

CL.

CL. CL.

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Bag of Words and Hierarchical K-means

CL.

CL. CL. CL.

CL.

CL. CL.

CL.

CL. CL.

CL.

CL. CL.

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Bag of Words and Hierarchical K-means

CL.

CL. CL. CL.

CL.

CL. CL.

CL.

CL. CL.

CL.

CL. CL.

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Bag of Words and Hierarchical K-means

CL.

CL. CL. CL.

CL.

CL. CL.

CL.

CL. CL.

CL.

CL. CL.

X

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Bag of Words and Hierarchical K-means

CL.

CL. CL. CL.

CL.

CL. CL.

CL.

CL. CL.

CL.

CL. CL.

X X

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Bag of Words and Hierarchical K-means

CL.

CL. CL. CL.

CL.

CL. CL.

CL.

CL. CL.

CL.

CL. CL.

X XXXXXXX X X

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Bag of Words and Hierarchical K-means

word1

word2

word3

word4

word5

0

2

4

6

8

3

8

2

5

1matches

1

;

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Bag of Features Model

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Inverted File Index

� word 1:� word 2� word 3� word 4� word 5� word 6� ...

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Inverted File Index

� word 1: image 1, image 3, image 5, ...� word 2: image 4, image 9, image 16, ...� word 3: image 4, image 12, image 13, ...� word 4: image 1, image 5, image 7, ...� word 5: image 2, image 3, image 9, ...� word 6: image 7, image 12, image 17, ...� ...

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Classification: Inverted File Index

� Benefit: retrieval via the inverted file is faster thansearching every image

� Drawback: lack of spatial accuracy

� Need additional verification to re-rank the retrieval images

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Classification: Inverted File Index

� Benefit: retrieval via the inverted file is faster thansearching every image

� Drawback: lack of spatial accuracy

� Need additional verification to re-rank the retrieval images

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Classification: Inverted File Index

� Benefit: retrieval via the inverted file is faster thansearching every image

� Drawback: lack of spatial accuracy

� Need additional verification to re-rank the retrieval images

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Bag of Features Model

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Re-ranking of Return Images

� Match descriptors of query image to descriptors in imagesin returned list.

� Simple Algorithm:

I Match each descriptor in query image to its nearestneighbor descriptor from list image.

I Compare L2 norm of the pair to the norm of the querydescriptor and every other descriptor in list image.

I If original norm is significantly smaller, count as “match”.I Sum up number of “matches” for each list image and divide

by total number of features.

� The returned list is then re-ranked based on this “matchratio” and returned to the user.

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Re-ranking of Return Images

� Match descriptors of query image to descriptors in imagesin returned list.

� Simple Algorithm:

I Match each descriptor in query image to its nearestneighbor descriptor from list image.

I Compare L2 norm of the pair to the norm of the querydescriptor and every other descriptor in list image.

I If original norm is significantly smaller, count as “match”.I Sum up number of “matches” for each list image and divide

by total number of features.

� The returned list is then re-ranked based on this “matchratio” and returned to the user.

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Re-ranking of Return Images

� Match descriptors of query image to descriptors in imagesin returned list.

� Simple Algorithm:I Match each descriptor in query image to its nearest

neighbor descriptor from list image.

I Compare L2 norm of the pair to the norm of the querydescriptor and every other descriptor in list image.

I If original norm is significantly smaller, count as “match”.I Sum up number of “matches” for each list image and divide

by total number of features.

� The returned list is then re-ranked based on this “matchratio” and returned to the user.

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Re-ranking of Return Images

� Match descriptors of query image to descriptors in imagesin returned list.

� Simple Algorithm:I Match each descriptor in query image to its nearest

neighbor descriptor from list image.I Compare L2 norm of the pair to the norm of the query

descriptor and every other descriptor in list image.

I If original norm is significantly smaller, count as “match”.I Sum up number of “matches” for each list image and divide

by total number of features.

� The returned list is then re-ranked based on this “matchratio” and returned to the user.

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Re-ranking of Return Images

� Match descriptors of query image to descriptors in imagesin returned list.

� Simple Algorithm:I Match each descriptor in query image to its nearest

neighbor descriptor from list image.I Compare L2 norm of the pair to the norm of the query

descriptor and every other descriptor in list image.I If original norm is significantly smaller, count as “match”.

I Sum up number of “matches” for each list image and divideby total number of features.

� The returned list is then re-ranked based on this “matchratio” and returned to the user.

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Re-ranking of Return Images

� Match descriptors of query image to descriptors in imagesin returned list.

� Simple Algorithm:I Match each descriptor in query image to its nearest

neighbor descriptor from list image.I Compare L2 norm of the pair to the norm of the query

descriptor and every other descriptor in list image.I If original norm is significantly smaller, count as “match”.I Sum up number of “matches” for each list image and divide

by total number of features.

� The returned list is then re-ranked based on this “matchratio” and returned to the user.

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Re-ranking of Return Images

� Match descriptors of query image to descriptors in imagesin returned list.

� Simple Algorithm:I Match each descriptor in query image to its nearest

neighbor descriptor from list image.I Compare L2 norm of the pair to the norm of the query

descriptor and every other descriptor in list image.I If original norm is significantly smaller, count as “match”.I Sum up number of “matches” for each list image and divide

by total number of features.

� The returned list is then re-ranked based on this “matchratio” and returned to the user.

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Convolutional NeuralNetworks (CNNs)

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Neural Networks

Figure: Neural network from http://www.texample.net/media/tikz/examples/PNG/neural-network.png

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Convolutional Neural Networks

Convolutional neural networks are neural networks with anadditional biological inspiration.

Each layer is of two basictypes: convolution and pooling.

� Convolution is the process of convolving an image with akernel. This idea comes from image processing where ithas been used for things like edge detection. Here, wewant to learn kernels specific to the data.

� Pooling refers to the process of providing a statisticalsummary of the outputs of several nearby “neurons”, e.g.by taking an average or max.

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Convolutional Neural Networks

Convolutional neural networks are neural networks with anadditional biological inspiration. Each layer is of two basictypes: convolution and pooling.� Convolution is the process of convolving an image with a

kernel. This idea comes from image processing where ithas been used for things like edge detection. Here, wewant to learn kernels specific to the data.

� Pooling refers to the process of providing a statisticalsummary of the outputs of several nearby “neurons”, e.g.by taking an average or max.

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Convolutional Neural Networks

Convolutional neural networks are neural networks with anadditional biological inspiration. Each layer is of two basictypes: convolution and pooling.� Convolution is the process of convolving an image with a

kernel. This idea comes from image processing where ithas been used for things like edge detection. Here, wewant to learn kernels specific to the data.

� Pooling refers to the process of providing a statisticalsummary of the outputs of several nearby “neurons”, e.g.by taking an average or max.

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Figure: Description of convolution process from http://www.songho.ca/dsp/convolution/files/conv2d_matrix.jpg.

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Implementation and Architecture

For implementation of CNNs, we used Caffe [?]. We only hadaround 16,000 images, so we used two pre-trained models todo fine-tuning:

� AlexNet [?], the winner of the ImageNet Large Scale VisualRecognition Challenge (ILSVRC) 2012.

� GoogLeNet [?], the winner of the ILSVRC 2014.

Both of these are provided in Caffe’s Model Zoo, with a file thatstores the weights of these models after training on ImageNet.

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Implementation and Architecture

For implementation of CNNs, we used Caffe [?]. We only hadaround 16,000 images, so we used two pre-trained models todo fine-tuning:� AlexNet [?], the winner of the ImageNet Large Scale Visual

Recognition Challenge (ILSVRC) 2012.

� GoogLeNet [?], the winner of the ILSVRC 2014.Both of these are provided in Caffe’s Model Zoo, with a file thatstores the weights of these models after training on ImageNet.

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Implementation and Architecture

For implementation of CNNs, we used Caffe [?]. We only hadaround 16,000 images, so we used two pre-trained models todo fine-tuning:� AlexNet [?], the winner of the ImageNet Large Scale Visual

Recognition Challenge (ILSVRC) 2012.� GoogLeNet [?], the winner of the ILSVRC 2014.

Both of these are provided in Caffe’s Model Zoo, with a file thatstores the weights of these models after training on ImageNet.

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Implementation and Architecture

For implementation of CNNs, we used Caffe [?]. We only hadaround 16,000 images, so we used two pre-trained models todo fine-tuning:� AlexNet [?], the winner of the ImageNet Large Scale Visual

Recognition Challenge (ILSVRC) 2012.� GoogLeNet [?], the winner of the ILSVRC 2014.

Both of these are provided in Caffe’s Model Zoo, with a file thatstores the weights of these models after training on ImageNet.

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Implementation and Architecture

For implementation of CNNs, we used Caffe [?]. We only hadaround 16,000 images, so we used two pre-trained models todo fine-tuning:� AlexNet [?], the winner of the ImageNet Large Scale Visual

Recognition Challenge (ILSVRC) 2012.� GoogLeNet [?], the winner of the ILSVRC 2014.

Both of these are provided in Caffe’s Model Zoo, with a file thatstores the weights of these models after training on ImageNet.

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

AlexNet

Figure: Image of AlexNet architecture (from [?]). This also illustrateshow original the network was split to train on two GPUs.

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

GoogLeNet

Figure: Image of GoogLeNet architecture (from [?]). Deeper, and 12xfewer parameters than AlexNet.

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Filter/Layer Visualization

Let’s do some filter/layer visualization!� 143.89.75.120/filayer.html

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Model Testing

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Dataset Construction

We gathered a data set of images of logos of 167 brands usingBing Search API (on average, 100 images per brand),searching for things like “<brand>”, “<brand>building”,“<brand><product>”.

One problem we faced was that wedownloaded either mislabeled images or irrelevant images. Wefiltered the dataset using two methods:

� compute the proportion of matching SIFT descriptorsbetween the downloaded image and a reference image forthat brand, and toss the image if it doesn’t meet somethreshold

� import ManualLabor

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Dataset Construction

We gathered a data set of images of logos of 167 brands usingBing Search API (on average, 100 images per brand),searching for things like “<brand>”, “<brand>building”,“<brand><product>”. One problem we faced was that wedownloaded either mislabeled images or irrelevant images. Wefiltered the dataset using two methods:

� compute the proportion of matching SIFT descriptorsbetween the downloaded image and a reference image forthat brand, and toss the image if it doesn’t meet somethreshold

� import ManualLabor

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Dataset Construction

We gathered a data set of images of logos of 167 brands usingBing Search API (on average, 100 images per brand),searching for things like “<brand>”, “<brand>building”,“<brand><product>”. One problem we faced was that wedownloaded either mislabeled images or irrelevant images. Wefiltered the dataset using two methods:� compute the proportion of matching SIFT descriptors

between the downloaded image and a reference image forthat brand, and toss the image if it doesn’t meet somethreshold

� import ManualLabor

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Dataset Construction

We gathered a data set of images of logos of 167 brands usingBing Search API (on average, 100 images per brand),searching for things like “<brand>”, “<brand>building”,“<brand><product>”. One problem we faced was that wedownloaded either mislabeled images or irrelevant images. Wefiltered the dataset using two methods:� compute the proportion of matching SIFT descriptors

between the downloaded image and a reference image forthat brand, and toss the image if it doesn’t meet somethreshold

� import ManualLabor

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Testing the original pipeline

� parameter tuning� cross validation

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Parameter Tuning

� BOW structure: how to choose vocabulary size:I words = BL

I B: number of branch; L: number of level

I Too large: lack of generalization, overfitting

I

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Parameter Tuning

� BOW structure: how to choose vocabulary size:I words = BL

I B: number of branch; L: number of levelI Too large: lack of generalization, overfitting

I

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Parameter Tuning

� BOW structure: how to choose vocabulary size:I words = BL

I B: number of branch; L: number of levelI Too large: lack of generalization, overfittingI Too small: lack of discrimination,mismatched

I

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Parameter Tuning

� vocabulary size� How to choose the number of images returned by inverted

file index searchI accuracyI the computation time of re-ranking

� How to choose the number of image shown in the clientside

I accuracyI mobile application, the size of screen

post

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Parameter Tuning

� vocabulary size� How to choose the number of images returned by inverted

file index searchI accuracyI the computation time of re-ranking

� How to choose the number of image shown in the clientside

I accuracyI mobile application, the size of screen

post

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Parameter Tuning

� vocabulary size� How to choose the number of images returned by inverted

file index searchI accuracyI the computation time of re-ranking

� How to choose the number of image shown in the clientside

I accuracyI mobile application, the size of screen

post

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Parameter Tuning

� vocabulary size� the number of images returned by searching� the number of image shown� Re-ranking: how to determine weight factor w in the

weighted functionI scores = w ∗ I + (1− w) ∗ FI I: number of inliersI F: frequency of the brands in the return images

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Parameters for Evaluation

� vocabulary sizeI number of branchI number of level

� the number of images returned by searching� the number of image shown� weight factor w in the weighted function� calculation of the accuracy

I one correct return then accuracy = 1

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Cross Validation

� applicationI model selectionI model assessment

� procedure

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Cross Validation

randomly divide the data into Kequal sized parts.� leave out part k, fit the

model to the other K-1parts(combined), and thenobtain predictions for theleft-out kth part

� this is done in turn for eachpart k=1,2,...K, and thenthe results are combined

� choose k = 5

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Testing Result

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Testing Result

� test on vocabulary size� optimal number of words: 500000 to 800000

I number of branch = 14 or 15I number of level = 5

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Testing Result

� With otherparameters fixed,test on

I weight factorI number of return

imageI number of image

shown on theclient side

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Testing Result

� optimal parametersetting:

I number of imageshown = 6

I set number ofreturn image tobe 15, savingabout 0.3s

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Testing Summary

� optimal parameter setting:I number of words: 500000 to 800000I number of image return: 15I number of image shown: 6

� stability of the system was also test:I standard deviation of 5 fold cross validation range from

0.005 to 0.007

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Evaluation of Deep Learning framework

Cross-validation for AlexNet (Top-5 Accuracy)

0.87

0.88

0.89

0.9

0.91

0.92

0.93

0.94

0.95

10

00

60

00

11

00

0

16

00

0

21

00

0

26

00

0

31

00

0

36

00

0

41

00

0

46

00

0

51

00

0

56

00

0

61

00

0

66

00

0

71

00

0

76

00

0

81

00

0

86

00

0

91

00

0

96

00

0

10

10

00

10

60

00

11

10

00

11

60

00

12

10

00

12

60

00

13

10

00

13

60

00

14

10

00

14

60

00

15

10

00

15

60

00

16

10

00

16

60

00

17

10

00

17

60

00

18

10

00

18

60

00

19

10

00

19

60

00

Cross Validation Example

94.63% 94.02%93.80%94.02%93.90%93.59%94.11%93.44%94.54%93.80%

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Evaluation of Deep Learning framework

Cross-validation for AlexNet

Final Accuracy reaches: (AlexNet)

AlexNet

Top-1 Accuracy 93.33%

Top-5 Accuracy 96.73%

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Evaluation of Deep Learning framework

Cross-validation for GoogleNet (Top-5 Accuracy)

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Evaluation of Deep Learning framework

Cross-validation for AlexNet

Cross-validation for GoogleNet

Final Accuracy reaches: (GoogleNet)

GoogleNet

Top-1 Accuracy 94.05%

Top-5 Accuracy 97.39%

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Evaluation of Deep Learning framework

Final Comparison

GoogleNet AlexNet Visual Bag of Words

Accuracy (Top-5) 97.39% 96.73% 87.6%

Efficiency

Preprocess 8.47ms 7.5ms 6ms

Classification 17.7ms 6.94ms

SURF Featureextraction

24ms

Total Time(Including some system level operation)

129ms 170ms 281ms

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Demonstration

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Future development

There is still something we can do to improve the system� We can enlarge the data set. (Currently 167 classes and

16,000 images)

� Test different deep learning frameworks.� Combine locally hand-crafted feature and globally deep

learned feature to achieve better accuracy.

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Future development

There is still something we can do to improve the system� We can enlarge the data set. (Currently 167 classes and

16,000 images)� Test different deep learning frameworks.

� Combine locally hand-crafted feature and globally deeplearned feature to achieve better accuracy.

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Future development

There is still something we can do to improve the system� We can enlarge the data set. (Currently 167 classes and

16,000 images)� Test different deep learning frameworks.� Combine locally hand-crafted feature and globally deep

learned feature to achieve better accuracy.

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

We would like to thank� Mr. Sun Lin and Lenovo-Hong Kong.� Professor Shingyu Leung, Dr. Ku Yin Bon and Hong Kong

University of Science and Technology.� Professor Susanna Serna and the Institute for Pure and

Applied Mathematics.� The National Science Foundation for program funding -

Grant DMS #0931852.

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo

Qi, Richfield, Zeng, Zhao

RIPS-HK: Lenovo