MKL for Category Recognition

Post on 13-Jan-2016

50 views 0 download

Tags:

description

MKL for Category Recognition. Kumar Srijan Syed Ahsan Ishtiaque. Dataset. 19 categories considered Currently Minimum of 58 images in each Average of 101 images The images have been taken from Google Images. Has been supplemented by images from Flickr. Code Walkthrough – Relevant Files. - PowerPoint PPT Presentation

Transcript of MKL for Category Recognition

MKL forCategory Recognition

Kumar SrijanSyed Ahsan Ishtiaque

Dataset

• 19 categories considered• Currently– Minimum of 58 images in each– Average of 101 images

• The images have been taken from Google Images.

• Has been supplemented by images from Flickr.

http://images.google.com and http://flickr.com

http://images.google.com

http://images.google.com

http://images.google.com

http://images.google.com

http://images.google.com

http://images.google.com

http://images.google.com

http://images.google.com

http://images.google.com

http://images.google.com

http://images.google.com

http://images.google.com

http://images.google.com

http://images.google.com

http://images.google.com

http://images.google.com

http://images.google.com

http://images.google.com

http://images.google.com

Code Walkthrough – Relevant Files• preprocCal101.sh - Rescales images and renames them

according to the code• cal_preprocDatabases.m - Builds Image and ROI database• cal_preprocVocabularies.m – Prepares the visual word

vocabularies• cal_preprocFeatures.m – computes features for all the

images, project them onto visual words and build map files for each

• cal_preprocHistograms.m – prepares Histograms for the visual words.

• cal_preprocKernels.m – computes training and testing kernel matrices.

• cal_classAll.m – final classification

Code Walkthrough

Construct Visual Wordscal_preprocVocabulariesConstruct Visual Words

cal_preprocVocabularies

Calculating Local Descriptors

Calculating Local Descriptors

Vector quantizationbk_calcVocabularyVector quantizationbk_calcVocabulary

Calculate Image Descriptors

bk_calcFeatures

Calculate Image Descriptors

bk_calcFeatures

Preparing Database( Separate Training and Testing Images, Define

Region of Interest and add Jitters to it )

cal_preprocDatabeses

Preparing Database( Separate Training and Testing Images, Define

Region of Interest and add Jitters to it )

cal_preprocDatabeses computes the features for all the images,

projects them on visual words, and produce map files for each.

cal_preprocFeatures

computes the features for all the images,

projects them on visual words, and produce map files for each.

cal_preprocFeatures

Compute and quantize descriptors for training

and test imagesbk_calcFeatures

Compute and quantize descriptors for training

and test imagesbk_calcFeatures

Prepare the visual words Histogram

cal_preprocHistograms

Prepare the visual words Histogram

cal_preprocHistograms

Compute Training and Testing Kernel Matrices

cal_preprocKernels

Compute Training and Testing Kernel Matrices

cal_preprocKernelsRun on all categories

cal_classAllRun on all categories

cal_classAll

Train SVM with MKLOne vs. Rest Classifiers

bk_trainAppModel

Train SVM with MKLOne vs. Rest Classifiers

bk_trainAppModel

Evaluate SVM on test data

bk_testAppModel

Evaluate SVM on test data

bk_testAppModel

Documentation for modifications and adjustment of parameters for

code execution

Changing the number of Training and testing images

• Default value is 15.• Change drivers/cal_filenames.txt accordingly

– this file contains the name of images for each category to be processed as training or testing images

Changing the number of Training images

• To change the number of final training images, which includes jittered images– In drivers/cal_conf.m file, change conf.numPos to desired value

• To change the number of initial training images (without jitters), which are input to the code– In drivers/cal_preprocDatabases.m –

Change this if ni <= 15 % Hard CodedTo if ni <= conf.numPos % Changed, change it to your desired value imdb.images(ii).set = imdb.sets.TRAIN ; else imdb.images(ii).set = imdb.sets.TEST ;

end

Changing the number of test images

• In drivers/cal_setupTrainTest.m –for cl = fieldnames(roidb.classes)‘

selCla = findRois(testRoidb, 'class', char(cl)) ;Change this keep(selCla(1 : min(15, length(selCla)))) = true ; % Hard CodedTo keep(selCla(1 : min(conf.numPos, length(selCla)))) = true ; % Changed

% you can change it to desired valueend

Adding a new Feature

• In drivers/cal_conf.m– Add that feature name to conf.featNames– Now specify the properties and parameters for that

feature like conf.feat.<your_feature_name>.<parameter>

• Now add your extractFn, quantizeFn and clusterFn in features directory ( check for input and output format for each )

Parameters

• Parameter should include– format – dense or sparse

• In the dense format, one stores features on a grid, specifying the x and y pixel coordinate of each column/row of the grid. Then we store an "image" whose pixels correspond to grid elements andspecify corresponding visual words.

• In the sparse format, one store a list of visual words and their x, y location in the image.

– extractFn - pointer to the function called to extract the feature

Parameters

– clusterFn - pointer to the clustering (k-means) function– quantizeFn - pointer to the function used to project onto

k-means cluster– vocabSize - k-means vocab. size (number of visual words)– numImagesPerClass - number of image per class used to

sample features to train the vocabulary with k-means– numFeatsPerImage - number of features per image

sampled to train the vocabulary with k-means– compress – “false” generally– pyrLevels - pyramidal levels used when building histogram

based on this features

Changing Jitters

• Jitters are basic modifications (zooming, flipping and rotating) on an image, in the code they are used to create more training data out of the basic training data, which helps to increase the accuracy.

• Current jitters supported are– rp5, rm5, fliplr, fliplr_rp5, fliplr_rm5, zm1, zm2 – these all

are modifications of zoom, rotate and flip only.

• For changing the jitters to be used, in drivers/cal_conf.m file – change conf.jitterNames accordingly.

Changing Features

• Current features supported are– gb – Sparse Geometric-Blur words– gist– bow – Sparse SIFT words, Bag of Words– phog180, phog360 – Dense edge-based shape– phowColor, phowGray – Dense SIFT words

• For changing features to be used, in drivers/cal_conf.m file – change conf.featNames accordingly

• For using bow feature, also use cal_preprocDiscrimScores after cal_preprocFeatures step.

Changing the weight learning method

• Current learning methods supported are– Manik– equalMean – It means that the weights are set to

the inverse of the average of the kernel matrices. It is a simple heuristic whose only purpose is to "balance" the kernels when you combine them additively.

• For changing the weight learning method, in drivers/cal_conf.m file – change conf.learnWeightMethod accordingly.

Obtaining Results

• Calculate SVM score for the image for all the classes.

• The image is assigned the class which has the highest score.

• Use this information to create the confusion matrix.

• Use confusion matrix to calculate the final accuracy.

Code Execution - I• In the current execution, we have taken 10 classes.

– Badge– Bulb– Camera– Cell– Frog– Horse– Keyboard– Kingfisher– Locket– Moon

• 15 train + 15 test images were used for the execution of the code

Kernel Matrices

echi2_phowGray_L0 echi2_phowGray_L1

echi2_phowGray_L2 el2_gb

Aggregate SVM Scores

Test Images

Categories

Highest

Lowest

Scores

Confusion matrixBADGE BULB CAMERA CELL FROG HORSE KEYBOARD KINGFISHER LOCKET MOON

BADGE 8 0 1 0 2 0 1 0 2 1

BULB 0 10 0 0 0 1 0 1 1 2

CAMERA 1 1 8 2 1 0 1 0 1 0

CELL 0 2 0 7 1 1 0 2 2 0

FROG 0 1 0 0 7 3 1 2 1 0

HORSE 0 0 0 0 6 9 0 0 0 0

KEYBOARD 0 1 0 0 1 0 13 0 0 0

KINGFISHER 0 0 0 0 5 2 0 7 1 0

LOCKET 1 2 0 0 1 0 0 0 9 2

MOON 1 0 0 0 0 0 0 1 0 13

Confusion Matrix

Category

Category

Highest

Lowest

Scores

Analysis• Overall accuracy is 61%.• Moon and keyboard have very high classification

rate – they have relatively lesser intraclass variance.

• Cell phone, frog and kingfisher have very low classification rates.

• There is appreciable confusion in horse vs. frog and kingfisher vs. frog. These are found in natural surroundings, possibly creating the confusion.

• Artificial objects don’t get confused with natural ones very frequently.

Code Execution - II• In this execution, we have taken 19 classes.

• badge• bulb• camera• cell• frog• horse• keyboard• kingfisher• locket• moon• owl• photo• piggy• pliers• remote• shirt• shoe• spoon• sunflower

• 15 train + 15 test images were used for the execution of the code

Kernel Matrices

echi2_phowGray_L0 echi2_phowGray_L1

echi2_phowGray_L2 el2_gb

Aggregate SVM Scores

Test Images

Categories

Highest

Lowest

Scores

Confusion Matrix

Category

Category

Highest

Lowest

Scores

Confusion Matrix

Category

Category

Highest

Lowest

Scores

Analysis

• Overall accuracy is 50.5% (lesser as compared to 10 category classification).

• Moon, keyboard and shirt have very high classification rate.

• Cell phone, frog and kingfisher have very low classification rates.

• There is appreciable confusion in photo-frame vs. cell phone and kingfisher vs. frog.

Analysis

• The classification of blub was good in the 10 category case, but was very bad in the 19 category case.

• Similar looking objects(low interclass difference) like camera, cell phone ,remote control and photo frame are more likely to get confused amongst themselves than with other groups.

Code Execution - III• In this execution, we have taken 19 classes.

• badge• bulb• camera• cell• frog• horse• keyboard• kingfisher• locket• moon• owl• photo• piggy• pliers• remote• shirt• shoe• spoon• sunflower

• 20 train + 15 test images were used for the execution of the code

Kernel Matrices

echi2_phowGray_L0 echi2_phowGray_L1

echi2_phowGray_L2 el2_gb

Aggregate SVM Scores

Highest

Lowest

Scores

Confusion Matrix

Highest

Lowest

Scores

Analysis

• Overall accuracy is 53.3% (slightly more than when 15 images were taken per category).

• Moon, keyboard and shirt have very high classification rate.

• Cell phone, frog and kingfisher have very low classification rates.

• There is appreciable confusion in photo-frame vs. cell phone and kingfisher vs. frog.

Analysis• Number of correct classifications for photo frame got

increased from 2 to 10(out of 15).• Number of correct Classification for piggy bank

decreased from 10 to 5(out of 15).• For objects with low intra-class variation(moon), the

classification error has increased.• For objects with low intra-class variation(photo

frame), the classification error has decreased significantly.

• Increasing the number of training images did not significantly increase accuracy in the case of classes with low inter-class variability.

Code Execution - IV

• 19 Classes Considered – Badge, Bulb, Camera, Cell Phone, Frog, Horse, Keyboard, Kingfisher, Locket, Moon, Owl, Photo Frame, Piggy Bank, Pliers, Remote Control, Shirt, Shoe, Spoon and Sunflower

• Features used are phog180 and phog360.• No Jitters are used• Kernel type is echi2.• 25 train + 15 test images were used for the execution

of the code.

Kernel Matrices

Aggregate SVM Scores

Highest

Lowest

Scores

Confusion Matrix

Highest

Lowest

Scores

Analysis

• Overall accuracy was 45.6 percent .• In spite of having more training images the

accuracy was decreased.• This shows that Jitter helps in better training

and in turn better accuracy.

Code Execution - V• 10 Classes considered - Badge, Bulb, Camera, Cell Phone,

Frog, Horse, Keyboard, Kingfisher, Locket and Moon.• Kernel type is echi2.• 25 train and 25 test images are used for execution of the

code.• Running the code on the new test feature “lowesift”.

• This feature is similar to BOW feature• Instead of using Laplace-Harris for calculating interest points, it uses

SIFT detector itself.• In our case, we used both the implementations of David Lowe and

Andrea Vedaldi to test the feature.• Since the feature was similar to BOW feature, the clusterFn and

quantizeFn of BOW were used.

Kernel Matrices

Aggregate SVM Scores

Highest

Lowest

Scores

Confusion Matrix

Highest

Lowest

Scores

Analysis

• Overall accuracy is 47.2 percent.• Keyboard is the class which is most correctly

classified, because of it’s low intraclass variance.• Cellphone, Badge and Bulb have the least

accuracy.• There is appreciable confusion between Badge as

a Locket as these two classes are very similar.• Kingfisher got confused with Horse and Locket,

this did not happen in earlier executions.