Learning a structured model for visual category recognition

57
Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary Learning A Structured Model For Visual Category Recognition Ashish Gupta University of Surrey [email protected] July 5,2013 Ashish Gupta University of Surrey Learning A Structured Model For Visual Category Recognition

description

Learning a Structured Model for Visual Category Recognition Abstract: This thesis deals with the problem of estimating structure in data due to the semantic relations between data elements and leveraging this information to learn a visual model for category recognition. A visual model consists of dictionary learning, which computes a succinct representation of training data by partitioning feature space, and feature encoding, which learns a representation of each image as a combination of dictionary elements. Besides variations in lighting and pose, a key challenge of classifying a category is intra-category appearance variation. The key idea in this thesis is that feature data describing a category has latent structure due to visual content idiomatic to a category. However, popular algorithms in literature disregard this structure when computing a visual model. Towards incorporating this structure in the learning algorithms, this thesis analyses two facets of feature data to discover relevant structure. The first is structure amongst the sub-spaces of the feature descriptor. Several sub-space embedding techniques that use global or local information to compute a projection function are analysed. A novel entropy based measure of structure in the embedded descriptors suggests that relevant structure has local extent. The second is structure amongst the partitions of feature space. Hard partitioning of feature space leads to ambiguity in feature encoding. To address this issue, novel fuzzy logic based dictionary learning and feature encoding algorithms are employed that are able to model the local feature vectors distributions and provide performance benefits. To estimate structure amongst sub-spaces, co-clustering is used with a training descriptor data matrix to compute groups of sub-spaces. A dictionary learnt on feature vectors embedded in these multiple sub-manifolds is demonstrated to model data better than a dictionary learnt on feature vectors embedded in a single sub-manifold computed using principal components. In a similar manner, co-clustering is used with encoded feature data matrix to compute groups of dictionary elements - referred to as `topics'. A topic dictionary is demonstrated to perform better than a regular dictionary of comparable size. Both these results suggest that the groups of sub-spaces and dictionary elements have semantic relevance. All the methods developed here have been viewed from the unifying perspective of matrix factorization, where a data matrix is decomposed to two factor matrices which are interpreted as a dictionary matrix and a co-efficient matrix. Sparse coding methods, which are currently enjoying much success, can be viewed as matrix factorization with a regularization constraint on the vectors of the dictionary or co-efficient matrices. ....

Transcript of Learning a structured model for visual category recognition

Page 1: Learning a structured model for visual category recognition

Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary

Learning A Structured Model For Visual CategoryRecognition

Ashish Gupta

University of Surrey

[email protected]

July 5,2013

Ashish Gupta University of Surrey

Learning A Structured Model For Visual Category Recognition

Page 2: Learning a structured model for visual category recognition

Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary

Introduction

Introduction : What is Category Recognition?

Feature vector Embedding : Information in Sub-Manifold.

Feature vector distribution: Fuzzy Visual Model.

Estimating semantic structure: Co-clustering.

Sparse Models: Semantically structured.

Summary & Future Work

Ashish Gupta University of Surrey

Learning A Structured Model For Visual Category Recognition

Page 3: Learning a structured model for visual category recognition

Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary

Motivation

Visual Category?

Robot interacts physical objects.

Object taxonomy based on physicalproperties.

Robot recognizes object usingvisual appearance.

Ashish Gupta University of Surrey

Learning A Structured Model For Visual Category Recognition

Page 4: Learning a structured model for visual category recognition

Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary

Motivation

Visual Category Model

Appearance variation → scatter of semantically related descriptors in featurespaceCan this scatter distribution be estimated?Can this structure be used to improve the learnt visual model?Visual category model ≈ Visual object model + Estimated structure of visualcategory variation

Ashish Gupta University of Surrey

Learning A Structured Model For Visual Category Recognition

Page 5: Learning a structured model for visual category recognition

Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary

Approach

Visual Classification Pipeline

Structure in sub-spaces → groups of sub-spaces → dictionary

Structure in dictionary → groups of prototypes → encoding

Ashish Gupta University of Surrey

Learning A Structured Model For Visual Category Recognition

Page 6: Learning a structured model for visual category recognition

Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary

Approach

Feature Descriptor Matrix

Scene−15 D−SIFT, 500 feature vectors of 128 dimensions

feature vectors

dim

ensio

ns

0

50

100

150

200

250

Matrix of 500 D-SIFT feature descriptors, each of 128 dimensions.

Ashish Gupta University of Surrey

Learning A Structured Model For Visual Category Recognition

Page 7: Learning a structured model for visual category recognition

Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary

Approach

Encoded Feature Matrix

Conceptual illustration of encoded feature matrix, occurrencehistogram of visual words in images.

Ashish Gupta University of Surrey

Learning A Structured Model For Visual Category Recognition

Page 8: Learning a structured model for visual category recognition

Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary

Approach

Conceptual Interpretation

Structure estimation can be interpreted as estimation ofsemantically related rows or columns of data matrix. These areprojected to a lower dimensional space such that mutual separationbetween equivalent feature vectors is reduced.

Ashish Gupta University of Surrey

Learning A Structured Model For Visual Category Recognition

Page 9: Learning a structured model for visual category recognition

Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary

Sub-space Embedding

Feature descriptor space is high dimensional.

Relevant information is embedded in a lower dimensionalsub-manifold.

What is the appropriate lower dimensionality?

Measure efficacy of sub-space embedding method?

Measure information in embedded feature vectors.

Ashish Gupta University of Surrey

Learning A Structured Model For Visual Category Recognition

Page 10: Learning a structured model for visual category recognition

Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary

Intrinsic Dimensionality

Intrinsic dimensionality p estimation

Correlation Dimension

Number of feature vectors in a hypersphere of radius r is proportional to rp.

Maximum Likelihood Estimate

Expectation of number of feature vectors covered by a hypersphere of growingradius r .

Eigenvalue Estimate

Number of eigenvalues greater than a small threshold value ε.

Geodesic Minimum Spanning Tree

Based on length of GMST of k descriptors in a neighbourhood graph.

Ashish Gupta University of Surrey

Learning A Structured Model For Visual Category Recognition

Page 11: Learning a structured model for visual category recognition

Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary

Intrinsic Dimensionality

Estimated Intrinsic Dimensionality

Ashish Gupta University of Surrey

Learning A Structured Model For Visual Category Recognition

Page 12: Learning a structured model for visual category recognition

Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary

Intrinsic Dimensionality

Subspace Embedding Methods

Global Methods

Principal Components

Multi-DimensionalScaling

Stochastic ProximityEmbedding

Isomap

Diffusion Maps

Local Methods

Locally Linear Embedding

Locality Preserving Projection

Neighbourhood PreservingProjection

Landmark Isomap

t-Stochastic NeighbourhoodEmbedding

Ashish Gupta University of Surrey

Learning A Structured Model For Visual Category Recognition

Page 13: Learning a structured model for visual category recognition

Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary

Entropic Measure

Entropy Measure Intuition

−10 −5 0 5 10 150

20

40−15

−10

−5

0

5

10

15

x

’swiss’ synthetic data

Y

Z

−1.5−1 −0.5

0 0.51 1.5

−1

−0.5

0

0.5

1−5

0

5

10

X

’intersect’ synthetic data

Y

Z

−400 −200 0 200 400

−500

0

500−300

−200

−100

0

100

200

X

’VOC2006,car’ data

Y

Z

0 10 20 30 40 50 60 70 80 90 1000

0.005

0.01

0.015

0.02

0.025

Bin index

Norm

aliz

ed F

requency

Distribution of pair−wise distances in data

swiss, H=−25.3355

intersect, H=−19.3150

VOC2006,car, H=−33.0302

Ashish Gupta University of Surrey

Learning A Structured Model For Visual Category Recognition

Page 14: Learning a structured model for visual category recognition

Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary

Empirical Results

Comparison of Embedded Entropy

Ashish Gupta University of Surrey

Learning A Structured Model For Visual Category Recognition

Page 15: Learning a structured model for visual category recognition

Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary

Empirical Results

Computational Time Complexity

Ashish Gupta University of Surrey

Learning A Structured Model For Visual Category Recognition

Page 16: Learning a structured model for visual category recognition

Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary

Empirical Results

Classification Performance

Ashish Gupta University of Surrey

Learning A Structured Model For Visual Category Recognition

Page 17: Learning a structured model for visual category recognition

Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary

Empirical Results

Conclusion

Estimated intrinsic dimensionality was in the neighbourhoodof 14 of the 128-dimensional descriptor.

The performance of LPP in comparison to other embeddingmethods accentuates the importance of modelling structure inlocal distributions.

Ashish Gupta University of Surrey

Learning A Structured Model For Visual Category Recognition

Page 18: Learning a structured model for visual category recognition

Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary

Fuzzy Visual Model

Structure in distribution of descriptors in feature space?

Issues with K-means clustering in the Bag-of-Words model.

Visual model incorporating Fuzzy logic framework.

Ashish Gupta University of Surrey

Learning A Structured Model For Visual Category Recognition

Page 19: Learning a structured model for visual category recognition

Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary

Visual Ambiguity

Descriptor assignment has issues of uncertainty andplausibility.

Kernel Codebook uses soft-assignment to resolve theambiguity.

Ashish Gupta University of Surrey

Learning A Structured Model For Visual Category Recognition

Page 20: Learning a structured model for visual category recognition

Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary

Fuzzy Models

Visual Dictionary

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

times (normalized scale)

acce

lera

tio

n (

no

rma

lize

d s

ca

le)

K−means Hard Partition | Motorcycle Data

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

times (normalized scale)

acce

lera

tio

n (

no

rma

lize

d s

ca

le)

Fuzzy K−Means Partition | Motorcycle Data

0 0.2 0.4 0.6 0.8 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

times (normalized scale)

acce

lera

tio

n (

no

rma

lize

d s

ca

le)

Gustafson−Kessel Fuzzy Partition | Motorcycle Data

L(Z;µC) =r∑

j=1

∑i∈Cj

‖ zi − µCj ‖2

L(Z; D,A) =r∑

i=1

n∑j=1

(αij)m ‖ zj − µCi ‖

L(Z; D,A, Σi) =r∑

i=1

n∑j=1

(αij)m ‖ zj − di ‖2Σi

Ashish Gupta University of Surrey

Learning A Structured Model For Visual Category Recognition

Page 21: Learning a structured model for visual category recognition

Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary

Fuzzy Models

d2Σ(z, µC) = (z−µC)TΣ(z−µC)

Σ =

( 1σ1

)2 0 · · · 0

0 ( 1σ2

)2 · · · 0...

.... . .

...0 0 · · · ( 1

σn)p

d2Σi

(zj , µCi ) = (zj−µCi )TΣi (zj−µCi )

Fi =

∑nj=1(αij)

m(zj − di )(zj − di )T∑n

j=1(αij)m

Σi =(ρi det(Fi ))

1p

Fi

Ashish Gupta University of Surrey

Learning A Structured Model For Visual Category Recognition

Page 22: Learning a structured model for visual category recognition

Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary

Empirical Results

FKM Classification Performance

MITcoast

MITmountainindustrial

livingroom

MITopencountryPARoffice

MITtallbuilding

CALsuburbstorebedroom

MITforest

MIThighwayMITstreet

MITinsidecitykitchen

visual category

0.5

0.6

0.7

0.8

Acc

Scene15

Bag-of-WordsFuzzy K-means

sheep

horse

bicycl

e

motorbi

ke cow bus

dog cat

perso

n car

visual category

0.45

0.50

0.55

0.60

Acc

VOC2006

Bag-of-WordsFuzzy K-means

Ashish Gupta University of Surrey

Learning A Structured Model For Visual Category Recognition

Page 23: Learning a structured model for visual category recognition

Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary

Empirical Results

GK Classification Performance

MITcoast

MITmountainindustrial

livingroom

MITopencountryPARoffice

MITtallbuilding

CALsuburbstorebedroom

MITforest

MIThighwayMITstreet

MITinsidecitykitchen

visual category

0.5

0.6

0.7

0.8

Acc

Scene15

Bag-of-WordsGustafson-Kessel

sheep horse bicycle motorbike cow bus dog cat person car

visual category

0.45

0.50

0.55

0.60

Acc

VOC2006

Bag-of-WordsGustafson-Kessel

Ashish Gupta University of Surrey

Learning A Structured Model For Visual Category Recognition

Page 24: Learning a structured model for visual category recognition

Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary

Empirical Results

Dictionary Size

32 64 128 256 512dictionary size

0.58

0.60

0.62

0.64

0.66

Acc

Caltech101

Bag-of-WordsFuzzy K-means

32 64 128 256 512dictionary size

0.58

0.60

0.62

0.64

0.66

Acc

Caltech101

Bag-of-WordsGustafson-Kessel

Comparison of BoW with FKM and GK for different sizes ofdictionary.

Ashish Gupta University of Surrey

Learning A Structured Model For Visual Category Recognition

Page 25: Learning a structured model for visual category recognition

Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary

Empirical Results

Aggregate Performance

VOC2006 VOC2010data set

0.50

0.51

0.52

0.53

0.54

0.55

Acc

Bag-of-WordsFuzzy K-meansGustafson-Kessel

(a) VOC datasets

Caltech101 Caltech256data set

0.60

0.62

0.64

0.66

0.68

Acc

Bag-of-WordsFuzzy K-meansGustafson-Kessel

(b) Caltech datasets

Visual Model Data SetVOC-2006 VOC-2010 Caltech-101 Caltech-256

BoW 0.50825 0.52446 0.60111 0.67606FKM 0.52635 0.53736 0.61928 0.68357G-K 0.52885 0.54224 0.62413 0.68623

Ashish Gupta University of Surrey

Learning A Structured Model For Visual Category Recognition

Page 26: Learning a structured model for visual category recognition

Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary

Empirical Results

Conclusion

Visual model learnt within the framework of fuzzy logic adaptsto the local distribution of feature vectors.

Learning a better fuzzy membership function is an effectivealternative to learning increasing large dictionaries to adapt toincreasing complexity of visual categories.

Ashish Gupta University of Surrey

Learning A Structured Model For Visual Category Recognition

Page 27: Learning a structured model for visual category recognition

Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary

Co-clustering for Structure Estimation

What is co-clustering?

Co-clustering for structure in descriptor data matrix.

Co-clustering for structure in encoded feature matrix.

Ashish Gupta University of Surrey

Learning A Structured Model For Visual Category Recognition

Page 28: Learning a structured model for visual category recognition

Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary

Co-clustering Methods

Co-clustering

Co-clustering is simultaneous and alternative row and columnclustering of a data matrix.

At each step of the optimization routine, the groups of rowsguide column clustering and vice versa.

CX : x1, . . . , xm 7→ x1, . . . , xkCY : y1, . . . , yn 7→ y1, . . . , yl

Ashish Gupta University of Surrey

Learning A Structured Model For Visual Category Recognition

Page 29: Learning a structured model for visual category recognition

Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary

Co-clustering Methods

Co-clustering methods

Information-Theoretic Co-Clustering

Data matrix is considered a joint probability distribution.Minimizes KL-divergence between original data and co-clusteredmatrices.

Sum-Squared Residue Co-Clustering

Alternative k-means clustering of rows and columns. Minimizessquared Euclidean distance between rows and columns from rowand column means respectively.

Ashish Gupta University of Surrey

Learning A Structured Model For Visual Category Recognition

Page 30: Learning a structured model for visual category recognition

Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary

Co-clustering Methods

Information-Theoretic Co-clustering

I (X ;Y )− I (X ; Y ) = dKL(p(X ,Y ), q(X ,Y ))

Ashish Gupta University of Surrey

Learning A Structured Model For Visual Category Recognition

Page 31: Learning a structured model for visual category recognition

Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary

Multiple Sub-spaces

Mutiple Sub-spaces Intuition

∑i ,j

dE (z•i |Sl , z•j |Sq) >

∑i ,j

dE (z•i , z•j ), l 6= q

Ashish Gupta University of Surrey

Learning A Structured Model For Visual Category Recognition

Page 32: Learning a structured model for visual category recognition

Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary

Multiple Sub-spaces

Co-clustering descriptor data matrix

Scene−15 D−SIFT, 500 feature vectors of 128 dimensions

feature vectors

dim

ensio

ns

0

50

100

150

200

250

Information−Theoretic Co−Clustering of Scene−15 D−SIFT 500x128 into 10 row and 10 column clusters

feature vectors

dim

ensio

ns

0

50

100

150

200

250

Ashish Gupta University of Surrey

Learning A Structured Model For Visual Category Recognition

Page 33: Learning a structured model for visual category recognition

Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary

Multiple Sub-spaces

Dictionary on single and multiple sub-spaces

Universal PCA Dictionary : VOC−2006 : D−SIFT : 10 x 500 : PCA + Kmeans

dictionary [500]

dim

en

sio

ns [

10

] P

CA

0

100

200

Universal CC Dictionary : VOC−2006 : D−SIFT : 10 x 500 : SSRCC + Kmeans

dictionary [500]

dim

en

sio

ns [

10

] C

C

0

100

200

Ashish Gupta University of Surrey

Learning A Structured Model For Visual Category Recognition

Page 34: Learning a structured model for visual category recognition

Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary

Multiple Sub-spaces

Classification performance

VOC2006 VOC2007Data Set

0.50

0.55

0.60

0.65

0.70

F1

Dict: 10x1000MSSD:(i): 5x1000MSSD:(r): 5x1000

VOC2006 VOC2007Data Set

0.50

0.55

0.60

0.65

F1

Dict: 10x1000MSSD:(i): 10x1000MSSD:(r): 10x1000

Comparison of classification performance of single and multiple sub-spacedictionaries.

Ashish Gupta University of Surrey

Learning A Structured Model For Visual Category Recognition

Page 35: Learning a structured model for visual category recognition

Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary

Multiple Sub-spaces

Dictionary projected to multiple sub-spaces

Universal Dictionary : VOC−2006 : D−SIFT : 128x500 : Kmeans

dictionary [500]

dim

ensio

ns [128]

0

50

100

150

200

250

Universal Submanifold Dictionary : VOC−2006 : D−SIFT : 128 (10) x 500 : SSRCC + Kmeans

dictionary [500]

dim

ensio

ns [128], s

ubm

anifold

s [10]

0

50

100

150

200

250

Ashish Gupta University of Surrey

Learning A Structured Model For Visual Category Recognition

Page 36: Learning a structured model for visual category recognition

Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary

Multiple Sub-spaces

Classification performance

VOC2006 VOC2007Data Set

0.50

0.55

0.60

0.65

F1 (5)

Dict: 128x1000SSSD:(i): 128x1000SSSD:(r): 128x1000

VOC2006 VOC2007Data Set

0.50

0.55

0.60

0.65

0.70

F1 (50)

Dict: 128x1000SSSD:(i): 128x1000SSSD:(r): 128x1000

Comparison of classification performance of dictionary projected to multiplesub-spaces.

Ashish Gupta University of Surrey

Learning A Structured Model For Visual Category Recognition

Page 37: Learning a structured model for visual category recognition

Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary

Topic Dictionary

Structure in Dictionary Intuition

Estimating groups of non-contiguous partitions of feature spacethat are semantically related.

Ashish Gupta University of Surrey

Learning A Structured Model For Visual Category Recognition

Page 38: Learning a structured model for visual category recognition

Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary

Topic Dictionary

Topic Dictionary Concept

Ashish Gupta University of Surrey

Learning A Structured Model For Visual Category Recognition

Page 39: Learning a structured model for visual category recognition

Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary

Topic Dictionary

Classification Performance

Comparison of classification performance of dictionaries using BoWand ITCC, for VOC2006 and Scene15 datasets.

Ashish Gupta University of Surrey

Learning A Structured Model For Visual Category Recognition

Page 40: Learning a structured model for visual category recognition

Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary

Topic Dictionary

Dictionary sizes

VOC2006 VOC2007 VOC2010 Scene15 Caltech101Data Set

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

F1

BoW: 100CC:i: 100

VOC2006 VOC2007 VOC2010 Scene15 Caltech101Data Set

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

F1

BoW: 500CC:i: 500

VOC2006 VOC2007 VOC2010 Scene15 Caltech101Data Set

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

F1

BoW: 1000CC:i: 1000

Comparative classification performance for different dictionarysizes.

Ashish Gupta University of Surrey

Learning A Structured Model For Visual Category Recognition

Page 41: Learning a structured model for visual category recognition

Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary

Topic Dictionary

Conclusion

Groups of sub-spaces computed using co-clustering yieldeddictionaries with better classification performance.

Groups of feature space partition (dictionary elements) yieldedimproved classification results.

These estimated groups can be used in learning a semanticallystructured visual model.

Ashish Gupta University of Surrey

Learning A Structured Model For Visual Category Recognition

Page 42: Learning a structured model for visual category recognition

Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary

Sparse Decomposition

Ashish Gupta University of Surrey

Learning A Structured Model For Visual Category Recognition

Page 43: Learning a structured model for visual category recognition

Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary

Sparse Visual Model

Sparse model approximates a feature vector as a combinationof a sub-set of an over-complete basis set.

Sparsity is induced by adding a regularization constraint isadded to the coefficients in the loss function.

Degree of sparsity is determined empirically.

Each basis element is considered individually.

Possible structure amongst basis elements is disregarded.

Ashish Gupta University of Surrey

Learning A Structured Model For Visual Category Recognition

Page 44: Learning a structured model for visual category recognition

Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary

Structured Sparse Model

SSPCA (structure in sub-spaces)

Co-clustered groups of sub-spaces is used to augment Sparse-PCAto compute Structured Sparse-PCA dictionary.

Group Lasso (structure in dictionary)

Co-clustered groups of dictionary elements is used to augmentLasso to compute group Lasso feature encoding.

Ashish Gupta University of Surrey

Learning A Structured Model For Visual Category Recognition

Page 45: Learning a structured model for visual category recognition

Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary

Sparse Regularization

Sparse regularization : minα

1

n

n∑i=1

L(zi , dαi ) + λΩ(α)

Lasso : minα

1

n

n∑i=1

‖ zi −Dαi ‖2 +λ ‖ αi ‖1

Group Sparsity : minα

1

n

n∑i=1

‖ zi −Dαi ‖2 +λk∑

j=1

‖ αi ‖Gj

Ashish Gupta University of Surrey

Learning A Structured Model For Visual Category Recognition

Page 46: Learning a structured model for visual category recognition

Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary

Structured Sub-space

Structured Sub-space Dictionary using ITCC

sheep

horse

bicycle

motorbike cow bu

sdog cat

person car

Visual Category

50

60

70

80

90

mAP

VOC2006

Sparse SubspaceStructured Subspace

sheephorsebicycle

aerop

lanecow sof

abusdog cat

perso

ntrain

dining

table

bottleca

r

pottedplan

t

tvmonitor

chairbir

dboat

motorbike

Visual Category

50

60

70

80

90

mAP

VOC2007

Sparse SubspaceStructured Subspace

Ashish Gupta University of Surrey

Learning A Structured Model For Visual Category Recognition

Page 47: Learning a structured model for visual category recognition

Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary

Structured Sub-space

Structured Sub-space Dictionary using SSRCC

sheep

horse

bicycl

e

motor

bike cow bu

sdog cat

perso

n car

Visual Category

60

70

80

90

mAP

VOC2006

Sparse SubspaceStructured Subspace

sheephorsebicycle

aerop

lanecow sof

abusdog cat

perso

ntrain

dining

table

bottleca

r

pottedplan

t

tvmonitor

chairbir

dboat

motorbike

Visual Category

50

60

70

80

90

mAP

VOC2007

Sparse SubspaceStructured Subspace

Ashish Gupta University of Surrey

Learning A Structured Model For Visual Category Recognition

Page 48: Learning a structured model for visual category recognition

Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary

Structured Sub-space

Sparse Subspace Structured Sparse Subspace

Data Set ITCC SSRCC

VOC2006 67.5941 70.8295 68.5808

VOC2007 67.9971 68.0783 68.3718

Sparse selection of semantically related set of sub-spacesperforms better than sparse individual selection of sub-spaces.

Ashish Gupta University of Surrey

Learning A Structured Model For Visual Category Recognition

Page 49: Learning a structured model for visual category recognition

Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary

Structured Sparse Dictionary

Structured Sparse Encoding using ITCC

MITcoa

st

MITmo

untain

indust

rial

livingroo

m

MITopencou

ntry

PARoffic

e

MITtallbu

ilding

CALsu

burbsto

re

bedro

om

MITforest

MIThig

hway

MITstreet

MITins

idecity

kitchen

Visual Category

50

60

70

80

90

mAP

Scene15 ITCC

Sparse EncodingStructured Encoding

sheep

horse

bicycl

e

motorbi

ke cow bus

dog cat

perso

n car

Visual Category

60

70

80

90

100

mAP

VOC2006 ITCC

Sparse EncodingStructured Encoding

Ashish Gupta University of Surrey

Learning A Structured Model For Visual Category Recognition

Page 50: Learning a structured model for visual category recognition

Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary

Structured Sparse Dictionary

Structured Sparse Encoding using SSRCC

MITcoa

st

MITmo

untain

indust

rial

livingroo

m

MITopencou

ntry

PARoffic

e

MITtallbu

ilding

CALsu

burbsto

re

bedro

om

MITforest

MIThig

hway

MITstreet

MITinside

city

kitchen

Visual Category

50

55

60

65

70

75

80

85

mAP

Scene15 SSRCC

Sparse EncodingStructured Encoding

sheep

horse

bicycl

e

motorbi

ke cow bus

dog cat

perso

n car

Visual Category

60

70

80

90

100

mAP

VOC2006 SSRCC

Sparse EncodingStructured Encoding

Ashish Gupta University of Surrey

Learning A Structured Model For Visual Category Recognition

Page 51: Learning a structured model for visual category recognition

Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary

Structured Sparse Dictionary

Sparse Encoding Structured Sparse Encoding

Data Set ITCC SSRCC

VOC-2006 72.8386 73.3977 72.7738

Scene-15 68.5737 79.8794 72.1155

Sparse selection of semantically related set of dictionaryelements performs better than sparse individual selection ofdictionary element.

Ashish Gupta University of Surrey

Learning A Structured Model For Visual Category Recognition

Page 52: Learning a structured model for visual category recognition

Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary

Summary

Learning semantically relevant structure in feature space usedto compute better visual models.

Analysis of sub-space embedding emphasized modelling localdistributions.

Incorporation of fuzzy logic framework to learn dictionarykernels that adapt to local distributions yielded better visualmodels.

Co-clustering was successful in grouping semantically relatedsub-spaces and feature space partitions.

Estimated groups of sub-spaces and dictionary elements wereused to compute structured sparse visual models, improvingupon regular sparse models.

Ashish Gupta University of Surrey

Learning A Structured Model For Visual Category Recognition

Page 53: Learning a structured model for visual category recognition

Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary

Future Work

Future Work

Visual models using Fisher Kernel coding, which uses aGaussian kernel, has been very successful. Combining theapproach in Fisher Kernels with the learnt Fuzzy membershipfunctions could potentially improve the visual model.Fuzzy logic based learning algorithms that are more advancedthan Gustafson-Kessel could be explored to learn bettermembership functions.Co-clustering creates a block factorization of the data matrix.Partial membership of rows and columns to the co-clusterswould be the natural next step.Explore ways of using semantic structure to improve featuregeneration techniques like hierarchical models that aim tolearn category specific descriptors.

Ashish Gupta University of Surrey

Learning A Structured Model For Visual Category Recognition

Page 54: Learning a structured model for visual category recognition

Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary

Future Work

End

Questions...

Ashish Gupta University of Surrey

Learning A Structured Model For Visual Category Recognition

Page 55: Learning a structured model for visual category recognition

Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary

Appendices

BoW Partitioning

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

x

y

Bag−of−Words Partition | VOC−2006 | #000017

Figure: Bag-of-Words model and image ‘000017’ in VOC-2006 dataset. The dictionary of size 25 () iscomputed using K-means clustering. The feature vectors () are projected to 2 dimensions using PCA.

Ashish Gupta University of Surrey

Learning A Structured Model For Visual Category Recognition

Page 56: Learning a structured model for visual category recognition

Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary

Appendices

FKM Partitioning

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

x

y

Fuzzy K−means Fuzzy Partition | VOC−2006 | #000017

Figure: Fuzzy K-means model and image ‘000017’ in VOC-2006 dataset.

Ashish Gupta University of Surrey

Learning A Structured Model For Visual Category Recognition

Page 57: Learning a structured model for visual category recognition

Introduction Sub-space Embedding Fuzzy Visual Model Structure Estimation Structured Sparse Model Summary

Appendices

GK Partitioning

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

x

y

Gustafson−Kessel Fuzzy Partition | VOC−2006 | #000017

Figure: Gustafson-Kessel model and image ‘000017’ in VOC-2006 dataset.

Ashish Gupta University of Surrey

Learning A Structured Model For Visual Category Recognition