Unsupervised Image Segmentation Using …...Hashing • Random hashing i.e using a hash code to...

Unsupervised Image Segmentation Using Comparative Reasoning and Random

Walks Anuva Kulkarni Carnegie Mellon University Filipe Condessa Carnegie Mellon, IST-University of Lisbon

Jelena Kovacevic Carnegie Mellon University

Outline

•  Motivation –  Training-free methods –  Comparative Reasoning –  Related work

•  Approach –  Winner Take All (WTA) Hash –  Clustering based on Random Walks

•  Some experimental results

Acknowledgements

•  Example and test images taken from –  Berkeley Segmentation Dataset (BSDS) –  The Prague Texture Segmentation Data Generator

and Benchmark

Motivation

•  Goals: –  Segment images where no. of classes unknown

–  Eliminate training data (may not be available) –  Fast pre-processing step for classification

•  Segmentation is similarity search

•  Comparative Reasoning is rank correlation using machine learning concept of “hashing”

Hashing

•  Used to speed up the searching process

•  A ‘hash function’ relates the data values to keys or ‘hash codes’

•  Hash table is shortened representation of data

1110 0110

01110001

Hash code

Input data

ValueKey/Hash code

Hashfunction

Hash value Data

Bird_type1

Bird_type2

Dog_type1

Fox_type1

Hash table

1110 0110

01110001

Hash code

Input data

ValueKey/Hash code

Hashfunction

Hash value Data

Bird_type1

Bird_type2

Dog_type1

Fox_type1

Hash table

Hashing

•  Similar data points have the same (or close by) hash keys or “hash codes”

•  Properties of hash functions –  Always returns a number for an object –  Two equal objects will always have the same number –  Two unequal objects may not always have different

numbers

1110 0110

01110001

Hash code

Input data

ValueKey/Hash code

Hashfunction

Hash value Data

Bird_type1

Bird_type2

Dog_type1

Fox_type1

Hash table

6 Images from https://upload.wikimedia.org/wikipedia/commons/3/32/House_sparrow04.jpg Wikipedia www.weknowyourdreams.com

Hashing for Segmentation

•  Each pixel is described by some feature vectors (eg. Color)

•  Hashing is used to cluster them into groups

1110 0110

01110001

Hash code

Input data

ValueKey/Hash code

Hashfunction

Hash value Data

Bird_type1

Bird_type2

Dog_type1

Fox_type1

Hash table

Color features of each pixel

computedImage

Similar features hashed into same groups

Segmentation and Randomized Hashing

•  Random hashing i.e using a hash code to indicate the region in which a feature vector lies after splitting the space using a set of randomly chosen splitting planes

C. J. Taylor and A. Cowley, “Fast segmentation via randomized hashing.,” in BMVC, pp. 1–11, 2009. 8

TAYLOR, COWLEY: FAST SEGMENTATION VIA RANDOMIZED HASHING 3

0000 0001

00110010

01110110

0100 0101

(a) (b)

Figure 1: This figure depicts a simplified 2D version of the randomized hashing scheme.Figure (a) depicts a 2D feature space fractured into regions by a set of randomly chosensplitting planes. Each region is associated with a hash code indicating where it falls withrespect to the splitting planes. The set of all hash codes can be associated with the verticesof a hypercube as shown in Figure (b) here the shading of the nodes indicates how manyfeature vectors are hashed to that code. The segmentation scheme proceeds by identifyinglocal maxima in this hash code space.

this, the scheme employs a series of randomly chosen splitting planes. Figure 1 shows asimplified view of this procedure in two dimensions. Here the random splitting planes areused to hash the feature vectors into a set of disjoint cells based on their location.

The notion of randomized hashing has been employed before most notably by Indykand Motwani in the context of Locality Sensitive Hashing [8]. These authors used a similarapproach to hash a set of vectors into a set of discrete bins in order to accelerate the search fornearest neighbors. Their approach leveraged the fact that this randomized hashing proceduretends to preserves locality so points that are near to each other in the feature space are hashedto the same bin with high probability.

The proposed segmentation scheme leverages the same phenomenon for a different pur-pose - namely to cluster the feature vectors into groups. Returning to Figure 1 we note thatthe n spitting planes fracture the feature space into a set of 2n disjoint convex cells each ofwhich corresponds to an n-bit hash code. More specifically, each vector in the feature spacev

is assigned an n-bit hash code where the ith bit in the code, bi j, is derived from the ithsplitting plane as follows bi j = (v

where u

denotes the normal associated withthe ith splitting plane and si denotes the corresponding splitting value. Neighboring cells inthe feature space differ by a single bit so the Hamming distance between the codes providessome indication of the distance between vectors in the feature space. More generally we canconstruct a correspondence between the set of all possible hash codes and the vertices of ann-dimensional hypercube. The topology of the hypercube reflects the structure of the featurespace since neighboring cells in feature space will correspond to neighboring vertices in thehypercube.

For each of the hash codes the clustering procedure records how many feature vectorsare mapped to that code. We expect that clusters in feature space will induce populationmaxima in the code space. That is, if we consider the hypercube as a graph we would expectto observe that some of the hash codes have a greater population than their neighbors. This

Winner Take All (WTA) Hash

•  A way to convert feature vectors into compact binary hash codes

•  Absolute value of feature does not matter, only the ordering of values matters

•  Rank correlation preserved –  Stability

•  Distance between hashes approximates rank correlation

J. Yagnik, D. Strelow, D. A. Ross, and R.s. Lin, “The power of comparative reasoning,” in ICCV 2011, pp. 2431–2438, IEEE, 2011.

Calculating WTA Hash

•  Consider 3 feature vectors Step 1: Create random permutations

13 4 2 11 5 3

3 1 5 2 6 4

2 13 5 4 3 11 44 1 15 90 6 5

7 15 12 1 33 5

12 5 3 10 4 2feature 1 feature 2 feature 3

1 90 44 5 15 6

3 12 4 5 2 10 Permute with θ

2 13 5 4 3 11 44 1 15 90 6 53 12 4 5 2 10 Choose first K entries

2 13 5 4 3 11 44 1 15 90 6 53 12 4 5 2 10Hash code is index

of top entry out of the K

h=2 h=2 h=1

Permutation vector θ

Feature 1 and Feature 2 are similar

Step 1

Step 2

Step 3

•  Step 2: Choose first K entries. Let K=3

13 4 2 11 5 3

3 1 5 2 6 4

2 13 5 4 3 11 44 1 15 90 6 5

7 15 12 1 33 5

1 90 44 5 15 6

h=2 h=2 h=1

Step 1

Step 2

Step 3

•  Step 3: Pick the index of the max. entry. This is

the hash code ‘h’ of that feature vector

13 4 2 11 5 3

3 1 5 2 6 4

2 13 5 4 3 11 44 1 15 90 6 5

7 15 12 1 33 5

1 90 44 5 15 6

h=2 h=2 h=1

Step 1

Step 2

Step 3

Notice that Feature 2 is just Feature 1 perturbed by one, but Feature 3 is very different

13 4 2 11 5 3

3 1 5 2 6 4

2 13 5 4 3 11 44 1 15 90 6 5

7 15 12 1 33 5

1 90 44 5 15 6

h=2 h=2 h=1

Step 1

Step 2

Step 3

Random Walks

•  Understanding proximity in graphs •  Useful in propagation in graphs – creates

probability maps

•  Similar to electrical network with voltages and resistances

•  It is supervised. User must specify seeds

-0.16V

Our Approach

1.  Similarity Search using WTA Hash 2.  Transformation to graph with nodes and

edges 3.  Probability map using Random Walks

–  Automatic seed selection

4.  Clustering

Input image Segmented output

Random projections

WTA hash

Transform to graph with

(Nodes, Edges)

Auto. seedselection Stop?

Probabilities from

RW algo.

Similarity Search RW Algorithm

Block I Block II Block III

Our Approach

4.  Clustering

Random projections

WTA hash

(Nodes, Edges)

Probabilities from

RW algo.

Block I: Similarity Search

Our Approach

4.  Clustering

Random projections

WTA hash

(Nodes, Edges)

Probabilities from

RW algo.

Our Approach

4.  Clustering

Random projections

WTA hash

(Nodes, Edges)

Probabilities from

RW algo.

WTA hash

•  Image Dimensions: P x Q x d •  Project onto R randomly chosen hyperplanes

–  Each point in image has R feature vectors

Image =

Each point has R featuresRun WTA hash. We get one hash code

for each point in the image

Repeat this N times to get PQ x N matrix of hash codes

Each point has R features

NEach point has R features

Get a vector of one hash codefor each point in the image

vectorize

Random projections onto R pairs of points

K=3Hence possible values

of hash codesare 00, 01, 11

WTA hash

•  Run WTA hash N times.

Image =

vectorize

Image =

PQEach point has R features

Run WTA hash. We get one hash codefor each point in the image

vectorize

Image =

vectorize

Image =

vectorize

Block II: Create Graph

Our Approach

4.  Clustering

Random projections

WTA hash

(Nodes, Edges)

Probabilities from

RW algo.

Our Approach

4.  Clustering

Random projections

WTA hash

(Nodes, Edges)

Probabilities from

RW algo.

Create Graph

•  Run WTA hash N times ! each point has N hash codes

•  Image transformed into lattice •  Calculate edge weight between nodes i and j

!i,j = exp(��⌫i,j)

where:

⌫i,j =dH(i, j)

�dH(i, j) = Avg. Hamm. distance over all N hash codes of i and j� = Scaling factor� = Weight parameter for the RW algorithm

Block III: RW Algorithm

Our Approach

4.  Clustering

Random projections

WTA hash

(Nodes, Edges)

Probabilities from

RW algo.

Our Approach

4.  Clustering

Random projections

WTA hash

(Nodes, Edges)

Probabilities from

RW algo.

Seed Selection

•  Needs initial seeds to be defined •  Unsupervised draws using Dirichlet processes •  DP(G0,α)

–  Go is base distribution –  α is discovery parameter

•  Larger α leads to discovery of more classes

! !! = !|!!! !,! =!!!!! !

!!"!!!!!!

where !!"! = Total!number!of!classes !! = Class!label!!, ! ∈ 1,2…!!"! !!! = {!!|!! ≠ !!} !!!! = number!of!samples!in!!th!class!excluding!the!!th!sample

lim!!"!→! ! !! = !|!!! !,! = !!!!!!!!! ,!!!!!!∀!,!!

!! > 0

lim!!"!→!

! !! = !|!!! !,!!

= !! − 1+ ! , ∀!, !!!!! = 0

!!,… !!!!!!are!samples!drawn!from!a!Dirichlet!process!!!with!parameter!α Then behaviour of the next sample Xn given the previous samples is given by: Xi are the c-tuple probability vectors obtained from RW.

! !! = !|!!! !,! =!!!!! !

!!"!!!!!!

lim!!"!→! ! !! = !|!!! !,! = !!!!!!!!! ,!!!!!!∀!,!!

!! > 0

lim!!"!→!

! !! = !|!!! !,!!

= !! − 1+ ! , ∀!, !!!!! = 0

! !! = !|!!! !,! =!!!!! !

!!"!!!!!!

lim!!"!→! ! !! = !|!!! !,! = !!!!!!!!! ,!!!!!!∀!,!!

!! > 0

lim!!"!→!

! !! = !|!!! !,!!

= !! − 1+ ! , ∀!, !!!!! = 0

! !! = !|!!! !,! =!!!!! !

!!"!!!!!!

lim!!"!→! ! !! = !|!!! !,! = !!!!!!!!! ,!!!!!!∀!,!!

!! > 0

lim!!"!→!

! !! = !|!!! !,!!

= !! − 1+ ! , ∀!, !!!!! = 0

! !! = !|!!! !,! =!!!!! !

!!"!!!!!!

lim!!"!→! ! !! = !|!!! !,! = !!!!!!!!! ,!!!!!!∀!,!!

!! > 0

lim!!"!→!

! !! = !|!!! !,!!

= !! − 1+ ! , ∀!, !!!!! = 0

! !! = !|!!! !,! =!!!!! !

!!"!!!!!!

lim!!"!→! ! !! = !|!!! !,! = !!!!!!!!! ,!!!!!!∀!,!!

!! > 0

lim!!"!→!

! !! = !|!!! !,!!

= !! − 1+ ! , ∀!, !!!!! = 0

! !! = !|!!! !,! =!!!!! !

!!"!!!!!!

lim!!"!→! ! !! = !|!!! !,! = !!!!!!!!! ,!!!!!!∀!,!!

!! > 0

lim!!"!→!

! !! = !|!!! !,!!

= !! − 1+ ! , ∀!, !!!!! = 0

! !! = !|!!! !,! =!!!!! !

!!"!!!!!!

lim!!"!→! ! !! = !|!!! !,! = !!!!!!!!! ,!!!!!!∀!,!!

!! > 0

lim!!"!→!

! !! = !|!!! !,!!

= !! − 1+ ! , ∀!, !!!!! = 0

! !! = !|!!! !,! =!!!!! !

!!"!!!!!!

lim!!"!→! ! !! = !|!!! !,! = !!!!!!!!! ,!!!!!!∀!,!!

!! > 0

lim!!"!→!

! !! = !|!!! !,!!

= !! − 1+ ! , ∀!, !!!!! = 0

Seed Selection

•  Probability that a new seed belongs to a new class is proportional to α

•  Probability for the ith sample with class label yi –  Result by Blackwell and MacQueen, 1973

p(yi = c|y�i,↵) =n�ic + ↵

n� 1 + ↵where:

= Total number of classes

= Class label c, c 2 {1, 2...Ctot

}y�i

|j 6= i}n�i

= number of samples in cth class excluding ith sample

Seed Selection

•  Unsupervised, hence Ctot is infinite. Hence,

•  “Clustering effect” or “rich gets richer”

•  Probability that a new class is discovered:

Class is non-empty

Class is empty or new

!1p(yi = c|y�i,↵) =

n�ic

n� 1 + ↵8c, n�i

!1p(yi 6= yj for all j < i|y�i,↵) =

n� 1 + ↵8c, n�i

Random Walks

•  Use the RW algorithm to generate probability maps in each iteration

•  Entropy calculated with probability maps •  Entropy-based stopping criteria

–  Cluster purity ", Avg. image entropy #

Experimental Results

Value of Accuracy(%)

R BSDSubset TexBTF TexColor TexGeo

10 91.57 ± 4.47 97.84 ± 1.63 94.00 ± 2.81 95.60 ± 2.39

20 91.58 ± 4.31 98.25 ± 1.15 94.55 ± 2.04 95.82 ± 1.71

40 91.42 ± 4.57 98.36 ± 0.78 94.09 ± 2.57 95.14 ± 2.97

Table I: Comparison of segmentation performance on various datasets whilevarying the number of pairs of points R used for random projection. BSDSub-set refers to 200 training images in the BSDS500 dataset. TexBTF contains10 images which are a mosaic of bi-directional function textures. TexColorcontains 20 mosaics of various color images. TexGeo contains 10 mosaicsof satellite images. Smaller R implies smaller computation time and as seenhere, cluster purity is not affected adversely.

Value of GCE

10 0.179 0.063 0.159 0.102

20 0.180 0.065 0.159 0.129

40 0.186 0.061 0.156 0.134

Table II: GCE scores for varying values of R over all test datasets. GCEscores for the BSDSubset of 200 images can be compared with those of othermethods (Table III) and with the randomized hashing method in [6] tested onthe same dataset.

H&E teratoma images [11], [12] are analyzed byhistopathologists to assess the tissue composition of the ter-atoma. This visual analysis is extremely time consuming andresource intensive. The multiple types of tissues within theteratoma do not show regular relationships between each other,making automated analysis challenging. It is to be noted thatthe ground truth provided by the histopathology expert doesnot always include each and every tissue component, but onlythe ones deemed important enough. Examples of segmentationoutputs of histology images are as seen in Fig. 3. The lowGCE scores indicate accurate segmentation and our methodhas been able to identify similar tissue regions. The latticeconnectivity for histology images was chosen to be radial withr = 2. Large images were downsampled by a factor of 4 tospeed up computation. This segmentation result can be usedas a preprocessing step for robust tissue classification on H&Eimages, as in [11], [12].

For natural and texture images, we use an 8-connectedlattice to form the graph. Downsampling is not required forthese images. Some example results are shown in Fig. 4. Wealso tested performance for varying R on the training setfrom the BSDS500 dataset [10] containing 200 images and3 texture datasets provided by UTIA-PR [13]. These texturedatasets contain images which are computer-generated mosaicsfilled with randomly selected textures taken from the UTIA-PR dataset. Table I gives the values for cluster purity averagedover the dataset. GCE scores for the test datasets are given inTable II. These results are encouraging since they imply thatour segmentations and the ground truth show a high match.

Finally, we reproduce the table from [14] containing GCEscores for several state of the art methods tested on the same200 BSDS training images used by us. The performance ofour method on this same dataset can be compared using GCEscores for [6] which also uses randomized hashing.

Figure 3: Top row: Histology images with the respective sets of seeds usedby RW marked on the images. Middle row: ground truth images provided byexperts. Bottom row: segmented outputs using our method.

Figure 4: Results of our method demonstrated on some natural images fromBSDS500 and a texture image from the UTIA-PR dataset (last). Using theavailable ground truth, the cluster purity for the images from left to right are96.61% , 96.49%, 88.92% and 98.29% respectively.

V. CONCLUSION

The segmentation performance of our unsupervised methodon various datasets is presented in Table I. It demonstrates thatsmaller R also led to high cluster purity which meant savingsin computation time. GCE score was used to compare resultson the BSDSubset with other methods and it was found thatour method performed better than state of the art methods (seeBSDSubset column in Table II and Table III) and Taylor’srandomized hashing method[6]. Additionally, we found thatour method gives encouraging results on histology images ofteratomas.

Method GCE

Human Segmentation 0.080

RAD[14] 0.205

seed[15] 0.209

Learned Affinity[16] 0.214

Mean Shift[17] 0.260

Normalized Cuts[18] 0.336

Table III: Comparison of segmentation performance for BSDSubset usingGlobal Consistency Error scores for state of the art methods. Reproducedfrom [14].

10 91.57 ± 4.47 97.84 ± 1.63 94.00 ± 2.81 95.60 ± 2.39

20 91.58 ± 4.31 98.25 ± 1.15 94.55 ± 2.04 95.82 ± 1.71

40 91.42 ± 4.57 98.36 ± 0.78 94.09 ± 2.57 95.14 ± 2.97

Value of GCE

10 0.179 0.063 0.159 0.102

20 0.180 0.065 0.159 0.129

40 0.186 0.061 0.156 0.134

V. CONCLUSION

Method GCE

RAD[14] 0.205

seed[15] 0.209

10 91.57 ± 4.47 97.84 ± 1.63 94.00 ± 2.81 95.60 ± 2.39

20 91.58 ± 4.31 98.25 ± 1.15 94.55 ± 2.04 95.82 ± 1.71

40 91.42 ± 4.57 98.36 ± 0.78 94.09 ± 2.57 95.14 ± 2.97

Value of GCE

10 0.179 0.063 0.159 0.102

20 0.180 0.065 0.159 0.129

40 0.186 0.061 0.156 0.134

V. CONCLUSION

Method GCE

RAD[14] 0.205

seed[15] 0.209

Automatically Picked seeds

Berkeley segmentation subset Avg. GCE of dataset = 0.186

Histology images

10 91.57 ± 4.47 97.84 ± 1.63 94.00 ± 2.81 95.60 ± 2.39

20 91.58 ± 4.31 98.25 ± 1.15 94.55 ± 2.04 95.82 ± 1.71

40 91.42 ± 4.57 98.36 ± 0.78 94.09 ± 2.57 95.14 ± 2.97

Value of GCE

10 0.179 0.063 0.159 0.102

20 0.180 0.065 0.159 0.129

40 0.186 0.061 0.156 0.134

V. CONCLUSION

Method GCE

RAD[14] 0.205

seed[15] 0.209

TexGeo Avg GCE of dataset = 0.134

TexBTF Avg. GCE of dataset = 0.061

•  Comparison measure: Global Consistency Error (GCE)* –  Lower GCE indicates lower error

28 *C. Fowlkes, D. Martin, and J. Malik, “Learning affinity functions for image segmentation: Combining patch-based and gradient-based approaches,” vol. 2, pp. II–54, IEEE, 2003.

No. of features

GCE Score

BSDSubset TexBTF TexColor TexGeo

10 0.179 0.063 0.159 0.102

20 0.180 0.065 0.159 0.129

40 0.186 0.061 0.156 0.134

•  Comparison measure: Global Consistency Error (GCE) –  Lower GCE indicates lower error

•  Comparison with other methods**: –  Performed on BSDS Subset

No. of features

GCE Score

BSDSubset TexBTF TexColor TexGeo

10 0.179 0.063 0.159 0.102

20 0.180 0.065 0.159 0.129

40 0.186 0.061 0.156 0.134

Method Human RAD Seed Learned Affinity Mean Shift Normalized cuts

GCE 0.080 0.205 0.209 0.214 0.260 0.336

**E. Vazquez, J. Van De Weijer, and R. Baldrich, “Image segmentation in the presence of shadows and highlights,” pp. 1–14, Springer, 2008.

Conclusions

•  Comparative reasoning and Winner Take All hash enables fast similarity search

•  Our method performs unsupervised segmentation using context (Random Walks-based clustering)

•  There is no need to predefine the number of classes

•  This can be used as a pre-processing step for classification of hyperspectral images, biomedical images etc.

Thank you

Unsupervised Image Segmentation Using …...Hashing • Random hashing i.e using a hash code to...

Documents

Transcript of Unsupervised Image Segmentation Using …...Hashing • Random hashing i.e using a hash code to...

Efﬁcient Hashing using the AES Instruction Set¬cient Hashing using the AES Instruction Set Joppe W. Bos1 and Onur Özen1 and Martijn Stam2 1 Laboratory for Cryptologic Algorithms,

Multiplicative Noise Removal Using Variable Splitting and Constrained Optimization

Fiber-Optic Accelerometer Using Wavefront-Splitting Interferometry

Choosing Best Hashing Strategies and Hash Functions Best...and states the need of using hashing i.e. for faster data retrieval. The report contains ... hashing technique and hash function

DIGITAL VLSI ON SYSTEM CHIP DESIGN USING MULTI-HASHING ...ijves.com/wp-content/uploads/2012/07/IJVES-Y13-05095.pdf · DIGITAL VLSI ON SYSTEM CHIP DESIGN USING MULTI-HASHING TECHNIQUE

Data Structures Using C++ 2E Chapter 9 Searching and Hashing Algorithms.

CHAPTER 11 Hashing Objectives To know what hashing · PDF fileObjectives • To know what hashing is for ... time in a well-balanced search tree. ... using the polynomial hash code

Semantic hashing using tags and topic modeling

Accomplish Segment Reporting using Document Splitting …qsands.com/pdfs/QSandS_NewGL_DocumentSplitting_GuideBook.pdf · Accomplish Segment Reporting using Document Splitting in the

5 Hashing Hashtabellen Hashing with Chaining Hashing with ... · Hashing Ubersicht¨ 5 Hashing Hashtabellen Hashing with Chaining Universelles Hashing Hashing with Linear Probing

Content Based Image Hashing Using Companding … Based Image Hashing Using Companding and Gray Code Lei Yang Department of Electrical and Computer Engineering University …Published

Development of Water Splitting Catalysts Using a Novel ...Center for BioOptical Nanotechnology 2005 DOE Hydrogen Program Development of Water Splitting Catalysts using a Novel Molecular

Temperature effect on water splitting using a Si-doped ...

Hashing, Hashing Tables Chapter 8. Class Hierarchy.

Splitting Historical Blue Whale Catches using Spatial GAMs

On the (In)Security of IDEA in Various Hashing Modesfse2012.inria.fr/SLIDES/26.pdf · On the (In)Security of IDEA in Various Hashing Modes Using IDEA For Block Cipher Based Hashing

Real-time 3D Reconstruction at Scale using Voxel Hashing · Real-time 3D Reconstruction at Scale using Voxel Hashing Matthias Nießner 1;3 Michael Zollhofer¨ Shahram Izadi2 Marc

A Cortex-Like model for object recognition using feature selective hashing

Pixel Cluster Splitting Using Templates

3D GEOMETRIC HASHING USING TRANSFORM INVARIANT …etd.lib.metu.edu.tr/upload/12610546/index.pdf · 3D object recognition is performed by using geometric hashing where transformation