Supervised discrete discriminant hashing for image...

Pattern Recognition 78 (2018) 79–90

Contents lists available at ScienceDirect

Pattern Recognition

journal homepage: www.elsevier.com/locate/patcog

Supervised discrete discriminant hashing for image retrieval

Yan Cui a , d , ∗, Jielin Jiang

b , d , Zhihui Lai c , d , Zuojin Hu

a , WaiKeung Wong

d , ∗

a School of Mathematics and Information Science, Nanjing Normal University of Special Education, Nanjing, 210038, China b School of Computer and Software, Nanjing University of Information Science and Technology, Nanjing, 210044, China c College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, 518060, China d Institute of Textiles and Clothing, The Hong Kong Polytechnic University, Hong Kong, China

a r t i c l e i n f o

Article history:

Received 13 August 2017

Revised 18 November 2017

Accepted 7 January 2018

Available online 11 January 2018

Keywords:

Supervised hash learning

Discrete hash learning

Discrete hash codes

Discriminant information

Robust similarity metric

a b s t r a c t

Most existing hashing methods usually focus on constructing hash function only, rather than learning dis-

crete hash codes directly. Therefore the learned hash function in this way may result in the hash function

which can-not achieve ideal discrete hash codes. To make the learned hash function for achieving ideal

approximated discrete hash codes, in this paper, we proposed a novel supervised discrete discriminant

hashing learning method, which can learn discrete hashing codes and hashing function simultaneously.

To make the learned discrete hash codes to be optimal for classification, the learned hashing framework

aims to learn a robust similarity metric so as to maximize the similarity of the same class discrete hash

codes and minimize the similarity of the different class discrete hash codes simultaneously. The discrim-

inant information of the training data can thus be incorporated into the learning framework. Meanwhile,

the hash functions are constructed to fit the directly learned binary hash codes. Experimental results

clearly demonstrate that the proposed method achieves leading performance compared with the state-

of-the-art semi-supervised classification methods.

© 2018 Elsevier Ltd. All rights reserved.

1

t

o

[

l

w

c

i

i

p

f

s

S

i

s

g

a

i

W

p

a

a

o

i

h

a

c

i

w

u

i

i

i

i

i

u

c

t

h

0

. Introduction

In recent years, hashing technique has become an important

ool to deal with variety problems of large-scale databases such as

bject recognition [1] , image retrieval [2–4] , information retrieval

5–7] , Image matching [8] and related areas [9–12] . For these prob-

em, hashing learning is usually to learn a series of hash function

hich maps the similar data samples into a similar binary hash

ode such that the structure of the original space can be preserved

n the hash space. Existing hashing methods can be categorized

nto two main categories: data-independent and data-dependent.

Data-independent hashing methods usually generate random

ermutations or projections to map the data samples into a new

eature space without considering the training data and then the

ign function is applied to binarize the mapped feature. Locality

ensitive Hashing (LSH) [13] is a well-known representative data-

ndependent hashing method, which constructs hash functions by

imply using random linear projections. In addition, LSH has been

eneralized to other variants by accommodating other distance

nd similarity measures such as p-norm distance [14] , cosine sim-

larity [15] and kernel similarity [16,17] . In order to achieve high

∗ Corresponding authors.

E-mail addresses: [email protected] (Y. Cui), [email protected] (W.

ong).

m

w

o

ttps://doi.org/10.1016/j.patcog.2018.01.007

031-3203/© 2018 Elsevier Ltd. All rights reserved.

recision, long bit length hash codes need to be obtained for LSH

nd its variants, however, long bit length leads to reduce recall and

huge storage overhead.

Recently, data-dependent hashing methods have been devel-

ped to learn compact hash codes by taking full advantage of the

nformation of the training data. In other words, data-dependent

ashing methods usually learn a hash function from a training set,

nd then the learned hashing function is applied to learn hash

odes. Existing data-dependent hashing methods can be divided

nto: unsupervised, semi-supervised, and supervised methods.

Unsupervised hashing methods try to design hash functions

hich can preserve the similarity in the original feature space from

nlabeled data. Some well-known unsupervised hashing methods

nclude spectral Hashing(SH) [18,19] , spectral multimodal hash-

ng [20] , Iterative Quantization(ITQ) [21] , inductive manifold hash-

ng (IMH) [22] , Anchor Graph Hashing(AGH) [23] , Isotropic Hash-

ng(IsoH) [24] , Self-taught hashing [25] , Discrete Graph Hash-

ng(DGH) [26] and Collaborative multiview hashing (CMH) [27] . For

nsupervised hashing, no label information can be exploited for

onstructing hash functions. In order to achieve high precision re-

rieval results, long bit length hash codes need to be obtained. This

ay lead to considerable storage overhead and longer query time.

Semi-supervised hashing methods try to incorporate the pair-

ise label information of few labeled data into the construction

f hash functions. Some popular semi-supervised hashing methods

https://doi.org/10.1016/j.patcog.2018.01.007

http://www.ScienceDirect.com

http://www.elsevier.com/locate/patcog

http://crossmark.crossref.org/dialog/?doi=10.1016/j.patcog.2018.01.007&domain=pdf

mailto:[email protected]

mailto:[email protected]

https://doi.org/10.1016/j.patcog.2018.01.007

80 Y. Cui et al. / Pattern Recognition 78 (2018) 79–90

s

c

i

h

[

i

f

v

c

a

s

v

w

b

a

c

s

i

f

i

b

a

d

v

c

i

a

f

r

i

j

h

v

p

p

i

m

i

i

m

l

a

d

w

v

f

learned hash codes.

include Semi-Supervised Hashing (SSH) [28,29] , Semi-Supervised

Discriminant Hashing (SSDH) [30] , binary reconstructive embed-

ding (BRE) [31] and Semi-supervised manifold-embedded hashing

[32] . Semi-Supervised Hashing [28,29] preserves semantic similar-

ity by utilizing the pairwise label information. Semi-Supervised

Discriminant Hashing (SSDH) [30] is proposed based on Fisher dis-

criminant analysis to learn hash codes by maximizing the separa-

bility between labeled data in different classes while the unlabeled

data are used for regularization. Binary Reconstructive Embedding

(BRE) [31] aims to design hash functions by minimizing the recon-

structed error between the input distances and the reconstructed

Hamming distances. Semi-supervised manifold-embedded hashing

[32] is explored to simultaneously optimize feature representation

and classifier learning, which will make the learned binary codes

optimal for classification. For semi-supervised hashing, the exist-

ing semi-supervised hashing usually designs independent hashing

functions learning criterion based on labeled and unlabeled train-

ing data. Thus, the design of objective functions based on the unla-

beled training data do not take any prior information of the labeled

information into account.

Supervised hashing methods try to exploit the label informa-

tion of the training data in hashing function learning. Some well-

known supervised hashing methods include kernel-based Super-

vised hashing (KSH) [33] , supervised discrete hashing (SDH) [34] ,

minimal loss hashing (MLH) [35] , Supervised hashing via image

representation learning [36] , Supervised deep hashing [37–39] ,

Quantization-based hashing [40] and Robust discrete code mod-

eling for supervised hashing [41] . Kernel-Based Supervised Hash-

ing (KSH) [33] is to design hashing function based on similar and

dissimilar sample pairs. Supervised discrete hashing [34] learns

the hashing function directly from discrete hash codes, where

the learned hash codes are expected to be optimal for classifica-

tion. Minimal Loss Hashing (MLH) [35] aims to preserve the pairs

similarity of training samples in hash space. Supervised hashing

via representation learning [36] automatically learns a good im-

age representation tailored to hashing as well as a set of hash

functions. Supervised deep hashing [37–39] builds a deep convo-

lutional network to learn discriminative feature representations.

Quantization-based hashing [40] proposes a general framework to

incorporate the quantization-based methods into the conventional

similarity-preserving hashing and aims to reduce the quantization

error of any similarity-preserving hashing methods. Robust discrete

code modeling for supervised hashing [41] devises an l2, p-norm-

based binary code modeling approach, which can adaptively in-

duce sample-wise sparsity and perform automatic code selection

as well as noisy samples identification. Many existing supervised

hashing methods [42–44] usually focus on the construction of the

hashing function. However, the learning of the discrete hash codes

is very important for hashing function learning.

Although the aim of hashing learning is to achieve discrete hash

codes, the discrete constraints lead to mixed integer optimization

problems, which are generally NP-hard. Therefore, most existing

hashing methods usually relax the discrete constraints into a con-

tinuous alternative to simplify the optimization, and then the sign

function or thresholds is used to turn real values into the approx-

imate discrete hash codes. However, the approximate hash codes

are usually suboptimal and thus the effectiveness of the final hash

codes is affected. Therefore how to decrease the quantization er-

ror and achieve optimal discrete hash codes are critical for hashing

learning.

Recent studies focus on learning hashing codes and hashing

function simultaneously. The aim of these studies is to make the

learned hashing function for obtaining the optimal discrete binary

codes which can better approximate the learned hashing codes. Ex-

isting discrete hashing methods can be categorized into two main

categories: the discrete hash codes which can be achieved by pre-

erving the original similarity, and the discrete hash codes which

an be achieved by regressing each hash code to its correspond-

ng class label. Discrete graph hashing [26] , Supervised discrete

ashing [34] , cross-modality sequential discrete hashing (CSDH)

45] and discrete Collaborative Filtering (DCF) [46] are classified

nto the former category, and supervised discrete hashing (SDH),

ast supervised discrete hashing [47] and semi-supervised multi-

iew discrete hashing [48] are classified into latter category. Dis-

rete graph hashing directly learns the discrete binary codes using

graph-based unsupervised hashing model, and the neighborhood

tructure of original data can be preserved in hash space. Super-

ised Hashing discrete presents a joint optimization framework in

hich the similarity matrix of the pairwise similarity information

etween the training data is leveraged and the binary constraints

re preserved during the optimization, and then the hash function

an be achieved by training some binary classifiers. Cross-modality

equential discrete hashing learns the unified hash codes by us-

ng a sequentially discrete optimization strategy in which the hash

unctions are learned simultaneously. Discrete Collaborative Filter-

ng (DCF) aims to minimize the quantization loss during discrete

inary codes learning. Meanwhile, the discrete binary hash codes

re required to be balanced and uncorrelated to make the learned

iscrete binary hash codes more informative and compact. Super-

ised Discrete Hashing utilizes label information by a least squares

lassification which regresses each hash code to its correspond-

ng label while the hash functions are learned from the directly

chieved hash codes. Similarly, to leverage the label information,

ast supervised discrete hashing regresses each label to its cor-

esponding hash code. Semi-supervised multi-view discrete hash-

ng minimizes the joint hashing learning model, in which the loss

ointly on multi-view features when using relaxation on learning

ashing codes is minimized; the statistically uncorrelated multi-

iew features for generating hash codes is explored; and a com-

osite locality can be preserved in hamming space.

Inspired by the idea of the discrete hashing learning, we pro-

ose a novel supervised discrete discriminant hashing framework

n this paper. To make the learned discrete hash codes to be opti-

al for classification, the learned hashing framework aims to max-

mize the similarity of the same class discrete hash codes and min-

mize the similarity of the different class discrete hash codes, si-

ultaneously. To this end, we learn a robust similarity metric by

everaging the label information of the training data. Furthermore,

group of hash functions are simultaneously optimized to fit the

irectly learned binary hash codes in the learned hashing frame-

ork. To emphasize the main contributions of this paper, the ad-

antages of the proposed supervised discrete discriminant hashing

ramework can be summarized as follows:

(1) To make the learned discrete hash codes to be optimal for

classification, a novel robust similarity metric is developed

in the proposed supervised discrete discriminant hashing

approach. We utilize leverage the label information of the

original training data to learn a robust similarity metric such

that the learned similarity matric can make the similarity of

the same class discrete hash codes maximized and the sim-

ilarity of the different class discrete hash code minimized.

Thus, the discriminant information of the training data can

be incorporated into the learning framework and the learned

discrete hash codes are optimal for classification.

(2) To make the learned hash function for achieving optimal ap-

proximate discrete hash codes, the hash functions are opti-

mized based on the directly learned discrete hash codes. A

hash functions learned regular term is embedded in the pro-

posed supervised discrete discriminant hashing framework

and thus the hash functions can be optimized by the directly

Y. Cui et al. / Pattern Recognition 78 (2018) 79–90 81

l

d

E

r

2

t

a

l

r

2

v

m

c

t

X

a

f

z

w

s

M

t

c

Z

F

m

A

w

t

m

I

a

b

m

w

m

i

o

T

m

n

s

2

h

t

S

t

y

w

i

w

I

m

B

w

e

t

l

B

W

fi

B

T

f

B

w

k

X

W

t

t

w

t

t

B

a

F

c

b

p

a

f

3

a

f

c

s

The organization of this paper is as follows. We present the re-

ated works in Section 2 . In Section 3 , the novel supervised discrete

iscriminant hashing framework is proposed for large-scale data.

xperiments are presented in Section 4 . Conclusions are summa-

ized in Section 5 .

. Related works

Recently, discrete hash learning Hashing has gained a lot of at-

ention in hash learning duo to the fact that the discrete hash code

chieved by the learned hash functions can well fit the directly

earned discrete hash codes. In this section, we briefly review some

epresentative discrete hashing learning approaches.

.1. Discrete graph hashing

Discrete Graph Hashing (DGH) [26] is a well known unsuper-

ised discrete hashing method, which presents a discrete opti-

ization model to achieve nearly balanced and uncorrelated dis-

rete hash codes. In DGH model, the anchor graphs are chosen

o scale to massive data points. Specifically, for training dataset

= { x i ∈ R d } n i =1

, a small set U = { u j ∈ R d } m

j=1 (m � n ) are chosen to

pproximate the neighborhood structure underlying X by using the

ollowing nonlinear data-to-anchor mapping

(x ) =

[δ1 exp

(−D

2 (x, u 1 )

t

), . . . , δm

exp

(−D

2 (x, u m

)

t

)]T /

M

(1)

here δj ∈ {1, 0} and δ j = 1 if and only if anchor u j is one of

� m closest anchors of x in U underlying distance function D (),

=

∑ m

j=1 δ j exp (− D 2 (x,u j )

t ) , t > 0 is the bandwidth parameter. With

he data-to-anchor mapping z ( x ), a data-to-anchor affinity matrix Z

an be built as follows

= [ z(x 1 ) , z(x 2 ) , . . . , z(x n )] T ∈ R

n ×m (2)

rom the data-to-anchor affinity matrix Z , the data-to-data affinity

atrix A can be approximated by the following matrix

= Z �−1 Z T ∈ R

n ×n (3)

here � = diag (Z T 1) ∈ R m ×m . With the data-to-data affinity ma-

rix, the formulation of DGH can be built as follows

ax B

tr (B

T AB ) ,

s.t. B ∈ { 1 , −1 } n ×r , 1

T B = 0 , B

T B = nI r , (4)

n order to avoid amplification of the error caused by the relax-

tion as the code length r increases, the discrete hash codes can

e directly achieved by the following optimization framework

ax B

tr (B

T AB ) − ρ

2

dist 2 (B, �) ,

s.t. B ∈ { 1 , −1 } n ×r , (5)

here � = { Y ∈ R n ×r | 1 T Y = 0 , Y T Y = nI r } and dist (B, �) =in Y ∈ � ‖ B − Y ‖ F measures the distance between B and �, ρ

s a tuning parameter. Since tr (B T B ) = tr (Y T Y ) = nr, the DGH

ptimization framework can be further rewritten as follows

max B,Y

tr (B

T AB ) + ρtr (B

T Y ) ,

s.t. B ∈ { 1 , −1 } n ×r , Y ∈ R

n ×r , 1

T Y = 0 , Y T Y = nI r . (6)

he DGH framework can be optimized by a tractable alternating

aximization algorithm, the details can be found in [26] . For a

ew query sample y , its hash code can be computed as b(y ) =gn (W z(y )) , where W = B T Z�−1 ∈ R r×m .

.2. Supervised discrete hashing

Supervised discrete hashing (SDH) [34] is a popular discrete

ashing learning approach, which utilize the label information of

he training data during the hash codes learning. The purpose of

DH is that the learned hash binary codes are ideal for classifica-

ion, Thus, the following classification formulation is built

= G (b) = W

T b = [ w

T 1 b, w

T 2 b, . . . , w

T C b] T ll (7)

here y ∈ R C is the class label vector, C is the number of classes,

f b comes from class k , the k th element of y is 1 and 0 other-

ise, w k ∈ R r , k = 1 , 2 , . . . , C is the classification vector for class k .

n order to search the idea assigned class of x, the following opti-

ization problem is built

min

,W,F

n ∑

i =1

L (y i , W

T b i ) + λ‖ W ‖

2

s.t. b i = sgn (F (x i )) , i = 1 , 2 , . . . , n, (8)

here L ( •) is the loss function and λ is the regularization param-

ter. In order to achieve better quality hash codes, the hash func-

ions F ( •) is enhanced in the SDH framework. Thus, with the regu-

arization strategy, the problem (9) is turned to

min

,W,F

n ∑

i =1

L (y i , W

T b i ) + λ‖ W ‖

2 + αn ∑

i =1

‖ b i − F (x i ) ‖

s.t. b i ∈ {−1 , 1 } r . (9)

hen the l 2 loss is chosen as the loss function L ( •) for the classi-

cation model, the SDH formulation (9) can be further rewritten

min

,W,F

n ∑

i =1

‖ y i − W

T b i ‖

2 + λ‖ W ‖

2 + αn ∑

i =1

‖ b i − F (x i ) ‖

s.t. b i ∈ {−1 , 1 } r . (10)

he objective function of SDH can be rewritten in matric form as

ollows

min

,W,F ‖ Y − W

T B ‖ F + λ‖ W ‖

2 + α‖ B − F (X ) ‖

s.t. B ∈ {−1 , 1 } r×n . (11)

here Y ∈ R C × n is class label matrix, y ik = 1 if x i is from class

and 0 otherwise, B = [ b 1 , b 2 , . . . , b n ] ∈ R r×n is the hash codes,

= [ x 1 , x 2 , . . . , x n ] ∈ R d×n is the training data. The first term ‖ Y −

T B ‖ F is least squares regression, which regresses each hash code

o its corresponding class label vector. The second term λ‖ W ‖ 2 is

he projection matrix for hash codes. The last term is α‖ B − F (X ) ‖hich models the fitting error of the binary codes B by the con-

inuous embedding F (x ) = P T φ(x ) . The second term and the last

erm of Eq. (11) are for regularization. The optimal F ( X ), W and

can be iteratively solved by three steps: the F-step, the G-step

nd the B-step. Specifically, The F-step is to solve P , which let the

(X ) = P T φ(x ) and fix B in Eq. (11) , then the projection matrix P is

omputed by P = (φ(X ) φ(X T )) −1 φ(X ) B T ; the G-step is to solve W

y fixing B in Eq. (11) , W is solved by the regularized least squares

roblem and has a closed-form solution as W = (BB T + λI) −1 BY T ;

nd the B-step is to solve B , more details of the B-step can be

ound in [34] .

. Supervised discrete discriminant hashing

Although existing supervised hashing learning approaches take

dvantage of the label information of training data during hash

unction learning, many approaches only focus on hash function

onstruction without concerning discrete hash codes learning. Thus

uch learned hash functions are suboptimal for the directly learned


S

s

r

A

T

∑

∑

w

S

w

i

o

b

mW

F

i

S

T

mW

i

w

i

t

t

c

p

n

m

i

d

w

i

t

l

t

l

p

d

w

p

3

m

u

o

discrete hash codes. To handle this problem, we propose super-

vised discrete discriminant hashing learning for achieving the dis-

crete hash codes and hash function simultaneously. Furthermore,

the discriminative information of the training data is incorporated

during hash learning. In this section, we will specifically present

the formulation and optimization algorithms of the proposed for-

mulation.

3.1. Formulation

To achieve idea retrieved results, it is expected that the binary

bits representations of the same class samples are as similar as

possible. However when the length of binary bits is large enough,

the difference of the similarity becomes greater once some bits are

different. To handle this problem, it is proposed to learn a robust

similarity metric which can make the similarity of the same class

samples as large as possible, meanwhile, make the similarity of the

different class samples as small as possible.

Let X = { x 1 , x 2 , · · · , x n } be the training data set and each sample

x i ∈ R d , (i = 1 , 2 , · · · , n ) , L = { l 1 , l 2 , · · · , l n } ⊆ L 1 , L 2 , . . . L C is the cor-

responding labels set. Our aim is to learn the corresponding binary

codes B = { b 1 , b 2 , · · · , b n } so as to well preserve their class similar-

ities, that is, the binary bits representations of the same class sam-

ples are as similar as possible, b i ∈ R r is the r -bits binary represen-

tation of x i . To maximize the similarity of the same class samples,

it is necessary to learn a robust similarity metric A for measuring

the similarity of the same class samples as follows

max A,B

n ∑

i =1

n ∑

j=1

∑

(b i ,b j ) ∈ S b T i Ab j

s.t. b k = F (x k ) k = 1 , 2 , . . . , n (12)

where A is r × r square matrix ( b i , b j ) ∈ S means b i and b j have the

same class labels. In order to make the achieved hash codes to be

beneficial to classification, the similarity of the different class is ex-

pected to be as small as possible, that is, minimizing the similarity

of the different classes as follows

min

A,B

n ∑

i =1

n ∑

j=1

∑

(b i ,b j ) / ∈ S b T i Ab j

s.t. b k = F (x k ) k = 1 , 2 , . . . , n (13)

where ( b i , b j ) �∈ S means b i and b j does not have the same class la-

bels. To achieve ideal binary hash codes for classification, the simi-

larity of the same class samples is expected to be as large as possi-

ble, meanwhile, the similarity of the different class samples is ex-

pected to be as small as possible. Thus the following optimization

problem is chosen for optimization

max A,B

n ∑

i =1

n ∑

j=1

∑

(b i ,b j ) ∈ S b T i Ab j −

n ∑

i =1

n ∑

j=1

∑

(b i ,b j ) / ∈ S b T i Ab j

s.t. b k = F (x k ) k = 1 , 2 , . . . , n (14)

The aim of hash learning is to learn the ideal hash function,

which can achieve well approximate discrete hash codes. Thus we

focus more on the learning of the hash function. With the regu-

larization idea in large-scale optimization, the learning of the hash

function is incorporated into the learning formulation, i.e.

max A,B,F

n ∑

i =1

n ∑

j=1

∑

(b i ,b j ) ∈ S b T i Ab j −

n ∑

i =1

n ∑

j=1

∑

(b i ,b j ) / ∈ S b T i Ab j − α

n ∑

k =1

‖ b k − F (x i ) ‖

2

s.t. b ∈ {−1 , 1 } r k = 1 , 2 , . . . , n (15)
k
ince the similarity b T i

Ab j satisfies the properties of nonnegativity,

ymmetry and triangle inequality, we can know that A is symmet-

ic and positive semi-definite, that is A can be decomposed as

= W W

T (16)

hus the following equation can be obtained

n

i =1

n ∑

j=1

∑

(b i ,b j ) ∈ S b T i Ab j =

n ∑

i =1

n ∑

j=1

∑

(b i ,b j ) ∈ S b T i W W

T b j

= tr (W

T BS S B

T W ) n

i =1

n ∑

j=1

∑

(b i ,b j ) / ∈ S b T i Ab j =

n ∑

i =1

n ∑

j=1

∑

(b i ,b j ) ∈ S b T i W W

T b j

= tr (W

T BS D B

T W ) (17)

here S S is the relationship matrix of the same class samples, i.e.

S i j = 1 if b i and b j are with the same class labels and 0 other-

ise, S D is the relationship matrix of the different class samples,

.e. S D i j = 1 if b i and b j are with the different class labels and 0

therwise. From Eq. (17) , the proposed learning formulation can

e further rewritten

ax ,B,F

tr (W

T BS S B

T W ) − tr (W

T BS D B

T W ) − α‖ B − F (X ) ‖

2 F

s.t. B ∈ {−1 , 1 } r×n (18)

urthermore, we can redefine a relationship matrix S for all train-

ng data, whose element can be defined as

i j =

{1 if x i and x j have the same class labels

−1 if x i and x j have the different class labels (19)

hus the proposed learning formulation Eq. (18) is turned to

ax ,B,F

tr (W

T BSB

T W ) − α‖ B − F (X ) ‖

2 F

s.t. B ∈ {−1 , 1 } r×n (20)

As stated above, the proposed learning formulation learns the

deal hash function by directly achieved binary hash codes, mean-

hile the discriminant of the training data is enhanced by max-

mizing the similarity of the same class samples and minimizing

he similarity of the different class samples simultaneously. Thus

he proposed learning formulation is called supervised discrete dis-

riminant hashing (SDDH). In [30] , the SSDH framework is pro-

osed based on neighbor-pair samples with same class labels and

onneighbor-pair samples with different class labels, which can

ake the neighbor-pair samples with same class labels more sim-

lar and nonneighbor-pair samples with different class labels more

ifference, but neighbor-pair sam ples with different class labels

ill lead to misclassification. Moreover, the hash function learn-

ng is independent from the proposed SSDH framework. In contrast

o SSDH, the proposed SDDH take into account all the same class

abel samples and the non-same label samples, it will maximize

he similarity of the same class samples and minimizing the simi-

arity of the different class samples simultaneously, moreover, the

roposed learning formulation learns the ideal hash function by

irectly achieved binary hash codes. In the following section, we

ill use alternating optimization strategy to iteratively solve the

roposed optimization problem in Eq. (20) .

.2. Optimization and algorithms

From Section 3.1 , we can know that the proposed SDDH for-

ulation in Eq. (20) is a mixed binary integer program with three

nknown variables W, B and F . Similar to SDH, we use alternating

ptimization strategy to iteratively solve the problem.


f

F

w

f

φ

w

r

w

F

w

t

p

a

m

t

i

m

T

d

P

E

m

T

t

l

b

m

S

a

m

w

m

T

B

T

B

w

t

i

d

o

1

i

3

A

A

I

O

1

2

R

U

p

d

l

t

f

4

S

1

t

f

t

a

h

T

s

a

a

d

q

p

t

n

Similar to [34] , we use the following nonlinear form as hash

unction

(x ) = P T φ(x ) (21)

here P ∈ R m × r the transformation matrix, φ( x ) is a nonlinear

unction, which can be defined as

(x ) =

[exp

(−D

2 (x, u 1 )

t

), . . . , exp

(−D

2 (x, u m

)

t

)]T

∈ R

m (22)

here t is the kernel bandwidth and { u i } m

i =1 are m anchor points

andomly selected from training data. Thus, for all training data X ,

e can have

(X ) = [ F (x 1 ) , F (x 2 ) , . . . , F (x n )]

= [ P T φ(x 1 ) , P T φ(x 2 ) , . . . , P

T φ(x n )]

= P T (X ) (23)

here (X ) = [ φ(x 1 ) , φ(x 2 ) , . . . , φ(x n )] ∈ R m ×n . The transforma-

ion matrix P maps ( X ) into the low dimensional space. Thus, the

roposed SDDH formulation in Eq. (20) can be further rewritten

s

ax P,W,B

tr (W

T BSB

T W ) − α‖ B − P T (X ) ‖

2 F

s.t. B ∈ {−1 , 1 } r×n (24)

Next, we will provide the details of the update of P, W and B in

hree steps.

Step 1: Update P

From Eq. (24) , we know that P only relies on B , thus if B is fixed

n Eq. (24) , P can be achieved by the following quadratic problem

ax P

−‖ B − P T (x ) ‖

2 F ⇔ min

P ‖ P T (X ) − B ‖

2 F (25)

aking the derivatives of Eq. (25) with respect to P and setting its

erivative to 0, we can easily obtain

= ((X )(X ) T ) −1 (X ) B

T (26)

Step 2: Update W

Similar to step 1, W is only dependent on B , thus if B is fixed in

q. (24) , W can be achieved by the following optimization problem

ax W

tr (W

T BSB

T W ) (27)

he above optimization problem can be solved by the conven-

ional eigenvalue-eigenvector problem, that is, W is composed of

’s eigenvalue vector corresponding the l’s eigenvalue of BSB T .

Step 3: Update B

When P and W are fixed, the SDDH formulation in Eq. (24) can

e rewritten as

ax B

tr (W

T BSB

T W )

−αtr ((B − P T (X )) T (B − P T (X )))

s.t. B ∈ {−1 , 1 } r×n (28)

ince tr( B T B ) and tr(( P T ( X )) T ( P T ( X ))) are constant, thus the

bove optimization problem is equivalent to

ax B

tr (W

T BSB

T W ) + βtr (B

T P T (X ))

s.t. B ∈ {−1 , 1 } r×n (29)

hich is further equivalent to

ax B

tr (B

T (W W

T BS + βP T (X ))

s.t. B ∈ {−1 , 1 } r×n (30)

hus, B t+1 can be updated with a closed-form solution as follows:

= sgn (W W

T B

t S + βP T (X )) (31)

he solution can be further simplified as

= sgn (AB

t S + βP T (X )) (32)

here B t is the learned binary hash codes in t th iteration. Similar

o [43] , B is updated by using a single step to solve all bits, making

t much faster than SDH, which adopts discrete cyclic coordinate

escent to learn the hash code bit by bit. From the state of the

ptimization, the per-iteration computational complexity of Step

is O ( m

2 n ), the per-iteration computational complexity of Step 2

s O ( rn 2 ) and the per-iteration computational complexity of Step

is O ( rn 2 ). The algorithm of the proposed SDDH is presented in

lgorithm 1 .

lgorithm 1 Supervised discrete discriminant hashing.

nput: Training data set X = { x i , y i } n i =1 ,

the maximum iteration number t ,

the number of hashing bits K.

the parameters α.

utput: the hashing bits { b i } n i =1 ∈ −1 , 1 L ×n .

the hash function.

. Randomly select m samples { u i } m

i =1 from the training

data to compose anchor points, φ(x ) is achieved by

using RBF kernel.

. Initialize b i ∈ {−1 , 1 } L randomly, i = 1 , 2 , . . . , n ;

P is initialized by Eq. (26);

W is initialized by Eq. (27).

epeat

Follow Step 3 to updata B;

Follow Step 2 to solve W;

Follow Step 1 to solve P;

ntil Convergence

From the state of the optimization, we known that the pro-

osed SDDH can achieve discrete discriminant hash codes by up-

ating the robust similarity metric A = W W

T by Eq. (27) . Thus the

earned hashing function can lead to the learned hashing codes of

he same class samples pairs highly similar while that of the dif-

erent class samples pairs may be highly different.

. Experiments

In this section, we evaluate the effectiveness of the proposed

DDH algorithm mainly on three large-scale image datasets: CIFAR-

0, MNIST and NUS-WIDE database. Moreover, to expand the data

ypes, we further apply the proposed SDDH on a face dataset: AR

ace data. Experiments were performed on a workgrop with an In-

el Xeon processor (2.20 GHz), 128GB RAM, and MATLAB 2012b.

In order to demonstrate the effectiveness of the proposed SDDH

lgorithm, we compare the proposed method with representative

ashing algorithms, namely LSH, ITQ, AGH KSH, SDH, and FSDH.

hese codes were downloaded from the homepage of the corre-

ponding authors and the parameters of these methods were set

s the suggested values in the corresponding papers. For AGH, SDH,

nd FSDH, 20 0 0 randomly sampled anchor points were utilized.

LSH: The hash functions were constructed by simply using ran-

om linear projections.

ITQ: The orthogonal hashing transform matrix minimizing the

uantization error of the mapped data to the vertices of binary hy-

ercube was searched by an iteration method.

KMH: The K mean hashing is an affinity-preserving quantiza-

ion method for learning binary compact codes.

ABQ: Adaptive binary quantization method is for fast nearest

eighbor search.


Table 1

Results of the compared methods in precision (%), recall (%) and F-measure (%) of Hamming distance 2 on

CIFAR-10 with code length 16, 32, 64, 96, 128 and 256.

LSH ITQ KMH ABQ ADH KSH SDH FSDH SDDH

Precision 16 9.32 23.77 20.38 19.93 22.38 34.46 44.38 41.8 48.06

32 8.24 22.23 19.65 24.60 27.31 39.89 51.57 50.84 52.56

64 7.41 24.19 21.40 23.00 25.96 42.82 45.02 44.71 47.63

96 5.37 20.26 20.60 25.33 25.06 43.25 35.76 35.04 44.93

128 4.93 25.3 20.01 22.65 24.52 43.69 29.77 29.35 41.99

256 7.41 23.27 19.01 21.49 21.86 44.67 19.8 22.13 37.68

Recall 16 0.13 1.17 3.47 5.96 4.47 3.45 27.90 29.24 32.14

32 0.13 1.17 3.35 7.75 0.42 3.99 15.58 15.38 17.02

64 0.09 1.30 3.66 6.69 0.06 4.28 8.81 9.98 16.68

96 0.07 1.09 3.52 7.37 0.05 4.33 7.20 8.08 11.29

128 0.05 1.38 3.44 6.59 0.05 4.37 5.97 7.16 9.91

256 0.09 1.09 3.25 6.25 0.04 4.47 3.72 5.81 8.80

F-measure 16 0.26 2.23 5.93 9.18 7.45 6.27 34.26 34.41 34.43

32 0.26 2.22 5.72 11.79 0.83 7.25 23.93 23.62 25.07

64 0.18 2.47 6.25 10.36 0.12 7.78 14.74 16.32 24.13

96 0.14 2.07 6.02 11.41 0.10 7.87 11.99 13.13 18.05

128 0.10 2.62 5.87 10.20 0.10 7.95 9.95 11.51 16.04

256 0.18 2.08 5.55 9.68 0.080 8.13 6.26 9.2 14.27

0 16 32 64 96 128 2560.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

Code length

MA

P

LSH

ITQ

AGH

KSH

SDH

FSDH

SDDH

Fig. 1. Compared MAP results on CIFAR-10 with code lengths 16, 32, 64, 96, 128

and 256.

s

S

m

F

p

w

f

b

s

t

c

4

m

e

a

a

w

i

i

p

s

AGH: Anchors were selected to approximate the data neighbor-

hood structure and the hash functions were learned based on the

selected anchors.

KSH: The hash function was learned by utilizing the pairwise

supervision information to achieve high quality hashing.

SDH: A supervised hashing approach based on the assumption

that good hash codes were optimal for linear classification.

FSDH: A fast supervised hashing approach based on the as-

sumption that good hash codes were optimal for linear classifica-

tion, which is similar to SDH.

SDDH: The proposed supervised discrete discriminant hashing.

4.1. Experiments on CIFAR-10

CIFAR-10 databases is composed of 60,0 0 0 images which are

manually labelled as 10 classes with 60 0 0 samples for each class.

Each image in this databases is represented by a GIST feature vec-

tor of dimension 512. In our experiment, 59,0 0 0 data point were

randomly selected for training and the rest data point for test-

ing. For AGH, KSH and SDH, we randomly selected 20 0 0 sam-

ples as anchor points, and the experimental results is reported by

precision, recall, and F-measure of Hamming radius 2 and MAP.

The definition of the F-measure is same as in [47] , which is de-

fined as 2 × precision × recal l / (precision + recal l ) . The experiments

results under different code lengths can be found in Table 1 .

From Table 1 , it is clearly seen that on this large dataset the

best performance of the proposed SDDH is better than that of

the best performance of the LSH, ITQ, AGH, KSH, SDH and FSDH

for each code length. Specifically, based on the precision indica-

tor, the best precision rate of the proposed algorithm is 52.56% on

32 bit code length, compared with 9.32% for LSH, 25.30% for ITQ,

21.40% for KMH, 24.60% for ABQ, 27.31% for AGH, 44.67% for KSH,

51.57% for SDH and 50.84% for SDH. Regarding the recall indicator,

the best recall rate of the proposed algorithm is 32.14% on 16 bit

code length, compared with 0.26% for LSH, 1.38% for ITQ, 3.66% for

KMH 7.75% for AGH4.47% for AGH, 4.47% for KSH, 27.9% for SDH

and 29.24% for FSDH. For the F-measure indicator, the best rate

of the proposed algorithm is 34.43% on 16 bit code length, com-

pared with 0.26% for LSH, 2.62% for ITQ, 6.25% for KMH, 11.75%

for ABQ, 7.45% for AGH, 8.13% for KSH, 34.26% for SDH and 34.41%

for FSDH. It is clear that the data non-dependent hashing methods

LSH achieve very low precisions and recall in the hamming space,

while the data dependent hashing methods achieve promising re-

ults. The discrete hashing learning SDH, FSDH and the proposed

DDH outperform the traditional hashing learning.

Furthermore, we compared the hamming ranking of these

ethods and the MAP score of these methods can be found in

ig. 1 . From the MAP scores, the best MAP performance of the

roposed algorithm is 0.4758% on 128 bit code length, compared

ith 0.1287% for LSH, 0.1633% for ITQ, 0.1466% for AGH, 0.3041%

or KSH, 0.4740% for SDH and 0.4712% for FSDH. Form Fig. 1 , it can

e found that the discrete hashing methods achieve similar MAP

cores and outperform LSH, ITQ, KMH, ABQ, AGH and KSH. Thus,

he proposed SDDH outperform all other hashing methods in pre-

ision, recall, F-measure and MAP.

.2. Experiments on MNIST

The MNIST dataset consists of 70,0 0 0 images, each of 784 di-

ensions, of handwritten digits from 0 to 9. In our experiment, ev-

ry class data set was randomly split into a training set and test set

t a ratio of 69:1, i.e. the training data set contains 69,0 0 0 samples,

nd the testing data set contains 10 0 0 samples. The training data

as used to train a robust hashing function, and the robust hash-

ng function was applied to testing data to obtain the correspond-

ng hashing code. The experimental results is reported in terms of

recision, recall, F-measure of Hamming radius 2 and MAP. The re-

ults of precision and MAP can be found in Table 2 .


Table 2

Results of the compared methods in precision (%) of Hamming distance 2 and MAP on MNIST with code length 16,

32, 64, 96, 128 and 256.

LSH ITQ KMH ABQ ADH KSH SDH FSDH SDDH

Precision 16 0.43 11.44 36.30 27.52 70.52 86.74 90.06 88.46 91.27

32 4.670 44.38 58.34 61.55 80.42 90.06 91.80 91.78 91.84

64 47.08 19.10 31.00 44.60 80.69 91.18 91.30 91.33 92.21

96 21.43 8.10 29.54 39.45 77.14 91.87 88.51 89.07 90.64

128 9.90 5.20 20.41 20.90 74.68 91.30 87.12 87.34 89.82

256 0.20 0.10 16.38 16.42 68.17 91.62 83.39 83.75 88.39

MAP 16 0.2074 0.4058 0.2652 0.4069 0.4722 0.8262 0.9007 0.8856 0.9026

32 0.2598 0.4433 0.2143 0.4931 0.4068 0.8637 0.9142 0.9127 0.9183

64 0.3246 0.4512 0.2333 0.4607 0.3394 0.8750 0.9217 0.9204 0.9308

96 0.3472 0.4635 0.1568 0.4914 0.3084 0.8760 0.9243 0.9238 0.9509

128 0.3786 0.4600 0.1515 0.4890 0.2870 0.8767 0.9267 0.9265 0.9330

256 0.4080 0.4687 0.1510 0.4697 0.2428 0.8781 0.9312 0.9286 0.9322

0 16 32 64 96 128 2560

10

20

30

40

50

60

70

80

90

Code length

Rec

all

LSH

ITQ

AGH

KSK

SDH

FSDH

SDDH

(a) Recall of Hamming radius 2.

16 32 64 96 128 2560

10

20

30

40

50

60

70

80

90

Code length

F−m

easu

re

LSH

ITQ

AGH

KSH

SDH

FSDH

SDDH

(b) F-measure of Hamming radius 2.

Fig. 2. Compared recall and F-measure results on MNIST with code lengths 16, 32, 64, 96, 128 and 256.

o

t

t

1

9

c

f

S

K

M

a

w

i

t

c

b

2

F

i

I

S

o

f

f

Table 3

Results of the compared methods in precision (%) of Hamming distance 2, Recall,

F-measure and MAP on NUS-WIDE with code length 16, 32, 64, 96, 128 and 256.

LSH ITQ KSH SDH FSDH SDDH

Precision 16 8.19 8.23 23.72 36.05 36.13 36.02

32 14.08 14.55 27.31 36.05 36.04 36.13

64 6.46 6.22 28.52 36.04 36.04 36.19

96 1.44 1.53 26.23 36.04 36.04 36.14

128 0.960 1.06 22.03 36.04 36.04 36.14

256 0.08 0.05 19.47 36.04 36.04 36.10

Recall 16 82.23 82.49 14.24 99.53 99.52 99.53

32 19.61 18.57 3.750 99.49 99.48 99.56

64 3.53 3.38 0.66 99.48 99.48 99.78

96 1.02 1.30 0.37 99.48 99.48 99.48

128 0.45 0.34 0.10 99.48 99.48 99.48

256 0.01 0.01 0.01 99.48 99.48 99.81

F-measure 16 14.90 14.97 17.80 52.93 52.91 52.9

32 16.39 16.32 6.59 52.92 52.91 53.02

64 4.57 4.38 1.290 52.91 52.91 53.12

96 1.19 1.41 0.73 52.91 52.91 53.02

128 0.61 0.51 0.20 52.91 52.91 53.02

256 0.02 0.02 0.02 52.91 52.91 53.02

MAP 16 0.5125 0.5073 0.5493 0.4918 0.4918 0.5680

32 0.5226 0.5183 0.5554 0.4918 0.4918 0.5680

64 0.5387 0.5394 0.5571 0.4918 0.4918 0.5684

96 0.5380 0.5315 0.5588 0.4918 0.4918 0.5680

128 0.5460 0.5457 0.5614 0.4918 0.4918 0.5680

256 0.5503 0.5429 0.5642 0.4918 0.4918 0.5680

From Table 2 , it is clear that the proposed SDDH outperform all

ther hashing methods in terms of precision and MAP. Specifically,

he proposed SDDH performs best when the code lengths is 64,

he precision is achieved 92.21%, compared with 47.08% for LSH,

9.10% for ITQ, 58.34% for KMH, 61.55% for ABQ, 80.69% for AGH,

1.18% for KSH, 91.30% for SDH and 91.33% for FSDH. When the

ode length is 64, the corresponding hashing rank MAP is 0.9308

or the proposed SDDH, compared with 0.9204 for FSDH, 0.9217 for

DH, 0.8750 for KSH, 0.3394 for AGH,0.4914 for ABQ, 0.2652 for

MH, 0.4512 for ABQ and 0.3246 for LSH. Thus the hashing rank

AP of the proposed SDDH outperforms all other hashing methods

s well. To demonstrate the effectiveness of the proposed SDDH,

e further compared the proposed SDDH with all other methods

n recall and F-measure. The results can be found in Fig. 2 .

From Fig. 2 , it is clear that that the proposed SDDH achieves

he best performance at each code length. Specifically, for the re-

all, the best recall rate of the proposed SDDH is 86.51% on 16

it code length, compared with 24.78% for LSH, 64.26% for ITQ,

5.26% for AGH, 8.68% for KSH, 82.89% for SDH and 83.76% for

SDH. When the code length is 32, the recall of the proposed SDDH

s very similar to SDH and FSDH, and obviously outperform LSH,

TQ, KMH, ABQ, AGH and KSH. For the F-measure, the proposed

DDH significantly outperforms all other supervised hashing meth-

ds, and the F-measure curves is very similar to the recall curves

or SDH, FSDH and SDDH. Therefore, the proposed SDDH outper-

orm all other hashing methods in hash lookup and hamming rank.


0

100

200

300

400

500

600

Tra

inin

g t

ime

(s)

LSHIT

QKSH

SDHFSDH

SDDH0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Tra

inin

g t

ime

(s)

LSHIT

QKSH

SDHFSDH

SDDH

(a) The training-testing time of code

length 16.

0

200

400

600

800

1000

1200

Tra

inin

g t

ime

(s)

LSHIT

QKSH

SDHFSDH

SDDH0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Tes

tin

g t

ime

(s)

LSHIT

QKSH

SDHFSDH

SDDH

(b) The training-testing time of code

length 32.

0

500

1000

1500

2000

2500

Tra

inin

g t

ime

(s)

LSHIT

QKSH

SDHFSDH

SDDH0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Tes

tin

g t

ime

(s)

LSHIT

QKSH

SDHFSDH

SDDH

(c) The training-testing time of code

length 64.

0

500

1000

1500

2000

2500

3000

Tra

inin

g t

ime

(s)

LSHIT

QKSH

SDHFSDH

SDDH0

0.2

0.4

0.6

0.8

1

1.2

1.4

Tes

tin

g t

ime

(s)

LSHIT

QKSH

SDHFSDH

SDDH

(d) The training-testing time of code

length 96.

0

500

1000

1500

2000

2500

3000

3500

4000

Tra

inin

g t

ime

(s)

LSHIT

QKSH

SDHFSDH

SDDH0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Tes

tin

g t

ime

(s)

LSHIT

QKSH

SDHFSDH

SDDH

(e) The training-testing time of code

length 128.

0

1000

2000

3000

4000

5000

6000

7000

8000

Tra

inin

g t

ime

(s)

LSHIT

QKSH

SDHFSDH

SDDH 0

0.2

0.4

0.6

0.8

1

1.2

1.4

Tes

tin

g t

ime

(s)

LSHIT

QKSH

SDHFSDH

SDDH

(f) The training-testing time of code

length 256.

Fig. 3. Compared the training-testing time with code lengths 16, 32, 64, 96, 128 and 256.

l

a

p

a

a

T

b

m

p

c

4.3. Experiments on NUS-WIDE

The NUS-WIDE database is composed of 270,0 0 0 images which

is collected from Flickr. The 270,0 0 0 images is associated with

10 0 0 most frequent tags, and 81 ground truth concept labels is

labeled by humans for all images. The content of the images is ex-

tracted by local SIFT features descriptor, 500D bag of words based

on SIFT descriptions were used in our experiments, and the 21

most frequent labels were collected to test as in [28] . For each la-

bel, 100 images were randomly selected to constitute the query set

and the remaining images were used for the training set. For this

arge dataset, we define x j and x i as the same class samples if x j nd x i share at least one same class label. We also evaluated the

roposed SDDH and compared it against LSH, ITQ, AGH, KSH, SDH,

nd FSDH in precision, recall, and F-measure of Hamming radius 2

nd MAP as well. Experimental results on NUS-WIDE are shown in

able 3 .

From Table 3 , we can see that the proposed SDDH achieves the

est performance for each code length under precision, recall, F-

easure of Hamming radius 2 and MAP index. Specifically, for the

recision, the proposed SDDH achieves 36.19% precision when the

ode length is 64, compared with the best performance of others


Table 4

Results of the compared methods in precision (%) of Hamming distance 2, Recall, F-measure and

MAP on AR with code length 16, 32, 64, 96, 128 and 256.

LSH ITQ AGH KSH SDH FSDH SDDH

Precision 16 2.10 2.48 13.06 16.51 41.75 37.97 43.03

32 11.56 7.83 25.52 6.35 27.60 26.44 28.47

64 24.88 39.17 28.73 0.63 15.42 15.00 15.61

96 1.67 17.49 27.06 0.42 13.96 11.87 13.85

128 0.63 7.50 24.82 0.42 12.08 9.79 15.56

256 0 0.420 18.42 0 8.54 6.67 15.82

Recall 16 12.05 12.08 33.52 9.27 40.00 38.19 48.12

32 19.41 18.54 17.94 1.81 20.67 18.33 31.42

64 3.45 16.06 11.35 0.23 12.79 9.31 25.54

96 0.13 1.91 8.44 0.42 10.88 6.56 15.08

128 0.01 0.76 6.730 0.31 9.46 4.46 14.19

256 0 0.02 3.48 0 6.33 1.58 13.94

F-measure 16 3.58 4.12 18.8 11.87 40.86 38.08 45.43

32 14.49 11.01 21.07 2.82 23.64 21.65 29.87

64 6.06 22.78 16.27 0.34 13.98 11.49 19.38

96 0.24 3.44 12.87 0.42 12.23 8.45 14.44

128 0.02 1.38 10.59 0.36 10.61 6.13 14.84

256 / 0.04 5.85 / 7.27 2.55 14.82

MAP 16 00.0794 0.0901 0.1655 0.1394 0.472 0.4375 0.5258

32 0.1009 0.1512 0.2082 0.2881 0.5484 0.537 0.6055

64 0.1634 0.2111 0.2282 0.3571 0.6526 0.6423 0.6711

96 0.1952 0.2451 0.2283 0.385 0.6879 0.6519 0.6351

128 0.2167 0.2692 0.2278 0.4832 0.6484 0.6844 0.6954

256 0.2613 0.3055 0.221 0.4335 0.715 0.703 0.7338

Table 5

The training and testing time on AR dataset with code length 16, 32, 64, 96, 128 and 256.

Code No. Time LSH ITQ AGH KSH SDH FSDH SDDH

16 training 0.10 0.31 0.71 75.63 0.51 0.48 0.63

testing 1.4e −3 3.5e −3 3.2e −2 3.9e −2 3.1e −3 3.9e −3 3.0e −3

32 training 0.080 0.34 0.62 168.92 0.87 0.44 0.47

testing 0 3.1e −3 3.3e −2 3.2e −2 2.8e −3 3.2e −3 2.8e −3

64 training 0.080 0.51 0.69 358.57 2.210 0.44 0.50

testing 0 3.5e −3 3.5e −2 3.4e −2 2.9e −3 2.5e −3 2.6e −3

96 training 0.090 0.68 0.68 537.64 9.9 0.52 0.53

testing 1.0e −4 9.6e −3 3.3e −2 3.2e −2 3.0e −3 4.1e −2 3.6e −3

128 training 0.090 1.260 0.76 730.9 21.52 0.58 0.66

testing 1.0e −4 1.4e −2 4.0e −2 3.4e −2 7.0e −3 5.9e −3 6.5e −3

256 training 0.12 3.540 0.73 1235.69 78.10 0.72 0.88

testing 1.0e −4 2.0e −2 3.7e −2 3.0e −2 8.3e −3 7.8e −3 7.9e −3

d

c

p

m

1

l

a

1

m

h

h

i

o

n

a

d

S

W

i

i

t

q

o

4

o

h

i

t

f

f

d

K

c

a

M

F

T

a

p

r

a

c

c

F

iscrete hashing learning methods, SDH achieves 36.05% on 32 bit

ode length and FSDH achieves 36.13% on 16 bit code length; com-

ared with the best performance of others non-discrete hashing

ethods, LSH achieves 14.08% on 32 bit code length, ITQ achieves

4.55% on 32 bit length and KSH achieves 28.52% on 64 bit code

ength. For the recall of Hamming radius 2, the proposed SDDH

chieves 99.78%, compared with 82.23% for LSH, 82.49% for ITQ,

4.24% for KSH, 99.53% for SDH and 99.52% for FSDH. For the F-

easure, the proposed SDDH is superior to all other non-discrete

ashing learning methods and slightly outperforms the discrete

ashing learning methods SDH and FSDD. For the hamming rank

ndex, the MAP of the proposed SDDH is slightly superior to all

ther hashing learning methods. To better illustrate the effective-

ess of the proposed SDDH, we further compared the training time

nd the testing time of these hashing learning methods and the

etails can be found in the Fig. 3 .

From Fig. 3 , we can see that the training time of the proposed

DDH is significantly shorter than KSH, similar to LSH, ITQ, FSDH.

ith increasing the code length, the training time of KSH and SDH

s increased quickly while the training of other hashing methods

s changed slightly. however with increasing the code length, the

esting time of SDH, FSDH and the proposed SDDH is increased

uickly. Thus the timeliness of the proposed SDDH is poorer than

ther hashing methods to some extent.

.4. Experiments on AR dataset

To expend the variety of data, we applied the proposed SDDH

n AR face data, and compares the proposed SDDH with other

ashing methods. The AR database consists of over 40 0 0 frontal

mages for 126 individuals. For each individual, 26 pictures were

aken in two separate sessions [49] . These images include more

acial variations, including illumination change, expressions, and

acial disguises. In our experiment, every class data set was ran-

omly split into a training set and test set at a ratio of 5:2. For

SH, SDH and FSDH, we randomly selected 500 samples as an-

hor points. The performance of the proposed SDDH was evalu-

ted in precision, recall, and F-measure of Hamming radius 2 and

AP, and compared against that of LSH, ITQ, AGH, KSH, SDH, and

SDH in as well. Experimental results on NUS-WIDE are shown in

able 4 .

From Table 4 , it is found that the proposed SDDH outperforms

ll other hashing methods. When the code length is 16, the pro-

osed SDDH achieves best performance. Specifically, the precision

ate is 43.03%, compared with 41.75% for SDH and 37.97% for FSDH,

nd significantly higher than LSH, ITQ, AGH and KSH. When the

ode length is 16, the recall rate of the proposed SDDH is 48.12%,

ompared with 33.52% for AGH, 40.00% for SDH and 38.19% for

SDH, and obviously outperform LSH, ITQ and KSH. The F-feature


F

6

o

e

C

d

t

z

n

G

G

4

R

of the proposed SDDH is 45.43% and the MAP is 0.7338, which

outperform all other hashing methods. Furthermore, we compared

the training and testing time of the proposed SDDH with LSH, ITQ,

AGH, KSH, SDH and FSDH. From Table 5 , we know that the train-

ing time of the proposed SDDH is similar to FSDH, and the testing

time is similar to SDH and FSDH. The training time of KSH is much

higher than the proposed SDDH and other methods.

4.5. Discussion

We evaluate the effectiveness of the proposed SDDH on CIFAR-

10, MNIST and NUS-WIDE and AR face datasets. Form the results

of the experiments, it is clear that the performance of SDDH out-

performs the data-independent hashing methods LSH, the unsu-

pervised hashing methods ITQ KMH, ABQ and AHG, and the super-

vised hashing methods KSH, SDH and FSDH. For CIFAR-10 dataset,

the proposed SDDH achieves better performance in hash lookup

index (precision, recall and F-measure), while the performance of

the hamming ranking MAP index is similar to SDH and FSDH, but

obviously higher than LSH, ITQ, KMH, ABQ, AGH and KSH. For the

MNIST dataset, the proposed SDDH slightly outperforms KSH, SDH

and FSDH in precision and MAP index, while is obviously supe-

rior to LSH, ITQ, KMH, ABQ and AGH. For NUS-WIDE and AR face

datasets, the proposed SDDH outperforms all other methods as

well though the advantage of efficiency is not highlighted. For SDH

and FSDH, the proposed learning frameworks use l 2 loss function

for the classification model, the l 2 loss function is mainly used to

measure the loss between the learned hash bits and class labels

vector, the l 2 loss function may be more reasonable for multi-class

labels databases. In comparing with SDH and FSDH, the proposed

SDDH enhances the discriminant of the training data by maxi-

mizing the similarity of the same class samples and minimizing

the similarity of the different class samples simultaneously, which

is effective for single-class label and multi-class labels databases.

In contrast to the state-of-the-art hashing methods, the proposed

SDDH can incorporate the label information into the hash code

learning, update the directly learned hash codes in each iteration

and achieve the optimal hashing function from the directly learned

discrete hash codes.

5. Conclusions

In this paper, we propose the supervised discrete discriminant

hashing (SDDH). To utilize the label information of the training

data, a robust learned distance metric is used to make the binary

bits of the same class samples more similar, and the binary bits

of the different class samples more dissimilar, such that the dis-

criminant information of the training data can be incorporated into

the learning framework. Meanwhile, let the learned hash function

achieve optimal approximate discrete hash codes, a hash functions

learned regular term is embedded in the proposed supervised dis-

crete discriminant hashing framework, which make the hash func-

tions to be optimized based on the directly learned discrete hash

codes. Meanwhile, such learned hashing function is optimal for the

testing data. Therefore, the proposed supervised discrete discrim-

inant hashing incorporates the discrete binary bits learning and

hashing functions learning into an integrated framework, which

make the SDDH framework more practical in the real-world ap-

plications. Furthermore, the experimental results show that the ef-

ficacy of the proposed SDDH for large-scale image retrieval.

Acknowledgments

The authors would like to thank the editor and the anony-

mous reviewers for their critical and constructive comments and

suggestions. This work is supported by the National Science

oundation of China (Grant no. 61601235 , 61573248 , 61773328 ,

1732011 ), the Natural Science Foundation of Jiangsu Province

f China (Grant no. BK20170768 , BK20160972 ), the Natural Sci-

nce Foundation of the Jiangsu Higher Education Institutions of

hina (Grant no. 17KJB520 019, 16KJB520 031), the Startup Foun-

ation for Introducing Talent of Nanjing University of Informa-

ion Science and Technology (Grant no. 2243141601019), the Shen-

hen Municipal Science and Technology Innovation Council (Grant

o. JCYJ20170302153434048), the Natural Science Foundation of

uangdong Province (Grant no. 2017A030313367) and a Research

rant of the Hong Kong Polytechnic University (Project code:

-ZZDR ).

eferences

[1] A. Torralba , R. Fergus , W.T. Freeman , 80 million tiny images: a large data set for

nonparametric object and scene recognition, IEEE Trans. Pattern Anal. Mach.Intell. 30 (11) (2008) 1958–1970 .

[2] B. Kulis , K. Grauman , Kernelized locality-sensitive hashing for scalable im-age search, in: Proceedings of the 2009 IEEE 12th International Conference on

Computer Vision, IEEE, 2009, pp. 2130–2137 . [3] F. Shen , Y. Yang , L. Liu , W. Liu , D. Tao , et al. , Asymmetric binary coding for

image search, IEEE Trans. Multim. 19 (2) (2017) 2020–2032 .

[4] Z. Li , X. Liu , J. Wu , H. Su. , Adaptive binary quantization for fast nearest neigh-bor search, Proc. Eur. Conf. Artif. Intell. (2016) . 64C72.

[5] D. Greene , M. Parnas , F. Yao , Multi-index hashing for information retrieval, in:Proceedings of the of the 35th Annual Symposium on Foundations of Com-

puter Science, IEEE, in: Foundations of Computer Science, 1994, pp. 722–731 . [6] X. Liu, J. He, C. Deng, B. Lang., 2014, Collaborative hashing, Proceedings of the

IEEE Conference on Computer Vision and Pattern Recognition, 2139–2146. [7] X. Liu , X. Fan , C. Deng , Z. Li , H. Su , D. Tao. , Multilinear hyperplane hashing, in:

Proceedings of the IEEE Conference on Computer Vision and Pattern Recogni-

tion, 2016, pp. 5119–5127 . [8] J. Cheng , C. Leng , J. Wu , H. Cui , H. Lu. , Fast and accurate image matching with

cascade hashing for 3d reconstruction, in: Proceedings of the IEEE Conferenceon Computer Vision and Pattern Recognition, 2014, pp. 1–8 .

[9] K. He , F. Wen , J. Sun , K-means hashing: an affinity-preserving quantizationmethod for learning binary compact codes, in: Proceedings of the IEEE Con-

ference on Computer Vision and Pattern Recognition, 2013, pp. 2938–2945 .

[10] F. Shen , X. Zhou , Y. Yang , J. Song , H.T. Shen , D. Tao. , A fast optimizationmethod for general binary code learning, IEEE Trans. Image Process. 25 (2016)

5610–5621 . [11] F. Shen , C. Shen , et al. , Hashing on nonlinear manifolds, IEEE Trans. Image Pro-

cess. 24 (6) (2015) 1839–1851 . [12] F. Godin , V. Slavkovikj , W. De Neve , Using topic models for Twitter hashtag

recommendation, Proc. ACM (2013) 593–596 .

[13] M. Datar , N. Immorlica , P. Indyk , et al. , Locality-sensitive hashing scheme basedon p-stable distributions, in: Proceedings of the Twentieth Annual Symposium

on Computational Geometry, ACM, 2004, pp. 253–262 . [14] S. Similarity , M. Charikar , Estimation techniques from rounding algorithms, in:

Proceedings of the thiry-fourth annual ACM symposium on Theory of comput-ing, ACM, 2002, pp. 380–388 .

[15] B. Kulis , P. Jain , K. Grauman , Fast similarity search for learned metrics, IEEE

Trans. Pattern Anal. Mach. Intell. 31 (12) (2009) 2143–2157 . [16] B. Kulis , K. Grauman , Kernelized locality-sensitive hashing for scalable im-

age search, in: Proceedings of the 2009 IEEE 12th International Conference onComputer Vision, 2009, pp. 2130–2137 .

[17] M. Raginsky , S. Lazebnik , Locality-sensitive binary codes from shift-invariantkernels, Adv. Neural Inf. Process. Syst. (2009) 1509–1517 .

[18] Y. Weiss , A. Torralba , R. Fergus , Spectral hashing, Adv. Neural Inf. Process. Syst.

(2009) 1753–1760 . [19] Y. Weiss, R. Fergus, A. Torralba, Multidimensional spectral hashing, 2012, Pro-

ceedings of the European Conference on Computer Vision, 340–353. [20] Y. Zhen , Y. Gao , D.Y. Yeung , Spectral multimodal hashing and its application to

multimedia retrieval, IEEE Trans. Cybern. 46 (1) (2016) 27–38 . [21] Y. Gong , S. Lazebnik , A. Gordo , F. Perronnin , Iterative quantization: a pro-

crustean approach to learning binary codes for large-scale image retrieval, IEEE

Trans. Pattern Anal. Mach. Intell. 35 (12) (2013) . 2916C2929. [22] F. Shen , C. Shen , Q. Shi , et al. , Inductive hashing on manifolds, in: Proceed-

ings of the IEEE Conference on Computer Vision and Pattern Recognition, 2013,pp. 1562–1569 .

[23] W. Liu , J. Wang , S. Kumar , et al. , Hashing with graphs, in: Proceedings of the28th International Conference on Machine Learning (ICML-11), 2011, pp. 1–8 .

[24] W. Kong , W.J. Li , Isotropic hashing, Adv. Neural Inf. Process. Syst. (2012)1646–1654 .

[25] D. Zhang , J. Wang , D. Cai , et al. , Self-taught hashing for fast similarity search,

in: Proceedings of the 33rd International ACM SIGIR Conference on Researchand Development in Information Retrieval, ACM, 2010, pp. 18–25 .

[26] W. Liu , C. Mu , S. Kumar , et al. , Discrete graph hashing, Adv. Neural Inf. Process.Syst. (2014) 3419–3427 .

[27] Z. Chen , J. Zhou , Collaborative multiview hashing, Pattern Recognit. (2017) .

https://doi.org/10.13039/501100001809

https://doi.org/10.13039/501100001809

https://doi.org/10.13039/501100004608

https://doi.org/10.13039/501100004377

http://refhub.elsevier.com/S0031-3203(18)30009-8/sbref0001



















































































































[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

28] J. Wang , S. Kumar , S.F. Chang , Semi-supervised hashing for scalable image re-trieval, in: Proceedings of the 2010 IEEE Conference on Computer Vision and

Pattern Recognition (CVPR), IEEE, 2010, pp. 3424–3431 . 29] J. Wang , S. Kumar , S.F. Chang , Semi-supervised hashing for large-scale search,

IEEE Trans. Pattern Anal. Mach. Intell. 34 (12) (2012) 2393–2406 . 30] S. Kim , S. Choi , Semi-supervised discriminant hashing, in: Proceedings of the

2011 IEEE 11th International Conference on Data Mining (ICDM), IEEE, 2011,pp. 1122–1127 .

[31] B. Kulis , T. Darrell , Learning to hash with binary reconstructive embeddings,

Adv. Neural Inf. Process. Syst. (2009) 1042–1050 . 32] T. Song , J. Cai , T. Zhang , Semi-supervised manifold-embedded hashing with

joint feature representation and classifier learning, Pattern Recognit. 68 (2017)99–110 .

[33] W. Liu , J. Wang , R. Ji , et al. , Supervised hashing with kernels, in: Proceedings ofthe 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR),

IEEE, 2012, pp. 2074–2081 .

34] F. Shen , C. Shen , W. Liu , et al. , Supervised discrete hashing, in: Proceedings ofthe 2015 IEEE Conference on Computer Vision and Pattern Recognition, 2015,

pp. 37–45 . [35] M. Norouzi , D.M. Blei , Minimal loss hashing for compact binary codes, in: Pro-

ceedings of the 28th International Conference on Machine Learning (ICML-11),2011, pp. 353–360 .

36] R. Xia , Y. Pan , H. Lai , et al. , Supervised hashing for image retrieval via image

representation learning, in: Proceedings of the 2014 Association for the Ad-vancement of Artificial Intelligence, 1, 2014, p. 2 .

[37] J. Tang , Z. Li , X. Zhu , Supervised deep hashing for scalable face image retrieval,Pattern Recognit. (2017) .

38] E.Yang , C. Deng , W. Liu , X. Liu , D. Tao , X. Gao. , Pairwise relationship guideddeep hashing for cross-modal retrieval, in: Proceedings of the Thirty-First AAAI

Conference on Artificial Intelligence (AAAI-17), 2017, pp. 1618–1625 . 39] L. Liu, F. Shen, Y. Shen, X. Liu, L. Shao., Deep sketch hashing: fast free-hand

sketch-based image retrieval, 2017, arXiv: 1703.05605 . 40] J. Song , L. Gao , L. Liu , Quantization-based hashing: a general framework for

scalable image and video retrieval, Pattern Recognit. (2017) . [41] Y. Luo , Y. Yang , F. Shen , et al. , Robust discrete code modeling for supervised

hashing, Pattern Recognit. (2017) .

42] G. Lin , C. Shen , H.A. van den , Supervised hashing using graph cuts and boosteddecision trees, IEEE Trans. Pattern Anal. Mach. Intell. 37 (11) (2015) 2317–2331 .

43] G. Lin , C. Shen , Q. Shi , et al. , Fast supervised hashing with decision trees forhigh-dimensional data, in: Proceedings of the IEEE Conference on Computer

Vision and Pattern Recognition, 2014, pp. 1963–1970 . 44] Y. Mu , J. Shen , S. Yan , Weakly-supervised hashing in kernel space, in: Com-

puter Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on. IEEE,

2010, pp. 3344–3351 . 45] L. Liu , Z. Lin , L. Shao , et al. , Sequential discrete hashing for scalable cross–

modality similarity retrieval, IEEE Trans. Image Process. 26 (1) (2017) 107–118 .46] H. Zhang , F. Shen , W. Liu , et al. , Discrete collaborative filtering, in: Proceedings

of the 39th International ACM SIGIR Conference on Research and Developmentin Information Retrieval, ACM, 2016, pp. 325–334 .

[47] J. Gui , T. Liu , Z. Sun , et al. , Fast supervised discrete hashing, IEEE Trans. Pattern

Anal. Mach. Intell. (2017) . 48] C. Zhang , W.S. Zheng , Semi-supervised multi-view discrete hashing for fast im-

age search, IEEE Trans. Image Process. 26 (6) (2017) 2604–2617 . 49] A. Martinez , R. Benavente , The AR Face Database, Tech. Report #24, CVC, 1998 .
















































http://arxiv.org/abs/1703.05605













































ng University, Liaocheng, China, in 2008 and 2011, respectively. She received her Ph.D.

logy, on the subject of pattern recognition and intelligence systems in 2015. She visited niversity of Miami, USA, from May 2013 to November, 2013; She is currently working as

g, Hong Kong Polytechnic University. She is a lecturer in the College of Mathematics and

l Education. Her current research interests include pattern recognition, machine learning

al University, Huainan, China in 2007. He received the M.S. degree from Inner Mongolia

om the Nanjing University of Science and Technology, on the subject of pattern recog- 013 to August 2013, he was an Exchange Student with the Department of Computing,

he is a lecturer in the School of Computer and Software, Jiangsu Engineering Center of ience and Technology. His current research interests include image denoising and image

South China Normal University, M.S degree from Jinan University, and the Ph.D. degree

g University of Science and Technology (NUST), China, in 20 02, 20 07 and 2011, respec- ral Fellow from 2010 to 2013 at The Hong Kong Polytechnic University. Currently, he is

Shenzhen Graduate School, Harbin Institute of Technology (HIT). His research interests ased image retrieval, pattern recognition, compressive sense, human vision modelization

.

ion and intelligence system from Nanjing university of Science and Technology (NUST), atics and Information Science, Nanjing Normal University of Special Education, Nanjing

tern recognition, computer vision, face recognition, facial expression analysis and content-

g Kong Polytechnic University. Currently, he is an associate professor in this university. fereed journals, including IEEE Transactions on Neural Networks and Learning Systems,

conomics, European Journal of Operational Research, International Journal of Production tems, Man, and Cybernetics, among others. His recent research interests include artificial

timization of manufacturing scheduling, planning and control.

Yan Cui received her B.S. and M.S. degree from Liaoche

degree from the Nanjing University of Science and Technothe Department of Electrical and Computer Engineering, U

a research assistant at the Institute of Textiles and Clothin

Information Science, Nanjing Normal University of Speciaand image retrieval.

Jielin Jiang received the B.S. degree from Huainan Norm

University of Technology in 2010 and the Ph.D. degree frnition and intelligence systems in 2015. From February 2

the Hong Kong Polytechnic University, Hong Kong. Now, Network Monitoring, Nanjing University of Information Sc

classification.

Zhihui Lai received the B.S degree in mathematics from

inpattern recognition and intelligence system from Nanjintively. He has been a research associate and a Postdocto

a Postdoctoral Fellow at Bio-Computing Research Center,include face recognition, image processing and content-b

and applications in the fields of intelligent robot research

Zuojin Hu received the Ph.D. degree in pattern recognitNanjing, China. He is a professor in the College of Mathem

210038, China. His current interests are in the areas of pat

based image retrieval.

Waikeung Wong received his Ph.D. degree from The HonHe has published more than fifty scientific articles in re

Pattern Recognition, International Journal of Production EResearch, Computers in Industry, IEEE Transactions on Sys

intelligence, pattern recognition, feature extraction and op

Supervised discrete discriminant hashing for image...

Documents

Transcript of Supervised discrete discriminant hashing for image...