Contour detection via stacking random forest...
Transcript of Contour detection via stacking random forest...
Neurocomputing 275 (2018) 2702–2715
Contents lists available at ScienceDirect
Neurocomputing
journal homepage: www.elsevier.com/locate/neucom
Contour detection via stacking random forest learning
Chao Zhang
a , Junchi Yan
b , c , ∗, Changsheng Li d , Rongfang Bie
a
a College of Information Science and Technology, Beijing Normal University, China b Shanghai Key Laboratory of Trustworthy Computing, East China Normal University, China c IBM Research, China d School of Computer Science and Engineering, University of Electronic Science and Technology of China, China
a r t i c l e i n f o
Article history:
Received 5 April 2017
Revised 5 November 2017
Accepted 21 November 2017
Available online 28 November 2017
Communicated by Xiang Bai
Keywords:
Contour dectection
Image processing
Feature mapping
a b s t r a c t
Contour detection is an important and fundamental problem in computer vision which finds numerous
applications. Despite significant progress has been made in the past decades, contour detection from
natural images remains a challenging task due to the difficulty of clearly distinguishing between edges
of objects and surrounding backgrounds. To address this problem, we first capture multi-scale features
from pixel-level to segment-level using local and global information. These features are mapped to a
space where discriminative information is captured by computing posterior divergence of Gaussian mix-
ture models and sufficient statistics based on deep Boltzmann machine. Then we introduce a stacking
random forest learning framework for contour detection. We evaluate the proposed algorithm against
leading methods in the literature on the Berkeley segmentation and Weizmann horse data sets. Exper-
imental results demonstrate that the proposed contour detection algorithm performs favorably against
state-of-the-art methods in terms of speed and accuracy.
© 2017 Elsevier B.V. All rights reserved.
s
a
n
S
u
p
1. Introduction
Object contour is of prime importance as it contains essential
visual information such as shape and identity that finds numerous
applications. Contour detection is a fundamental problem in com-
puter vision which is closely related to other tasks, e.g., segmenta-
tion, shape discrimination, and object recognition [1–4] .
In this work, we propose a learning algorithm for contour de-
tection based on stacking random forest learning by using multi-
level visual cues. We extract pixel-level features that integrate both
local and global visual information. In addition, segment-level fea-
tures are extracted to exploit structural information of contours.
All the features are mapped to a space for selecting discriminative
ones via a novel algorithm based on posterior divergence of Gaus-
sian mixture models and sufficient statistics feature mapping based
on deep Boltzmann machine. A stacking random forest learning
classifier is trained based on these features for contour detection.
We evaluate the proposed algorithm with state-of-art methods
on several databases including the Berkeley segmentation [5] and
Weizmann horse database (WHD) [6] and the Weizmann segmen-
tation database (WSD) [7] . Experimental results bear out feature
∗ Corresponding author at: Shanghai Key Laboratory of Trustworthy Computing,
East China Normal University, North Zhongshan Road Campus: 3663 N. Zhongshan
Rd., Shanghai 20 0 062, China
E-mail address: [email protected] (J. Yan).
[
s
c
https://doi.org/10.1016/j.neucom.2017.11.046
0925-2312/© 2017 Elsevier B.V. All rights reserved.
election from multi-scale visual cues via posterior divergence with
random forest classifier facilitates effective contour detection in
atural images.
In a nutshell, the main contributions of this paper are: 1
• This paper presents the so-called sufficient image features in-
cluding multi-scale pixel level and segment level for contour
detection.
• This paper describes posterior divergence feature mapping us-
ing Gaussian mixture model and sufficient statistics feature
mapping based on deep Boltzmann machine.
• This paper introduces stacking random forest learning frame-
work for contour detection. Moreover we perform comprehen-
sive empirical studies on the performance of different feature
extractors, feature mapping methods and classifiers and the
stacking of their different combinations.
The paper is organized as follows. Related work is introduced in
ection 2 . Section 3 presents the image features extraction method
sed in our model. Section 4 describes the proposed feature map-
ing methods. Section 5 introduces the stacking random forest
1 The preliminary version of this paper has appeared in the conference paper
8–10] . We make several extensions including: (i) an updated introduction and re-
lated work review on the recent development for contour detection; (ii) a new
tacking forest learning framework for contour detection; (iii) updated and more
omprehensive experimental results are reported with various ablation tests.
C. Zhang et al. / Neurocomputing 275 (2018) 2702–2715 2703
Fig. 1. Framework of the stacking random forest learning (SRFL) for contour detection.
l
s
o
c
2
o
i
a
l
[
m
m
p
t
a
b
t
b
a
e
t
a
m
a
c
t
r
c
t
a
l
v
i
s
n
earning for contour detection. Fig. 1 shows the framework of the
tacking random forest learning for contour detection. Experiments
n natural images are depicted in Section 6 and Section 7 con-
ludes this paper.
. Related work
There exists a large body of work on contour extraction based
n different techniques including edge detection, perceptual group-
ng and saliency maps [1] . The classical Robert and Sobel oper-
tors [1] identify edges by convolving a gray-scale image with
ocal derivative filters whereas another classical work i.e., Canny
11] edge detector computes the image gradient and applies non-
axima suppression and hysteretic thresholds on the gradient
agnitude. The compass operator [12] uses distributions of color
ixels to determine the orientation of a diameter that maximizes
he difference between two halves of a circular window. Recently
n algorithm for estimating the probability of pixel on contour
oundary ( Pb ) [13] is proposed which uses a combination of fea-
Fig. 2. Image features: (a) input, (b) magnitude of gradient, (c) direction of gradient, (d
ures based on brightness, color, and texture. A tree-structured
oosted edge learning (BEL) method [14] is proposed by selecting
nd combining a large number of features across different scales
xtracted from image patches for edge and object boundary de-
ection. The gPb detector [5,15] combines multiple local cues in
probabilistic framework based on spectral clustering with two
ain components: the mPb detector based on local image analysis
t multiple scales, and the sPb detector based on the normalized
ut segmentation results. Papari and Petkov [16] use steerable fil-
ers to construct a model with an inhibition term to remove spu-
ious edges in textured regions. Yao et al. [17] learns a cascade
lassifier to incorporate both appearance and structure informa-
ion iteratively to class specific object contour detection. Dollár
nd Zitnick [18,19] use structured forests to detect edge. Convo-
utional neural networks (CNN) have been widely used to extract
isual features from images in recent years, and many promis-
ng progresses have been made for solving typical computer vi-
ion problems [20,21] . Bertasius et al. [22] use a multi-scale deep
etwork that consists of five convolutional layers and a bifurcated
) inhibition term, (e) brightness gradient, (f) color gradient, (g) compass operator.
2704 C. Zhang et al. / Neurocomputing 275 (2018) 2702–2715
s
i
i
p
a
o
3
s
i
a
v
t
n
r
o
g
a
d
i
b
fully-connected sub-network to detect contour. Shen et al. [23] in-
troduce a deep convolutional feature learned by positive-sharing
loss for contour detection. Xie and Tu [24] present a deep learn-
ing model that leverages fully convolutional neural networks and
deeply-supervised nets for detecting edge. RCF-MS [25] uses richer
convolutional features to detect contours and achieves state-of-the-
art performance. However these deep neural network [26] based
methods in general have to learn their parameters on a relatively
large dataset and with enough training epochs to obtain represen-
tative filters in convolution layers.
3. Image features for contour detection
We first describe the image features for learning a contour de-
tector in the proposed algorithm shown in Fig. 1 . These features
have been used for representing edge information of gray scale
and color images. In this work, all the features are integrated for a
more effective image representation.
3.1. Pixel-level features
Pixel-level features provide raw and basic visual cues for de-
tection of object contours. To capture effective visual information
on the pixel level, we extract local and global features at multiple
Fig. 3. Recalls of object contours using the BSDS500 data set: (a) recall of each training
number of superpixels.
Fig. 4. With 200 superpixels: (a) input, (b) superpixel, (c) edges i
cales. Local visual cues extracted from edges are first exploited
n order to account for object contours at different scales. Global
nformation extracted from visual saliency is then incorporated to
rovide cues of salient objects [27] in the scenes. These features
re integrated to form effective pixel-level features to represent
bject contours.
.1.1. Multi-scale point features
Basic point (pixel) features have been widely used for repre-
enting edge information of grayscale and color images such as
mage gradients, texture inhibition, brightness and color gradients
s well as compass operators. As these features capture different
isual information, we extract and combine these features for con-
our detection.
Image gradient. Each image I is convolved with a Gaussian ker-
el of width σ to compute its gradient, ∇I . The magnitude | ∇I |
eveals the strength of an edge at each pixel and the direction
r angle θ∇
contains the intensity discontinuity information. Both
radient magnitude and direction are used as features for learning
contour detector (See Fig. 2 (b) and (c)). Texture inhibition. In or-
er to remove small edges in highly textured regions, we use the
nhibition term that suppress the response on the texture regions
ased on steerable filters [16] . It is computed as the convolution of
image from the BSDS500 data set, (b) mean recall values with respect to different
n source image, (d) all edges, (e) segment, (f) ground truth.
C. Zhang et al. / Neurocomputing 275 (2018) 2702–2715 2705
Fig. 5. A three-layer deep Boltzmann machine [40] .
t
t
w
b
V
c
d
t
u
p
c
t
w
M
a
t
W
t
i
a
w
c
t
f
(
d
w
γ
i
i
c
b
c∑
Table 1
F-measures of contour detection on BSDS500 for different
parts.
BSDS500
ODS OIS AP
MG + DG+IT+CG+BG+CO+Mulit 0.70 0.72 0.75
MG + DG+IT+CG+BG+CO a 0.69 0.71 0.72
MG + DG+IT+CG+BG 0.68 0.70 0.71
MG + DG+IT+CG 0.67 0.69 0.70
MG + DG+IT 0.65 0.67 0.68
a MG+DG+IT+CG+BG+CO in fact is the method pre-
sented in the conference version of this paper [8] ,
whereby it is titled as CDRF for shot.
w
c
E
t
t
t
c
o
d
b
G
v
r
a
r
w
r
e
u
g
3
b
t
e
a
c
[
b
s
t
a
w
t
f
m
m
s
m
n
o
t
3
u
he Gaussian gradient magnitude with the inhibition term:
(x, y ) = { V 0 ∗ |∇I|} (x, y ) + re { e 2 iθ∇ (x,y ) [(V 2 ∗ |∇I| )(x, y )] } , (1)
here re( ·) returns the real part and the steering basis are defined
y
0 (ρ, φ) =
ρ2
2
, V 2 (ρ, φ) =
ρ2
2
e 2 iφ, (2)
ontrolled by two parameters ρ and φ. It can be shown that the
ifference of magnitude of Gaussian gradient and the inhibition
erm is difference of Gaussian (DoG) functions [16] , and they are
sed as features for learning a contour detector (See Fig. 2 (d)).
Brightness and color gradient. The pb and gpb methods [5,13] ex-
loit brightness, color, texture and segmented regions to detect
ontours, and achieve the state-of-the-art performance. However,
he image segmentation process is time consuming. For efficiency
e exploit the brightness and color gradient features. Similar to
artin et al. [13] , we use a circular disc of radius r at pixel ( x, y )
nd split into two half discs by a diameter at angle θ and represent
hem with histograms of brightness and color in the CIELAB space.
e compute the χ2 -distance between two histograms of half discs
o compute the oriented gradient G ( x, y, θ , r ), thereby encod-
ng both the brightness and color gradient features (See Fig. 2 (e)
nd (f)).
Compass operator. The compass operator [12] detects edges
ithout assuming that the regions on both sides have constant
olor by exploiting pixel distribution rather than the means. It de-
ermines the orientation of a diameter which maximizes the dif-
erence between two half discs of a circular compass at each pixel
x, y ). The distance between two color signatures is computed by
i j = 1 − exp (−E i j /γ ) , (3)
here E ij is the Euclidean distance between color i and color j , and
is a constant. A distribution of color on either side of an edge
s represented by a color signature which is a set of point mass
n the CIELAB color space (i.e., color pixels). The distance between
olor signatures of equal mass of half discs S 1 and S 2 are computed
y aggregating the earth mover’s distance (EMD) [28] between the
olor signatures of every pair of colors i and j which minimizes
i ∈ S 1
∑
j∈ S 2 d i j f i j , (4)
here f ij indicates the flow between color i and j subject to all the
onstraints that move all the mass from S 1 to S 2 [28] . The resulting
MD can be represented as a function f ( θ ) (0 °≤ θ ≤ 180 °) that finds
he orientation of a diameter to maximize the difference between
wo half discs, i.e., ˆ θ = arg max θ f (θ ) (See Fig. 2 (g)).
Multi-scale representation. We extract the above-mentioned fea-
ures on every point of an image and integrate them to detect
ontours. In order to deal with the scale-space problem [29] . we
btain local features at different scales by changing the standard
eviation of image gradient and texture inhibition, direction of
rightness and color gradient, and the standard deviation of a
aussian derivative in the compass operator. These features pro-
ide rich descriptions of image details at different levels, thereby
endering a multi-scale representation. As edges can be extracted
t different scales, several pixels on one edge have equally strong
esponse and they should be considered to describe contours. Thus,
e extract local features at three different scales.
These features have been found empirically very useful for rep-
esenting edge information. Image features such as, Image Gradi-
nt, Texture Inhibition Brightness and Color Gradient etc. are also
sed by most state-of-the-art contour detectors such as Pb [13] ,
Pb [5,15] .
.1.2. Multi-scale global features
It has been shown that object contours can be better extracted
y incorporating global information (e.g., the gPb method [5,15] )
han simply local visual cues (e.g., the pb algorithm [13] ). How-
ver, existing methods that exploit global information (e.g., gPb)
re often time consuming. For efficiency and effectiveness, we in-
orporate global visual saliency [30] in our approach. Cheng et al.
30] present a simple and efficient saliency extraction algorithm
ased on region contrast which exploits histogram contrast and
patial information. Each image is first segmented into regions and
he saliency value is computed by measuring its color contrast to
ll other regions in the image: S(r k ) =
∑
r k � = r i w (r i ) D r (r k , r i ) , where
( r i ) is the weight of region r i and D r ( ·, ·) is the color distance be-
ween the two regions. The weighting term can increase the ef-
ects of closer regions and decrease those farther regions. With this
ethod, the distinctness of each pixel is described in a saliency
ap S .
Given a pixel I ( x, y ), we consider the local contrast of the
aliency values with respect to its four neighbors. We take the
aximum value of the difference between saliency values of its
eighbors. With this saliency contrast (SC) feature, the difference
f saliency values is maximized when the pixel is right on the con-
our, thereby facilitating boundary detection.
.2. Segment-level features
While the pixel-level features described in Section 3.1 can be
tilized to determine contour points, structural cues [31] such as
2706 C. Zhang et al. / Neurocomputing 275 (2018) 2702–2715
Fig. 6. Precision-recall curves on BSDS500 for different feature combinations.
e
o
l
s
c
f
4
i
7
t
O
s
B
f
m
4
p
p
p
c
v
G
t
r
m
f
4
a
t
p
t
e
p
h
f
b
c
e
a
w
p
t
E
c
t
segments [32] contain important information more than pixel-
wise evidence. Toward this goal, we compute superpixels to extract
structural segments with the SLIC algorithm [33] , which has been
verified to perform well terms of efficiency and effectiveness. The
SLIC clusters pixels in the five-dimensional including three color
channels in CIELAB color space ( lab ) and two position values ( xy )
by introducing a new distance measure method in the 5D space.
After getting the segmentation result from SLIC, we use the edges
from superpixels as our segment-level features. Point features de-
scribed in Section 3.1 are used to describe edge pixels on the line
fragment, and segment-level features are then extracted by com-
puting their mean value, variance and differences in this work.
Similar to the scale-space problem for edge detection, the im-
age structure of a scene that can be exploited hinges on the num-
ber of superpixels. Fig. 3 (a) shows the recalls of contours increase
as more superpixels are used using 300 training images from the
BSDS500 data set. Fig. 3 (b) shows the mean recalls of the same
data set with different number of superpixels.
In this paper, we vary the number of superpixels (from 200 to
20 0 0) to extract segments at different scales. Fig. 4 shows one ex-
ample how segments are extracted when 200 superpixels are used.
From the superpixel results, edges can be extracted ( Fig. 4 (c) and
(d)) based on cluster value of each point ( Fig. 4 (b)) with respect
to its neighborhood. When the pixels within a neighborhood of a
point belong to more than two clusters, it indicates the existence
of a endpoint (e.g., the point on the T-junctions or Y-junctions of
Fig. 4 (b)). On the other hand, when the pixels within a neighbor-
hood of a point belong to exactly two clusters, it indicates the ex-
istence of a segment point. Thus, segments and endpoints can be
extracted as denoted by different colors in Fig. 4 (e) for contour
Fig. 7. Contour detection results with an
xtraction. We determine whether a pixel belongs to a segment
r not and then concatenate segment-level features and the pixel
evel features.
At each point, as described in Section 3.1 , 18 (6 features at 3
cales) local and 1 global pixel-level features are extracted. We
ompute the mean, variance, minimum and maximum values of 19
eatures from all the points on a segment. In addition, we compute
local statistics (mean, variance, minimum and maximum values)
n the neighborhood of the corresponding segment and obtain a
6-dimensional feature. Fig. 4 (f) shows the ground truth of con-
our for comparison with the segments extracted from our method.
ne advantage of our approach is that edge thinning is not neces-
ary and instead we directly operate on pixels to extract segments.
y controlling the number of generated superpixels, segment-level
eatures at different scales can be obtained, and the smallest seg-
ent is a pixel itself.
. Feature mappings for contour detection
By using both generative and discriminative model, we pro-
ose a generative discriminative scheme to build a feature map-
ing to extract more useful feature information in Fig. 1 the second
art. We will introduce our feature mapping based generative dis-
riminative information in Section 4.1 and we call it posterior di-
ergence feature mapping based on Gaussian mixture model (PD-
MM). Deep learning has strong ability to learn feature represen-
ation [34–36] , and it could use multiple processing layers to learn
epresentations of data. In Section 4.2 , we build a novel feature
apping to represent features and we call it sufficient statistics
eature mapping based on deep Boltzmann machine (SS-DBM).
.1. Feature mapping via posterior divergence
To extract discriminative information from features, we propose
mapping method based on the log likelihood of a Gaussian mix-
ure model (GMM) in which the parameters are estimated via their
osterior divergence (PD) in an incremental expectation maximiza-
ion (EM) formulation. The posterior divergence approach is a gen-
rative discriminative scheme that determines one or a few sam-
les to update the model in every iteration of the EM step which
as been shown to be effective in several tasks [37] . We trans-
orm the vectors formed by point-level and segment-level features
ased this mapping to obtain more discriminative information for
ontour detection . While our method bears some similarity to Li
t al. [37] , the generative models and derivations for feature maps
re different.
Compared with the convolutional neural network (CNN) layers
idely used in deep neural network, the proposed feature map-
ing model involves less model parameters and thus requires less
raining data. Moreover the model can be readily trained with the
M method which is empirically found straightforward to train.
Let x ∈ R
D be the observed random variable. In the context of
ontour detection, x denotes the combination of multi-scale fea-
ures. Let z = { z 1 , . . . , z K } be the hidden variable, where z k = 1 if
d without multi-feature extraction.
C. Zhang et al. / Neurocomputing 275 (2018) 2702–2715 2707
Table 2
F-measures of contour detection on BSDS500 for different
feature mappings.
BSDS500
ODS OIS AP
PD-GMM-RF 0.72 0.74 0.78
SS-DBM-RF 0.71 0.72 0.76
MG + DG+IT+CG+BG+CO+Mulit 0.70 0.72 0.75
t
o
b
P
w
E
t
a
w
Q
d
f
F
p
s
i
L
L
w
r
S
h
L
w
t
fi
t
a
�
w
m
l
�
Fig. 8. Precision-recall curves on BSDS500 for different feature mappings.
he k th mixture center is selected to generate a sample and z k = 0
therwise. The joint distribution of Gaussian mixture models can
e expressed as
(x , z | θ ) =
K ∏
k =1
N(x ; u k , �k ) z k
K ∏
k =1
a z k k
, (5)
here a = (a 1 , . . . , a K ) � are the mixture prior satisfying a k =
P(z ) [ z k ] ; u k and �k respectively are the mean and variance ma-
rix of the k th mixture center.
For any observed sample x t , similar to Jordan et al. [38] , we
ssume that the posterior distribution of z takes the same from
ith its prior P ( z ) but with different parameter g t = (g t 1 , . . . , g t
K ) � ,
t (z ) =
K ∏
k =1
g z k k
. (6)
With the above joint distribution and approximate posterior
istribution, the free energy function F of the sample x t can be
ormulated with variational learning [38] :
(Q
t , θ ) = E Q t (z )
[ K ∑
k =1
z k
(
D ∑
d=1
− (x t d
− u d ) 2
2 δ2 d
− log √
2 πD ∏
d=1
δD/ 2
d
)
+
K ∑
k =1
z k log g t
k
a k
] . (7)
Let θ be the model estimated from a set of N − 1 training sam-
les X = { x i } N−1 i =1
, and θ+ t be the model estimated from a set of N
amples X ∪ { x t } . The log likelihood of the EM algorithm for the
nput sample x t is
t =
N ∑
i =1
[−F(Q
i + t , θ+ t )
]−
N ∑
i � = t
[−F(Q
i , θ ) ]
=
N ∑
i =1
(−E Q i + t (z )
[log
Q
i + t (z )
P (x
t | z , θ+ t ) P (z | θ+ t )
])
−N ∑
i � = t
(−E Q i (z )
[log
Q
i (z )
P (x
t | z , θ ) P (z | θ )
]). (8)
After rearranging the random variables, we have
t =
[
N ∑
i =1
E Q i + t (z ) log P (x
t | z , θ+ t )−
N ∑
i � = t E Q i (z ) log P (x
t | z , θ )
]
︸ ︷︷ ︸ x −cross entropy
+
[
N ∑
i =1
E Q i + t (z ) log P (z| θ+ t ) −N ∑
i � = t E Q i (z ) log P (z | θ )
]
︸ ︷︷ ︸ z−cross entropy
−[
N ∑
i =1
E Q i + t (z ) log Q
i + t (z ) −
N ∑
i � = t E Q i (z ) log Q
i (z )
]
︸ ︷︷ ︸ z−entropy
, (9)
here the cross entropy term measure the fitness of a sample to
andom variables and the entropy term measure the uncertainty.
imilar to Li et al. [37] , we assume that Q
i + t (z ) = Q
i (z ) , and thus
ave
t =
⎡
⎢ ⎢ ⎢ ⎢ ⎣
N ∑
i � = t E Q i (z ) log
P (x
t | z , θ+ t ) P (x
t | z , θ ) ︸ ︷︷ ︸ �pd
x
+ E Q t (z ) log P (z | θ+ t ) � f it x ︸ ︷︷ ︸ ⎤
⎥ ⎥ ⎥ ⎥ ⎦
+
⎡
⎢ ⎢ ⎢ ⎢ ⎣
N ∑
i � = t E Q i (z ) log
P (z | θ+ t ) P (z | θ ) ︸ ︷︷ ︸
�pd z
+ E Q t (z ) log P (z | θ+ t ) ︸ ︷︷ ︸ � f it
z
−E Q t (z ) log Q
t (z ) ︸ ︷︷ ︸ ]
�ent z
, (10)
here the posterior divergence �pd measures how much x affects
he model, the fitness function �fit measures how well the sample
ts the model, the entropy function �ent measures how uncertain
he fitting is. The feature mapping given by posterior divergence
re derived as follows:
pd x =
N ∑
i � = t
K ∑
k =1
D ∑
d=1
g i k
(− (x t
d − u d, + t ) 2
2 δ2 d, + t
+
(x t d
− u d ) 2
2 δ2 d
+ δD/ 2
d, + t − δD/ 2
d
)=
D ∑
d=1
�pd x d
, (11)
here �pd x is further decomposed to D terms according to the di-
ension of x , and �pd x d
measures how x d affects the model. Simi-
arly, we have:
f it x =
K,D ∑
k,d=1
g t k
(− (x t
d −u d, + t ) 2
2 δ2 d, + t
+δD/ 2
d, + t log √
2 π
)
=
D ∑
d=1
� f it x d
, (12)
2708 C. Zhang et al. / Neurocomputing 275 (2018) 2702–2715
Table 3
F-measures of contour detection on BSDS500 for
different contour detection algorithms.
BSDS500
ODS OIS AP
Human 0.80 0.80 −SRFL 0.73 0.76 0.79
RCF-MS [25] 0.81 0.82
HED [24] 0.78 0.80 0.83
DeepContour [23] 0.76 0.78 0.80
DeepEdge [22] 0.75 0.77 0.81
SE + MS+SH [19] 0.75 0.77 0.80
SE-MS, T = 4 [18] 0.74 0.76 0.78
PD-GMM-RF 0.72 0.74 0.78
gPb [5] 0.71 0.74 0.65
BEL [14] 0.66 0.67 0.68
Canny [11] 0.60 0.63 0.58
Compass operator [12] 0.49 0.53 0.36
Fig. 9. Precision-recall curves on BSDS500 for different contour detection algo-
rithms.
w
p
P
w
Z
T
l
e
P
w
s
(
α
a
t
p
E
w
c
Q
S
n
Q
C
t
α
where � f it x d
measures how well x d fits the model.
The feature mapping according to the hidden variable z can be
derived as follows:
�pd z =
N ∑
i � = t
K ∑
k =1
g i k log a k, + t
a k =
K ∑
k =1
�pd z k
, (13)
where �pd z k
=
∑ N i � = t g
i k
log a k, + t
a k .
� f it z =
K ∑
k =1
g t k log a k, + t =
K ∑
k =1
� f it z k
, (14)
�ent z =
K ∑
k =1
g t k log g t k =
K ∑
k =1
�ent z k
. (15)
Therefore for the input x t , we obtain a set of feature mappings:
�t = vec ({ �pd
x d , � f it
x d , �pd
z k , � f it
z k , �ent
z k } d,k
). (16)
To extract discriminative information from multi-scale features,
we map multi-scale features to the space via Eq. (16) instead
of simply stacking features in a long vector. The reasons for us-
ing this mapping are twofold. First, this feature mapping includes
a data normalization procedure which reduces the metric differ-
ence among different features. The normalization is carried out by
(x t d
− u d ) 2 / 2 δ2
d , with which the derived feature mapping only re-
sponses to the relative quantities with respect to the mean and
variance. Second, this feature mapping exploits the hidden vari-
able z which encodes additional information, i.e., cluster or mix-
ture center which is informative in image representation (e.g., bag-
of-words).
4.2. Sufficient statistics feature mapping based on deep Boltzmann
machine
Restricted Boltzmann machines (RBM) [39] is a generative ran-
dom neural network with two-layer architecture, one visible layer
and one hidden layer.
By combining multiple RBMs, the multi-layer Deep Boltzmann
Machine (DBM) is constructed [40] , as shown in Fig. 5 . For exam-
ple, the energy of the joint configuration ( v, h ) of three-layer is
given by:
E ( v , h | θ ) = −v T W
1 h
1 − h
1 T W
2 h
2 − h
2 T W
3 h
3 (17)
where v = (v 1 , v 2 , . . . , v m
) represents observable units; h 1 =(h 1 , h 2 , . . . , h n ) , h 2 = (h 1 , h 2 , . . . , h l ) and h 3 = (h 1 , h 2 , . . . , h k ) are
hidden units; h = [ h 1 , h 2 , h 3 ] ; W
1 , W
2 and W
3 are real valued
eights between different layers; θ = [ W
1 , W
2 , W
3 ] are the model
arameters. The joint distribution is:
(v , h | θ ) =
1
Z(θ ) exp (−E(v , h | θ )) (18)
here
(θ ) =
∑
v ,h
exp (−E(v , h | θ ))
o obtain the sufficient statistics feature mapping [41] of multi-
ayer DBM, the joint distribution P ( v, h | θ ) can be derived to the
xponential family format:
(v , h | θ ) =
1
Z(θ ) exp (−E(v , h | θ ))
= exp (α(θ ) T T (x, h ) + A (θ )) (19)
here α( θ ) is a vector-valued function; T ( x, h ) is the sufficient
tatistics; A ( θ ) is a scalar function. Substituting Eq. (17) into Eq.
19) and after simple derivation steps:
(θ ) T T (x, h ) = v T W
1 h
1 + h
1 T W
2 h
2 + h
2 T W
3 h
3 (20)
nd A (θ ) = − ln (Z(θ )) .
Owing to P (v , h ) = P (v | h ) P (h ) , P ( h ) also can format to exponen-
ial family. Let Q ( h t ) be the approximate distribution of the real
osterior P ( h ), the energy function of Q ( h t ) is:
h (h
t | θh ) = −h
1 tT W
1 t h h
2 t − h
2 tT W
2 t h h
3 t (21)
here θh = [ W
1 t h , W
2 t h ] are the model parameters. Therefore Q ( h t )
an be derived as follow:
(h
t | θh ) =
1
Z(θh ) exp (−E(h
t )) (22)
imilar with Eq. (19) , Q ( h t | θh ) can also be expressed as the expo-
ential form:
(h
t | θh ) = exp (α(θh ) T T (h
t ) + A (θh )) (23)
ombining Eqs. (21) –(23) , the sufficient statistics and scalar func-
ion of Q ( h t ) are derived:
(θ t ) T T (h
t ) = h
1 tT W
1 t h
2 t + h
2 tT W
2 t h
3 t (24)
h h hC. Zhang et al. / Neurocomputing 275 (2018) 2702–2715 2709
Input Image
ground truth
SRFL
PD-GMM-RF
gPb [5]
BEL [14]
Canny [11]
compass operator [12]
texture inhibition [16]
dcba
Fig. 10. Sample experimental results from the Berkeley segmentation data set.
2710 C. Zhang et al. / Neurocomputing 275 (2018) 2702–2715
W
h
p
t
t
f
s
e
a
S
Z
a
o
a
d
a
i
f
f
A
f
5
m
o
a
p
s
t
i
e
s
m
f
w
t
i
p
m
c
(
r
t
Having the above formulas, the free energy lower bound of
log P ( x t ; θ ) [41,42] can be expressed as:
F t (Q, θ ) = E Q(h t ) [ α(θ ) T T (x t , h
t ) + A (θ )
− α(θ t h )
T T (h
t ) + A (θ t h )]
= E Q(h t ) [ v tT W
1 h
1 t + h
1 tT W
2 h
2 t
+ h
2 tT W
3 h
3 t − h
1 tT W
1 t h h
2 t
−h
2 tT W
2 t h h
3 t − ln (Z(θ )) + ln (Z(θh ))] (25)
Define reshape operator R and
R mn = [ W 11 , W 12 , . . . , W 1 n , . . . , W mn ]
T (26)
v m
h n = [ v 1 h 1 , v 1 h 2 , . . . , v 1 h n , . . . , v m
h n ] T (27)
Reformat Eq. (25) by using the reshape operator R and :
F t (Q, θ ) = E Q(h t ) [ W
1 R (v t h
1 t ) + W
2 R (h
1 t h
2 t )
+ W
3 R (h
2 t h
3 t ) − W
1 tR h (h
1 t h
2 t )
−W
2 tR h (h
2 tT h
3 t ) − ln (Z(θ ))
+ ln (Z(θh ))]
= W
1 R E Q(h t ) [ v t h
1 t ]
+ W
2 R E Q(h t ) [ h
1 t h
2 t ]
+ W
3 R E Q(h t ) [ h
2 t h
3 t ]
−W
1 tR h E Q(h t ) [ h
1 t h
2 t ]
−W
2 tR h E Q(h t ) [ h
2 tT h
3 t ]
− ln Z(θ ) + ln Z(θh )]
= ηT E Q(h t ) [ φ(v t , h
t )] = ηT �(v t ) (28)
where the vector η only depends on the parameter θ ,
η =
(W
1 R , W
2 R , W
3 R , W
1 tR h , W
2 tR h , − ln Z(θ ) , ln Z(θh )
)T (29)
and the vector,
φ(v t , h
t ) = (v t h
1 t , h
1 t h
2 t , h
2 t h
3 t , h
1 t h
2 t , h
2 tT h
3 t , 1 , 1) T
is a function over v t and h t , depending on v t . Therefore the feature
mapping takes the following form,
�(v t ) = E Q(h t ) [ φ(v t , h
t )] (30)
Eq. (30) is regarded as sufficient statistic feature mapping owing to
it is constructed by T ( v, h ) and T ( h ).
The advantages of our proposed approach are threefold. First, it
can well adapt to the data, with the ability inherited from prob-
abilistic models. Second, it is able to exploit the generative infor-
mation such as data distribution and hidden information for detec-
tion. Third, it comprehensively summarizes both visible units and
Table 4
F-measures of contour detectors in WSD and WHD.
(a)
Weizmann horse data set
ODS OIS AP
SRFL 0.65 0.66 0.64
gPb [5] 0.56 0.58 0.47
BEL [14] 0.52 0.54 0.50
Compass operator [12] 0.35 0.36 0.23
(b)
Weizmann segmentation data set
ODS OIS AP
SRFL 0.59 0.63 0.56
gPb [5] 0.54 0.58 0.45
BEL [14] 0.46 0.46 0.39
Compass operator [12] 0.23 0.25 0.09
a
f
v
a
b
f
e
t
s
s
o
i
b
f
c
f
t
d
idden units of DBM and subsequently fully exploits DBM, while
revious approaches only use a certain layer of units.
We extract useful features related to contour detection firstly
hen we use random forest to train a model to detect contours, we
hink better features much better contour detection results. So we
ocus on feature working on most part of the paper including ba-
ic features, feature mapping etc. These features have been found
mpirically very useful for representing edge information, as well
s the two unsupervised feature mappings: PD-GMM and SS-DBM.
ome empirical study references (which have been cited) are in
hang et al. [9,10] .
It is true that unsupervised learning features may not perform
s well as supervised learning features for specific tasks. On the
ther hand, it may also suffers from overfitting (when the training
nd testing data are much different) or underfitting (when label
ata is insufficient especially considering pixel level labeling is re-
lly tedious). We try to make a balance by adopting unsupervised
n the low level feature extraction and mapping stages, while in-
use supervised learning via the stacking random forest. The per-
ormance of this design is empirically verified by our experiments.
lso, there are some prior art showing the strength of our adopted
eatures [37] .
. Stacking random forest learning for contour detection
A random forest [43] is an ensemble classifier consisting of nu-
erous decision trees where the class label is determined based
n the mode of the outputs by individual trees. Random forest
lgorithms have been shown to deal with large amount of data
oints effectively and efficiently. The Gini ratio [44] can be used to
plit the training examples so that the descendant trees are “purer”
han their parents.
Concurrently, a recent forest model [45] has shown promis-
ng performance compared with deep neural networks on differ-
nt tasks from different domains. However, forest model is more
calable and efficient than current deep neural networks based
odels. Our approach follows a similar stacking design principle
or contour detection and the framework is shown in Fig. 1 . First
e use the image feature extractors in Section 3 to get basic fea-
ures a . Then we map those features by two feature mappings in
n Section 4 to get more informative features. Specifically we use
osterior divergence feature mapping based on Gaussian mixture
odel (PD-GMM) in Section 4.1 to get features part b and use suffi-
ient statistics feature mapping based on deep Boltzmann machine
SS-DBM) in Section 4.2 to get features part c . Next we use four
andom forests respectively to detection contours based on fea-
ures part b and c and get those results to form two vectors d and e
s new features. d means global contour results from four random
orest contour detectors and this vector includes a list probability
alues about pixel in the image, so d forms a vector. Similarly e is
nother vector. Following we concatenate those four parts feature
, c, d and e to develop a final feature f . At last we train a random
orest by using those feature to get a final contour detector.
This stacking random forest learning could be seen as two lay-
rs framework for contour detection. In the stage of train, to ensure
he sub-forest could get gainful contour information, we train each
ub-forest separately and use supervised learning strategy. So each
ub-forest in the first layer could be seen as base learner for sec-
nde layer’s ensemble learning. The output of each random forest
n the first layer is contour detection results includes a list proba-
ility values about pixel in the image. We train our final random
orest for contour detection in the second layer in our learning ar-
hitecture by leveraging both the results from first layer and the
eatures directly from the feature mapping in previous stage. Af-
er we train all the random forests, we could use them directly to
etect contours with the tuned parameters.
C. Zhang et al. / Neurocomputing 275 (2018) 2702–2715 2711
Fig. 11. Precision-recall curves for different contour detection algorithms. (a) WHD data set. (b) WSD data set.
6
s
5
r
t
c
o
p
t
b
t
d
t
t
(
p
t
g
g
t
fi
c
S
b
s
S
fi
i
f
s
c
p
f
i
d
t
r
t
[
o
[
r
d
r
o
C
d
t
f
m
t
[
m
i
t
s
a
a
a
g
b
g
t
n
G
. Experiments
We evaluate the proposed algorithm for contour detection on
everal data sets. The Berkeley segmentation data set [5] includes
00 images of 481 × 321 pixels and human labeled segmentation
esults. For fair comparisons, we use 300 images for training and
he remaining ones for tests [5] . The pixels on the ground-truth
ontours of the training set are used as positive examples, whereas
ther pixels are utilized as negative examples. All experiments are
erformed on a machine with 3.10 GHz CPU and 8 GB memory.
We use the precision-recall curve as measurement with respect
o human labeled ground truth, and their F-measures computed
y 2 ·Precision ·Recall Precision + Recall
. As the segmented regions and contours are de-
ermined based on a choice of scale, we first use a training set to
etermine the optimal data set (ODS) scale and fix it for all the
est images [5] . We also evaluate the performance when the op-
imal image scale (OIS) for each image, and the average precision
AP) on the full recall range [5] .
We first evaluate different basic feature combination of the pro-
osed algorithm for contour detection in Section 3 . The image fea-
ures in Fig. 2 includes magnitude of gradient (MG), direction of
radient (DG), inhibition term (IT), brightness gradient (BG), color
radient (CG), compass operator (CO) and multi-scale represen-
ation and we train those features by random forest as classi-
er to detect contour. Table 1 and Fig. 6 show the F-measures of
ontour detection results using different feature combinations in
ection 3.1 . The results show that contours can be better detected
y adding more features. Fig. 7 shows one contour detection re-
ults using basic features.
Then we compare the two feature mapping algorithms in
ection 4 with basic features in Section 3 shown in Fig. 1 . We
rst use random forest to train a contour detector with feature
n Section 3 . Then we train a contour detector by using random
orest with posterior divergence feature mapping based on Gaus-
ian mixture model (PD-GMM-RF) 2 in Section 4.1 and another
ontour detector by using random forest with sufficient statistics
2 PD-GMM-RF in fact is the method presented in the conference version of this
aper [9] , whereby it is titled as MCDRF for shot.
p
w
eature mapping based on deep Boltzmann machine (SS-DBM-RF) 3
n Section 4.2 . Table 2 and Fig. 8 show the F-measures of contour
etection results for different feature mapping. The results show
he effectiveness of our feature mapping algorithms.
We evaluate the proposed stacking random forest learning algo-
ithm (SRFL) for contour detection against other methods including
he compass filter [12] , Canny edge detector [11] , gPb [5,15] , BEL
14] , SE-MS [18] , SE+MS+SH [19] methods and deep learning meth-
ds like RCF-MS [25] , HED [24] , DeepContour [23] and DeepEdge
22] using the BSDS500 data set. Table 3 shows the precision-
ecall curves and the F-measures with different thresholds. The
eep neural network methods [22–25] achieve the most high accu-
acy because they allow more learning capacity, while at the cost
f more labeled training data and computational overhead. In 2012
DRF [8] proves the effectiveness of detecting contours using ran-
om forest. In 2013, Dollár and Zitnick [18,19] begin to use struc-
ured learning approach to detect edges and SE [18,19] also per-
orms better than us.
Fig. 9 shows the precision-recall curves with respect to hu-
an labeled ground truth. The proposed SRFL algorithm achieves
he highest F-measures and average precision than compass filter
12] , Canny edge detector [11] , gPb [5,15] , BEL [14] . While the gPb
ethod performs well in terms of accuracy, the computational load
s significant. On average, the proposed SRFL method is about 6
imes faster than the gPb method (30 and 180 s on MATLAB re-
pectively). It is slower than RCF, while much faster than gPb etc.
Fig. 10 shows the contour detection results by all the evalu-
ted algorithms. We note that both the gPb and SRFL methods
re able to extract object contours with fewer spurious edges. In
ddition, our method capture more contours and details than the
Pb method. Fig. 10 (a) shows the gPb misses some windows of the
uilding and the contours in roof is not clear or sharp as those
enerated by the proposed method. The results of the Canny de-
ectors without non-maximum suppression and no hysteresis are
oisy (consistent with results shown in Arbelaez et al. [5] ) as the
aussian kernel width is set based on ODS and OIS. The results of
3 SS-DBM-RF in fact is the method presented in the conference version of this
aper [10] except we use multi features here not only basic six image features,
hereby it is titled as SSDBM for shot.
2712 C. Zhang et al. / Neurocomputing 275 (2018) 2702–2715
Input Image
ground truth
SRFL
gPb
BEL
Canny
compass operator
texture inhibition
dcba
Fig. 12. Sample experimental results on the Weizmann horse database.
C. Zhang et al. / Neurocomputing 275 (2018) 2702–2715 2713
Input Image
ground truth
SRFL
gPb
BEL
Canny
compass operator
texture inhibition
dcba
Fig. 13. Visual experimental results on the Weizmann segmentation database.
2714 C. Zhang et al. / Neurocomputing 275 (2018) 2702–2715
R
the compass filter are based on the code provided by Ruzon and
Tomasi [12] with default parameters.
We also carry out experiments on the Weizmann horse
database (WHD) [6] and the Weizmann segmentation database
(WSD) [7] . The Weizmann horse database contains 328 side-view
color images with manually segmented results. It contains horses
of different breed, color, and size in various scenes. The Weizmann
segmentation database contains 200 color images with manual
segmentation results by several subjects. The images in this data
set contain only one or two salient objects with relatively simple
backgrounds. The foreground objects differ significantly from the
background either by intensity, texture, or other low level cues.
We note that all the parameters of the proposed SRFL algo-
rithm are fixed in the experiments on three data sets. The pro-
posed algorithm is evaluated against the gPb [5] , compass operator
[12] and BEL [14] methods. Although these databases are devel-
oped for segmentation evaluation, we extract contours using the
same approach as the BSDS500 [5] to compute F-measures and
precision-recall curves. In the Weizmann segmentation datasets,
the images are in different sizes. Some are small but some are very
large. The gPb method requires huge memory and cost much time
to detect contours. In comparison with gPb, our method requires
less memory and less time. Our method can handle big pictures.
Table 4 shows the F-measure and average precision by the SRFL
algorithm are greater than those by the gPb [5] , compass opera-
tor [12] and BEL [14] methods. Fig. 11 shows the precision-recall
curves of SRFL are significantly better than those by the gPb, com-
pass operator and BEL methods.
In addition, Figs. 12 and 13 show the extracted contours by the
proposed SRFL algorithm are visually better than those extracted
by the other two methods. The extracted contours show that the
proposed SRFL algorithm performs better than other methods with
more details and less noise. Similar to the results with Weiz-
mann horse data set, the extracted contours from this segmen-
tation database show that the proposed SRFL algorithm performs
better than other methods with sharper details and fewer spurious
edges.
7. Conclusion
In this paper, we propose an efficient and effective algorithm
for contour detection based on a stacking random forest learning
framework on features mapped from multi-scale local and global
image features. The model parameters of the feature space are
learned from the posterior divergence of the log likelihood of a
Gaussian mixtures and sufficient statistics based on deep Boltz-
mann machine incrementally. We use posterior divergence based
Gaussian mixtures model and sufficient statistics based on deep
Boltzmann machine to exploit more information rather than use
raw features with random forest classifier directly. The proposed
approach is evaluated qualitatively and quantitatively on three
benchmark data sets against several state-of-art methods. Exper-
imental results demonstrate the proposed algorithm performs fa-
vorably against leading methods for contour detection. Our future
work includes more effective contour completion algorithms. In
addition, we will develop efficient algorithms for object recogni-
tion based on contours.
Acknowledgment
This research is sponsored by National Natural Science Foun-
dation of China (Nos. 61571049 , 61601033 , 61401029 , 11401028 ,
61472044 ). The authors are thankful to the anonymous reviewers
for valuable discussion and feedback.
eferences
[1] G. Papari , N. Petkov , Edge and line oriented contour detection: state of the art,
Image Vis Comput 29 (2–3) (2011) 79–103 .
[2] X. Bai , L.J. Latecki , W.-Y. Liu , Skeleton pruning by contour partitioning withdiscrete curve evolution, IEEE Trans. Pattern Anal. Mach. Intell. 29 (3) (2007) .
[3] W. Shen , X. Wang , C. Yao , X. Bai , Shape recognition by combining contour andskeleton into a mid-level representation, in: Proceedings of the Chinese Con-
ference on Pattern Recognition, Springer, 2014, pp. 391–400 . [4] A.I. Muhammad Adeel Waris , M. Gabbouj , CNN-based edge filtering for object
proposals, Neurocomputing 266 (2017) 631–640 .
[5] P. Arbelaez , M. Maire , C. Fowlkes , J. Malik , Contour detection and hierarchi-cal image segmentation, IEEE Trans Pattern Anal Mach Intell 33 (5) (2011)
898–916 . [6] E. Borenstein , S. Ullman , Learning to segment, Computer Vision - ECCV (2004)
315–328 . [7] S. Alpert , M. Galun , A. Brandt , R. Basri , Image segmentation by probabilistic
bottom-up aggregation and cue integration, IEEE Trans Pattern Anal Mach In-tell 34 (2) (2012) 315–327 .
[8] C. Zhang , X. Ruan , Y. Zhao , M.-H. Yang , Contour detection via random forest, in:
Proceedings of the Twenty-First International Conference on Pattern Recogni-tion (ICPR), IEEE, 2012, pp. 2772–2775 .
[9] C. Zhang , X. Li , X. Ruan , Y. Zhao , M.-H. Yang , Discriminative generative contourdetection., in: Proceedings of the British Machine Vision Conference, Citeseer,
2013 . [10] C. Zhang , X. Li , J. Yan , S. Qui , Y. Wang , C. Tian , Y. Zhao , Sufficient statistics
feature mapping over deep Boltzmann machine for detection, in: Proceedings
of the Twenty-Second International Conference on Pattern Recognition (ICPR),IEEE, 2014, pp. 827–832 .
[11] J. Canny , A computational approach to edge detection, IEEE Trans Pattern AnalMach Intell 8 (6) (1986) 679–698 .
[12] M.A. Ruzon , C. Tomasi , Color edge detection with the compass operator, in:IEEE Computer Society Conference on Computer Vision and Pattern Recogni-
tion, 2, 1999, pp. 2160–2166 .
[13] D.R. Martin , C.C. Fowlkes , J. Malik , Learning to detect natural image boundariesusing local brightness, color, and texture cues., IEEE Trans Pattern Anal Mach
Intell 26 (1) (2004) 530–549 . [14] P. Dollar , Z. Tu , S. Belongie , Supervised learning of edges and object bound-
aries, in: Proceedings of the Computer Vision and Pattern Recognition, 2006 . [15] M. Maire , P. Arbelaez , C.C. Fowlkes , J. Malik , Using contours to detect and lo-
calize junctions in natural images, in: Proceedings of the Computer Vision and
Pattern Recognition, 2008 . [16] G. Papari , N. Petkov , An improved model for surround suppression by steerable
filters and multilevel inhibition with application to contour detection., PatternRecognit. 44 (9) (2011) 1999–2007 .
[17] C. Yao , W. Shen , X. Bai , W. Liu , Class-specific object contour detection by it-eratively combining context information, in: Proceedings of the Eight Inter-
national Conference on Information, Communications and Signal Processing
(ICICS), IEEE, 2011, pp. 1–5 . [18] P. Dollár , C.L. Zitnick , Structured forests for fast edge detection, in: Proceedings
of the IEEE International Conference on Computer Vision, 2013, pp. 1841–1848 .[19] P. Dollár , C.L. Zitnick , Fast edge detection using structured forests, IEEE Trans.
Pattern Anal. Mach. Intell. 37 (8) (2015) 1558–1570 . [20] A. Krizhevsky , I. Sutskever , G.E. Hinton , ImageNet classification with deep con-
volutional neural networks, in: Proceedings of the Advances in Neural Infor-
mation Processing Systems, 2012, pp. 1097–1105 . [21] H. Xu , J. Yan , N. Persson , W. Lin , H. Zha , Fractal dimension invariant filtering
and its CNN-based implementation, in: 2017 IEEE Conference on Computer Vi-sion and Pattern Recognition (CVPR), 2017, pp. 3825–3833 .
[22] G. Bertasius , J. Shi , L. Torresani , DeepEdge: a multi-scale bifurcated deep net-work for top-down contour detection, in: Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition, 2015, pp. 4380–4389 . [23] W. Shen , X. Wang , Y. Wang , X. Bai , Z. Zhang , DeepContour: a deep convolu-
tional feature learned by positive-sharing loss for contour detection, in: Pro-
ceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2015, pp. 3982–3991 .
[24] S. Xie , Z. Tu , Holistically-nested edge detection, in: Proceedings of the IEEEInternational Conference on Computer Vision, 2015, pp. 1395–1403 .
[25] Y. Liu , M.-M. Cheng , X. Hu , K. Wang , X. Bai , Richer Convolutional Features forEdge Detection, The IEEE Conference on Computer Vision and Pattern Recog-
nition (CVPR), 2017 .
[26] X. Wang , X. Duan , X. Bai , Deep sketch feature for cross-domain image retrieval,Neurocomputing 207 (2016) 387–397 .
[27] B. Yang , X. Zhang , L. Chen , H. Yang , Z. Gao , Edge guided salient object detec-tion, Neurocomputing 221 (2017) 60–71 .
[28] Y. Rubner , C. Tomasi , L. Guibas , A metric for distributions with applications toimage databases, in: Sixth International Conference on Computer Vision, 1998,
pp. 59–66 .
[29] T. Lindeberg , Scale-Space Theory in Computer Vision, Springer, 1993 . [30] M.-M. Cheng , G.-X. Zhang , N.J. Mitra , X. Huang , S.-M. Hu , Global contrast based
salient region detection, IEEE Trans Pattern Anal Mach Intell (2011) 409–416 . [31] X. Bai , S. Bai , Z. Zhu , L.J. Latecki , 3d shape matching via two layer coding, IEEE
Trans. Pattern Anal. Mach. Intell. 37 (12) (2015) 2361–2373 . [32] Y. Zhao , J. Zhao , J. Yang , Y. Liu , Y. Zhao , Y. Zheng , L. Xia , Y. Wang , Saliency
driven vasculature segmentation with infinite perimeter active contour model,
Neurocomputing 259 (2017) 201–209 .
C. Zhang et al. / Neurocomputing 275 (2018) 2702–2715 2715
[
[
[
[
[
[
[[
[
l
j
T
[33] R. Achanta , K. Smith , A. Lucchi , P. Fua , S. Süsstrunk , SLIC Superpixels, TechnicalReport No. 149300, EPFL, 2010 .
34] G.E. Hinton , S. Osindero , Y.-W. Teh , A fast learning algorithm for deep beliefnets, Neural Comput. 18 (7) (2006) 1527–1554 .
[35] N. Srivastava , R.R. Salakhutdinov , Multimodal learning with deep Boltzmannmachines, in: Proceedings of the Advances in Neural Information Processing
Systems, 2012, pp. 2222–2230 . 36] Y. LeCun , Y. Bengio , G. Hinton , Deep learning, Nature 521 (7553) (2015)
436–4 4 4 .
[37] X. Li , T.S. Lee , Y. Liu , Hybrid generative-discriminative classification using pos-terior divergence, in: Proceeding CVPR ’11 Proceedings of the 2011 IEEE Con-
ference on Computer Vision and Pattern Recognition, 2011, pp. 2713–2720 . 38] M.I. Jordan , Z. Ghahramani , T.S. Jaakkola , L.K. Sail , An introduction to vari-
ational methods for graphical models, in: Machine Learning, vol. 37, 1999,pp. 183–233 .
39] P. Smolensky , Information processing in dynamical systems: foundations of
harmony theory, in: Parallel Distributed Processing, Explorations in the Mi-crostructure of Cognition: Foundations, vol. 1, MIT Press, 1986, pp. 194–281 .
40] R. Salakhutdinov , G. Hinton , Deep Boltzmann machines, in: Proceedings of theInternational Conference on Artificial Intelligence and Statistics, vol. 5, 2009,
pp. 448–455 . [41] X. Li , B. Wang , Y. Liu , T.S. Lee , Learning discriminative sufficient statistics score
space for classification, in: Proceedings of the ECML/PKDD, vol. 8190, 2013,
pp. 49–64 . 42] R. Neal , G. Hinton , A view of the EM algorithm that justifies incremental,
sparse, and other variants, in: Learning in Graphical Models, Springer, Dor-drecht, 1999, pp. 355–368 .
43] L. Breiman , Random forests., Machine Learn. 45 (1) (2001) 5–32 . 44] L.E. Raileanu , K. Stoffel , Theoretical comparison between the Gini index and
information gain criteria, Ann. Math. Artif. Intell. 41 (1) (2004) 77–93 .
45] Z.-H. Zhou, J. Feng, Deep Forest: Towards An Alternative to Deep Neural Net-works, in: Proceedings of the Twenty-Sixth International Joint Conference on
Artificial Intelligence, IJCAI-17, 2017, pp. 3553–3559, doi: 10.24963/ijcai.2017/497 .
Chao Zhang is currently pursuing the Ph.D. degree with
the College of Information Science and Technology, Bei-jing Normal University, Beijing, China. He received the
M.S. and B.E. degree from the Department of Automa-tion, Shanghai Jiao Tong University, Shanghai, China. His
research interests are in computer vision and machinelearning.
Junchi Yan is with Shanghai Key Laboratory of Trustwor-
thy Computing, and School of Computer Science and Soft-ware Engineering, East China Normal University, and IBM
Research – China. He obtained the Ph.D. from Shanghai
Jiao Tong University. He has been entitled as IBM Mas-ter Inventor, and the recipient of IBM Research Division
Award, China Computer Federation Doctoral Dissertation Award, and the ACM China Doctoral Dissertation Nomi-
nation Award. His research interests are computer visionand machine learning. He is a member of IEEE, and also
ACM.
Changsheng Li is a full research professor from Universityof Electronic Science and Technology of China (UESTC).
Before that, he was an algorithm expert from iDST, Al-ibaba Group, and a Research Scientist from IBM Research
– China. He received his B.E. degree from the University
of Electronic Science and Technology of China (UESTC) in2008, and his Ph.D. degree in pattern recognition and in-
telligent system from the Institute of Automation, Chi-nese Academy of Sciences in 2013. He also studied as
a research assistant in The Hong Kong Polytechnic Uni-versity in 2009-2010. He joined IBM Research – China
in 2013. He received the IBM Research AccomplishmentAward in 2015. His research interests include machine
earning, data mining. Dr. Li has more than 30 refereed publications in international
ournals and conferences, including CVPR, AAAI, IJCAI, CIKM, MM, ICMR, TNNLS, TIP,C, PR, etc.
Rongfang Bie is currently a Professor at the College of In-formation Science and Technology of Beijing Normal Uni-
versity where She received her M.S. degree on June 1993and Ph.D. degree on June 1996. She was with the Com-
puter Laboratory at the University of Cambridge as a vis-
iting faculty from March 2003 for one year. She is the au-thor or co-author of more than 100 papers. Her current
research interests include knowledge representation and acquisition for the Internet of Things, dynamic spectrum
allocation, big data analysis and application etc.