Contour detection via stacking random forest...

14
Neurocomputing 275 (2018) 2702–2715 Contents lists available at ScienceDirect Neurocomputing journal homepage: www.elsevier.com/locate/neucom Contour detection via stacking random forest learning Chao Zhang a , Junchi Yan b,c,, Changsheng Li d , Rongfang Bie a a College of Information Science and Technology, Beijing Normal University, China b Shanghai Key Laboratory of Trustworthy Computing, East China Normal University, China c IBM Research, China d School of Computer Science and Engineering, University of Electronic Science and Technology of China, China a r t i c l e i n f o Article history: Received 5 April 2017 Revised 5 November 2017 Accepted 21 November 2017 Available online 28 November 2017 Communicated by Xiang Bai Keywords: Contour dectection Image processing Feature mapping a b s t r a c t Contour detection is an important and fundamental problem in computer vision which finds numerous applications. Despite significant progress has been made in the past decades, contour detection from natural images remains a challenging task due to the difficulty of clearly distinguishing between edges of objects and surrounding backgrounds. To address this problem, we first capture multi-scale features from pixel-level to segment-level using local and global information. These features are mapped to a space where discriminative information is captured by computing posterior divergence of Gaussian mix- ture models and sufficient statistics based on deep Boltzmann machine. Then we introduce a stacking random forest learning framework for contour detection. We evaluate the proposed algorithm against leading methods in the literature on the Berkeley segmentation and Weizmann horse data sets. Exper- imental results demonstrate that the proposed contour detection algorithm performs favorably against state-of-the-art methods in terms of speed and accuracy. © 2017 Elsevier B.V. All rights reserved. 1. Introduction Object contour is of prime importance as it contains essential visual information such as shape and identity that finds numerous applications. Contour detection is a fundamental problem in com- puter vision which is closely related to other tasks, e.g., segmenta- tion, shape discrimination, and object recognition [1–4]. In this work, we propose a learning algorithm for contour de- tection based on stacking random forest learning by using multi- level visual cues. We extract pixel-level features that integrate both local and global visual information. In addition, segment-level fea- tures are extracted to exploit structural information of contours. All the features are mapped to a space for selecting discriminative ones via a novel algorithm based on posterior divergence of Gaus- sian mixture models and sufficient statistics feature mapping based on deep Boltzmann machine. A stacking random forest learning classifier is trained based on these features for contour detection. We evaluate the proposed algorithm with state-of-art methods on several databases including the Berkeley segmentation [5] and Weizmann horse database (WHD) [6] and the Weizmann segmen- tation database (WSD) [7]. Experimental results bear out feature Corresponding author at: Shanghai Key Laboratory of Trustworthy Computing, East China Normal University, North Zhongshan Road Campus: 3663 N. Zhongshan Rd., Shanghai 200062, China E-mail address: [email protected] (J. Yan). selection from multi-scale visual cues via posterior divergence with a random forest classifier facilitates effective contour detection in natural images. In a nutshell, the main contributions of this paper are: 1 This paper presents the so-called sufficient image features in- cluding multi-scale pixel level and segment level for contour detection. This paper describes posterior divergence feature mapping us- ing Gaussian mixture model and sufficient statistics feature mapping based on deep Boltzmann machine. This paper introduces stacking random forest learning frame- work for contour detection. Moreover we perform comprehen- sive empirical studies on the performance of different feature extractors, feature mapping methods and classifiers and the stacking of their different combinations. The paper is organized as follows. Related work is introduced in Section 2. Section 3 presents the image features extraction method used in our model. Section 4 describes the proposed feature map- ping methods. Section 5 introduces the stacking random forest 1 The preliminary version of this paper has appeared in the conference paper [8–10]. We make several extensions including: (i) an updated introduction and re- lated work review on the recent development for contour detection; (ii) a new stacking forest learning framework for contour detection; (iii) updated and more comprehensive experimental results are reported with various ablation tests. https://doi.org/10.1016/j.neucom.2017.11.046 0925-2312/© 2017 Elsevier B.V. All rights reserved.

Transcript of Contour detection via stacking random forest...

Page 1: Contour detection via stacking random forest learningstatic.tongtianta.site/paper_pdf/9acd2b1e-e196-11e8-8224-00163e08bb86.pdfContour detection via stacking random forest learning

Neurocomputing 275 (2018) 2702–2715

Contents lists available at ScienceDirect

Neurocomputing

journal homepage: www.elsevier.com/locate/neucom

Contour detection via stacking random forest learning

Chao Zhang

a , Junchi Yan

b , c , ∗, Changsheng Li d , Rongfang Bie

a

a College of Information Science and Technology, Beijing Normal University, China b Shanghai Key Laboratory of Trustworthy Computing, East China Normal University, China c IBM Research, China d School of Computer Science and Engineering, University of Electronic Science and Technology of China, China

a r t i c l e i n f o

Article history:

Received 5 April 2017

Revised 5 November 2017

Accepted 21 November 2017

Available online 28 November 2017

Communicated by Xiang Bai

Keywords:

Contour dectection

Image processing

Feature mapping

a b s t r a c t

Contour detection is an important and fundamental problem in computer vision which finds numerous

applications. Despite significant progress has been made in the past decades, contour detection from

natural images remains a challenging task due to the difficulty of clearly distinguishing between edges

of objects and surrounding backgrounds. To address this problem, we first capture multi-scale features

from pixel-level to segment-level using local and global information. These features are mapped to a

space where discriminative information is captured by computing posterior divergence of Gaussian mix-

ture models and sufficient statistics based on deep Boltzmann machine. Then we introduce a stacking

random forest learning framework for contour detection. We evaluate the proposed algorithm against

leading methods in the literature on the Berkeley segmentation and Weizmann horse data sets. Exper-

imental results demonstrate that the proposed contour detection algorithm performs favorably against

state-of-the-art methods in terms of speed and accuracy.

© 2017 Elsevier B.V. All rights reserved.

s

a

n

S

u

p

1. Introduction

Object contour is of prime importance as it contains essential

visual information such as shape and identity that finds numerous

applications. Contour detection is a fundamental problem in com-

puter vision which is closely related to other tasks, e.g., segmenta-

tion, shape discrimination, and object recognition [1–4] .

In this work, we propose a learning algorithm for contour de-

tection based on stacking random forest learning by using multi-

level visual cues. We extract pixel-level features that integrate both

local and global visual information. In addition, segment-level fea-

tures are extracted to exploit structural information of contours.

All the features are mapped to a space for selecting discriminative

ones via a novel algorithm based on posterior divergence of Gaus-

sian mixture models and sufficient statistics feature mapping based

on deep Boltzmann machine. A stacking random forest learning

classifier is trained based on these features for contour detection.

We evaluate the proposed algorithm with state-of-art methods

on several databases including the Berkeley segmentation [5] and

Weizmann horse database (WHD) [6] and the Weizmann segmen-

tation database (WSD) [7] . Experimental results bear out feature

∗ Corresponding author at: Shanghai Key Laboratory of Trustworthy Computing,

East China Normal University, North Zhongshan Road Campus: 3663 N. Zhongshan

Rd., Shanghai 20 0 062, China

E-mail address: [email protected] (J. Yan).

[

s

c

https://doi.org/10.1016/j.neucom.2017.11.046

0925-2312/© 2017 Elsevier B.V. All rights reserved.

election from multi-scale visual cues via posterior divergence with

random forest classifier facilitates effective contour detection in

atural images.

In a nutshell, the main contributions of this paper are: 1

• This paper presents the so-called sufficient image features in-

cluding multi-scale pixel level and segment level for contour

detection.

• This paper describes posterior divergence feature mapping us-

ing Gaussian mixture model and sufficient statistics feature

mapping based on deep Boltzmann machine.

• This paper introduces stacking random forest learning frame-

work for contour detection. Moreover we perform comprehen-

sive empirical studies on the performance of different feature

extractors, feature mapping methods and classifiers and the

stacking of their different combinations.

The paper is organized as follows. Related work is introduced in

ection 2 . Section 3 presents the image features extraction method

sed in our model. Section 4 describes the proposed feature map-

ing methods. Section 5 introduces the stacking random forest

1 The preliminary version of this paper has appeared in the conference paper

8–10] . We make several extensions including: (i) an updated introduction and re-

lated work review on the recent development for contour detection; (ii) a new

tacking forest learning framework for contour detection; (iii) updated and more

omprehensive experimental results are reported with various ablation tests.

Page 2: Contour detection via stacking random forest learningstatic.tongtianta.site/paper_pdf/9acd2b1e-e196-11e8-8224-00163e08bb86.pdfContour detection via stacking random forest learning

C. Zhang et al. / Neurocomputing 275 (2018) 2702–2715 2703

Fig. 1. Framework of the stacking random forest learning (SRFL) for contour detection.

l

s

o

c

2

o

i

a

l

[

m

m

p

t

a

b

t

b

a

e

t

a

m

a

c

t

r

c

t

a

l

v

i

s

n

earning for contour detection. Fig. 1 shows the framework of the

tacking random forest learning for contour detection. Experiments

n natural images are depicted in Section 6 and Section 7 con-

ludes this paper.

. Related work

There exists a large body of work on contour extraction based

n different techniques including edge detection, perceptual group-

ng and saliency maps [1] . The classical Robert and Sobel oper-

tors [1] identify edges by convolving a gray-scale image with

ocal derivative filters whereas another classical work i.e., Canny

11] edge detector computes the image gradient and applies non-

axima suppression and hysteretic thresholds on the gradient

agnitude. The compass operator [12] uses distributions of color

ixels to determine the orientation of a diameter that maximizes

he difference between two halves of a circular window. Recently

n algorithm for estimating the probability of pixel on contour

oundary ( Pb ) [13] is proposed which uses a combination of fea-

Fig. 2. Image features: (a) input, (b) magnitude of gradient, (c) direction of gradient, (d

ures based on brightness, color, and texture. A tree-structured

oosted edge learning (BEL) method [14] is proposed by selecting

nd combining a large number of features across different scales

xtracted from image patches for edge and object boundary de-

ection. The gPb detector [5,15] combines multiple local cues in

probabilistic framework based on spectral clustering with two

ain components: the mPb detector based on local image analysis

t multiple scales, and the sPb detector based on the normalized

ut segmentation results. Papari and Petkov [16] use steerable fil-

ers to construct a model with an inhibition term to remove spu-

ious edges in textured regions. Yao et al. [17] learns a cascade

lassifier to incorporate both appearance and structure informa-

ion iteratively to class specific object contour detection. Dollár

nd Zitnick [18,19] use structured forests to detect edge. Convo-

utional neural networks (CNN) have been widely used to extract

isual features from images in recent years, and many promis-

ng progresses have been made for solving typical computer vi-

ion problems [20,21] . Bertasius et al. [22] use a multi-scale deep

etwork that consists of five convolutional layers and a bifurcated

) inhibition term, (e) brightness gradient, (f) color gradient, (g) compass operator.

Page 3: Contour detection via stacking random forest learningstatic.tongtianta.site/paper_pdf/9acd2b1e-e196-11e8-8224-00163e08bb86.pdfContour detection via stacking random forest learning

2704 C. Zhang et al. / Neurocomputing 275 (2018) 2702–2715

s

i

i

p

a

o

3

s

i

a

v

t

n

r

o

g

a

d

i

b

fully-connected sub-network to detect contour. Shen et al. [23] in-

troduce a deep convolutional feature learned by positive-sharing

loss for contour detection. Xie and Tu [24] present a deep learn-

ing model that leverages fully convolutional neural networks and

deeply-supervised nets for detecting edge. RCF-MS [25] uses richer

convolutional features to detect contours and achieves state-of-the-

art performance. However these deep neural network [26] based

methods in general have to learn their parameters on a relatively

large dataset and with enough training epochs to obtain represen-

tative filters in convolution layers.

3. Image features for contour detection

We first describe the image features for learning a contour de-

tector in the proposed algorithm shown in Fig. 1 . These features

have been used for representing edge information of gray scale

and color images. In this work, all the features are integrated for a

more effective image representation.

3.1. Pixel-level features

Pixel-level features provide raw and basic visual cues for de-

tection of object contours. To capture effective visual information

on the pixel level, we extract local and global features at multiple

Fig. 3. Recalls of object contours using the BSDS500 data set: (a) recall of each training

number of superpixels.

Fig. 4. With 200 superpixels: (a) input, (b) superpixel, (c) edges i

cales. Local visual cues extracted from edges are first exploited

n order to account for object contours at different scales. Global

nformation extracted from visual saliency is then incorporated to

rovide cues of salient objects [27] in the scenes. These features

re integrated to form effective pixel-level features to represent

bject contours.

.1.1. Multi-scale point features

Basic point (pixel) features have been widely used for repre-

enting edge information of grayscale and color images such as

mage gradients, texture inhibition, brightness and color gradients

s well as compass operators. As these features capture different

isual information, we extract and combine these features for con-

our detection.

Image gradient. Each image I is convolved with a Gaussian ker-

el of width σ to compute its gradient, ∇I . The magnitude | ∇I |

eveals the strength of an edge at each pixel and the direction

r angle θ∇

contains the intensity discontinuity information. Both

radient magnitude and direction are used as features for learning

contour detector (See Fig. 2 (b) and (c)). Texture inhibition. In or-

er to remove small edges in highly textured regions, we use the

nhibition term that suppress the response on the texture regions

ased on steerable filters [16] . It is computed as the convolution of

image from the BSDS500 data set, (b) mean recall values with respect to different

n source image, (d) all edges, (e) segment, (f) ground truth.

Page 4: Contour detection via stacking random forest learningstatic.tongtianta.site/paper_pdf/9acd2b1e-e196-11e8-8224-00163e08bb86.pdfContour detection via stacking random forest learning

C. Zhang et al. / Neurocomputing 275 (2018) 2702–2715 2705

Fig. 5. A three-layer deep Boltzmann machine [40] .

t

t

w

b

V

c

d

t

u

p

c

t

w

M

a

t

W

t

i

a

w

c

t

f

(

d

w

γ

i

i

c

b

c∑

Table 1

F-measures of contour detection on BSDS500 for different

parts.

BSDS500

ODS OIS AP

MG + DG+IT+CG+BG+CO+Mulit 0.70 0.72 0.75

MG + DG+IT+CG+BG+CO a 0.69 0.71 0.72

MG + DG+IT+CG+BG 0.68 0.70 0.71

MG + DG+IT+CG 0.67 0.69 0.70

MG + DG+IT 0.65 0.67 0.68

a MG+DG+IT+CG+BG+CO in fact is the method pre-

sented in the conference version of this paper [8] ,

whereby it is titled as CDRF for shot.

w

c

E

t

t

t

c

o

d

b

G

v

r

a

r

w

r

e

u

g

3

b

t

e

a

c

[

b

s

t

a

w

t

f

m

m

s

m

n

o

t

3

u

he Gaussian gradient magnitude with the inhibition term:

(x, y ) = { V 0 ∗ |∇I|} (x, y ) + re { e 2 iθ∇ (x,y ) [(V 2 ∗ |∇I| )(x, y )] } , (1)

here re( ·) returns the real part and the steering basis are defined

y

0 (ρ, φ) =

ρ2

2

, V 2 (ρ, φ) =

ρ2

2

e 2 iφ, (2)

ontrolled by two parameters ρ and φ. It can be shown that the

ifference of magnitude of Gaussian gradient and the inhibition

erm is difference of Gaussian (DoG) functions [16] , and they are

sed as features for learning a contour detector (See Fig. 2 (d)).

Brightness and color gradient. The pb and gpb methods [5,13] ex-

loit brightness, color, texture and segmented regions to detect

ontours, and achieve the state-of-the-art performance. However,

he image segmentation process is time consuming. For efficiency

e exploit the brightness and color gradient features. Similar to

artin et al. [13] , we use a circular disc of radius r at pixel ( x, y )

nd split into two half discs by a diameter at angle θ and represent

hem with histograms of brightness and color in the CIELAB space.

e compute the χ2 -distance between two histograms of half discs

o compute the oriented gradient G ( x, y, θ , r ), thereby encod-

ng both the brightness and color gradient features (See Fig. 2 (e)

nd (f)).

Compass operator. The compass operator [12] detects edges

ithout assuming that the regions on both sides have constant

olor by exploiting pixel distribution rather than the means. It de-

ermines the orientation of a diameter which maximizes the dif-

erence between two half discs of a circular compass at each pixel

x, y ). The distance between two color signatures is computed by

i j = 1 − exp (−E i j /γ ) , (3)

here E ij is the Euclidean distance between color i and color j , and

is a constant. A distribution of color on either side of an edge

s represented by a color signature which is a set of point mass

n the CIELAB color space (i.e., color pixels). The distance between

olor signatures of equal mass of half discs S 1 and S 2 are computed

y aggregating the earth mover’s distance (EMD) [28] between the

olor signatures of every pair of colors i and j which minimizes

i ∈ S 1

j∈ S 2 d i j f i j , (4)

here f ij indicates the flow between color i and j subject to all the

onstraints that move all the mass from S 1 to S 2 [28] . The resulting

MD can be represented as a function f ( θ ) (0 °≤ θ ≤ 180 °) that finds

he orientation of a diameter to maximize the difference between

wo half discs, i.e., ˆ θ = arg max θ f (θ ) (See Fig. 2 (g)).

Multi-scale representation. We extract the above-mentioned fea-

ures on every point of an image and integrate them to detect

ontours. In order to deal with the scale-space problem [29] . we

btain local features at different scales by changing the standard

eviation of image gradient and texture inhibition, direction of

rightness and color gradient, and the standard deviation of a

aussian derivative in the compass operator. These features pro-

ide rich descriptions of image details at different levels, thereby

endering a multi-scale representation. As edges can be extracted

t different scales, several pixels on one edge have equally strong

esponse and they should be considered to describe contours. Thus,

e extract local features at three different scales.

These features have been found empirically very useful for rep-

esenting edge information. Image features such as, Image Gradi-

nt, Texture Inhibition Brightness and Color Gradient etc. are also

sed by most state-of-the-art contour detectors such as Pb [13] ,

Pb [5,15] .

.1.2. Multi-scale global features

It has been shown that object contours can be better extracted

y incorporating global information (e.g., the gPb method [5,15] )

han simply local visual cues (e.g., the pb algorithm [13] ). How-

ver, existing methods that exploit global information (e.g., gPb)

re often time consuming. For efficiency and effectiveness, we in-

orporate global visual saliency [30] in our approach. Cheng et al.

30] present a simple and efficient saliency extraction algorithm

ased on region contrast which exploits histogram contrast and

patial information. Each image is first segmented into regions and

he saliency value is computed by measuring its color contrast to

ll other regions in the image: S(r k ) =

r k � = r i w (r i ) D r (r k , r i ) , where

( r i ) is the weight of region r i and D r ( ·, ·) is the color distance be-

ween the two regions. The weighting term can increase the ef-

ects of closer regions and decrease those farther regions. With this

ethod, the distinctness of each pixel is described in a saliency

ap S .

Given a pixel I ( x, y ), we consider the local contrast of the

aliency values with respect to its four neighbors. We take the

aximum value of the difference between saliency values of its

eighbors. With this saliency contrast (SC) feature, the difference

f saliency values is maximized when the pixel is right on the con-

our, thereby facilitating boundary detection.

.2. Segment-level features

While the pixel-level features described in Section 3.1 can be

tilized to determine contour points, structural cues [31] such as

Page 5: Contour detection via stacking random forest learningstatic.tongtianta.site/paper_pdf/9acd2b1e-e196-11e8-8224-00163e08bb86.pdfContour detection via stacking random forest learning

2706 C. Zhang et al. / Neurocomputing 275 (2018) 2702–2715

Fig. 6. Precision-recall curves on BSDS500 for different feature combinations.

e

o

l

s

c

f

4

i

7

t

O

s

B

f

m

4

p

p

p

c

v

G

t

r

m

f

4

a

t

p

t

e

p

h

f

b

c

e

a

w

p

t

E

c

t

segments [32] contain important information more than pixel-

wise evidence. Toward this goal, we compute superpixels to extract

structural segments with the SLIC algorithm [33] , which has been

verified to perform well terms of efficiency and effectiveness. The

SLIC clusters pixels in the five-dimensional including three color

channels in CIELAB color space ( lab ) and two position values ( xy )

by introducing a new distance measure method in the 5D space.

After getting the segmentation result from SLIC, we use the edges

from superpixels as our segment-level features. Point features de-

scribed in Section 3.1 are used to describe edge pixels on the line

fragment, and segment-level features are then extracted by com-

puting their mean value, variance and differences in this work.

Similar to the scale-space problem for edge detection, the im-

age structure of a scene that can be exploited hinges on the num-

ber of superpixels. Fig. 3 (a) shows the recalls of contours increase

as more superpixels are used using 300 training images from the

BSDS500 data set. Fig. 3 (b) shows the mean recalls of the same

data set with different number of superpixels.

In this paper, we vary the number of superpixels (from 200 to

20 0 0) to extract segments at different scales. Fig. 4 shows one ex-

ample how segments are extracted when 200 superpixels are used.

From the superpixel results, edges can be extracted ( Fig. 4 (c) and

(d)) based on cluster value of each point ( Fig. 4 (b)) with respect

to its neighborhood. When the pixels within a neighborhood of a

point belong to more than two clusters, it indicates the existence

of a endpoint (e.g., the point on the T-junctions or Y-junctions of

Fig. 4 (b)). On the other hand, when the pixels within a neighbor-

hood of a point belong to exactly two clusters, it indicates the ex-

istence of a segment point. Thus, segments and endpoints can be

extracted as denoted by different colors in Fig. 4 (e) for contour

Fig. 7. Contour detection results with an

xtraction. We determine whether a pixel belongs to a segment

r not and then concatenate segment-level features and the pixel

evel features.

At each point, as described in Section 3.1 , 18 (6 features at 3

cales) local and 1 global pixel-level features are extracted. We

ompute the mean, variance, minimum and maximum values of 19

eatures from all the points on a segment. In addition, we compute

local statistics (mean, variance, minimum and maximum values)

n the neighborhood of the corresponding segment and obtain a

6-dimensional feature. Fig. 4 (f) shows the ground truth of con-

our for comparison with the segments extracted from our method.

ne advantage of our approach is that edge thinning is not neces-

ary and instead we directly operate on pixels to extract segments.

y controlling the number of generated superpixels, segment-level

eatures at different scales can be obtained, and the smallest seg-

ent is a pixel itself.

. Feature mappings for contour detection

By using both generative and discriminative model, we pro-

ose a generative discriminative scheme to build a feature map-

ing to extract more useful feature information in Fig. 1 the second

art. We will introduce our feature mapping based generative dis-

riminative information in Section 4.1 and we call it posterior di-

ergence feature mapping based on Gaussian mixture model (PD-

MM). Deep learning has strong ability to learn feature represen-

ation [34–36] , and it could use multiple processing layers to learn

epresentations of data. In Section 4.2 , we build a novel feature

apping to represent features and we call it sufficient statistics

eature mapping based on deep Boltzmann machine (SS-DBM).

.1. Feature mapping via posterior divergence

To extract discriminative information from features, we propose

mapping method based on the log likelihood of a Gaussian mix-

ure model (GMM) in which the parameters are estimated via their

osterior divergence (PD) in an incremental expectation maximiza-

ion (EM) formulation. The posterior divergence approach is a gen-

rative discriminative scheme that determines one or a few sam-

les to update the model in every iteration of the EM step which

as been shown to be effective in several tasks [37] . We trans-

orm the vectors formed by point-level and segment-level features

ased this mapping to obtain more discriminative information for

ontour detection . While our method bears some similarity to Li

t al. [37] , the generative models and derivations for feature maps

re different.

Compared with the convolutional neural network (CNN) layers

idely used in deep neural network, the proposed feature map-

ing model involves less model parameters and thus requires less

raining data. Moreover the model can be readily trained with the

M method which is empirically found straightforward to train.

Let x ∈ R

D be the observed random variable. In the context of

ontour detection, x denotes the combination of multi-scale fea-

ures. Let z = { z 1 , . . . , z K } be the hidden variable, where z k = 1 if

d without multi-feature extraction.

Page 6: Contour detection via stacking random forest learningstatic.tongtianta.site/paper_pdf/9acd2b1e-e196-11e8-8224-00163e08bb86.pdfContour detection via stacking random forest learning

C. Zhang et al. / Neurocomputing 275 (2018) 2702–2715 2707

Table 2

F-measures of contour detection on BSDS500 for different

feature mappings.

BSDS500

ODS OIS AP

PD-GMM-RF 0.72 0.74 0.78

SS-DBM-RF 0.71 0.72 0.76

MG + DG+IT+CG+BG+CO+Mulit 0.70 0.72 0.75

t

o

b

P

w

E

t

a

w

Q

d

f

F

p

s

i

L

L

w

r

S

h

L

w

t

fi

t

a

w

m

l

Fig. 8. Precision-recall curves on BSDS500 for different feature mappings.

he k th mixture center is selected to generate a sample and z k = 0

therwise. The joint distribution of Gaussian mixture models can

e expressed as

(x , z | θ ) =

K ∏

k =1

N(x ; u k , �k ) z k

K ∏

k =1

a z k k

, (5)

here a = (a 1 , . . . , a K ) � are the mixture prior satisfying a k =

P(z ) [ z k ] ; u k and �k respectively are the mean and variance ma-

rix of the k th mixture center.

For any observed sample x t , similar to Jordan et al. [38] , we

ssume that the posterior distribution of z takes the same from

ith its prior P ( z ) but with different parameter g t = (g t 1 , . . . , g t

K ) � ,

t (z ) =

K ∏

k =1

g z k k

. (6)

With the above joint distribution and approximate posterior

istribution, the free energy function F of the sample x t can be

ormulated with variational learning [38] :

(Q

t , θ ) = E Q t (z )

[ K ∑

k =1

z k

(

D ∑

d=1

− (x t d

− u d ) 2

2 δ2 d

− log √

2 πD ∏

d=1

δD/ 2

d

)

+

K ∑

k =1

z k log g t

k

a k

] . (7)

Let θ be the model estimated from a set of N − 1 training sam-

les X = { x i } N−1 i =1

, and θ+ t be the model estimated from a set of N

amples X ∪ { x t } . The log likelihood of the EM algorithm for the

nput sample x t is

t =

N ∑

i =1

[−F(Q

i + t , θ+ t )

]−

N ∑

i � = t

[−F(Q

i , θ ) ]

=

N ∑

i =1

(−E Q i + t (z )

[log

Q

i + t (z )

P (x

t | z , θ+ t ) P (z | θ+ t )

])

−N ∑

i � = t

(−E Q i (z )

[log

Q

i (z )

P (x

t | z , θ ) P (z | θ )

]). (8)

After rearranging the random variables, we have

t =

[

N ∑

i =1

E Q i + t (z ) log P (x

t | z , θ+ t )−

N ∑

i � = t E Q i (z ) log P (x

t | z , θ )

]

︸ ︷︷ ︸ x −cross entropy

+

[

N ∑

i =1

E Q i + t (z ) log P (z| θ+ t ) −N ∑

i � = t E Q i (z ) log P (z | θ )

]

︸ ︷︷ ︸ z−cross entropy

−[

N ∑

i =1

E Q i + t (z ) log Q

i + t (z ) −

N ∑

i � = t E Q i (z ) log Q

i (z )

]

︸ ︷︷ ︸ z−entropy

, (9)

here the cross entropy term measure the fitness of a sample to

andom variables and the entropy term measure the uncertainty.

imilar to Li et al. [37] , we assume that Q

i + t (z ) = Q

i (z ) , and thus

ave

t =

⎢ ⎢ ⎢ ⎢ ⎣

N ∑

i � = t E Q i (z ) log

P (x

t | z , θ+ t ) P (x

t | z , θ ) ︸ ︷︷ ︸ �pd

x

+ E Q t (z ) log P (z | θ+ t ) � f it x ︸ ︷︷ ︸ ⎤

⎥ ⎥ ⎥ ⎥ ⎦

+

⎢ ⎢ ⎢ ⎢ ⎣

N ∑

i � = t E Q i (z ) log

P (z | θ+ t ) P (z | θ ) ︸ ︷︷ ︸

�pd z

+ E Q t (z ) log P (z | θ+ t ) ︸ ︷︷ ︸ � f it

z

−E Q t (z ) log Q

t (z ) ︸ ︷︷ ︸ ]

�ent z

, (10)

here the posterior divergence �pd measures how much x affects

he model, the fitness function �fit measures how well the sample

ts the model, the entropy function �ent measures how uncertain

he fitting is. The feature mapping given by posterior divergence

re derived as follows:

pd x =

N ∑

i � = t

K ∑

k =1

D ∑

d=1

g i k

(− (x t

d − u d, + t ) 2

2 δ2 d, + t

+

(x t d

− u d ) 2

2 δ2 d

+ δD/ 2

d, + t − δD/ 2

d

)=

D ∑

d=1

�pd x d

, (11)

here �pd x is further decomposed to D terms according to the di-

ension of x , and �pd x d

measures how x d affects the model. Simi-

arly, we have:

f it x =

K,D ∑

k,d=1

g t k

(− (x t

d −u d, + t ) 2

2 δ2 d, + t

+δD/ 2

d, + t log √

2 π

)

=

D ∑

d=1

� f it x d

, (12)

Page 7: Contour detection via stacking random forest learningstatic.tongtianta.site/paper_pdf/9acd2b1e-e196-11e8-8224-00163e08bb86.pdfContour detection via stacking random forest learning

2708 C. Zhang et al. / Neurocomputing 275 (2018) 2702–2715

Table 3

F-measures of contour detection on BSDS500 for

different contour detection algorithms.

BSDS500

ODS OIS AP

Human 0.80 0.80 −SRFL 0.73 0.76 0.79

RCF-MS [25] 0.81 0.82

HED [24] 0.78 0.80 0.83

DeepContour [23] 0.76 0.78 0.80

DeepEdge [22] 0.75 0.77 0.81

SE + MS+SH [19] 0.75 0.77 0.80

SE-MS, T = 4 [18] 0.74 0.76 0.78

PD-GMM-RF 0.72 0.74 0.78

gPb [5] 0.71 0.74 0.65

BEL [14] 0.66 0.67 0.68

Canny [11] 0.60 0.63 0.58

Compass operator [12] 0.49 0.53 0.36

Fig. 9. Precision-recall curves on BSDS500 for different contour detection algo-

rithms.

w

p

P

w

Z

T

l

e

P

w

s

(

α

a

t

p

E

w

c

Q

S

n

Q

C

t

α

where � f it x d

measures how well x d fits the model.

The feature mapping according to the hidden variable z can be

derived as follows:

�pd z =

N ∑

i � = t

K ∑

k =1

g i k log a k, + t

a k =

K ∑

k =1

�pd z k

, (13)

where �pd z k

=

∑ N i � = t g

i k

log a k, + t

a k .

� f it z =

K ∑

k =1

g t k log a k, + t =

K ∑

k =1

� f it z k

, (14)

�ent z =

K ∑

k =1

g t k log g t k =

K ∑

k =1

�ent z k

. (15)

Therefore for the input x t , we obtain a set of feature mappings:

�t = vec ({ �pd

x d , � f it

x d , �pd

z k , � f it

z k , �ent

z k } d,k

). (16)

To extract discriminative information from multi-scale features,

we map multi-scale features to the space via Eq. (16) instead

of simply stacking features in a long vector. The reasons for us-

ing this mapping are twofold. First, this feature mapping includes

a data normalization procedure which reduces the metric differ-

ence among different features. The normalization is carried out by

(x t d

− u d ) 2 / 2 δ2

d , with which the derived feature mapping only re-

sponses to the relative quantities with respect to the mean and

variance. Second, this feature mapping exploits the hidden vari-

able z which encodes additional information, i.e., cluster or mix-

ture center which is informative in image representation (e.g., bag-

of-words).

4.2. Sufficient statistics feature mapping based on deep Boltzmann

machine

Restricted Boltzmann machines (RBM) [39] is a generative ran-

dom neural network with two-layer architecture, one visible layer

and one hidden layer.

By combining multiple RBMs, the multi-layer Deep Boltzmann

Machine (DBM) is constructed [40] , as shown in Fig. 5 . For exam-

ple, the energy of the joint configuration ( v, h ) of three-layer is

given by:

E ( v , h | θ ) = −v T W

1 h

1 − h

1 T W

2 h

2 − h

2 T W

3 h

3 (17)

where v = (v 1 , v 2 , . . . , v m

) represents observable units; h 1 =(h 1 , h 2 , . . . , h n ) , h 2 = (h 1 , h 2 , . . . , h l ) and h 3 = (h 1 , h 2 , . . . , h k ) are

hidden units; h = [ h 1 , h 2 , h 3 ] ; W

1 , W

2 and W

3 are real valued

eights between different layers; θ = [ W

1 , W

2 , W

3 ] are the model

arameters. The joint distribution is:

(v , h | θ ) =

1

Z(θ ) exp (−E(v , h | θ )) (18)

here

(θ ) =

v ,h

exp (−E(v , h | θ ))

o obtain the sufficient statistics feature mapping [41] of multi-

ayer DBM, the joint distribution P ( v, h | θ ) can be derived to the

xponential family format:

(v , h | θ ) =

1

Z(θ ) exp (−E(v , h | θ ))

= exp (α(θ ) T T (x, h ) + A (θ )) (19)

here α( θ ) is a vector-valued function; T ( x, h ) is the sufficient

tatistics; A ( θ ) is a scalar function. Substituting Eq. (17) into Eq.

19) and after simple derivation steps:

(θ ) T T (x, h ) = v T W

1 h

1 + h

1 T W

2 h

2 + h

2 T W

3 h

3 (20)

nd A (θ ) = − ln (Z(θ )) .

Owing to P (v , h ) = P (v | h ) P (h ) , P ( h ) also can format to exponen-

ial family. Let Q ( h t ) be the approximate distribution of the real

osterior P ( h ), the energy function of Q ( h t ) is:

h (h

t | θh ) = −h

1 tT W

1 t h h

2 t − h

2 tT W

2 t h h

3 t (21)

here θh = [ W

1 t h , W

2 t h ] are the model parameters. Therefore Q ( h t )

an be derived as follow:

(h

t | θh ) =

1

Z(θh ) exp (−E(h

t )) (22)

imilar with Eq. (19) , Q ( h t | θh ) can also be expressed as the expo-

ential form:

(h

t | θh ) = exp (α(θh ) T T (h

t ) + A (θh )) (23)

ombining Eqs. (21) –(23) , the sufficient statistics and scalar func-

ion of Q ( h t ) are derived:

(θ t ) T T (h

t ) = h

1 tT W

1 t h

2 t + h

2 tT W

2 t h

3 t (24)

h h h
Page 8: Contour detection via stacking random forest learningstatic.tongtianta.site/paper_pdf/9acd2b1e-e196-11e8-8224-00163e08bb86.pdfContour detection via stacking random forest learning

C. Zhang et al. / Neurocomputing 275 (2018) 2702–2715 2709

Input Image

ground truth

SRFL

PD-GMM-RF

gPb [5]

BEL [14]

Canny [11]

compass operator [12]

texture inhibition [16]

dcba

Fig. 10. Sample experimental results from the Berkeley segmentation data set.

Page 9: Contour detection via stacking random forest learningstatic.tongtianta.site/paper_pdf/9acd2b1e-e196-11e8-8224-00163e08bb86.pdfContour detection via stacking random forest learning

2710 C. Zhang et al. / Neurocomputing 275 (2018) 2702–2715

W

h

p

t

t

f

s

e

a

S

Z

a

o

a

d

a

i

f

f

A

f

5

m

o

a

p

s

t

i

e

s

m

f

w

t

i

p

m

c

(

r

t

Having the above formulas, the free energy lower bound of

log P ( x t ; θ ) [41,42] can be expressed as:

F t (Q, θ ) = E Q(h t ) [ α(θ ) T T (x t , h

t ) + A (θ )

− α(θ t h )

T T (h

t ) + A (θ t h )]

= E Q(h t ) [ v tT W

1 h

1 t + h

1 tT W

2 h

2 t

+ h

2 tT W

3 h

3 t − h

1 tT W

1 t h h

2 t

−h

2 tT W

2 t h h

3 t − ln (Z(θ )) + ln (Z(θh ))] (25)

Define reshape operator R and

R mn = [ W 11 , W 12 , . . . , W 1 n , . . . , W mn ]

T (26)

v m

h n = [ v 1 h 1 , v 1 h 2 , . . . , v 1 h n , . . . , v m

h n ] T (27)

Reformat Eq. (25) by using the reshape operator R and :

F t (Q, θ ) = E Q(h t ) [ W

1 R (v t h

1 t ) + W

2 R (h

1 t h

2 t )

+ W

3 R (h

2 t h

3 t ) − W

1 tR h (h

1 t h

2 t )

−W

2 tR h (h

2 tT h

3 t ) − ln (Z(θ ))

+ ln (Z(θh ))]

= W

1 R E Q(h t ) [ v t h

1 t ]

+ W

2 R E Q(h t ) [ h

1 t h

2 t ]

+ W

3 R E Q(h t ) [ h

2 t h

3 t ]

−W

1 tR h E Q(h t ) [ h

1 t h

2 t ]

−W

2 tR h E Q(h t ) [ h

2 tT h

3 t ]

− ln Z(θ ) + ln Z(θh )]

= ηT E Q(h t ) [ φ(v t , h

t )] = ηT �(v t ) (28)

where the vector η only depends on the parameter θ ,

η =

(W

1 R , W

2 R , W

3 R , W

1 tR h , W

2 tR h , − ln Z(θ ) , ln Z(θh )

)T (29)

and the vector,

φ(v t , h

t ) = (v t h

1 t , h

1 t h

2 t , h

2 t h

3 t , h

1 t h

2 t , h

2 tT h

3 t , 1 , 1) T

is a function over v t and h t , depending on v t . Therefore the feature

mapping takes the following form,

�(v t ) = E Q(h t ) [ φ(v t , h

t )] (30)

Eq. (30) is regarded as sufficient statistic feature mapping owing to

it is constructed by T ( v, h ) and T ( h ).

The advantages of our proposed approach are threefold. First, it

can well adapt to the data, with the ability inherited from prob-

abilistic models. Second, it is able to exploit the generative infor-

mation such as data distribution and hidden information for detec-

tion. Third, it comprehensively summarizes both visible units and

Table 4

F-measures of contour detectors in WSD and WHD.

(a)

Weizmann horse data set

ODS OIS AP

SRFL 0.65 0.66 0.64

gPb [5] 0.56 0.58 0.47

BEL [14] 0.52 0.54 0.50

Compass operator [12] 0.35 0.36 0.23

(b)

Weizmann segmentation data set

ODS OIS AP

SRFL 0.59 0.63 0.56

gPb [5] 0.54 0.58 0.45

BEL [14] 0.46 0.46 0.39

Compass operator [12] 0.23 0.25 0.09

a

f

v

a

b

f

e

t

s

s

o

i

b

f

c

f

t

d

idden units of DBM and subsequently fully exploits DBM, while

revious approaches only use a certain layer of units.

We extract useful features related to contour detection firstly

hen we use random forest to train a model to detect contours, we

hink better features much better contour detection results. So we

ocus on feature working on most part of the paper including ba-

ic features, feature mapping etc. These features have been found

mpirically very useful for representing edge information, as well

s the two unsupervised feature mappings: PD-GMM and SS-DBM.

ome empirical study references (which have been cited) are in

hang et al. [9,10] .

It is true that unsupervised learning features may not perform

s well as supervised learning features for specific tasks. On the

ther hand, it may also suffers from overfitting (when the training

nd testing data are much different) or underfitting (when label

ata is insufficient especially considering pixel level labeling is re-

lly tedious). We try to make a balance by adopting unsupervised

n the low level feature extraction and mapping stages, while in-

use supervised learning via the stacking random forest. The per-

ormance of this design is empirically verified by our experiments.

lso, there are some prior art showing the strength of our adopted

eatures [37] .

. Stacking random forest learning for contour detection

A random forest [43] is an ensemble classifier consisting of nu-

erous decision trees where the class label is determined based

n the mode of the outputs by individual trees. Random forest

lgorithms have been shown to deal with large amount of data

oints effectively and efficiently. The Gini ratio [44] can be used to

plit the training examples so that the descendant trees are “purer”

han their parents.

Concurrently, a recent forest model [45] has shown promis-

ng performance compared with deep neural networks on differ-

nt tasks from different domains. However, forest model is more

calable and efficient than current deep neural networks based

odels. Our approach follows a similar stacking design principle

or contour detection and the framework is shown in Fig. 1 . First

e use the image feature extractors in Section 3 to get basic fea-

ures a . Then we map those features by two feature mappings in

n Section 4 to get more informative features. Specifically we use

osterior divergence feature mapping based on Gaussian mixture

odel (PD-GMM) in Section 4.1 to get features part b and use suffi-

ient statistics feature mapping based on deep Boltzmann machine

SS-DBM) in Section 4.2 to get features part c . Next we use four

andom forests respectively to detection contours based on fea-

ures part b and c and get those results to form two vectors d and e

s new features. d means global contour results from four random

orest contour detectors and this vector includes a list probability

alues about pixel in the image, so d forms a vector. Similarly e is

nother vector. Following we concatenate those four parts feature

, c, d and e to develop a final feature f . At last we train a random

orest by using those feature to get a final contour detector.

This stacking random forest learning could be seen as two lay-

rs framework for contour detection. In the stage of train, to ensure

he sub-forest could get gainful contour information, we train each

ub-forest separately and use supervised learning strategy. So each

ub-forest in the first layer could be seen as base learner for sec-

nde layer’s ensemble learning. The output of each random forest

n the first layer is contour detection results includes a list proba-

ility values about pixel in the image. We train our final random

orest for contour detection in the second layer in our learning ar-

hitecture by leveraging both the results from first layer and the

eatures directly from the feature mapping in previous stage. Af-

er we train all the random forests, we could use them directly to

etect contours with the tuned parameters.

Page 10: Contour detection via stacking random forest learningstatic.tongtianta.site/paper_pdf/9acd2b1e-e196-11e8-8224-00163e08bb86.pdfContour detection via stacking random forest learning

C. Zhang et al. / Neurocomputing 275 (2018) 2702–2715 2711

Fig. 11. Precision-recall curves for different contour detection algorithms. (a) WHD data set. (b) WSD data set.

6

s

5

r

t

c

o

p

t

b

t

d

t

t

(

p

t

g

g

t

fi

c

S

b

s

S

fi

i

f

s

c

p

f

i

d

t

r

t

[

o

[

r

d

r

o

C

d

t

f

m

t

[

m

i

t

s

a

a

a

g

b

g

t

n

G

. Experiments

We evaluate the proposed algorithm for contour detection on

everal data sets. The Berkeley segmentation data set [5] includes

00 images of 481 × 321 pixels and human labeled segmentation

esults. For fair comparisons, we use 300 images for training and

he remaining ones for tests [5] . The pixels on the ground-truth

ontours of the training set are used as positive examples, whereas

ther pixels are utilized as negative examples. All experiments are

erformed on a machine with 3.10 GHz CPU and 8 GB memory.

We use the precision-recall curve as measurement with respect

o human labeled ground truth, and their F-measures computed

y 2 ·Precision ·Recall Precision + Recall

. As the segmented regions and contours are de-

ermined based on a choice of scale, we first use a training set to

etermine the optimal data set (ODS) scale and fix it for all the

est images [5] . We also evaluate the performance when the op-

imal image scale (OIS) for each image, and the average precision

AP) on the full recall range [5] .

We first evaluate different basic feature combination of the pro-

osed algorithm for contour detection in Section 3 . The image fea-

ures in Fig. 2 includes magnitude of gradient (MG), direction of

radient (DG), inhibition term (IT), brightness gradient (BG), color

radient (CG), compass operator (CO) and multi-scale represen-

ation and we train those features by random forest as classi-

er to detect contour. Table 1 and Fig. 6 show the F-measures of

ontour detection results using different feature combinations in

ection 3.1 . The results show that contours can be better detected

y adding more features. Fig. 7 shows one contour detection re-

ults using basic features.

Then we compare the two feature mapping algorithms in

ection 4 with basic features in Section 3 shown in Fig. 1 . We

rst use random forest to train a contour detector with feature

n Section 3 . Then we train a contour detector by using random

orest with posterior divergence feature mapping based on Gaus-

ian mixture model (PD-GMM-RF) 2 in Section 4.1 and another

ontour detector by using random forest with sufficient statistics

2 PD-GMM-RF in fact is the method presented in the conference version of this

aper [9] , whereby it is titled as MCDRF for shot.

p

w

eature mapping based on deep Boltzmann machine (SS-DBM-RF) 3

n Section 4.2 . Table 2 and Fig. 8 show the F-measures of contour

etection results for different feature mapping. The results show

he effectiveness of our feature mapping algorithms.

We evaluate the proposed stacking random forest learning algo-

ithm (SRFL) for contour detection against other methods including

he compass filter [12] , Canny edge detector [11] , gPb [5,15] , BEL

14] , SE-MS [18] , SE+MS+SH [19] methods and deep learning meth-

ds like RCF-MS [25] , HED [24] , DeepContour [23] and DeepEdge

22] using the BSDS500 data set. Table 3 shows the precision-

ecall curves and the F-measures with different thresholds. The

eep neural network methods [22–25] achieve the most high accu-

acy because they allow more learning capacity, while at the cost

f more labeled training data and computational overhead. In 2012

DRF [8] proves the effectiveness of detecting contours using ran-

om forest. In 2013, Dollár and Zitnick [18,19] begin to use struc-

ured learning approach to detect edges and SE [18,19] also per-

orms better than us.

Fig. 9 shows the precision-recall curves with respect to hu-

an labeled ground truth. The proposed SRFL algorithm achieves

he highest F-measures and average precision than compass filter

12] , Canny edge detector [11] , gPb [5,15] , BEL [14] . While the gPb

ethod performs well in terms of accuracy, the computational load

s significant. On average, the proposed SRFL method is about 6

imes faster than the gPb method (30 and 180 s on MATLAB re-

pectively). It is slower than RCF, while much faster than gPb etc.

Fig. 10 shows the contour detection results by all the evalu-

ted algorithms. We note that both the gPb and SRFL methods

re able to extract object contours with fewer spurious edges. In

ddition, our method capture more contours and details than the

Pb method. Fig. 10 (a) shows the gPb misses some windows of the

uilding and the contours in roof is not clear or sharp as those

enerated by the proposed method. The results of the Canny de-

ectors without non-maximum suppression and no hysteresis are

oisy (consistent with results shown in Arbelaez et al. [5] ) as the

aussian kernel width is set based on ODS and OIS. The results of

3 SS-DBM-RF in fact is the method presented in the conference version of this

aper [10] except we use multi features here not only basic six image features,

hereby it is titled as SSDBM for shot.

Page 11: Contour detection via stacking random forest learningstatic.tongtianta.site/paper_pdf/9acd2b1e-e196-11e8-8224-00163e08bb86.pdfContour detection via stacking random forest learning

2712 C. Zhang et al. / Neurocomputing 275 (2018) 2702–2715

Input Image

ground truth

SRFL

gPb

BEL

Canny

compass operator

texture inhibition

dcba

Fig. 12. Sample experimental results on the Weizmann horse database.

Page 12: Contour detection via stacking random forest learningstatic.tongtianta.site/paper_pdf/9acd2b1e-e196-11e8-8224-00163e08bb86.pdfContour detection via stacking random forest learning

C. Zhang et al. / Neurocomputing 275 (2018) 2702–2715 2713

Input Image

ground truth

SRFL

gPb

BEL

Canny

compass operator

texture inhibition

dcba

Fig. 13. Visual experimental results on the Weizmann segmentation database.

Page 13: Contour detection via stacking random forest learningstatic.tongtianta.site/paper_pdf/9acd2b1e-e196-11e8-8224-00163e08bb86.pdfContour detection via stacking random forest learning

2714 C. Zhang et al. / Neurocomputing 275 (2018) 2702–2715

R

the compass filter are based on the code provided by Ruzon and

Tomasi [12] with default parameters.

We also carry out experiments on the Weizmann horse

database (WHD) [6] and the Weizmann segmentation database

(WSD) [7] . The Weizmann horse database contains 328 side-view

color images with manually segmented results. It contains horses

of different breed, color, and size in various scenes. The Weizmann

segmentation database contains 200 color images with manual

segmentation results by several subjects. The images in this data

set contain only one or two salient objects with relatively simple

backgrounds. The foreground objects differ significantly from the

background either by intensity, texture, or other low level cues.

We note that all the parameters of the proposed SRFL algo-

rithm are fixed in the experiments on three data sets. The pro-

posed algorithm is evaluated against the gPb [5] , compass operator

[12] and BEL [14] methods. Although these databases are devel-

oped for segmentation evaluation, we extract contours using the

same approach as the BSDS500 [5] to compute F-measures and

precision-recall curves. In the Weizmann segmentation datasets,

the images are in different sizes. Some are small but some are very

large. The gPb method requires huge memory and cost much time

to detect contours. In comparison with gPb, our method requires

less memory and less time. Our method can handle big pictures.

Table 4 shows the F-measure and average precision by the SRFL

algorithm are greater than those by the gPb [5] , compass opera-

tor [12] and BEL [14] methods. Fig. 11 shows the precision-recall

curves of SRFL are significantly better than those by the gPb, com-

pass operator and BEL methods.

In addition, Figs. 12 and 13 show the extracted contours by the

proposed SRFL algorithm are visually better than those extracted

by the other two methods. The extracted contours show that the

proposed SRFL algorithm performs better than other methods with

more details and less noise. Similar to the results with Weiz-

mann horse data set, the extracted contours from this segmen-

tation database show that the proposed SRFL algorithm performs

better than other methods with sharper details and fewer spurious

edges.

7. Conclusion

In this paper, we propose an efficient and effective algorithm

for contour detection based on a stacking random forest learning

framework on features mapped from multi-scale local and global

image features. The model parameters of the feature space are

learned from the posterior divergence of the log likelihood of a

Gaussian mixtures and sufficient statistics based on deep Boltz-

mann machine incrementally. We use posterior divergence based

Gaussian mixtures model and sufficient statistics based on deep

Boltzmann machine to exploit more information rather than use

raw features with random forest classifier directly. The proposed

approach is evaluated qualitatively and quantitatively on three

benchmark data sets against several state-of-art methods. Exper-

imental results demonstrate the proposed algorithm performs fa-

vorably against leading methods for contour detection. Our future

work includes more effective contour completion algorithms. In

addition, we will develop efficient algorithms for object recogni-

tion based on contours.

Acknowledgment

This research is sponsored by National Natural Science Foun-

dation of China (Nos. 61571049 , 61601033 , 61401029 , 11401028 ,

61472044 ). The authors are thankful to the anonymous reviewers

for valuable discussion and feedback.

eferences

[1] G. Papari , N. Petkov , Edge and line oriented contour detection: state of the art,

Image Vis Comput 29 (2–3) (2011) 79–103 .

[2] X. Bai , L.J. Latecki , W.-Y. Liu , Skeleton pruning by contour partitioning withdiscrete curve evolution, IEEE Trans. Pattern Anal. Mach. Intell. 29 (3) (2007) .

[3] W. Shen , X. Wang , C. Yao , X. Bai , Shape recognition by combining contour andskeleton into a mid-level representation, in: Proceedings of the Chinese Con-

ference on Pattern Recognition, Springer, 2014, pp. 391–400 . [4] A.I. Muhammad Adeel Waris , M. Gabbouj , CNN-based edge filtering for object

proposals, Neurocomputing 266 (2017) 631–640 .

[5] P. Arbelaez , M. Maire , C. Fowlkes , J. Malik , Contour detection and hierarchi-cal image segmentation, IEEE Trans Pattern Anal Mach Intell 33 (5) (2011)

898–916 . [6] E. Borenstein , S. Ullman , Learning to segment, Computer Vision - ECCV (2004)

315–328 . [7] S. Alpert , M. Galun , A. Brandt , R. Basri , Image segmentation by probabilistic

bottom-up aggregation and cue integration, IEEE Trans Pattern Anal Mach In-tell 34 (2) (2012) 315–327 .

[8] C. Zhang , X. Ruan , Y. Zhao , M.-H. Yang , Contour detection via random forest, in:

Proceedings of the Twenty-First International Conference on Pattern Recogni-tion (ICPR), IEEE, 2012, pp. 2772–2775 .

[9] C. Zhang , X. Li , X. Ruan , Y. Zhao , M.-H. Yang , Discriminative generative contourdetection., in: Proceedings of the British Machine Vision Conference, Citeseer,

2013 . [10] C. Zhang , X. Li , J. Yan , S. Qui , Y. Wang , C. Tian , Y. Zhao , Sufficient statistics

feature mapping over deep Boltzmann machine for detection, in: Proceedings

of the Twenty-Second International Conference on Pattern Recognition (ICPR),IEEE, 2014, pp. 827–832 .

[11] J. Canny , A computational approach to edge detection, IEEE Trans Pattern AnalMach Intell 8 (6) (1986) 679–698 .

[12] M.A. Ruzon , C. Tomasi , Color edge detection with the compass operator, in:IEEE Computer Society Conference on Computer Vision and Pattern Recogni-

tion, 2, 1999, pp. 2160–2166 .

[13] D.R. Martin , C.C. Fowlkes , J. Malik , Learning to detect natural image boundariesusing local brightness, color, and texture cues., IEEE Trans Pattern Anal Mach

Intell 26 (1) (2004) 530–549 . [14] P. Dollar , Z. Tu , S. Belongie , Supervised learning of edges and object bound-

aries, in: Proceedings of the Computer Vision and Pattern Recognition, 2006 . [15] M. Maire , P. Arbelaez , C.C. Fowlkes , J. Malik , Using contours to detect and lo-

calize junctions in natural images, in: Proceedings of the Computer Vision and

Pattern Recognition, 2008 . [16] G. Papari , N. Petkov , An improved model for surround suppression by steerable

filters and multilevel inhibition with application to contour detection., PatternRecognit. 44 (9) (2011) 1999–2007 .

[17] C. Yao , W. Shen , X. Bai , W. Liu , Class-specific object contour detection by it-eratively combining context information, in: Proceedings of the Eight Inter-

national Conference on Information, Communications and Signal Processing

(ICICS), IEEE, 2011, pp. 1–5 . [18] P. Dollár , C.L. Zitnick , Structured forests for fast edge detection, in: Proceedings

of the IEEE International Conference on Computer Vision, 2013, pp. 1841–1848 .[19] P. Dollár , C.L. Zitnick , Fast edge detection using structured forests, IEEE Trans.

Pattern Anal. Mach. Intell. 37 (8) (2015) 1558–1570 . [20] A. Krizhevsky , I. Sutskever , G.E. Hinton , ImageNet classification with deep con-

volutional neural networks, in: Proceedings of the Advances in Neural Infor-

mation Processing Systems, 2012, pp. 1097–1105 . [21] H. Xu , J. Yan , N. Persson , W. Lin , H. Zha , Fractal dimension invariant filtering

and its CNN-based implementation, in: 2017 IEEE Conference on Computer Vi-sion and Pattern Recognition (CVPR), 2017, pp. 3825–3833 .

[22] G. Bertasius , J. Shi , L. Torresani , DeepEdge: a multi-scale bifurcated deep net-work for top-down contour detection, in: Proceedings of the IEEE Conference

on Computer Vision and Pattern Recognition, 2015, pp. 4380–4389 . [23] W. Shen , X. Wang , Y. Wang , X. Bai , Z. Zhang , DeepContour: a deep convolu-

tional feature learned by positive-sharing loss for contour detection, in: Pro-

ceedings of the IEEE Conference on Computer Vision and Pattern Recognition,2015, pp. 3982–3991 .

[24] S. Xie , Z. Tu , Holistically-nested edge detection, in: Proceedings of the IEEEInternational Conference on Computer Vision, 2015, pp. 1395–1403 .

[25] Y. Liu , M.-M. Cheng , X. Hu , K. Wang , X. Bai , Richer Convolutional Features forEdge Detection, The IEEE Conference on Computer Vision and Pattern Recog-

nition (CVPR), 2017 .

[26] X. Wang , X. Duan , X. Bai , Deep sketch feature for cross-domain image retrieval,Neurocomputing 207 (2016) 387–397 .

[27] B. Yang , X. Zhang , L. Chen , H. Yang , Z. Gao , Edge guided salient object detec-tion, Neurocomputing 221 (2017) 60–71 .

[28] Y. Rubner , C. Tomasi , L. Guibas , A metric for distributions with applications toimage databases, in: Sixth International Conference on Computer Vision, 1998,

pp. 59–66 .

[29] T. Lindeberg , Scale-Space Theory in Computer Vision, Springer, 1993 . [30] M.-M. Cheng , G.-X. Zhang , N.J. Mitra , X. Huang , S.-M. Hu , Global contrast based

salient region detection, IEEE Trans Pattern Anal Mach Intell (2011) 409–416 . [31] X. Bai , S. Bai , Z. Zhu , L.J. Latecki , 3d shape matching via two layer coding, IEEE

Trans. Pattern Anal. Mach. Intell. 37 (12) (2015) 2361–2373 . [32] Y. Zhao , J. Zhao , J. Yang , Y. Liu , Y. Zhao , Y. Zheng , L. Xia , Y. Wang , Saliency

driven vasculature segmentation with infinite perimeter active contour model,

Neurocomputing 259 (2017) 201–209 .

Page 14: Contour detection via stacking random forest learningstatic.tongtianta.site/paper_pdf/9acd2b1e-e196-11e8-8224-00163e08bb86.pdfContour detection via stacking random forest learning

C. Zhang et al. / Neurocomputing 275 (2018) 2702–2715 2715

[

[

[

[

[

[

[[

[

l

j

T

[33] R. Achanta , K. Smith , A. Lucchi , P. Fua , S. Süsstrunk , SLIC Superpixels, TechnicalReport No. 149300, EPFL, 2010 .

34] G.E. Hinton , S. Osindero , Y.-W. Teh , A fast learning algorithm for deep beliefnets, Neural Comput. 18 (7) (2006) 1527–1554 .

[35] N. Srivastava , R.R. Salakhutdinov , Multimodal learning with deep Boltzmannmachines, in: Proceedings of the Advances in Neural Information Processing

Systems, 2012, pp. 2222–2230 . 36] Y. LeCun , Y. Bengio , G. Hinton , Deep learning, Nature 521 (7553) (2015)

436–4 4 4 .

[37] X. Li , T.S. Lee , Y. Liu , Hybrid generative-discriminative classification using pos-terior divergence, in: Proceeding CVPR ’11 Proceedings of the 2011 IEEE Con-

ference on Computer Vision and Pattern Recognition, 2011, pp. 2713–2720 . 38] M.I. Jordan , Z. Ghahramani , T.S. Jaakkola , L.K. Sail , An introduction to vari-

ational methods for graphical models, in: Machine Learning, vol. 37, 1999,pp. 183–233 .

39] P. Smolensky , Information processing in dynamical systems: foundations of

harmony theory, in: Parallel Distributed Processing, Explorations in the Mi-crostructure of Cognition: Foundations, vol. 1, MIT Press, 1986, pp. 194–281 .

40] R. Salakhutdinov , G. Hinton , Deep Boltzmann machines, in: Proceedings of theInternational Conference on Artificial Intelligence and Statistics, vol. 5, 2009,

pp. 448–455 . [41] X. Li , B. Wang , Y. Liu , T.S. Lee , Learning discriminative sufficient statistics score

space for classification, in: Proceedings of the ECML/PKDD, vol. 8190, 2013,

pp. 49–64 . 42] R. Neal , G. Hinton , A view of the EM algorithm that justifies incremental,

sparse, and other variants, in: Learning in Graphical Models, Springer, Dor-drecht, 1999, pp. 355–368 .

43] L. Breiman , Random forests., Machine Learn. 45 (1) (2001) 5–32 . 44] L.E. Raileanu , K. Stoffel , Theoretical comparison between the Gini index and

information gain criteria, Ann. Math. Artif. Intell. 41 (1) (2004) 77–93 .

45] Z.-H. Zhou, J. Feng, Deep Forest: Towards An Alternative to Deep Neural Net-works, in: Proceedings of the Twenty-Sixth International Joint Conference on

Artificial Intelligence, IJCAI-17, 2017, pp. 3553–3559, doi: 10.24963/ijcai.2017/497 .

Chao Zhang is currently pursuing the Ph.D. degree with

the College of Information Science and Technology, Bei-jing Normal University, Beijing, China. He received the

M.S. and B.E. degree from the Department of Automa-tion, Shanghai Jiao Tong University, Shanghai, China. His

research interests are in computer vision and machinelearning.

Junchi Yan is with Shanghai Key Laboratory of Trustwor-

thy Computing, and School of Computer Science and Soft-ware Engineering, East China Normal University, and IBM

Research – China. He obtained the Ph.D. from Shanghai

Jiao Tong University. He has been entitled as IBM Mas-ter Inventor, and the recipient of IBM Research Division

Award, China Computer Federation Doctoral Dissertation Award, and the ACM China Doctoral Dissertation Nomi-

nation Award. His research interests are computer visionand machine learning. He is a member of IEEE, and also

ACM.

Changsheng Li is a full research professor from Universityof Electronic Science and Technology of China (UESTC).

Before that, he was an algorithm expert from iDST, Al-ibaba Group, and a Research Scientist from IBM Research

– China. He received his B.E. degree from the University

of Electronic Science and Technology of China (UESTC) in2008, and his Ph.D. degree in pattern recognition and in-

telligent system from the Institute of Automation, Chi-nese Academy of Sciences in 2013. He also studied as

a research assistant in The Hong Kong Polytechnic Uni-versity in 2009-2010. He joined IBM Research – China

in 2013. He received the IBM Research AccomplishmentAward in 2015. His research interests include machine

earning, data mining. Dr. Li has more than 30 refereed publications in international

ournals and conferences, including CVPR, AAAI, IJCAI, CIKM, MM, ICMR, TNNLS, TIP,C, PR, etc.

Rongfang Bie is currently a Professor at the College of In-formation Science and Technology of Beijing Normal Uni-

versity where She received her M.S. degree on June 1993and Ph.D. degree on June 1996. She was with the Com-

puter Laboratory at the University of Cambridge as a vis-

iting faculty from March 2003 for one year. She is the au-thor or co-author of more than 100 papers. Her current

research interests include knowledge representation and acquisition for the Internet of Things, dynamic spectrum

allocation, big data analysis and application etc.