Pattern Recognition Letters › files › 2018 › 03 › Improved...16 C.-S. Yang, Y.-H. Yang /...

Pattern Recognition Letters 100 (2017) 14–21

Contents lists available at ScienceDirect

Pattern Recognition Letters

journal homepage: www.elsevier.com/locate/patrec

Improved local binary pattern for real scene optical character

recognition

Chu-Sing Yang, Yung-Hsuan Yang

∗

Department of Electrical Engineering, National Cheng Kung University, No.1, University Road, Tainan City 701, Taiwan, ROC

a r t i c l e i n f o

Article history:

Received 10 December 2016

Available online 8 August 2017

Keywords:

Local binary pattern

Optical character recognition

Edge descriptor

Integral image

Recognition speed

a b s t r a c t

A strong edge descriptor is an important topic in a wide range of applications. Local binary pattern (LBP)

techniques have been applied to numerous fields and are invariant with respect to luminance and ro-

tation. However, the performance of LBP for optical character recognition is not as good as expected. In

this study, we propose a robust edge descriptor called improved LBP (ILBP), which is designed for optical

character recognition. ILBP overcomes the noise problems observed in the original LBP by searching over

scale space, which is implemented using an integral image with a reduced number of features to achieve

recognition speed. In experiments, we evaluated ILBP’s performance on the ICDAR03, chars74K, IIIT5K,

and Bib digital databases. The results show that ILBP is more robust to blur and noise than LBP.

© 2017 Elsevier B.V. All rights reserved.

v

m

r

a

o

t

t

T

t

l

[

i

s

b

l

a

t

t

a

L

i

L

1. Introduction

With the increase in the number of portable cameras, smart

phones, and surveillance cameras, the details in their images have

considerably increased. As a result, the information in these im-

age details cannot be analyzed by humans. Thus, efficiently and

automatically analyzing this information is an important challenge

faced by image processing and pattern recognition. This informa-

tion includes image objects, text information, and characters. Ex-

tracting and understanding text in images have numerous applica-

tions in daily life such as helping vision-impaired people under-

stand text information, automatic document generation, or even

in surveillance systems. Therefore, we have mainly conducted re-

search on the detection and recognition of character information.

The general method for recognizing text in an image is to ex-

tract features from the image and classify the characters by feed-

ing them into a machine learning algorithm such as a neural net-

work (NN) [1] , support vector machine (SVM) [2,3] , or Adaboost

[4] . Alternatively, simpler algorithms such as the K-nearest neigh-

bors (K-NN) [5] or mean shift [6] can be used. Recently, as the

bag-of-words model [7] has gained considerable attention, many

methods have been proposed to divide a character into smaller

image “words” such as the end of stroke, curved stroke, or cross

stroke. These smaller words improve the recognition rate and are

∗ Corresponding author at: Computer and Communication Engineering, No. 3,

Fengnian St., Sanmin Dist., Ka, Ohsiung City 807, Taiwan (R.O.C.), Kaohsuing 807,

Taiwan.

E-mail addresses: [email protected] (C.-S. Yang),

[email protected] (Y.-H. Yang).

T

e

H

d

w

m

http://dx.doi.org/10.1016/j.patrec.2017.08.005

0167-8655/© 2017 Elsevier B.V. All rights reserved.

ery successful. However, fundamentally, the performance of these

ethods depends on the robustness of the feature extraction algo-

ithm.

A robust texture descriptor must address numerous issues such

s glare, blur, low contrast, and reflective surfaces in natural scene

ptical character recognition (OCR). Hence, the design of a robust

ext descriptor has become a popular topic in recognition sys-

ems, even for local descriptors of the bag-of-words (BOW) model.

here are several fully developed descriptor methods such as his-

ograms of Gaussian (HOG) [8] , local binary pattern (LBP) [9] , Harr-

ike descriptors [10] , dense [11] , BRIEF [12] , SIFT [13,14] , and SURF

15] descriptors. These methods have different advantages depend-

ng on the scene. For example, the HOG descriptor is more sen-

itive to edge changes, the Harr-like descriptor is more robust in

lurred regions, and LBP and BRIEF overcome low contrast prob-

ems. However, while recognizing a character image in complex

nd varied real environments, there are still numerous problems

o overcome.

The original LBP algorithm has many benefits with respect to

exture recognition, such as luminance invariance, rotation invari-

nce, easy implementation, and low computational cost. Moreover,

BP, both with and without rotation invariance, is very successful

n the field of face recognition [16] . However, in our experience,

BP’s performance for real scene OCR is worse than that of HOG.

his is because there is a large amount of hardware noise, blurred

dges, and other interfering factors in real-scene character images.

ence, in this study, we propose an improved LBP (ILBP) texture

escriptor, which is modified LBP without rotation invariance and

ith concepts from HOG added. There are four main modified

ethods in our implementation: creation of an LBP by averaging


http://www.ScienceDirect.com

http://www.elsevier.com/locate/patrec

http://crossmark.crossref.org/dialog/?doi=10.1016/j.patrec.2017.08.005&domain=pdf

mailto:[email protected]

mailto:[email protected]


C.-S. Yang, Y.-H. Yang / Pattern Recognition Letters 100 (2017) 14–21 15

Original image LBP over scale space

FirstSelection

Maximum Selection

Concat as ILBP FeatureBin reduced LBP Histogram into Cells

Group into blocks and normalize

Fig. 1. Flow chart of ILBP.

p

t

a

t

i

n

s

t

W

t

f

s

O

a

L

S

f

t

s

2

fi

m

o

c

b

s

t

a

l

o

t

o

s

t

w

t

i

i

l

p

p

a

r

L

c

t

L

Fig. 2. Current pixel and its neighbor pixels for P = 8 and R = 1.

Table 1

Recognition rate of LBP u 2 8 , 1 and, LBP riu 2 8 , 1 , and HOG 8, 2 .

Methods IIIT5K ICDAR03_ch2 Chars74k_15 Bib_Digits

LBP u 2 8 , 1 64 .42% 65 .83% 51 .56% 58 .09%

LBP riu 2 8 , 1 36 .70% 34 .70% 20 .49% 23 .02%

HOG 8, 2 81 .6% 75 .9% 66 .2% 71 .7%

s

c

d

r

t

a

s

a

(

o

r

a

s

e

p

f

a

a

t

a

i

s

s

c

e

r

t

s

n

L

t

o

ixel intensity via integral image, reduction of the number of his-

ogram bins, creation of an LBP over scale space, and using a blocks

nd cells concept from HOG. In the bin reduction step, we reduce

he number of LBP histogram bins and add a magnitude map sim-

lar to that of HOG; thus, each pixel can express an integral mag-

itude and direction information, which we called edge type. The

econd step is to create a bin-reduced LBP over scale space, and

he third step is to determine a meaningful scale in scale space.

e use the concept from MLBP [17] and SIFT that uses average in-

ensity instead of single pixel intensity comparisons in LBP and dif-

erent kernels between the center and neighbor pixels to achieve

cale invariance. Fig. 1 shows the flow chart of ILBP.

In Section 2 , we discuss related works, which are state-of-art

CR methods, comprise the original LBP, its variants, integral im-

ges, and scale space. Our main method, including scale space

BP, bin reduction, and scale selection, is explained in Section 3 .

ection 4 describes our experiment and recognition rate results for

our letter datasets, IIIT5k [18] , ICDAR03 challenge2 [19] , chars74K

ask 15 [20] , and Bib digital dataset [21] . Section 4 concludes the

tudy.

. Related works

Since visual word and BOW models have been popular in the

eld of image object recognition and matching, many of these

ethods have been presented. The method in [22] addresses ge-

metric blur in OCR applications, where the basic idea is to match

haracters by local descriptors, and Wang et al. [23] proposed OCR

y invariant feature matching. Furthermore, Gao et al. [24] pre-

ented a model to relate local strokes in a character using spa-

iality embedded dictionary (SED). The SED procedure associates

code-word, a feature based on a blocked HOG, with a particular

ocation and then builds a dictionary to model the co-occurrence

f several local strokes. Shi et al. [25] also presented a part-base

ree structure to model each character’s visual word (i.e., the end

f a stroke, line, curve, or cross) and use a tree structure model to

earch for each part’s score and add it to the global information in

he recognition stage. Although visual word-based OCR performs

ell, it depends on the robustness of the texture or edge descrip-

or. Hence, the design of a robust edge descriptor is an important

ssue. In our survey of the literature, the most common descriptor

s HOG because it can overcome blur with its cell histogram and

ow contrast by block normalization. Moreover, we note that LBP

erforms well on face and texture recognition. However, our ex-

eriments show that LBP cannot overcome noise in real scene OCR

nd loses spatial information as a result of its rotation invariance.

There are many advantages of the simple LBP algorithm with

otation invariance in the field of texture recognition. The original

BP emerged from a simple concept: a comparison between the

urrent pixel and its neighbors within a constant radius and direc-

ion as shown below ( Fig. 2 ):

BP u 2 P,R =

P−1 ∑

i =0

f ( p i − p c ) 2

i (1)

f ( x ) =

{1 , i f x ≥ 0

0 , i f x < 0

(2)

The authors also defined parameter U to be the number of tran-

itions between 0 and 1 [26] . Moreover, this study proved that the

urrent pixel is meaningless when u ≥ 3. So, in the Eq. (1 ), u 2 is

enoted as remove transition times, which are greater than 2. The

otation invariant LBP is denoted as LBP ri _ u 2 P,R

.

However, in real scene OCR, the performance of LBP ri _ u 2 P,R

is lower

han that of LBP u 2 P,R

because it loses edge information between cells

fter rotation invariance is implemented. Table 1 compares the real

cene OCR results of the original LBP algorithm with a cell size of 8

nd block size of 2 ( LBP u 2 P,R

) (the same LBP with rotation invariance

LBP ri _ u 2 P,R

)), and the HOG algorithm with a cell size of 8, block size

f 2, and unsigned orientation bins in 9 directions ( HOG 8, 2 ). These

esults are case-insensitive because it is difficult to distinguish “x”

nd “X” in real scene OCR from a single letter image. The results

how that HOG’s performance is better than the original LBP in

ach dataset. However, there are many extended LBP methods ap-

lied in numerous fields.

In our estimation, to design a descriptor for real scene OCR, the

ollowing issues must be addressed: noise, the number of features,

bility to describe edges, low contrast regions, reflective surfaces,

nd unbalanced edge pairs of a stroke. Noise is unavoidable; thus

here are many methods that use scale space to smooth the im-

ge for more stability. The ability to describe edges is the most

mportant issue because the entire algorithm is based on a de-

cription of edges or the difference between a light and dark area

uch as HOG, Haar-like, dense, and LBP methods. The most un-

ontrollable problems are reflection, low contrast, and unbalanced

dges. These problems cannot be controlled when we capture a

eal scene image with a low-cost device. Therefore, each descrip-

or can overcome two or more issues. HOG performs well with re-

pect to lower feature numbers, good edge description, and robust-

ess to unbalanced stroke type because of its unsigned orientation.

BP can handle low contrast images, and Haar-like and dense fea-

ures can handle blurred conditions. Many variants of these meth-

ds have been proposed to overcome various issues.

16 C.-S. Yang, Y.-H. Yang / Pattern Recognition Letters 100 (2017) 14–21

,

,,,

,

, , ,

,r=2

Fig. 3. Schematic of LBP 1 (left), and LBP 2 with first neighbor pixel (right).

w

m

o

L

t

3

b

t

s

w

p

i

(

n

L

r

t

s

c

t

t

2

r

a

w

t

o

a

t

s

s

R

i

G

c

s

t

i

a

T

To overcome the fluctuation of luminance, low contrast, and

noise, the local ternary pattern (LTP) [26] was proposed. It consid-

ers smaller intensity differences to create two state LBPs, a lower

and upper pattern, and then combines them into a feature de-

scriptor. A series of LBPs [27–29] was also proposed to overcome

the noise problem. The MLBP [17] method uses the average of a

3 × 3 area instead of a single pixel to determine small intensity

differences. Multi-block LBP (MBLBP) [30,31] has a similar concept,

which is to combine the average of a local block instead of sin-

gle pixel’s intensity. Ref. [32] presented a different approach to ad-

dressing noise by using a scale space LBP feature. In our previ-

ous study, we also proposed using average intensity and selected

a meaningful edge type over scale space. The method gives LBP

scale invariance, but it still cannot solve the information loss af-

ter rotation invariance. Thus, we concluded that rotation invariance

is not useful in real scene OCR. Hence, in this paper, we modified

LBP without rotation invariance to achieve more stability. The over-

complete LBP [33] considers the overlap of adjacent blocks with

different directions and modes to carry more information between

cells. However, the higher number of features in these variants of

LBP lead to higher training and recognition costs. Thus, there are

methods to reduce the number or LBP features. Center-symmetric

LBP (CS-LBP) [34] compares pairs of neighbors to lower the num-

ber of edge types and keep the same information as the original

LBP. Its extended method, XCS-LBP, [35] compares the current pixel

and two distance neighbors to increase stability and use fewer fea-

tures. Robust LBP (RLBP) [36] uses a different way to reduce the

number of bins by binary code equivalence. In general, RLBP has

30 bins, and CS-LBP has 14 bins. Lower numbers of bins will save

computation cost during recognition.

Numerous methods exist to enhance the ability of LBP edge

description. One of them is to increase the number of neighbors

and variant radius to obtain more texture information. Extended-

LBP (ELBP) [37,38] generates multiple LBP codes to describe the

current pixel state using more information. Complete-LBP (CLBP)

[39] presents a similar idea with sign and gray-value differences,

called intensity differences, to improve the current pixel’s discrim-

inative power. And in discriminative robust-LBP (DRLBP) [40] en-

hances the edge description ability by combining RLBP and the dif-

ference of LBP, which comprises the complete bins of RLBP. The au-

thors of [40] also presented a new concept to combine edge type

and magnitude information. Instead of per-pixel integer counts,

the histogram of DRLBP uses gradient magnitude ω x,y =

√

I 2 x + I 2 y ,

where I 2 x and I 2 y are the first-order derivatives in the x and y direc-

tions. DRLBP increases the edge information of a single pixel and

makes LBP more like HOG. Hence, it is becoming common to let a

single pixel carry more information. In our proposed method, we

also applied this concept by different way.

Although there are numerous ways to improve LBP’s perfor-

mance, not all the methods are suitable for real-scene OCR or a

local edge descriptor and some methods improve texture recogni-

tion. For example, in our experiments, LTP is not suitable for OCR.

In Chars74k_15, LTP with zero threshold obtains the same recog-

nition as the original LBP, but the recognition rate decreases when

the threshold is increased.

Thus, we conclude that a strong texture descriptor will have the

following four characteristics: 1. Robustness to noise. 2. Tolerance

to unbalanced luminance. 3. Fewer feature dimensions. 4. More in-

formation per pixel.

3. Main method

We propose ILBP to create an LBP method with the character-

istics mentioned in Section 2 by searching a scale space to real-

ize meaningful edge information, reducing feature numbers with

two procedures, and obtaining more information per pixel. In brief,

e use a magnitude map to carry more information; however, the

agnitude calculation for the current pixel is different from that

f HOG and DRLBP. In this section, we explain how to create the

BP over scale space using an integral image. We then explain how

o reduce number of bins and select the scale space method.

.1. Scale space LBP

To overcome noise problems in LBP, we use an image blurred

y integration to create scale-space LBP. First, we define the in-

egral image as II ( p ), where p is image pixel, and the image was

moothed by an m × m box filter as II m

( p ). Because, in this paper,

e only use eight neighbors and a variable radius with one other

arameter, we call the LBP method that uses eight neighbors orig-

nal LBP. We next define LBP s ( x, y ) as a comparison of center pixel

x, y) with its neighbor pixels using the following smoothing ker-

el.

B P s ( p ) =

7 ∑

i =0

s (I I 2 s −1 ( p i,s ) − I I 2 s +1 ( c ) ) 2

i (3)

Note that s is always greater than or equal to one. We then

edefine its neighbor as p i, r . Index i is the position relative to

he current pixel, and r means the distance from the center po-

ition. The value of r is an integer obtained by rounding the Eu-

lidean distance. The current pixel’s intensity is decided by taking

he mean of a rectangle area with size 2 s + 1 , and its neighbors’ in-

ensities are also smoothed with a smaller rectangle with a size of

s − 1 . When s = 1, the intensity of p c is the mean of a 3 × 3 pixel

ectangle area and the neighbor pixels are a single pixel intensity,

s shown in Fig. 3 (right). Fig. 3 (left) shows the case for s = 2,

here p c is the mean of a 5 × 5 pixel rectangle area and p 0, 2 is

he average of a 3 × 3 pixel rectangle area with a radius of 2. In

ur experiment, variant radii and smooth sizes between the center

nd its neighbor have a better recognition rate. Also, we proved

he same using MLBP and ILBP in each dataset. Thus parameter r

tarts from 1 and is proportional to parameter s.

We also calculate the approximate response when the scale

pace LBP is created, initializing the prediction equation as follows.

( p c ) = | 7 ∑

i =0

( G ( p i , σ1 ) ) − G ( p c , σ2 ) ) | (4)

Here, R ( p c ) usually uses G ( p ), a discrete version of the Laplaca-

an of Gaussian (LOG), calculated as follows.

( p ) =

1

2 πσ 2 e

−( p x 2 + p y 2 ) 2 σ2 (5)

In Lowe’s work [14] , they use an approximate method to cal-

ulate the extreme to detect the fitness of a blob by comparing

cale and spatial space. In our application, we use the response

o find the fitness of an edge type at the current pixel by search-

ng scale space at maximum selection stage. As mentioned before,

key point in real scene edge descriptor is robustness to noise.

hus, there are three possible conditions between σ and σ when
1 2


t

n

w

(

c

p

σ

t

c

b

a

R

σ

T

3

r

u

R

d

a

t

i

(

b

n

t

r

w

F

p

b

p

3

F

t

m

f

a

d

H

t

a

a

b

m

v

r

1

2 3 4 5 6 7 8 9

10 13 14 15 16 17

18 19 20 21 22 23 24

21

2 3 4 5 6 7 89

25

26

22

1514131211101716

23 24 25 18 19 20

29282726292827

30

11 12

0 else,when > 2

M\O

0

1

2

3

4

5

6

7

8

1 2 3 4 5 6 7

Fig. 4. All edge types in first number of bin reducing. The edge type fold along the

dotted line.

2 3 4 5 6 7 8 9

1 1

10 u>3

Fig. 5. All edge types after the second reduction.

2,1 2,2 3,1 3,2

Fig. 6. Edge type and magnitude. Each number pair indicates the edge type and

magnitude.

α

{

a

c

t

e

F

3

p

m

c

L

h

i

b

i

b

s

n

m

he current pixel value is noise. The first condition emphasizes the

oise response when σ 1 > σ 2 because the peak of center G ( p c , σ 2 )

ill have a higher value than its neighbor. The second condition

σ1 = σ2 ) performs like the LOG operator. However, in the third

ondition ( σ 1 < σ 2 ), we can obtain a smoother kernel with lower

eaks and wider Gaussian RMSs. Thus in our application, we use

1 < σ 2 with linear growth.

To reduce the computational cost and fit LBP based on the in-

egral image, we use a box filter instead of a Gaussian filter. By the

entral limit theory, a box filter is an approximation of Gaussian

lur, so we can rewrite Eq. (4 ) as follows when s is 1, σ1 = 2 s − 1 ,

nd σ2 = 2 s + 1 .

s ( p c ) = | 7 ∑

i =0

(I I 2 s −1 ( p i,s ) − I I 2 s +1 ( p c ) ) | (6)

We select a neighbor pixel using discrete distances and keep

1 < σ 2 to overcome the noise problem when s is increased.

herefore, Eq. (6 ) is a discrete version of LOG .

.2. Reducing number of bins in LBP

The number of bins in RLBP with a uniform pattern is 30. This

educes the number of bins calculated using Eq. (7 ). However, we

se the HOG concept of unsigned orientation.

LBP ( x, y ) = min

{LBP ( x, y ) , 2

8 − 1 − LBP ( x, y ) }

(7)

The characters in a natural scene are randomly brighter or

arker than the background. For example, an image may have

stroke that transitions from a lighter to a darker area. Thus,

he performance of unsigned HOG is better than signed HOG

n each dataset in our experiments. Considering this, we regard

0 0 0 0 0111) 2 as (111110 0 0) 2 so we can reduce the LBP feature bins

y considering the two situations as one edge type. For conve-

ience, M denotes the number of ones in ( LBP s ) 2 and O is a rota-

ion left with repeated bits; thus, rLBP u 2 can be written as follows.

LB P s =

⎧ ⎨

⎩

1 , i f O = 8 or O = 0

255 − LB P s , else i f ( M ≥ 4 and O ≥ 4 ) or O > 4

LB P s , other wise

(8)

here rLBP s is the first reduction in the number of bins of LBP .

urthermore, rLBP u 2 s denotes the image after the uniform pattern

rocess. In general, the range of LBP u 2 s is 1–59, which is com-

ined with M = 0 , 7 × 8 ( M × O ), where M = 8 , and U > 3. In our

roposed method, the number of bins after the first reduction is

1, which is a combination of (0 0 0 0 0 0 0 0) 2 , 28 bins (as shown in

ig. 4 ), (11111111) 2 , and u > 2. And Fig. 4 also shows the folds of

he edge types in our proposed method.

In the original LBP, each pixel presents a single piece of infor-

ation because one pixel is only counted once in a cell; there-

ore, we propose an improvement to reduce the number of bins

nd present more information per pixel. First, we continue to re-

uce the bins by their directions, which is an idea borrowed from

OG. In HOG, the performance of an unsigned orientation is bet-

er than a signed orientation for person detection, and we have

lso obtained the same result in character recognition on the IIIT5k

nd Chars74K_15 datasets. Therefore, we reduce the number of LBP

ins to 10, as shown in Fig. 5 . The edge types in rrLBP, which

eans the second bin reduction, is calculated as follows.

( x, y ) = rLBP u 2 s ( x, y ) − 2 (9)

r LBp u 2 s ( x, y ) =

{

1 , i f M = 0 or M = 8

0 , else i f U > 3

2 × mod ( v ( x, y ) , 4 ) + ( M ∩ 1 ) � 1 + 2 , other wise

(10)

( x, y ) =

1 , i f O = 0 or 8 ⌊ v ( x, y )

2

⌋

+ 1 , otherwise (11)

The physical meaning of the magnitude map is the length from

n edge to the current pixel. Fig. 5 shows the edge types and

orresponding magnitudes; the first pattern in Fig. 6 shows edge

ype 2 and magnitude 1, and the second pattern shows the same

dge type with magnitude 2. The third and fourth patterns in

ig. 6 show the situations when M is even.

.3. Selection method over scale space

After r r LBP u 2 s is created over scale space, we use two ap-

roaches called “first section” and “maximum selection” to deter-

ine the meaningful scale. In the following, we explain the physi-

al meaning of our proposed selection method.

We first observe that there are 59 edge types in the original

BP u 2 8 , 1

, many of which are meaningless. This number is caused by

ardware noise and the wrong radius size for obtaining the best

nformation at the current pixel.

In Fig. 7 , the yellow box is at a smaller scale in LBP s and blue

ox is at a larger scale. The edge type in the yellow box is mean-

ngless, but when we increase the scale, the edge type in the

lue box is meaningful. Hence, we conclude that selecting the best

cale will improve LBP performance. In general, we use the first

on-zero value as the current pixel edge type, and we called this

ethod first selection. In the first selection method, the pixel will


Fig. 7. Shows two scales with different LBP s edge types.

Fig. 8. Two cases of first selection. The point close to edge will have higher mag-

nitude (left), and point far from the edge will have smaller magnitude (right).

i

H

r

c

0

B

t

r

d

s

t

f

i

t

o

c

f

b

4

e

p

o

m

d

I

m

p

t

p

t

p

t

w

t

e

w

c

t

m

d

a

M

m

o

R

a

I

h

I

o

t

s

t

e

m

s

s

s

i

n

have larger magnitude if the current point is very close to the edge

and vice versa. Fig. 8 shows a schematic of first selection and illus-

trates the situation we describe. We assign the pixel as edge type

10 if the pixel is all zeros over all scales.

Thus, the equation of F LBP u 2 S

( x, y ) can be written as follows.

F LBP u 2 S ( x, y ) =

{r r LBP u 2

i ( x, y ) , i is f irst none zero in rrLB P u 2

10 , i f ∀ i, r r LBP u 2 i ( x, y ) = 0

(12)

where S in F LBP u 2 S

( x, y ) is the scale space from 1 to s of

r r LBP u 2 s ( x, y ) .

The second method is called maximum selection, and it selects

the maximum response from Eq. (13 ) in scale space. As mentioned

above, the response is a discrete version of LOG. Hence, we use

the extreme response to reduce the effect of noise. Thus, the max-

imum selection method uses the absolute maximum response in

scale space, which we record when creating LBP over scale space.

We still assign edge type 10 to the current pixel if there are all

zeros over all scales. Therefore, we can write MLBP u 2 S

( x, y ) as the

following:

MLBP u 2 S ( x, y )

=

{r r LBP u 2

i ( x, y ) , i = arg max i ( R i ( x, y ) ) and r r LBP u 2 i

� = 0

10 , i f ∀ i, r r LBP u 2 i ( x, y ) = 0

(13)

Parameter s is the search scale space, is from 1 to s, and is used

to find the maximum response. ILBP combines two selection meth-

ods; thus, the number of features in ILBP is 20, which is a combi-

nation of the 10 features in first selection and the 10 features in

maximum selection. The final step is to create a histogram cell and

combine it into blocks. In this step, we use the L2-norm when the

block is created. Note that the histograms of first and maximum

selection have already been created. In our study, the rectangle box

filter performs better than or equal to the Gaussian filter in OCR.

Hence, we will discuss the performance when we replace rectangle

smoothing with a Gaussian filter and use the interpolated intensity

value when comparing neighbor intensities in Section 4 . A clipping

exists in HOG’s block normalization procedure, and the purpose of

this clipping is to avoid peaks in the histogram and compress it

into an acceptable size. Hence, we also tested the original algo-

rithm with clipping in HOG in FLBP and MLBP, but the results were

not as effective as expected. In ILBP, the magnitude is the integer

distance to the nearest edge, so the meaning of the clipping fac-

tor is different from that of HOG. First selection determines the

meaningful edge type around the edges, and maximum selection

gnores the noise to find the strongest edges at the current pixel.

ence, we use the clip factor after block normalization and do not

enormalize the blocks. Furthermore, first and maximum selection

an have different clip factors. In our experiment, the clip factor is

.3 and 0.4 when the cell size is 8 for OCR.

In Fig. 9 , the first row depicts the histogram of each method.

oth the maximum and first selection approaches obtain edge

ypes without including any meaningless edge types. In the second

ow, we show the histogram results for a blurred version of the

iscrete edge image (blurred by a Gaussian filter with σ = 2). First

election, original LBP, and DRLBP obtain results that are similar to

heir results for the discrete image. Maximum selection is more af-

ected by blur. We then add salt-and-pepper noise into the blurred

mage. The result of maximum selection for this image is similar

o that of the discrete image. In contrast, first selection, HOG, the

riginal LBP, and DRLBP are easily affected by noise. Therefore, we

ombine first selection and maximum selection to reduce the ef-

ect of both blur and noise, and the result shows that the com-

ined method is more accurate on each dataset.

. Experiment and results

In this section, we describe the experimental platform, then

valuate the relationship between scale and cell size, and finally

resent the results for OCR performance and the computation cost

f ILBP.

We implemented the original LBP, LTP, RLBP, CS-LBP, DRLBP,

ulti-scale LDP [41] , HOG, HOG-LBP [42] , Gabor-wavelet [43] ,

aisy [44] , Haar-like, local phase quantization (LPQ) [45] , Brief, and

LBP methods in a Windows 10 OS with i7-4720HQ on a 16 GB

emory laptop. For the software, we used MATLAB for the data

resentation and the core of each method was implemented using

he OpenCV library and C ++ code for performance. Each method’s

erformance was carefully verified, and we cut off the pixels on

he borders to avoid edge interference. For example, our HOG’s

erformance is better than that of MATLAB and OpenCV (76.5%

han 75.6% with a block size of 1 and cell size of 8). Moreover,

e used the same histogram procedure to include the cell’s his-

ogram and block the normalization function to guarantee that

ach method’s performance was compared fairly. For the classifier,

e used the K-NN method to classify each character. This method

an use multi-classifiers, but the recognition speed depends on

he feature extraction method and number of features. K-NN and

ulti-classifiers (NN, SVM, or Adaboost) have both advantages and

isadvantages, but we choose K-NN as our classifier because it is

n effective way to observe the robustness of each edge descriptor.

ore importantly, it is a standard baseline for testing the perfor-

ance of each method. In general, CS-LBP will have lowest number

f feature dimensions (4) , HOG has 9 features per cell, ILBP has 20,

LBP has 30 features per cell, original LBP without rotation invari-

nce has 59 features, and DRLBP has 60 features per cell. However,

LBP’s basic performance is affected by scale S, which determines

ow many LBPs we should to build and scan over the scale space.

n the same architecture, ILBP’s performance will be S times that

f LBP, but by using advance CUDA architecture, the computation

ime can be reduced to an acceptable time.

The most controversial parameter is S, which is the scale

pace’s size. This parameter will affect accuracy and the compu-

ational cost because we have to scan all the scale space. In our

xperiments, an S that is too big or too small will affect the maxi-

um selection. Thus, the best S depends on what we want to de-

cribe and the characteristics of the dataset. Similar to HOG, cell

ize also affects the accuracy and feature dimensions. Larger cell

izes will degrade accuracy but reduce the number of features and

ncrease recognition speed. Table 2 lists the case-insensitive recog-

ition rate with variable S and cell sizes in the IIIT5K dataset. Each


Fig. 9. Discrete edge test image. The first row, from left to right, depicts the test image and histograms of F LBP u 2 8 , MLBP u 2 8 , HOG , LBP u 2 8 , 1 , and DRLBP. The second row shows

each histogram result for a blurred version of the first image. The third row shows the histogram result after adding salt-and-pepper noise.

Table 2

ILBP recognition rate for variable scale and cell sizes on the IIIT5K.

Cell Size Scale (IIT5K)

4 6 8 10 12 14 16

4 79 .3% 80 .4% 80 .7% 80 .6% 80 .6% 80 .6% 80 .6%

6 80 .4% 82 .1% 81 .8% 81 .7% 81 .7% 81 .7% 81 .7%

8 81 .0% 82 .1% 82 .3% 82 .2% 82 .0% 82 .0% 81 .9%

10 79 .4% 81 .5% 81 .9% 82 .0% 80 .6% 80 .6% 81 .6%

12 78 .2% 80 .2% 80 .6% 80 .6% 80 .5% 80 .6% 80 .6%

14 76 .1% 75 .9% 78 .4% 78 .5% 78 .5% 78 .5% 78 .5%

16 74 .0% 76 .2% 76 .8% 77 .0% 76 .9% 76 .9% 77 .0%

Table 3

ILBP recognition rate for variable scale and cell sizes on Chars74k Task 15.

Cell Size Scale (Chars74K_15)

4 6 8 10 12 14 16

4 62 .3% 66 .1% 65 .7% 64 .8% 64 .7% 64 .6% 64 .6%

6 66 .9% 68 .9% 69 .2% 67 .6% 67 .7% 67 .8% 66 .7%

8 68 .6% 72 .6% 71 .8% 71 .0% 70 .5% 70 .8% 68 .9%

10 70 .0% 71 .7% 70 .3% 69 .2% 68 .9% 68 .7% 68 .7%

12 67 .7% 68 .7% 71 .4% 70 .8% 70 .5% 70 .5% 70 .5%

14 68 .2% 69 .7% 69 .7% 69 .4% 69 .0% 68 .9% 68 .9%

16 63 .6% 66 .8% 67 .2% 66 .9% 66 .8% 66 .7% 66 .7%

Table 4

Recognition rates for each method with a cell size of 8 and block size of 1.

Methods Datasets

IIIT5k ICDAR03 Challenge 2 chars74K task 15 Bib digits

ILBP 8 79 .9% 75 .3% 63 .9% 59 .4%

LBP u2 8 , 1 65 .1% 64 .6% 39 .1% 57 .1%

LTP 66 .3% 63 .4% 38 .5% 55 .4%

RLBP 67 .5% 63 .3% 43 .4% 51 .1%

CS − LBP 67 .0% 62 .8% 41 .0% 54 .37%

LDP 66 .2% 61 .1% 44 .4% 64 .1%

DRLBP 72 .8% 70 .6% 50 .1% 59 .0%

HOG 76 .5% 72 .8% 60 .5% 58 .3%

HOG-LBP 75 .8% 72 .1% 59 .9% 59 .8%

c

w

“

c

t

r

Table 5

Recognition rates for each method with a cell size of 8, block size of 2, and block

overlap of 1 from ILBP to HOG-LBP.

Methods Datasets

IIIT5k ICDAR03

Challenge 2

chars74K

task 15

Bib digits

ILBP 8 82 .2% 80 .9% 71 .8% 70 .5%

LBP u2 8 , 1 67 .9% 65 .3% 42 .2% 66 .94%

LTP 66 .4% 64 .2% 41 .8% 64 .7%

RLBP 68 .2% 64 .4% 45 .7% 58 .4%

CS − LBP 67 .5% 63 .4% 41 .5% 61 .7%

LDP 75 .2% 68 .4% 60 .2% 67 .8%

DRLBP 68 .2% 72 .0% 53 .1% 68 .17%

HOG 81 .6% 75 .9% 66 .2% 71 .7%

HOG − LBP 81 .2% 72 .2% 65 .5% 71 .1%

Gabor wavelet 50 .1% 68 .2% 62 .6% 64 .7%

Daisy 48 .8% 68 .7% 68 .3% 69 .2%

Haar like 36 .0% 46 .2% 52 .6% 54 .3%

LPQ 46 .6% 64 .4% 69 .3% 62 .6%

Brief 22 .0% 50 .6% 50 .9% 58 .9%

s

a

s

C

e

s

m

e

n

n

fi

u

L

I

haracter image was resized to 60 × 48 pixels and the block size

as two. Note that again we believe it is challenging to distinguish

c” and “C” from a single letter; thus, we set our experiment to be

ase-insensitive. The results show that the recognition rate is bet-

er when the scale is set to 8 and 10 at each cell size. A similar

esult is also shown in Table 3 . In our experiments, the best ILBP

cale sizes for OCR are 8 and 10. In the overall experiments, we

djust the cell size to 8 and scale to 8 for OCR when the image

ize is 60 × 48 pixels.

In the OCR performance test, we used four datasets, IIIT5k [18] ,

hars74k_15 [19] , ICDAR03_ch2 [20] , and Bib digits [21] to test the

ach method’s performance. The performance of the texture de-

criptor can easily be observed without a block overlap. Further-

ore, block overlap and normalization are methods to improve an

dge descriptor’s robustness. Hence, we first tested a cell size of 8,

ormalized image size of 60 × 48 pixels with no block overlap and

o case sensitivity.

As mentioned before, we tested the performance of a Gaussian

lter instead of rectangle smoothing. First, we defined the image

sing a Gaussian filter as follows. And G ( p, σ ) is defined as Eq. (5 ).

( p, σ ) = G ( p, σ ) ∗I ( p ) (14)

Thus, the Gaussian version of ILBP s can be rewritten as follows.

LB P s ( p ) =

7 ∑

i =0

f

(L

(p,

2 ( s − 1 ) + 1

3

)( p i,s ) − L

(p,

2 s + 1

3

)( p c )

)2

i

(15)


Table 6

Recognition rate with different filters and interpolation on IIIT5K.

Interpolation Filter

Gaussian filter Rectangle box filter

Active 81 .0% 82 .2%

Not Active 81 .1% 81 .8%

Table 7

Recognition rate with different filters and interpolation on Chars74k_15.

Interpolation Filter

Gaussian filter Rectangle box filter

Active 71 .3% 71 .3%

Not Active 71 .3% 81 .3%

0%

100%

Fig. 10. Each figure shows the recognition rate of each letter and its incorrect clas-

sified with color map. From left to right is the recognition result of ILBP and HOG

on ICDAR03_ch2.

n

i

f

H

h

t

w

e

b

p

a

a

e

o

r

a

c

c

s

w

A

n

a

T

R

Tables 6 and 7 show the result of combining the blur method

and interpolation with a cell size of 8, scale size of 8, block size of

2, and block overlap of 1. The results show that ILBP with a rect-

angle box filter and without interpolation has an OCR recognition

rate that is greater than or equal to each situations.

We analyzed the recognition rate of each letter and show the

rates as confusion color maps in Fig. 10 . The most errors occurred

for the digit “0,” lower-case letter “o,” and upper-case letter “O.”

The second most common errors involved the digit “1,” upper-case

letter “I,” and lower-case letter “l.” Thus, we can observe that the

error-prone characters are similar to those of HOG because ILBP is

an edge-based descriptor. However, using first selection and max-

imum selection, ILBP is able to be more robust to noise, blur, and

letters combined with dots. The left image of Fig. 10 shows the

HOG recognition rate for each letter. From this figure, we can ob-

serve that the HOG and ILBP error rates of each letter are very

similar. However, the final recognition rates of ILBP are similar or

greater than that of HOG.

The computation cost of ILBP is S times that of original LBP for

CPU-based computation, where S is the scale space size. In this

study, we set our test image size to 60 × 48 pixels, and the com-

putation times of ILBP, HOG, and the original LBP u 2 8 , 1

were 0.031,

0.019, and 0.014 s, respectively (on average for IIIT5K). To observe

the computation time, we used 2048 × 2048 pixel images with a

cell size of 8 and block overlap of 1. The average time for ILBP

is 1.414 s. Furthermore, the computation time of HOG and LBP u 2 8 , 1

is 0.2642 s and 0.3143 s, respectively. However, when using the in-

tegral image, the computation time will be a constant when we

search for a pattern in the whole image at each scale. The com-

putation time could reduce to 0.812 s after we implement the ILBP

code on CUDA architecture.

5. Conclusion and future work

In summary, we propose a robust edge descriptor based on a

scale space search to find the best scale for LBP. We maintain lumi-

ance invariance and incorporate more information than LBP. ILBP

s robust to blur and noise, and we have demonstrated its per-

ormance on four datasets. Clearly, the performance of the basic

OG method is successful as an edge descriptor. In addition, there

ave been many methods based on HOG to describe local informa-

ion, such as SIFT and OCR based on bag-of-words. Thus, our future

ork will be to apply ILBP with a local descriptor. It will be inter-

sting to investigate the use of SURF because both the methods are

ased on an integral image.

However, ILBP still has some problems to overcome. Our first

roblem is how to create scale space more efficiently. In the SIFT

lgorithm, the scales are separated by octaves and increase with

scale and not a constant value of 2. Hence, scanning scale space

fficiently will be our next topic. In addition, there are many meth-

ds that use the bag-of-words model, so ILBP that is invariant to

otation is the second challenge. The original LBP achieves highly

ccurate results in face detection both for LBP riu 2 8 , 1

and LBP u 2 8 , 1

be-

ause it detects luminance changes on the face. Hence, the third

hallenge is to test ILBP performance on face detection. In this

tudy, we proved performance of the edge descriptor called ILBP,

hich is a new edge descriptor for pattern recognition .

cknowledgements

This research is supported by the Ministry of Science and Tech-

ology, Taiwan, R.O.C grant no. MOST103-2221-E-006-145-MY3

nd through national Academy of National Chen Kung University,

aiwan, R.O.C.

eference

[1] T. Wang , D.J. Wu , A. Coates , A.Y. Ng , End-to-end text recognition with convo-

lutional neural networks, in: ICPR, 2012, pp. 3304–3308 . [2] E. Osuna , R. Freund , F. Girosi , Training support vector machines: an application

to face detection, in: IEEE Computer Science Conference, 1997, pp. 130–136 .

[3] A . Vedaldi , A . Zisserman , Efficient additive kernels via explicit feature maps,in: IEEE Transaction of Pattern Analysis and Machine Intelligence, 2011,

pp. 4 80–4 92 . [4] Y. Freund , R.E. Schapire , A decision-theoretic generalization of on-line learning

and an application to boosting, in: Computational Learning Theory: Eurocolt,1995, pp. 23–37 .

[5] N.S. Altman , An introduction to kernel and nearest-neighbor nonparametric re-

gression, Am. Stat 46 (1992) 175–185 . [6] Y. Cheng , Mean shift, mode seeking, and clustering, IEEE Trans. Pattern Anal.

Mach. Intell. 17 (1995) 790–799 . [7] J. Sivic , A. Josef , Efficient visual search of videos cast as text retrieval, IEEE

Trans. Pattern Anal. Mach. Intell. 31 (2009) 591–605 . [8] N. Dalal , B. Triggs , Histograms of oriented gradients for human detection, CVPR

1 (2005) 886–969 .

[9] D. Huang , C. Shan , M. Ardabilian , Y. Wang , L. Chen , Local binary patterns andits application to facial image analysis: a survey, IEEE Trans. Syst., Man, Cy-

bern.—Part C (Appl. Rev.) 41 (2011) 765–781 . [10] P. Viola , M. Jones , Rapid object detection using a boosted cascade of simple

features, CV PR 1 (2011) 1–511 . [11] J. Chen , S. Shan , C. He , G. Zhao , M. Pietikainen , X. Chen , W. Gao , WLD: a ro-

bust local image descriptor, IEEE Trans. Pattern Anal. Mach. Intell. 39 (2010)

1705–1720 . [12] M. Calonder , V. Lepetit , C. Strecha , P. Fua , Brief: Binary robust independent el-

ementary features, in: Proceedings of the 11th European Conference on Com-puter Vision, Crete, Greece, Part IV, 2010, pp. 779–792 .

[13] C. Geng , X. Jiang , Face recognition based on the multi-scale local image struc-tures, Pattern Recognit. (2011) 2565–2575 .

[14] D. Lowe , Distinctive image features from scale-invariant keypoints efficient

classification for additive kernel svms, Int. J. Comput. Vis (2004) 91–110 . [15] H. Bay , A. Ess , T. Tuytelaars , L.J.V. Gool , Speeded-up robust features, Comput.

Vis. Image Understand. (2008) 346–359 . [16] T. Ahonen , A. Hadid , M. Pietikainen , Face description with local binary pat-

terns: application to face recognition, IEEE Trans. Pattern Anal. Mach. Intell. 12(2006) 2037–2041 .

[17] H. Jin , Q. Liu , H. Lu , X. Tong , Face detection using improved LBP under Bayesianframework, in: Proc Int. Conf. Image Graph., 2004, pp. 306–309 .

[18] A. Mishra , K. Alahari , C.V. Jawahar , Scene text recognition using higher order

language priors, Proceeding of 23rd BMVC, United Kingdom, 2012 . [19] S.M. Lucas , A. Panaretos , L. Sosa , A. Tamg , S. Wong , R. Young , ICDAR 2003 ro-

bust reading competitions, in: Proceeding of 7th ICDAR, 2003, pp. 6 82–6 87 . [20] deT.E. Campos , B.R. Babu , M. Verma , Character recognition in natural images,

in: Proceeding of VISAPP, 2009, pp. 273–280 .

http://dx.doi.org/10.13039/501100003711

http://refhub.elsevier.com/S0167-8655(17)30260-X/sbref0001


















































































[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[

[21] I. Ben-Ami , T. Basha , S. Avidan , Racing Bib Numbers Recognition, in: Proc. ofBritish Machine Vision Conference, 2012 19.1–19.10 .

22] Z. Zhang , Y. Xu , C.-L. Liu , Natural scene character recognition using robust PCAand sparse representation, in: Document Analysis Systems (DAS), 2016 12th

IAPR Workshop on, 2016, pp. 340–345 . 23] Y. Wanga , X. Bana , J. Chenc , B. Hud , X. Yange , License plate recognition based

on SIFT feature, Optik 126 (2015) 2895–2901 . [24] S. Gao , C. Wan , B. Xiao , C. Shi , W. Z. , Zhong Zhang , Scene text recognition by

learning co-occurrence of strokes based on spatiality embedded dictionary, in:

IET Computer Vision, 2014, pp. 138–148 . 25] C.-Z. Shi , C.-H. Wang , B.-H. Xiao , S. Gao , J.-L. Hu , Scene text recognition using

structure-guided character detection and linguistic knowledge, in: IEEE Trans-action on Circuit and System for Video Technology, July, 2014, pp. 1235–1250 .

26] X. Tan , B. Triggs , Enhanced local texture feature sets for face recognition underdifficult lighting conditions, IEEE Trans. Image Process. 19 (2010) 1635–1650 .

[27] T. Ojala , M. Pietikäinen , T.T. Mäenpää, Multiresolution gray-scale and rotation

invariant texture classification with local binary pattern, in: IEEE Transactionson Pattern Analysis and Machine Intelligence, 2002, pp. 971–987 .

28] J. Ruiz-del-Solar , J. Quinteros , Illumination compensation and normalization ineigenspace-based face recognition: a comparative study of differnet pre-pro-

cessing approaches, Pattern Recog. Lett. 29 (2008) 1966–1979 . 29] G. Bai , Y. Zhu , Z. Ding , A hierarchical face recognition method base on local

binary pattern, in: Proceeding of the Congress on Image and Signal Processing,

2008, pp. 610–614 . 30] L. Zhang , R. Chu , S. Xiang , S.Z. Li , Face detection based on Multi-Block LBP

representation, in: Proceeding of the International Conference on Biometrics,2007, pp. 11–18 .

[31] S. Liao , S.Z. Li , Learning multi-scale block local binary patterns for face re-congition, in: Proceeding of the International Conference on Biometrics, 2007,

pp. 828–837 .

32] Z. Li , G. Liu , Y. Yang , J. You , Scale- and rotation-invariant local binary patternusing scale-adaptive texton and subuniform-based circular shift, IEEE Trans.

Image Process. 4 (2012) 2130–2140 . [33] O. Barkan , Weill Jonathan , L. Wolf , H. Aronowitz , Fast high dimensional vector

multiplication face recognition, in: Proceedings of ICCV, 2013, pp. 1960–1967 .

34] M. Heikkila , M. Pietikäinen , C. Schmid , Description of interest regions withcenter-symmetric local binary patters, in: 5th Indian Conference on Computer

Vision, Graphics and Image Processing, 4338, 2006, pp. 58–69 . [35] C. Silva , T. Bouwmans , C. Frelicot , An eXtended center-symmetric local binary

pattern for background modeling and subtraction in videos, International JointConference on Computer Vision, Image and Computer Graphics Therory and

Applications, VISAPP, 2015 . 36] D.T. Nguyen , Z. Zong , P. Ogunbona , W. Li , Object detection using non-redun-

dant local binary patterns, in: Proceeding of IEEE International Conference on

Image Processing, 2010, pp. 4209–4216 . [37] D. Huang , Y. Wang , Y. Wang , A robust method for near infrared face recogni-

tion based on extended local binary pattern, in: Proc. Int. Symp. Vis. Compute,2007, pp. 437–446 .

38] Y. Huang , T.Tan Y.Wang , Combining statistics of geometrical and correla-tive features for 3D face recognition, in: Proc. Brit. Mach. Vis. Conf, 2006,

pp. 879–888 .

39] Z. Guo , L. Zhang , D. Zhang , A completed modeling of local binary pattern oper-ator for texture classification, IEEE Trans. Image Process. 19 (2010) 1657–1663 .

40] A. Satpathy , X. Jiang , H.-L. Eng , LBP-Based Edge-Texture Features for Object Re-congition, IEEE Trans. Image Process. (2014) 1953–1964 .

[41] B. Zhang , Y. Gao , S. Zhao , J. Liu , Local derivative pattern versus local binarypattern: Face recognition with high-order local pattern descriptor, IEEE Trans.

Image Process. (2010) 533–544 .

42] C. Juan , W. Quan , Z. Chenglong , Automatic head detection for passenger flowanalysis in bus surveillance videos, Int. Congr. Image Signal Process. (2012)

143–147 . 43] M. Haghighat , S. Zonouz , M. Abdel-Mottaleb , CloudID: Trustworthy

cloud-based and cross-enterprise biometric identification, Expert Syst. Appl.42 (21) (2015) 7905–7916 .

44] E. Tola , V. Lepetit , P. Fua , Daisy: An efficient dense descriptor applied to wide

baseline stereo, PAMI (32) 5 (2010) 815–830 May 2010 . 45] V. Ojansivu , J. Heikkilä, Blur insensitive texture classification using local phase

quantization, Proc. Image Signal Process. (ICISP 2008) 5099 (2008) 236–243 .












































































































Pattern Recognition Letters › files › 2018 › 03 › Improved...16 C.-S. Yang, Y.-H. Yang /...

Documents

Transcript of Pattern Recognition Letters › files › 2018 › 03 › Improved...16 C.-S. Yang, Y.-H. Yang /...