Pattern Recognition Letters › files › 2018 › 03 › Improved...16 C.-S. Yang, Y.-H. Yang /...
Transcript of Pattern Recognition Letters › files › 2018 › 03 › Improved...16 C.-S. Yang, Y.-H. Yang /...
Pattern Recognition Letters 100 (2017) 14–21
Contents lists available at ScienceDirect
Pattern Recognition Letters
journal homepage: www.elsevier.com/locate/patrec
Improved local binary pattern for real scene optical character
recognition
Chu-Sing Yang, Yung-Hsuan Yang
∗
Department of Electrical Engineering, National Cheng Kung University, No.1, University Road, Tainan City 701, Taiwan, ROC
a r t i c l e i n f o
Article history:
Received 10 December 2016
Available online 8 August 2017
Keywords:
Local binary pattern
Optical character recognition
Edge descriptor
Integral image
Recognition speed
a b s t r a c t
A strong edge descriptor is an important topic in a wide range of applications. Local binary pattern (LBP)
techniques have been applied to numerous fields and are invariant with respect to luminance and ro-
tation. However, the performance of LBP for optical character recognition is not as good as expected. In
this study, we propose a robust edge descriptor called improved LBP (ILBP), which is designed for optical
character recognition. ILBP overcomes the noise problems observed in the original LBP by searching over
scale space, which is implemented using an integral image with a reduced number of features to achieve
recognition speed. In experiments, we evaluated ILBP’s performance on the ICDAR03, chars74K, IIIT5K,
and Bib digital databases. The results show that ILBP is more robust to blur and noise than LBP.
© 2017 Elsevier B.V. All rights reserved.
v
m
r
a
o
t
t
T
t
l
[
i
s
b
l
a
t
t
a
L
i
L
1. Introduction
With the increase in the number of portable cameras, smart
phones, and surveillance cameras, the details in their images have
considerably increased. As a result, the information in these im-
age details cannot be analyzed by humans. Thus, efficiently and
automatically analyzing this information is an important challenge
faced by image processing and pattern recognition. This informa-
tion includes image objects, text information, and characters. Ex-
tracting and understanding text in images have numerous applica-
tions in daily life such as helping vision-impaired people under-
stand text information, automatic document generation, or even
in surveillance systems. Therefore, we have mainly conducted re-
search on the detection and recognition of character information.
The general method for recognizing text in an image is to ex-
tract features from the image and classify the characters by feed-
ing them into a machine learning algorithm such as a neural net-
work (NN) [1] , support vector machine (SVM) [2,3] , or Adaboost
[4] . Alternatively, simpler algorithms such as the K-nearest neigh-
bors (K-NN) [5] or mean shift [6] can be used. Recently, as the
bag-of-words model [7] has gained considerable attention, many
methods have been proposed to divide a character into smaller
image “words” such as the end of stroke, curved stroke, or cross
stroke. These smaller words improve the recognition rate and are
∗ Corresponding author at: Computer and Communication Engineering, No. 3,
Fengnian St., Sanmin Dist., Ka, Ohsiung City 807, Taiwan (R.O.C.), Kaohsuing 807,
Taiwan.
E-mail addresses: [email protected] (C.-S. Yang),
[email protected] (Y.-H. Yang).
T
e
H
d
w
m
http://dx.doi.org/10.1016/j.patrec.2017.08.005
0167-8655/© 2017 Elsevier B.V. All rights reserved.
ery successful. However, fundamentally, the performance of these
ethods depends on the robustness of the feature extraction algo-
ithm.
A robust texture descriptor must address numerous issues such
s glare, blur, low contrast, and reflective surfaces in natural scene
ptical character recognition (OCR). Hence, the design of a robust
ext descriptor has become a popular topic in recognition sys-
ems, even for local descriptors of the bag-of-words (BOW) model.
here are several fully developed descriptor methods such as his-
ograms of Gaussian (HOG) [8] , local binary pattern (LBP) [9] , Harr-
ike descriptors [10] , dense [11] , BRIEF [12] , SIFT [13,14] , and SURF
15] descriptors. These methods have different advantages depend-
ng on the scene. For example, the HOG descriptor is more sen-
itive to edge changes, the Harr-like descriptor is more robust in
lurred regions, and LBP and BRIEF overcome low contrast prob-
ems. However, while recognizing a character image in complex
nd varied real environments, there are still numerous problems
o overcome.
The original LBP algorithm has many benefits with respect to
exture recognition, such as luminance invariance, rotation invari-
nce, easy implementation, and low computational cost. Moreover,
BP, both with and without rotation invariance, is very successful
n the field of face recognition [16] . However, in our experience,
BP’s performance for real scene OCR is worse than that of HOG.
his is because there is a large amount of hardware noise, blurred
dges, and other interfering factors in real-scene character images.
ence, in this study, we propose an improved LBP (ILBP) texture
escriptor, which is modified LBP without rotation invariance and
ith concepts from HOG added. There are four main modified
ethods in our implementation: creation of an LBP by averaging
C.-S. Yang, Y.-H. Yang / Pattern Recognition Letters 100 (2017) 14–21 15
Original image LBP over scale space
FirstSelection
Maximum Selection
Concat as ILBP FeatureBin reduced LBP Histogram into Cells
Group into blocks and normalize
Fig. 1. Flow chart of ILBP.
p
t
a
t
i
n
s
t
W
t
f
s
O
a
L
S
f
t
s
2
fi
m
o
c
b
s
t
a
l
o
t
o
s
t
w
t
i
i
l
p
p
a
r
L
c
t
L
Fig. 2. Current pixel and its neighbor pixels for P = 8 and R = 1.
Table 1
Recognition rate of LBP u 2 8 , 1 and, LBP riu 2 8 , 1 , and HOG 8, 2 .
Methods IIIT5K ICDAR03_ch2 Chars74k_15 Bib_Digits
LBP u 2 8 , 1 64 .42% 65 .83% 51 .56% 58 .09%
LBP riu 2 8 , 1 36 .70% 34 .70% 20 .49% 23 .02%
HOG 8, 2 81 .6% 75 .9% 66 .2% 71 .7%
s
c
d
r
t
a
s
a
(
o
r
a
s
e
p
f
a
a
t
a
i
s
s
c
e
r
t
s
n
L
t
o
ixel intensity via integral image, reduction of the number of his-
ogram bins, creation of an LBP over scale space, and using a blocks
nd cells concept from HOG. In the bin reduction step, we reduce
he number of LBP histogram bins and add a magnitude map sim-
lar to that of HOG; thus, each pixel can express an integral mag-
itude and direction information, which we called edge type. The
econd step is to create a bin-reduced LBP over scale space, and
he third step is to determine a meaningful scale in scale space.
e use the concept from MLBP [17] and SIFT that uses average in-
ensity instead of single pixel intensity comparisons in LBP and dif-
erent kernels between the center and neighbor pixels to achieve
cale invariance. Fig. 1 shows the flow chart of ILBP.
In Section 2 , we discuss related works, which are state-of-art
CR methods, comprise the original LBP, its variants, integral im-
ges, and scale space. Our main method, including scale space
BP, bin reduction, and scale selection, is explained in Section 3 .
ection 4 describes our experiment and recognition rate results for
our letter datasets, IIIT5k [18] , ICDAR03 challenge2 [19] , chars74K
ask 15 [20] , and Bib digital dataset [21] . Section 4 concludes the
tudy.
. Related works
Since visual word and BOW models have been popular in the
eld of image object recognition and matching, many of these
ethods have been presented. The method in [22] addresses ge-
metric blur in OCR applications, where the basic idea is to match
haracters by local descriptors, and Wang et al. [23] proposed OCR
y invariant feature matching. Furthermore, Gao et al. [24] pre-
ented a model to relate local strokes in a character using spa-
iality embedded dictionary (SED). The SED procedure associates
code-word, a feature based on a blocked HOG, with a particular
ocation and then builds a dictionary to model the co-occurrence
f several local strokes. Shi et al. [25] also presented a part-base
ree structure to model each character’s visual word (i.e., the end
f a stroke, line, curve, or cross) and use a tree structure model to
earch for each part’s score and add it to the global information in
he recognition stage. Although visual word-based OCR performs
ell, it depends on the robustness of the texture or edge descrip-
or. Hence, the design of a robust edge descriptor is an important
ssue. In our survey of the literature, the most common descriptor
s HOG because it can overcome blur with its cell histogram and
ow contrast by block normalization. Moreover, we note that LBP
erforms well on face and texture recognition. However, our ex-
eriments show that LBP cannot overcome noise in real scene OCR
nd loses spatial information as a result of its rotation invariance.
There are many advantages of the simple LBP algorithm with
otation invariance in the field of texture recognition. The original
BP emerged from a simple concept: a comparison between the
urrent pixel and its neighbors within a constant radius and direc-
ion as shown below ( Fig. 2 ):
BP u 2 P,R =
P−1 ∑
i =0
f ( p i − p c ) 2
i (1)
f ( x ) =
{1 , i f x ≥ 0
0 , i f x < 0
(2)
The authors also defined parameter U to be the number of tran-
itions between 0 and 1 [26] . Moreover, this study proved that the
urrent pixel is meaningless when u ≥ 3. So, in the Eq. (1 ), u 2 is
enoted as remove transition times, which are greater than 2. The
otation invariant LBP is denoted as LBP ri _ u 2 P,R
.
However, in real scene OCR, the performance of LBP ri _ u 2 P,R
is lower
han that of LBP u 2 P,R
because it loses edge information between cells
fter rotation invariance is implemented. Table 1 compares the real
cene OCR results of the original LBP algorithm with a cell size of 8
nd block size of 2 ( LBP u 2 P,R
) (the same LBP with rotation invariance
LBP ri _ u 2 P,R
)), and the HOG algorithm with a cell size of 8, block size
f 2, and unsigned orientation bins in 9 directions ( HOG 8, 2 ). These
esults are case-insensitive because it is difficult to distinguish “x”
nd “X” in real scene OCR from a single letter image. The results
how that HOG’s performance is better than the original LBP in
ach dataset. However, there are many extended LBP methods ap-
lied in numerous fields.
In our estimation, to design a descriptor for real scene OCR, the
ollowing issues must be addressed: noise, the number of features,
bility to describe edges, low contrast regions, reflective surfaces,
nd unbalanced edge pairs of a stroke. Noise is unavoidable; thus
here are many methods that use scale space to smooth the im-
ge for more stability. The ability to describe edges is the most
mportant issue because the entire algorithm is based on a de-
cription of edges or the difference between a light and dark area
uch as HOG, Haar-like, dense, and LBP methods. The most un-
ontrollable problems are reflection, low contrast, and unbalanced
dges. These problems cannot be controlled when we capture a
eal scene image with a low-cost device. Therefore, each descrip-
or can overcome two or more issues. HOG performs well with re-
pect to lower feature numbers, good edge description, and robust-
ess to unbalanced stroke type because of its unsigned orientation.
BP can handle low contrast images, and Haar-like and dense fea-
ures can handle blurred conditions. Many variants of these meth-
ds have been proposed to overcome various issues.
16 C.-S. Yang, Y.-H. Yang / Pattern Recognition Letters 100 (2017) 14–21
,
,,,
,
, , ,
,r=2
Fig. 3. Schematic of LBP 1 (left), and LBP 2 with first neighbor pixel (right).
w
m
o
L
t
3
b
t
s
w
p
i
(
n
L
r
t
s
c
t
t
2
r
a
w
t
o
a
t
s
s
R
i
G
c
s
t
i
a
T
To overcome the fluctuation of luminance, low contrast, and
noise, the local ternary pattern (LTP) [26] was proposed. It consid-
ers smaller intensity differences to create two state LBPs, a lower
and upper pattern, and then combines them into a feature de-
scriptor. A series of LBPs [27–29] was also proposed to overcome
the noise problem. The MLBP [17] method uses the average of a
3 × 3 area instead of a single pixel to determine small intensity
differences. Multi-block LBP (MBLBP) [30,31] has a similar concept,
which is to combine the average of a local block instead of sin-
gle pixel’s intensity. Ref. [32] presented a different approach to ad-
dressing noise by using a scale space LBP feature. In our previ-
ous study, we also proposed using average intensity and selected
a meaningful edge type over scale space. The method gives LBP
scale invariance, but it still cannot solve the information loss af-
ter rotation invariance. Thus, we concluded that rotation invariance
is not useful in real scene OCR. Hence, in this paper, we modified
LBP without rotation invariance to achieve more stability. The over-
complete LBP [33] considers the overlap of adjacent blocks with
different directions and modes to carry more information between
cells. However, the higher number of features in these variants of
LBP lead to higher training and recognition costs. Thus, there are
methods to reduce the number or LBP features. Center-symmetric
LBP (CS-LBP) [34] compares pairs of neighbors to lower the num-
ber of edge types and keep the same information as the original
LBP. Its extended method, XCS-LBP, [35] compares the current pixel
and two distance neighbors to increase stability and use fewer fea-
tures. Robust LBP (RLBP) [36] uses a different way to reduce the
number of bins by binary code equivalence. In general, RLBP has
30 bins, and CS-LBP has 14 bins. Lower numbers of bins will save
computation cost during recognition.
Numerous methods exist to enhance the ability of LBP edge
description. One of them is to increase the number of neighbors
and variant radius to obtain more texture information. Extended-
LBP (ELBP) [37,38] generates multiple LBP codes to describe the
current pixel state using more information. Complete-LBP (CLBP)
[39] presents a similar idea with sign and gray-value differences,
called intensity differences, to improve the current pixel’s discrim-
inative power. And in discriminative robust-LBP (DRLBP) [40] en-
hances the edge description ability by combining RLBP and the dif-
ference of LBP, which comprises the complete bins of RLBP. The au-
thors of [40] also presented a new concept to combine edge type
and magnitude information. Instead of per-pixel integer counts,
the histogram of DRLBP uses gradient magnitude ω x,y =
√
I 2 x + I 2 y ,
where I 2 x and I 2 y are the first-order derivatives in the x and y direc-
tions. DRLBP increases the edge information of a single pixel and
makes LBP more like HOG. Hence, it is becoming common to let a
single pixel carry more information. In our proposed method, we
also applied this concept by different way.
Although there are numerous ways to improve LBP’s perfor-
mance, not all the methods are suitable for real-scene OCR or a
local edge descriptor and some methods improve texture recogni-
tion. For example, in our experiments, LTP is not suitable for OCR.
In Chars74k_15, LTP with zero threshold obtains the same recog-
nition as the original LBP, but the recognition rate decreases when
the threshold is increased.
Thus, we conclude that a strong texture descriptor will have the
following four characteristics: 1. Robustness to noise. 2. Tolerance
to unbalanced luminance. 3. Fewer feature dimensions. 4. More in-
formation per pixel.
3. Main method
We propose ILBP to create an LBP method with the character-
istics mentioned in Section 2 by searching a scale space to real-
ize meaningful edge information, reducing feature numbers with
two procedures, and obtaining more information per pixel. In brief,
e use a magnitude map to carry more information; however, the
agnitude calculation for the current pixel is different from that
f HOG and DRLBP. In this section, we explain how to create the
BP over scale space using an integral image. We then explain how
o reduce number of bins and select the scale space method.
.1. Scale space LBP
To overcome noise problems in LBP, we use an image blurred
y integration to create scale-space LBP. First, we define the in-
egral image as II ( p ), where p is image pixel, and the image was
moothed by an m × m box filter as II m
( p ). Because, in this paper,
e only use eight neighbors and a variable radius with one other
arameter, we call the LBP method that uses eight neighbors orig-
nal LBP. We next define LBP s ( x, y ) as a comparison of center pixel
x, y) with its neighbor pixels using the following smoothing ker-
el.
B P s ( p ) =
7 ∑
i =0
s (I I 2 s −1 ( p i,s ) − I I 2 s +1 ( c ) ) 2
i (3)
Note that s is always greater than or equal to one. We then
edefine its neighbor as p i, r . Index i is the position relative to
he current pixel, and r means the distance from the center po-
ition. The value of r is an integer obtained by rounding the Eu-
lidean distance. The current pixel’s intensity is decided by taking
he mean of a rectangle area with size 2 s + 1 , and its neighbors’ in-
ensities are also smoothed with a smaller rectangle with a size of
s − 1 . When s = 1, the intensity of p c is the mean of a 3 × 3 pixel
ectangle area and the neighbor pixels are a single pixel intensity,
s shown in Fig. 3 (right). Fig. 3 (left) shows the case for s = 2,
here p c is the mean of a 5 × 5 pixel rectangle area and p 0, 2 is
he average of a 3 × 3 pixel rectangle area with a radius of 2. In
ur experiment, variant radii and smooth sizes between the center
nd its neighbor have a better recognition rate. Also, we proved
he same using MLBP and ILBP in each dataset. Thus parameter r
tarts from 1 and is proportional to parameter s.
We also calculate the approximate response when the scale
pace LBP is created, initializing the prediction equation as follows.
( p c ) = | 7 ∑
i =0
( G ( p i , σ1 ) ) − G ( p c , σ2 ) ) | (4)
Here, R ( p c ) usually uses G ( p ), a discrete version of the Laplaca-
an of Gaussian (LOG), calculated as follows.
( p ) =
1
2 πσ 2 e
−( p x 2 + p y 2 ) 2 σ2 (5)
In Lowe’s work [14] , they use an approximate method to cal-
ulate the extreme to detect the fitness of a blob by comparing
cale and spatial space. In our application, we use the response
o find the fitness of an edge type at the current pixel by search-
ng scale space at maximum selection stage. As mentioned before,
key point in real scene edge descriptor is robustness to noise.
hus, there are three possible conditions between σ and σ when
1 2C.-S. Yang, Y.-H. Yang / Pattern Recognition Letters 100 (2017) 14–21 17
t
n
w
(
c
p
σ
t
c
b
a
R
σ
T
3
r
u
R
d
a
t
i
(
b
n
t
r
w
F
p
b
p
3
F
t
m
f
a
d
H
t
a
a
b
m
v
r
1
2 3 4 5 6 7 8 9
10 13 14 15 16 17
18 19 20 21 22 23 24
21
2 3 4 5 6 7 89
25
26
22
1514131211101716
23 24 25 18 19 20
29282726292827
30
11 12
0 else,when > 2
M\O
0
1
2
3
4
5
6
7
8
1 2 3 4 5 6 7
Fig. 4. All edge types in first number of bin reducing. The edge type fold along the
dotted line.
2 3 4 5 6 7 8 9
1 1
10 u>3
Fig. 5. All edge types after the second reduction.
2,1 2,2 3,1 3,2
Fig. 6. Edge type and magnitude. Each number pair indicates the edge type and
magnitude.
α
{
a
c
t
e
F
3
p
m
c
L
h
i
b
i
b
s
n
m
he current pixel value is noise. The first condition emphasizes the
oise response when σ 1 > σ 2 because the peak of center G ( p c , σ 2 )
ill have a higher value than its neighbor. The second condition
σ1 = σ2 ) performs like the LOG operator. However, in the third
ondition ( σ 1 < σ 2 ), we can obtain a smoother kernel with lower
eaks and wider Gaussian RMSs. Thus in our application, we use
1 < σ 2 with linear growth.
To reduce the computational cost and fit LBP based on the in-
egral image, we use a box filter instead of a Gaussian filter. By the
entral limit theory, a box filter is an approximation of Gaussian
lur, so we can rewrite Eq. (4 ) as follows when s is 1, σ1 = 2 s − 1 ,
nd σ2 = 2 s + 1 .
s ( p c ) = | 7 ∑
i =0
(I I 2 s −1 ( p i,s ) − I I 2 s +1 ( p c ) ) | (6)
We select a neighbor pixel using discrete distances and keep
1 < σ 2 to overcome the noise problem when s is increased.
herefore, Eq. (6 ) is a discrete version of LOG .
.2. Reducing number of bins in LBP
The number of bins in RLBP with a uniform pattern is 30. This
educes the number of bins calculated using Eq. (7 ). However, we
se the HOG concept of unsigned orientation.
LBP ( x, y ) = min
{LBP ( x, y ) , 2
8 − 1 − LBP ( x, y ) }
(7)
The characters in a natural scene are randomly brighter or
arker than the background. For example, an image may have
stroke that transitions from a lighter to a darker area. Thus,
he performance of unsigned HOG is better than signed HOG
n each dataset in our experiments. Considering this, we regard
0 0 0 0 0111) 2 as (111110 0 0) 2 so we can reduce the LBP feature bins
y considering the two situations as one edge type. For conve-
ience, M denotes the number of ones in ( LBP s ) 2 and O is a rota-
ion left with repeated bits; thus, rLBP u 2 can be written as follows.
LB P s =
⎧ ⎨
⎩
1 , i f O = 8 or O = 0
255 − LB P s , else i f ( M ≥ 4 and O ≥ 4 ) or O > 4
LB P s , other wise
(8)
here rLBP s is the first reduction in the number of bins of LBP .
urthermore, rLBP u 2 s denotes the image after the uniform pattern
rocess. In general, the range of LBP u 2 s is 1–59, which is com-
ined with M = 0 , 7 × 8 ( M × O ), where M = 8 , and U > 3. In our
roposed method, the number of bins after the first reduction is
1, which is a combination of (0 0 0 0 0 0 0 0) 2 , 28 bins (as shown in
ig. 4 ), (11111111) 2 , and u > 2. And Fig. 4 also shows the folds of
he edge types in our proposed method.
In the original LBP, each pixel presents a single piece of infor-
ation because one pixel is only counted once in a cell; there-
ore, we propose an improvement to reduce the number of bins
nd present more information per pixel. First, we continue to re-
uce the bins by their directions, which is an idea borrowed from
OG. In HOG, the performance of an unsigned orientation is bet-
er than a signed orientation for person detection, and we have
lso obtained the same result in character recognition on the IIIT5k
nd Chars74K_15 datasets. Therefore, we reduce the number of LBP
ins to 10, as shown in Fig. 5 . The edge types in rrLBP, which
eans the second bin reduction, is calculated as follows.
( x, y ) = rLBP u 2 s ( x, y ) − 2 (9)
r LBp u 2 s ( x, y ) =
{
1 , i f M = 0 or M = 8
0 , else i f U > 3
2 × mod ( v ( x, y ) , 4 ) + ( M ∩ 1 ) � 1 + 2 , other wise
(10)
( x, y ) =
1 , i f O = 0 or 8 ⌊ v ( x, y )
2
⌋
+ 1 , otherwise (11)
The physical meaning of the magnitude map is the length from
n edge to the current pixel. Fig. 5 shows the edge types and
orresponding magnitudes; the first pattern in Fig. 6 shows edge
ype 2 and magnitude 1, and the second pattern shows the same
dge type with magnitude 2. The third and fourth patterns in
ig. 6 show the situations when M is even.
.3. Selection method over scale space
After r r LBP u 2 s is created over scale space, we use two ap-
roaches called “first section” and “maximum selection” to deter-
ine the meaningful scale. In the following, we explain the physi-
al meaning of our proposed selection method.
We first observe that there are 59 edge types in the original
BP u 2 8 , 1
, many of which are meaningless. This number is caused by
ardware noise and the wrong radius size for obtaining the best
nformation at the current pixel.
In Fig. 7 , the yellow box is at a smaller scale in LBP s and blue
ox is at a larger scale. The edge type in the yellow box is mean-
ngless, but when we increase the scale, the edge type in the
lue box is meaningful. Hence, we conclude that selecting the best
cale will improve LBP performance. In general, we use the first
on-zero value as the current pixel edge type, and we called this
ethod first selection. In the first selection method, the pixel will
18 C.-S. Yang, Y.-H. Yang / Pattern Recognition Letters 100 (2017) 14–21
Fig. 7. Shows two scales with different LBP s edge types.
Fig. 8. Two cases of first selection. The point close to edge will have higher mag-
nitude (left), and point far from the edge will have smaller magnitude (right).
i
H
r
c
0
B
t
r
d
s
t
f
i
t
o
c
f
b
4
e
p
o
m
d
I
m
p
t
p
t
p
t
w
t
e
w
c
t
m
d
a
M
m
o
R
a
I
h
I
o
t
s
t
e
m
s
s
s
i
n
have larger magnitude if the current point is very close to the edge
and vice versa. Fig. 8 shows a schematic of first selection and illus-
trates the situation we describe. We assign the pixel as edge type
10 if the pixel is all zeros over all scales.
Thus, the equation of F LBP u 2 S
( x, y ) can be written as follows.
F LBP u 2 S ( x, y ) =
{r r LBP u 2
i ( x, y ) , i is f irst none zero in rrLB P u 2
10 , i f ∀ i, r r LBP u 2 i ( x, y ) = 0
(12)
where S in F LBP u 2 S
( x, y ) is the scale space from 1 to s of
r r LBP u 2 s ( x, y ) .
The second method is called maximum selection, and it selects
the maximum response from Eq. (13 ) in scale space. As mentioned
above, the response is a discrete version of LOG. Hence, we use
the extreme response to reduce the effect of noise. Thus, the max-
imum selection method uses the absolute maximum response in
scale space, which we record when creating LBP over scale space.
We still assign edge type 10 to the current pixel if there are all
zeros over all scales. Therefore, we can write MLBP u 2 S
( x, y ) as the
following:
MLBP u 2 S ( x, y )
=
{r r LBP u 2
i ( x, y ) , i = arg max i ( R i ( x, y ) ) and r r LBP u 2 i
� = 0
10 , i f ∀ i, r r LBP u 2 i ( x, y ) = 0
(13)
Parameter s is the search scale space, is from 1 to s, and is used
to find the maximum response. ILBP combines two selection meth-
ods; thus, the number of features in ILBP is 20, which is a combi-
nation of the 10 features in first selection and the 10 features in
maximum selection. The final step is to create a histogram cell and
combine it into blocks. In this step, we use the L2-norm when the
block is created. Note that the histograms of first and maximum
selection have already been created. In our study, the rectangle box
filter performs better than or equal to the Gaussian filter in OCR.
Hence, we will discuss the performance when we replace rectangle
smoothing with a Gaussian filter and use the interpolated intensity
value when comparing neighbor intensities in Section 4 . A clipping
exists in HOG’s block normalization procedure, and the purpose of
this clipping is to avoid peaks in the histogram and compress it
into an acceptable size. Hence, we also tested the original algo-
rithm with clipping in HOG in FLBP and MLBP, but the results were
not as effective as expected. In ILBP, the magnitude is the integer
distance to the nearest edge, so the meaning of the clipping fac-
tor is different from that of HOG. First selection determines the
meaningful edge type around the edges, and maximum selection
gnores the noise to find the strongest edges at the current pixel.
ence, we use the clip factor after block normalization and do not
enormalize the blocks. Furthermore, first and maximum selection
an have different clip factors. In our experiment, the clip factor is
.3 and 0.4 when the cell size is 8 for OCR.
In Fig. 9 , the first row depicts the histogram of each method.
oth the maximum and first selection approaches obtain edge
ypes without including any meaningless edge types. In the second
ow, we show the histogram results for a blurred version of the
iscrete edge image (blurred by a Gaussian filter with σ = 2). First
election, original LBP, and DRLBP obtain results that are similar to
heir results for the discrete image. Maximum selection is more af-
ected by blur. We then add salt-and-pepper noise into the blurred
mage. The result of maximum selection for this image is similar
o that of the discrete image. In contrast, first selection, HOG, the
riginal LBP, and DRLBP are easily affected by noise. Therefore, we
ombine first selection and maximum selection to reduce the ef-
ect of both blur and noise, and the result shows that the com-
ined method is more accurate on each dataset.
. Experiment and results
In this section, we describe the experimental platform, then
valuate the relationship between scale and cell size, and finally
resent the results for OCR performance and the computation cost
f ILBP.
We implemented the original LBP, LTP, RLBP, CS-LBP, DRLBP,
ulti-scale LDP [41] , HOG, HOG-LBP [42] , Gabor-wavelet [43] ,
aisy [44] , Haar-like, local phase quantization (LPQ) [45] , Brief, and
LBP methods in a Windows 10 OS with i7-4720HQ on a 16 GB
emory laptop. For the software, we used MATLAB for the data
resentation and the core of each method was implemented using
he OpenCV library and C ++ code for performance. Each method’s
erformance was carefully verified, and we cut off the pixels on
he borders to avoid edge interference. For example, our HOG’s
erformance is better than that of MATLAB and OpenCV (76.5%
han 75.6% with a block size of 1 and cell size of 8). Moreover,
e used the same histogram procedure to include the cell’s his-
ogram and block the normalization function to guarantee that
ach method’s performance was compared fairly. For the classifier,
e used the K-NN method to classify each character. This method
an use multi-classifiers, but the recognition speed depends on
he feature extraction method and number of features. K-NN and
ulti-classifiers (NN, SVM, or Adaboost) have both advantages and
isadvantages, but we choose K-NN as our classifier because it is
n effective way to observe the robustness of each edge descriptor.
ore importantly, it is a standard baseline for testing the perfor-
ance of each method. In general, CS-LBP will have lowest number
f feature dimensions (4) , HOG has 9 features per cell, ILBP has 20,
LBP has 30 features per cell, original LBP without rotation invari-
nce has 59 features, and DRLBP has 60 features per cell. However,
LBP’s basic performance is affected by scale S, which determines
ow many LBPs we should to build and scan over the scale space.
n the same architecture, ILBP’s performance will be S times that
f LBP, but by using advance CUDA architecture, the computation
ime can be reduced to an acceptable time.
The most controversial parameter is S, which is the scale
pace’s size. This parameter will affect accuracy and the compu-
ational cost because we have to scan all the scale space. In our
xperiments, an S that is too big or too small will affect the maxi-
um selection. Thus, the best S depends on what we want to de-
cribe and the characteristics of the dataset. Similar to HOG, cell
ize also affects the accuracy and feature dimensions. Larger cell
izes will degrade accuracy but reduce the number of features and
ncrease recognition speed. Table 2 lists the case-insensitive recog-
ition rate with variable S and cell sizes in the IIIT5K dataset. Each
C.-S. Yang, Y.-H. Yang / Pattern Recognition Letters 100 (2017) 14–21 19
Fig. 9. Discrete edge test image. The first row, from left to right, depicts the test image and histograms of F LBP u 2 8 , MLBP u 2 8 , HOG , LBP u 2 8 , 1 , and DRLBP. The second row shows
each histogram result for a blurred version of the first image. The third row shows the histogram result after adding salt-and-pepper noise.
Table 2
ILBP recognition rate for variable scale and cell sizes on the IIIT5K.
Cell Size Scale (IIT5K)
4 6 8 10 12 14 16
4 79 .3% 80 .4% 80 .7% 80 .6% 80 .6% 80 .6% 80 .6%
6 80 .4% 82 .1% 81 .8% 81 .7% 81 .7% 81 .7% 81 .7%
8 81 .0% 82 .1% 82 .3% 82 .2% 82 .0% 82 .0% 81 .9%
10 79 .4% 81 .5% 81 .9% 82 .0% 80 .6% 80 .6% 81 .6%
12 78 .2% 80 .2% 80 .6% 80 .6% 80 .5% 80 .6% 80 .6%
14 76 .1% 75 .9% 78 .4% 78 .5% 78 .5% 78 .5% 78 .5%
16 74 .0% 76 .2% 76 .8% 77 .0% 76 .9% 76 .9% 77 .0%
Table 3
ILBP recognition rate for variable scale and cell sizes on Chars74k Task 15.
Cell Size Scale (Chars74K_15)
4 6 8 10 12 14 16
4 62 .3% 66 .1% 65 .7% 64 .8% 64 .7% 64 .6% 64 .6%
6 66 .9% 68 .9% 69 .2% 67 .6% 67 .7% 67 .8% 66 .7%
8 68 .6% 72 .6% 71 .8% 71 .0% 70 .5% 70 .8% 68 .9%
10 70 .0% 71 .7% 70 .3% 69 .2% 68 .9% 68 .7% 68 .7%
12 67 .7% 68 .7% 71 .4% 70 .8% 70 .5% 70 .5% 70 .5%
14 68 .2% 69 .7% 69 .7% 69 .4% 69 .0% 68 .9% 68 .9%
16 63 .6% 66 .8% 67 .2% 66 .9% 66 .8% 66 .7% 66 .7%
Table 4
Recognition rates for each method with a cell size of 8 and block size of 1.
Methods Datasets
IIIT5k ICDAR03 Challenge 2 chars74K task 15 Bib digits
ILBP 8 79 .9% 75 .3% 63 .9% 59 .4%
LBP u2 8 , 1 65 .1% 64 .6% 39 .1% 57 .1%
LTP 66 .3% 63 .4% 38 .5% 55 .4%
RLBP 67 .5% 63 .3% 43 .4% 51 .1%
CS − LBP 67 .0% 62 .8% 41 .0% 54 .37%
LDP 66 .2% 61 .1% 44 .4% 64 .1%
DRLBP 72 .8% 70 .6% 50 .1% 59 .0%
HOG 76 .5% 72 .8% 60 .5% 58 .3%
HOG-LBP 75 .8% 72 .1% 59 .9% 59 .8%
c
w
“
c
t
r
Table 5
Recognition rates for each method with a cell size of 8, block size of 2, and block
overlap of 1 from ILBP to HOG-LBP.
Methods Datasets
IIIT5k ICDAR03
Challenge 2
chars74K
task 15
Bib digits
ILBP 8 82 .2% 80 .9% 71 .8% 70 .5%
LBP u2 8 , 1 67 .9% 65 .3% 42 .2% 66 .94%
LTP 66 .4% 64 .2% 41 .8% 64 .7%
RLBP 68 .2% 64 .4% 45 .7% 58 .4%
CS − LBP 67 .5% 63 .4% 41 .5% 61 .7%
LDP 75 .2% 68 .4% 60 .2% 67 .8%
DRLBP 68 .2% 72 .0% 53 .1% 68 .17%
HOG 81 .6% 75 .9% 66 .2% 71 .7%
HOG − LBP 81 .2% 72 .2% 65 .5% 71 .1%
Gabor wavelet 50 .1% 68 .2% 62 .6% 64 .7%
Daisy 48 .8% 68 .7% 68 .3% 69 .2%
Haar like 36 .0% 46 .2% 52 .6% 54 .3%
LPQ 46 .6% 64 .4% 69 .3% 62 .6%
Brief 22 .0% 50 .6% 50 .9% 58 .9%
s
a
s
C
e
s
m
e
n
n
fi
u
L
I
haracter image was resized to 60 × 48 pixels and the block size
as two. Note that again we believe it is challenging to distinguish
c” and “C” from a single letter; thus, we set our experiment to be
ase-insensitive. The results show that the recognition rate is bet-
er when the scale is set to 8 and 10 at each cell size. A similar
esult is also shown in Table 3 . In our experiments, the best ILBP
cale sizes for OCR are 8 and 10. In the overall experiments, we
djust the cell size to 8 and scale to 8 for OCR when the image
ize is 60 × 48 pixels.
In the OCR performance test, we used four datasets, IIIT5k [18] ,
hars74k_15 [19] , ICDAR03_ch2 [20] , and Bib digits [21] to test the
ach method’s performance. The performance of the texture de-
criptor can easily be observed without a block overlap. Further-
ore, block overlap and normalization are methods to improve an
dge descriptor’s robustness. Hence, we first tested a cell size of 8,
ormalized image size of 60 × 48 pixels with no block overlap and
o case sensitivity.
As mentioned before, we tested the performance of a Gaussian
lter instead of rectangle smoothing. First, we defined the image
sing a Gaussian filter as follows. And G ( p, σ ) is defined as Eq. (5 ).
( p, σ ) = G ( p, σ ) ∗I ( p ) (14)
Thus, the Gaussian version of ILBP s can be rewritten as follows.
LB P s ( p ) =
7 ∑
i =0
f
(L
(p,
2 ( s − 1 ) + 1
3
)( p i,s ) − L
(p,
2 s + 1
3
)( p c )
)2
i
(15)
20 C.-S. Yang, Y.-H. Yang / Pattern Recognition Letters 100 (2017) 14–21
Table 6
Recognition rate with different filters and interpolation on IIIT5K.
Interpolation Filter
Gaussian filter Rectangle box filter
Active 81 .0% 82 .2%
Not Active 81 .1% 81 .8%
Table 7
Recognition rate with different filters and interpolation on Chars74k_15.
Interpolation Filter
Gaussian filter Rectangle box filter
Active 71 .3% 71 .3%
Not Active 71 .3% 81 .3%
0%
100%
Fig. 10. Each figure shows the recognition rate of each letter and its incorrect clas-
sified with color map. From left to right is the recognition result of ILBP and HOG
on ICDAR03_ch2.
n
i
f
H
h
t
w
e
b
p
a
a
e
o
r
a
c
c
s
w
A
n
a
T
R
Tables 6 and 7 show the result of combining the blur method
and interpolation with a cell size of 8, scale size of 8, block size of
2, and block overlap of 1. The results show that ILBP with a rect-
angle box filter and without interpolation has an OCR recognition
rate that is greater than or equal to each situations.
We analyzed the recognition rate of each letter and show the
rates as confusion color maps in Fig. 10 . The most errors occurred
for the digit “0,” lower-case letter “o,” and upper-case letter “O.”
The second most common errors involved the digit “1,” upper-case
letter “I,” and lower-case letter “l.” Thus, we can observe that the
error-prone characters are similar to those of HOG because ILBP is
an edge-based descriptor. However, using first selection and max-
imum selection, ILBP is able to be more robust to noise, blur, and
letters combined with dots. The left image of Fig. 10 shows the
HOG recognition rate for each letter. From this figure, we can ob-
serve that the HOG and ILBP error rates of each letter are very
similar. However, the final recognition rates of ILBP are similar or
greater than that of HOG.
The computation cost of ILBP is S times that of original LBP for
CPU-based computation, where S is the scale space size. In this
study, we set our test image size to 60 × 48 pixels, and the com-
putation times of ILBP, HOG, and the original LBP u 2 8 , 1
were 0.031,
0.019, and 0.014 s, respectively (on average for IIIT5K). To observe
the computation time, we used 2048 × 2048 pixel images with a
cell size of 8 and block overlap of 1. The average time for ILBP
is 1.414 s. Furthermore, the computation time of HOG and LBP u 2 8 , 1
is 0.2642 s and 0.3143 s, respectively. However, when using the in-
tegral image, the computation time will be a constant when we
search for a pattern in the whole image at each scale. The com-
putation time could reduce to 0.812 s after we implement the ILBP
code on CUDA architecture.
5. Conclusion and future work
In summary, we propose a robust edge descriptor based on a
scale space search to find the best scale for LBP. We maintain lumi-
ance invariance and incorporate more information than LBP. ILBP
s robust to blur and noise, and we have demonstrated its per-
ormance on four datasets. Clearly, the performance of the basic
OG method is successful as an edge descriptor. In addition, there
ave been many methods based on HOG to describe local informa-
ion, such as SIFT and OCR based on bag-of-words. Thus, our future
ork will be to apply ILBP with a local descriptor. It will be inter-
sting to investigate the use of SURF because both the methods are
ased on an integral image.
However, ILBP still has some problems to overcome. Our first
roblem is how to create scale space more efficiently. In the SIFT
lgorithm, the scales are separated by octaves and increase with
scale and not a constant value of 2. Hence, scanning scale space
fficiently will be our next topic. In addition, there are many meth-
ds that use the bag-of-words model, so ILBP that is invariant to
otation is the second challenge. The original LBP achieves highly
ccurate results in face detection both for LBP riu 2 8 , 1
and LBP u 2 8 , 1
be-
ause it detects luminance changes on the face. Hence, the third
hallenge is to test ILBP performance on face detection. In this
tudy, we proved performance of the edge descriptor called ILBP,
hich is a new edge descriptor for pattern recognition .
cknowledgements
This research is supported by the Ministry of Science and Tech-
ology, Taiwan, R.O.C grant no. MOST103-2221-E-006-145-MY3
nd through national Academy of National Chen Kung University,
aiwan, R.O.C.
eference
[1] T. Wang , D.J. Wu , A. Coates , A.Y. Ng , End-to-end text recognition with convo-
lutional neural networks, in: ICPR, 2012, pp. 3304–3308 . [2] E. Osuna , R. Freund , F. Girosi , Training support vector machines: an application
to face detection, in: IEEE Computer Science Conference, 1997, pp. 130–136 .
[3] A . Vedaldi , A . Zisserman , Efficient additive kernels via explicit feature maps,in: IEEE Transaction of Pattern Analysis and Machine Intelligence, 2011,
pp. 4 80–4 92 . [4] Y. Freund , R.E. Schapire , A decision-theoretic generalization of on-line learning
and an application to boosting, in: Computational Learning Theory: Eurocolt,1995, pp. 23–37 .
[5] N.S. Altman , An introduction to kernel and nearest-neighbor nonparametric re-
gression, Am. Stat 46 (1992) 175–185 . [6] Y. Cheng , Mean shift, mode seeking, and clustering, IEEE Trans. Pattern Anal.
Mach. Intell. 17 (1995) 790–799 . [7] J. Sivic , A. Josef , Efficient visual search of videos cast as text retrieval, IEEE
Trans. Pattern Anal. Mach. Intell. 31 (2009) 591–605 . [8] N. Dalal , B. Triggs , Histograms of oriented gradients for human detection, CVPR
1 (2005) 886–969 .
[9] D. Huang , C. Shan , M. Ardabilian , Y. Wang , L. Chen , Local binary patterns andits application to facial image analysis: a survey, IEEE Trans. Syst., Man, Cy-
bern.—Part C (Appl. Rev.) 41 (2011) 765–781 . [10] P. Viola , M. Jones , Rapid object detection using a boosted cascade of simple
features, CV PR 1 (2011) 1–511 . [11] J. Chen , S. Shan , C. He , G. Zhao , M. Pietikainen , X. Chen , W. Gao , WLD: a ro-
bust local image descriptor, IEEE Trans. Pattern Anal. Mach. Intell. 39 (2010)
1705–1720 . [12] M. Calonder , V. Lepetit , C. Strecha , P. Fua , Brief: Binary robust independent el-
ementary features, in: Proceedings of the 11th European Conference on Com-puter Vision, Crete, Greece, Part IV, 2010, pp. 779–792 .
[13] C. Geng , X. Jiang , Face recognition based on the multi-scale local image struc-tures, Pattern Recognit. (2011) 2565–2575 .
[14] D. Lowe , Distinctive image features from scale-invariant keypoints efficient
classification for additive kernel svms, Int. J. Comput. Vis (2004) 91–110 . [15] H. Bay , A. Ess , T. Tuytelaars , L.J.V. Gool , Speeded-up robust features, Comput.
Vis. Image Understand. (2008) 346–359 . [16] T. Ahonen , A. Hadid , M. Pietikainen , Face description with local binary pat-
terns: application to face recognition, IEEE Trans. Pattern Anal. Mach. Intell. 12(2006) 2037–2041 .
[17] H. Jin , Q. Liu , H. Lu , X. Tong , Face detection using improved LBP under Bayesianframework, in: Proc Int. Conf. Image Graph., 2004, pp. 306–309 .
[18] A. Mishra , K. Alahari , C.V. Jawahar , Scene text recognition using higher order
language priors, Proceeding of 23rd BMVC, United Kingdom, 2012 . [19] S.M. Lucas , A. Panaretos , L. Sosa , A. Tamg , S. Wong , R. Young , ICDAR 2003 ro-
bust reading competitions, in: Proceeding of 7th ICDAR, 2003, pp. 6 82–6 87 . [20] deT.E. Campos , B.R. Babu , M. Verma , Character recognition in natural images,
in: Proceeding of VISAPP, 2009, pp. 273–280 .
C.-S. Yang, Y.-H. Yang / Pattern Recognition Letters 100 (2017) 14–21 21
[
[
[
[
[
[
[
[
[
[
[
[
[
[
[
[
[
[21] I. Ben-Ami , T. Basha , S. Avidan , Racing Bib Numbers Recognition, in: Proc. ofBritish Machine Vision Conference, 2012 19.1–19.10 .
22] Z. Zhang , Y. Xu , C.-L. Liu , Natural scene character recognition using robust PCAand sparse representation, in: Document Analysis Systems (DAS), 2016 12th
IAPR Workshop on, 2016, pp. 340–345 . 23] Y. Wanga , X. Bana , J. Chenc , B. Hud , X. Yange , License plate recognition based
on SIFT feature, Optik 126 (2015) 2895–2901 . [24] S. Gao , C. Wan , B. Xiao , C. Shi , W. Z. , Zhong Zhang , Scene text recognition by
learning co-occurrence of strokes based on spatiality embedded dictionary, in:
IET Computer Vision, 2014, pp. 138–148 . 25] C.-Z. Shi , C.-H. Wang , B.-H. Xiao , S. Gao , J.-L. Hu , Scene text recognition using
structure-guided character detection and linguistic knowledge, in: IEEE Trans-action on Circuit and System for Video Technology, July, 2014, pp. 1235–1250 .
26] X. Tan , B. Triggs , Enhanced local texture feature sets for face recognition underdifficult lighting conditions, IEEE Trans. Image Process. 19 (2010) 1635–1650 .
[27] T. Ojala , M. Pietikäinen , T.T. Mäenpää, Multiresolution gray-scale and rotation
invariant texture classification with local binary pattern, in: IEEE Transactionson Pattern Analysis and Machine Intelligence, 2002, pp. 971–987 .
28] J. Ruiz-del-Solar , J. Quinteros , Illumination compensation and normalization ineigenspace-based face recognition: a comparative study of differnet pre-pro-
cessing approaches, Pattern Recog. Lett. 29 (2008) 1966–1979 . 29] G. Bai , Y. Zhu , Z. Ding , A hierarchical face recognition method base on local
binary pattern, in: Proceeding of the Congress on Image and Signal Processing,
2008, pp. 610–614 . 30] L. Zhang , R. Chu , S. Xiang , S.Z. Li , Face detection based on Multi-Block LBP
representation, in: Proceeding of the International Conference on Biometrics,2007, pp. 11–18 .
[31] S. Liao , S.Z. Li , Learning multi-scale block local binary patterns for face re-congition, in: Proceeding of the International Conference on Biometrics, 2007,
pp. 828–837 .
32] Z. Li , G. Liu , Y. Yang , J. You , Scale- and rotation-invariant local binary patternusing scale-adaptive texton and subuniform-based circular shift, IEEE Trans.
Image Process. 4 (2012) 2130–2140 . [33] O. Barkan , Weill Jonathan , L. Wolf , H. Aronowitz , Fast high dimensional vector
multiplication face recognition, in: Proceedings of ICCV, 2013, pp. 1960–1967 .
34] M. Heikkila , M. Pietikäinen , C. Schmid , Description of interest regions withcenter-symmetric local binary patters, in: 5th Indian Conference on Computer
Vision, Graphics and Image Processing, 4338, 2006, pp. 58–69 . [35] C. Silva , T. Bouwmans , C. Frelicot , An eXtended center-symmetric local binary
pattern for background modeling and subtraction in videos, International JointConference on Computer Vision, Image and Computer Graphics Therory and
Applications, VISAPP, 2015 . 36] D.T. Nguyen , Z. Zong , P. Ogunbona , W. Li , Object detection using non-redun-
dant local binary patterns, in: Proceeding of IEEE International Conference on
Image Processing, 2010, pp. 4209–4216 . [37] D. Huang , Y. Wang , Y. Wang , A robust method for near infrared face recogni-
tion based on extended local binary pattern, in: Proc. Int. Symp. Vis. Compute,2007, pp. 437–446 .
38] Y. Huang , T.Tan Y.Wang , Combining statistics of geometrical and correla-tive features for 3D face recognition, in: Proc. Brit. Mach. Vis. Conf, 2006,
pp. 879–888 .
39] Z. Guo , L. Zhang , D. Zhang , A completed modeling of local binary pattern oper-ator for texture classification, IEEE Trans. Image Process. 19 (2010) 1657–1663 .
40] A. Satpathy , X. Jiang , H.-L. Eng , LBP-Based Edge-Texture Features for Object Re-congition, IEEE Trans. Image Process. (2014) 1953–1964 .
[41] B. Zhang , Y. Gao , S. Zhao , J. Liu , Local derivative pattern versus local binarypattern: Face recognition with high-order local pattern descriptor, IEEE Trans.
Image Process. (2010) 533–544 .
42] C. Juan , W. Quan , Z. Chenglong , Automatic head detection for passenger flowanalysis in bus surveillance videos, Int. Congr. Image Signal Process. (2012)
143–147 . 43] M. Haghighat , S. Zonouz , M. Abdel-Mottaleb , CloudID: Trustworthy
cloud-based and cross-enterprise biometric identification, Expert Syst. Appl.42 (21) (2015) 7905–7916 .
44] E. Tola , V. Lepetit , P. Fua , Daisy: An efficient dense descriptor applied to wide
baseline stereo, PAMI (32) 5 (2010) 815–830 May 2010 . 45] V. Ojansivu , J. Heikkilä, Blur insensitive texture classification using local phase
quantization, Proc. Image Signal Process. (ICISP 2008) 5099 (2008) 236–243 .