[IEEE 2012 IEEE International Conference on Multimedia & Expo Workshops (ICMEW) - Melbourne,...
Transcript of [IEEE 2012 IEEE International Conference on Multimedia & Expo Workshops (ICMEW) - Melbourne,...
A Computationally Efficient Algorithm for BuildingStatistical Color Models
Mingzhi Dong , Liang Yin, Weihong Deng, Jun Guo and Weiran XuSchool of Information and Communication Engineering
Beijing University of Posts and Telecommunications, P.R.China 100876
Email: [email protected], [email protected], [email protected], [email protected], [email protected]
Abstract—Though widely used in surveillance systems of hu-man or fire detection, statistical color models suffer from longtraining time during parametric estimation. To solve this low-dimension huge-number density estimation problem, we proposea computationally efficient algorithm: weighted EM, which learnsthe parameters of finite mixture distribution from the histogramof training data. Thus by representing data with a small numberof parameters, we significantly reduce long-time storage costs.At the same time, estimating parameters from the histogramof relatively small size ensures the computational efficiency. Thealgorithm can be readily applied to any mixture model which canbe estimated by EM and its online learning form is also given inour paper. In the experiment of skin detection, the algorithm istested in a database of nearly half a billion training samples, andthe results show that our algorithm can do density estimationaccurately and enjoys significantly better computational andstorage efficiency.
Index Terms—Statistical Color Models; Skin Detection; Low-dimension Huge-amount Parametric Estimation; Weighted EM
I. INTRODUCTION
Statistical color model [1] is a frequently used preprocessing
step for image and video analysis and proves to be effective in
many kinds of surveillance applications, such as background
modeling [2], face detection [3] and fire detection [4] in
surveillance video. The key step for the generation of statistical
color model in different scenes is density estimation with
mainly two characteristics: (1)great large amount of training
data: with one high resolution picture consisting millions
of pixels itself, the number of training pixels for a certain
statistical color model may reach the order of billions (2)low
dimension feature space: the feature space of nearly all color
representations, such as RGB and HSI, is no more than 3
dimension.
To solve this problem of density estimation, former re-
searches mainly choose nonparametric methods or parametric
ones[5]. Most of non-parametric methods, such as histogram
and kernel density estimators, require the histogram or part of
training data set to be stored and thus need expensive long-
time storage space. In contrast, parameter estimation methods,
such as mixture models, can represent the entire training data
with only a small number of parameters. The set of parameters
can also be updated on-line to fulfill the real-time adaptive
application [6]. However, the parameter estimating process,
such as Expectation Maximization algorithm(EM), requires
excessively long time of training [1] which prevents its further
application. Thus, there’s an eager need for low-dimension
huge-number density estimation with highly computational
and storage efficiency.
In this paper, we propose a computationally efficient para-
metric density estimation algorithm: weighted EM to solve
this problem. Instead of learning the parameters directly from
data, our algorithm estimates the parameters of finite mixture
distribution from the histogram of data. We justify that, by
regarding the number of data falling into each bin of the
histogram as weight, when the histogram can represent the
value of data lossless, our weighted EM algorithm is equivalent
to the conventional EM. And when the size of histogram
decays down, the performance won’t decrease much above a
threshold. Compared with conventional EM whose processing
time is in relation to the total number of data, the weighted
EM’s computational time sharply drops in relation to the
number of nonempty bins in histogram. At the same time, by
representing the data with the parameters of a mixture model,
the long-time storage costs are significantly reduced. And on-
line adaptive learning can also be adopted to improve the
performance of system. Thus our algorithm can remain well-
performed and enjoys much better computational and storage
efficiency when dealing with large amount of training data.
Weighted EM is tested in a skin detection experiment with
compaq skin database[1]. The ROC performance of weighted
EM is better than histogram and EM estimation in [1]. And
compared with histogram (323 bins), the memory requirement
sharply drops from 262 Kbytes to 1280 bytes (16 mixtures
GMM). Also, in contrast to traditional EM, the estimation
time is reduced from nearly 24hour in [1] (using 10 Alpha
workstations in parallel) to a matter of minutes. So our
algorithm is testified to enjoy highly computational and storage
efficiency. The adaptive experiment for skin detection also
exhibits good result.
The remainder of paper is organized as follows. Conven-
tional EM for mixture models is reviewed in Section 2. Then
Section 3 proposes weighted EM algorithm and gives out the
exact batch learning and online learning form for Gaussian
Mixture Model. Section 4 tests the performance of weighted
EM algorithm in skin detection. Eventually Section 5 draws
the conclusion.
2012 IEEE International Conference on Multimedia and Expo Workshops
978-0-7695-4729-9/12 $26.00 © 2012 IEEE
DOI 10.1109/ICMEW.2012.76
402
II. CONVENTIONAL EM FOR MIXTURE MODELS
A. Finite Mixture Models
A finite mixture density [7] with K components for a ddimensional random variable x is given by:
p(x|θ) =K∑
k=1
πkpk(x|θk);with
K∑
k=1
πk = 1 0 ≤ πk ≤ 1 (1)
where θ = {π1, . . . , πK , θ1, . . . , θK} are the parameters of
the finite mixture model.K indicates the total number of
components and the kth component of the mixture is denoted
by pk(x, θk).The mixing weights denoted by πk are non-
negative and add up to one.
B. Expectation Maximization
The objective function of conventional EM is to maximize
the joint distribution of observations:
ln p(X|θ) = L(q, θ) +KL(q||p)where
L(q, θ) =∑
Z
q(Z) ln{p(X,Z|θ)q(Z)
}
KL(q||p) = −∑
Z
q(Z) ln{p(Z|X, θ)
q(Z)}
And EM process can be written as,
• E-step:
Holding θold fixed, minimize KL divergency, so we can
get
q(Z) = p(Z|X, θold)
• M-step:
Holding q(Z) fixed, maximize L(q, θ) = Q(θ, θold) +const with respect to θ, where
Q(θ, θold) = EZ(log p(X,Z|θold))C. EM based Parameter Estimation for Mixture Models
When making the assumption of independent, identically
distributed date set, q(Z) in E-step can be written as follows,
q(Z) =N∏
n=1
p(zn|xn, θold) ∝
N∏
n=1
p(zn,xn, θold)
So Q(θ, θold) in M-step can be calculated as follows,
Q(θ, θold) =∑
Z
p(Z|X, θold) ln p(X,Z|θ)
When we define
γ(znk) = EZ[znk]
the equation can be written as
Q =
N∑
n=1
K∑
k=1
γ(znk)(lnπk + ln p(xn|θk)) (2)
We can use Lagrange multiplier to maximize L(q, θ) with
respect to π while Holding q(Z) fixed
πk =∑
γ(znk)
The other parameters can also be calculated from equation(2)
by counting partial derivative.
III. WEIGHTED EM
A. Histogram
Histogram is a discrete representation of distribution of data
value. And in digital images, a color histogram represents the
number of pixels that have colors in each of a fixed list of
color ranges. Uniform quantization is by far the most common
choice for histogram based applications. The basic idea of
uniform quantization is to divide the range of values in every
dimension into uniformly spaced bins. Then we can calculate
the number of data falling into bin b and get hb. And calculate
the probability of bin b
p(b) =hb
N, ,where N =
B∑
b=1
hb (3)
In this paper, uniform quantization in RGB color space is used.
And H3 histogram indicates the division of every dimension
of RGB space into H uniform space and results in totally H3
bins.
B. Objective Function of weighted EM
The objective function of conventional EM for finite mixture
model is:
log p(X; θ) =N∑
n=1
logK∑
k=1
πkpk(xn; θk) (4)
If we already know the lossless histogram, for example 2563
Histogram for jpeg pictures, and there are hb points of the
same value xb, then equation(4) can be written as
log p(X; θ) =
B∑
b=1
hb log
K∑
k=1
πkpk(xb; θk) (5)
Equation5 is the objective of weighted EM. So by regarding
the number of data falling into each bin of the histogram
hb as weight, when the histogram can represent the value
of data lossless, the objective of weighted EM algorithm is
equivalent to the conventional EM. And if the representation
is not lossless, weighted EM can be seen as an parameter
estimation method with an approximative objective function.
C. Weighted EM for Mixture Models
The process of weighted EM also comprises E-step, which
remains the same as E-step of conventional EM, and M-step.
403
In M-step, according to the objective function mentioned
above, the Q(θ, θold) can be written as
Q =∑
Z
q(Z) lnB∏
b=1
K∏
k=1
[πkp(xb|θk)]zbkhb
=∑
Z
q(Z)
B∑
b=1
K∑
k=1
zbkhb(lnπk + ln p(xb|θk))
=B∑
b=1
K∑
k=1
[∑
Z
q(Z)zbk]hb(lnπk + ln p(xb|θk))
Replace γ(zbk) = E[zbk] with∑
Z q(Z)zbk, the equation can
be written as
Q =B∑
b=1
K∑
k=1
γ(zbk)hb(lnπk + ln p(xb|θk))
And by defining
α(zbk) = γ(zbk)hb = E[zbk]hb
Q can be written as
Q =B∑
b=1
K∑
k=1
α(zbk)(lnπk + ln p(xb|θk)) (6)
Lagrange multiplier method can be adopted to solve πk
πk =
∑k α(zbk)∑
k
∑b α(zbk)
And the other parameters of the mixture model can be obtained
by counting the partial derivative of equation(6).
In conventional EM, γ are called responsibilities and γ(znk)represents how much responsibility component k takes for the
observation variable xn. When considering weighted EM, αused here also has significantly physical meaning and γ(zbk)can be understand as weighted responsibilities, which not
only contains the information of responsibilities of different
mixture components, but also expresses the weight of different
observation xb: hb. When hb is large, the observation xb
will have more influence during the estimation process of
parameters.
And according to the discussion above, the iteration process
of weighted EM for mixture models can be summarized as:
1) Initialize the parameters θold
2) Evaluate the weighted responsibilities:
α(zbk) = hbπkp(xb|θold
k )∑
k πkp(xb|θoldk )
3) Evaluate the new parameter:
Holding current α(zbk) fixed, maximize Q in
equation(6) with respect to θ , get θnew
4) Check for termination conditions
• Satisfied: Output parameter estimation results
• Not satisfied: θold = θnew goto step2
D. Weighted EM for Gaussian Mixture Models
1) Weighted EM for GMM: Gaussian Mixture
Model(GMM) is the most widely used mixture model,
and the previous works in skin detection [1][8] always
choose GMM as the density of skin color. In the case of
GMM, p(xb|θk) = N (xb|μk,Σk), so the following results
are obtained as a special case of previous subsection.
1) Evaluate the weighted responsibilities:
α(zbk) = hbπkN (xb|μk,Σk)∑Kk=1 πkN (xb|μk,Σk)
2) Evaluate the new parameter:
μnewk =
∑Bb=1 α(zbk)xb∑Bb=1 α(zbk)
Σnewk =
∑Bb=1 α(zbk)(xb − μk)(xb − μk)
T
∑Bb=1 α(zbk)
πk =
∑b α(zbk)∑
k
∑b α(zbk)
3) Loglikelihood Calculation:
ln p(X|π, μ,Σ,H) =B∑
k=1
hb ln{K∑
k=1
πkN (xb|μk,Σk)
2) Weighted EM for Online Learning GMM: It is also
convenient to change the above results into online learning
form so as to fulfill the requirement of adaptive learning in
some applications and display much better performance.
The online learning form of αN+1 can be written as
αN+1i = hn
πNi N (xN+1|μN
i ,ΣNi )
∑Kk=1 π
Nk N (xN+1|μN
k ,ΣNk )
(7)
Parameters updating equations according to αN+1 are
0 0.1 0.2 0.3 0.4 0.5 0.60.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
1
Probability of false detection
Pro
babili
ty o
f corr
ect dete
ction
Conventional EM
Histogram
Weighted EM
Fig. 1. Comparison of ROC curves
404
Histogram Conventional EM Weighted EM0
5
10
15
20
25
Tim
e (
Ho
urs
)
Comparison of Running Time
(a) Estimation Time
Histogram Conventional EM Weighted EM0
50
100
150
200
250
300
Sto
rag
e S
pa
ce
(K
byte
s)
Comparison of Storage Costs
(b) Storage Cost
Fig. 2. Comparison of storage and computational efficiency
1) update μ: μN+1i = μN
i + rαN+1i (xN+1 − μN
i )2) update Σ: ΣN+1
i = ΣNi + rαN+1
i [(xN+1−μi)(xN+1−
μi)T − ΣN
i ]3) update π: πN+1
i = πNi + rαN+1
i [γ(z(N+1)i)− πNi ]
Where r is the learning rate to be fixed by hand.
IV. SKIN DETECTION EXPERIMENTS
A. Batch Learning Experiment
To test the performance of weighted EM, with the hardware
of 4G memory and Intel i5 670 processor, the algorithm is
implemented in Matlab to estimate GMM for skin and no-
skin with data given by [1]. The date set contains 13,640
manually labeled photos with nearly 1 billion labeled pixels.
For failing to read 9 of the pictures, 13,631 pictures are used.
With totally 8,962 non-skin pictures, 5,000 non-skin pictures
are random chosen as training samples and all others are
regarded as testing ones. For 4,669 skin pictures, 3000 are
selected as training ones with all others being testing ones.
And the total number of training pixels are 456,977,650 for
non-skin distribution modeling and 54,919,465 for skin one.
During the experiment, after obtaining the histograms of skin
and no-skin training pixels separately , two GMM models are
estimated by weighted EM. Different from [1]. which adopts
diagonal covariance for each component of Mixture Gaussian
(a) Estimation Time
(b) Area Under ROC Curve
Fig. 3. Estimation time and ROC performance of weghted EM under varyinghistogram size and mixture component number
so as to shorten the required time for estimation, here we make
no isotropic or diagonal approximation of covariance matrix.
After the estimation process, p(b|skin) and p(b|non− skin)can be calculated from the estimated parameter. Then skin
pixel classifier are conducted according to standard likelihood
ratio approach[1], and a particular pixel is classified as skin if
the bin b it belongs to satisfy
p(b|skin)p(b|non− skin)
≤ Θ (8)
where 0 ≤ Θ ≤ 1 is the threshold of the classifier.
Histogram model and conventional EM estimation with
diagonal approximation given in[1] are also implemented to
conduct a comparison. Figure 1 shows the performance of
different models quantified by ROC curves. The area under the
ROC curve is 0.9382 for best histogram (323), 0.9437 for bestweighted EM and 0.9321 for best conventional EM estimated
from. From the result, we can see weighted EM performs a
little better than histogram for the smoothness of parametric
model and outperforms conventional EM mainly in that no
diagonal approximation is made so that more accurate result
are given. Also, with weighted EM performing each iteration
greatly faster, now different initial parameters can be adopted
and eventually the set of parameters with best loglikelyhood
for training samples are selected.
It’s also important to compare the computational and storage
costs of different models. In [1], the author spend nearly
405
Fig. 5. An example: Performance of Online Weighted EM (upside pictures) and Histogram
0 0.2 0.4 0.6 0.8 1
x 10−6
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1x 10
−6
Probability of false detection
Pro
ba
bili
ty o
f co
rre
ct
de
tectio
n
ROC curve of Histogram
ROC curve of Online−EM
(a) ROC
−50 0 500.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Log Classification Threshold
Accu
racy
Accuracy of Histogram
Accuracy of Online−EM
(b) Accuracy as a function of log threshold
Fig. 4. Performance of adaptive learning
24 hours to training both skin and non-skin mixture models
using 10 Alpha workstations in parallel while the histogram
models could be finished in a matter of minutes. For weighted
EM, the total estimating time consists of thist, the time for
constructing histogram of training data set, and twem, the time
for parameter learning. Estimating 16 components with 256
histogram only needs 73.8607s and compared with the hours
of estimation with conventional EM, weighted EM is much
faster and it needs only a little more time than histogram.
Figure 3 displays time consumption of twem and ROC perfor-
mance with different components number and histogram size.
If we choose less mixture components or make estimation
from relatively small histogram, the time for estimation can
be reduced to even less than 1s, which enables the nearly real
time training of specified skin model in certain applications.
And the performance of weighted EM won’t decrease much
until the histogram decrease to 83 bins.
Parametric models display great strength, as shown in
Figure 2(b), with respect to storage costs. After learning the
model, the histogram needs to store Hd numbers, where Hindicates histogram size and d indicates the dimension. Then
when H equals 32 and d equals 3, the system requires 262Kbytes of storage(assuming 4 byte per bin). While GMM can
represent the result with k+k d(d+3)2 and k+2kd for diagonal
covariance matrix. So when H equals 32 and d equals 3, the
system only needs 896 bytes for diagonal covariance matrix
and 1280 bytes with no approximation.
B. Online Learning Experiment
The performance of system can be promoted when using
adaptive algorithm properly and online weighted EM is tested
by an adaptive human face detection experiment. 100 pictures
with face pixels occupying more than 30% of the whole
picture are picked in compaq data set [1]. The histogram of
each picture is obtained and then threshold are set to choose
bins classified as skin by the GMM classifier mentioned in
previous experiment. Then we use the selected bins in each
picture to modify the previous parameters with on-line learning
weighted EM and get different adaptive GMM model for
different picture. Finally the performance of adaptive model
is compared with fixed Histogram model.
Figure 4(a) exhibits the ROC accuracy curves of adaptive
model and fixed model. In Figure 4(b), with higher accuracy
406
and much wider threshold, it’s clear to see adaptive model
perform much better than fixed one. At the same time, the
peak accuracy value for adaptive model is better than fixed
one. When doing adaptive learning, as shown in Figure 5, the
results are acceptable during a wider threshold from 0.1 to
5. While the fixed one would perform well on relative much
narrower one. So we can see, the online weighted EM can get
quite nice performance by adaptive learning form of weighted
EM.
V. CONCLUSION
In this paper, we have proposed a computationally efficient
parametric density estimation algorithm for statistical color
models: weighted EM. With no sacrifice in performance, the
algorithm enjoys high computation and storage capacity and
it’s testified to perform well in experiment. Our technique can
be readily applied to other histogram or clustering quantization
methods and any mixture model which can be estimate by EM
procedure. And it can also be easily change into on-line form
or maximize posterior(MAP) form when considering the prior
distribution.
REFERENCES
[1] M. Jones and J. Rehg, “Statistical color models with application to skindetection,” International Journal of Computer Vision, vol. 46, no. 1, pp.81–96, 2002.
[2] A. Elgammal, R. Duraiswami, D. Harwood, and L. Davis, “Backgroundand foreground modeling using nonparametric kernel density estimationfor visual surveillance,” Proceedings of the IEEE, vol. 90, no. 7, pp.1151–1163, 2002.
[3] R. Hsu, M. Abdel-Mottaleb, and A. Jain, “Face detection in color images,”Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 24,no. 5, pp. 696–706, 2002.
[4] T. Celik, H. Demirel, H. Ozkaramanli, and M. Uyguroglu, “Fire detectionin video sequences using statistical color model,” in Acoustics, Speechand Signal Processing, 2006. ICASSP 2006 Proceedings. 2006 IEEEInternational Conference on. IEEE.
[5] S. Phung, A. Bouzerdoum Sr, and D. Chai Sr, “Skin segmentation usingcolor pixel classification: analysis and comparison,” Pattern Analysis andMachine Intelligence, IEEE Transactions on, vol. 27, no. 1, pp. 148–154,2005.
[6] J. Lee, Y. Kuo, P. Chung, E. Chen et al., “Naked image detection basedon adaptive and extensible skin color model,” Pattern recognition, vol. 40,no. 8, pp. 2261–2270, 2007.
[7] C. Bishop, Pattern recognition and machine learning. springer NewYork, 2006.
[8] W. Chen, Y. Shi, and G. Xuan, “Identifying computer graphics usinghsv color model and statistical moments of characteristic functions,” inMultimedia and Expo, 2007 IEEE International Conference on. IEEE.
407