[IEEE 2012 IEEE International Conference on Multimedia & Expo Workshops (ICMEW) - Melbourne,...

A Computationally Efficient Algorithm for BuildingStatistical Color Models

Mingzhi Dong , Liang Yin, Weihong Deng, Jun Guo and Weiran XuSchool of Information and Communication Engineering

Beijing University of Posts and Telecommunications, P.R.China 100876

Email: [email protected], [email protected], [email protected], [email protected], [email protected]

Abstract—Though widely used in surveillance systems of hu-man or fire detection, statistical color models suffer from longtraining time during parametric estimation. To solve this low-dimension huge-number density estimation problem, we proposea computationally efficient algorithm: weighted EM, which learnsthe parameters of finite mixture distribution from the histogramof training data. Thus by representing data with a small numberof parameters, we significantly reduce long-time storage costs.At the same time, estimating parameters from the histogramof relatively small size ensures the computational efficiency. Thealgorithm can be readily applied to any mixture model which canbe estimated by EM and its online learning form is also given inour paper. In the experiment of skin detection, the algorithm istested in a database of nearly half a billion training samples, andthe results show that our algorithm can do density estimationaccurately and enjoys significantly better computational andstorage efficiency.

Index Terms—Statistical Color Models; Skin Detection; Low-dimension Huge-amount Parametric Estimation; Weighted EM

I. INTRODUCTION

Statistical color model [1] is a frequently used preprocessing

step for image and video analysis and proves to be effective in

many kinds of surveillance applications, such as background

modeling [2], face detection [3] and fire detection [4] in

surveillance video. The key step for the generation of statistical

color model in different scenes is density estimation with

mainly two characteristics: (1)great large amount of training

data: with one high resolution picture consisting millions

of pixels itself, the number of training pixels for a certain

statistical color model may reach the order of billions (2)low

dimension feature space: the feature space of nearly all color

representations, such as RGB and HSI, is no more than 3

dimension.

To solve this problem of density estimation, former re-

searches mainly choose nonparametric methods or parametric

ones[5]. Most of non-parametric methods, such as histogram

and kernel density estimators, require the histogram or part of

training data set to be stored and thus need expensive long-

time storage space. In contrast, parameter estimation methods,

such as mixture models, can represent the entire training data

with only a small number of parameters. The set of parameters

can also be updated on-line to fulfill the real-time adaptive

application [6]. However, the parameter estimating process,

such as Expectation Maximization algorithm(EM), requires

excessively long time of training [1] which prevents its further

application. Thus, there’s an eager need for low-dimension

huge-number density estimation with highly computational

and storage efficiency.

In this paper, we propose a computationally efficient para-

metric density estimation algorithm: weighted EM to solve

this problem. Instead of learning the parameters directly from

data, our algorithm estimates the parameters of finite mixture

distribution from the histogram of data. We justify that, by

regarding the number of data falling into each bin of the

histogram as weight, when the histogram can represent the

value of data lossless, our weighted EM algorithm is equivalent

to the conventional EM. And when the size of histogram

decays down, the performance won’t decrease much above a

threshold. Compared with conventional EM whose processing

time is in relation to the total number of data, the weighted

EM’s computational time sharply drops in relation to the

number of nonempty bins in histogram. At the same time, by

representing the data with the parameters of a mixture model,

the long-time storage costs are significantly reduced. And on-

line adaptive learning can also be adopted to improve the

performance of system. Thus our algorithm can remain well-

performed and enjoys much better computational and storage

efficiency when dealing with large amount of training data.

Weighted EM is tested in a skin detection experiment with

compaq skin database[1]. The ROC performance of weighted

EM is better than histogram and EM estimation in [1]. And

compared with histogram (323 bins), the memory requirement

sharply drops from 262 Kbytes to 1280 bytes (16 mixtures

GMM). Also, in contrast to traditional EM, the estimation

time is reduced from nearly 24hour in [1] (using 10 Alpha

workstations in parallel) to a matter of minutes. So our

algorithm is testified to enjoy highly computational and storage

efficiency. The adaptive experiment for skin detection also

exhibits good result.

The remainder of paper is organized as follows. Conven-

tional EM for mixture models is reviewed in Section 2. Then

Section 3 proposes weighted EM algorithm and gives out the

exact batch learning and online learning form for Gaussian

Mixture Model. Section 4 tests the performance of weighted

EM algorithm in skin detection. Eventually Section 5 draws

the conclusion.

2012 IEEE International Conference on Multimedia and Expo Workshops

978-0-7695-4729-9/12 $26.00 © 2012 IEEE

DOI 10.1109/ICMEW.2012.76

402

II. CONVENTIONAL EM FOR MIXTURE MODELS

A. Finite Mixture Models

A finite mixture density [7] with K components for a ddimensional random variable x is given by:

p(x|θ) =K∑

k=1

πkpk(x|θk);with

K∑

k=1

πk = 1 0 ≤ πk ≤ 1 (1)

where θ = {π1, . . . , πK , θ1, . . . , θK} are the parameters of

the finite mixture model.K indicates the total number of

components and the kth component of the mixture is denoted

by pk(x, θk).The mixing weights denoted by πk are non-

negative and add up to one.

B. Expectation Maximization

The objective function of conventional EM is to maximize

the joint distribution of observations:

ln p(X|θ) = L(q, θ) +KL(q||p)where

L(q, θ) =∑

Z

q(Z) ln{p(X,Z|θ)q(Z)

}

KL(q||p) = −∑

Z

q(Z) ln{p(Z|X, θ)

q(Z)}

And EM process can be written as,

• E-step:

Holding θold fixed, minimize KL divergency, so we can

get

q(Z) = p(Z|X, θold)

• M-step:

Holding q(Z) fixed, maximize L(q, θ) = Q(θ, θold) +const with respect to θ, where

Q(θ, θold) = EZ(log p(X,Z|θold))C. EM based Parameter Estimation for Mixture Models

When making the assumption of independent, identically

distributed date set, q(Z) in E-step can be written as follows,

q(Z) =N∏

n=1

p(zn|xn, θold) ∝

N∏

n=1

p(zn,xn, θold)

So Q(θ, θold) in M-step can be calculated as follows,

Q(θ, θold) =∑

Z

p(Z|X, θold) ln p(X,Z|θ)

When we define

γ(znk) = EZ[znk]

the equation can be written as

Q =

N∑

n=1

K∑

k=1

γ(znk)(lnπk + ln p(xn|θk)) (2)

We can use Lagrange multiplier to maximize L(q, θ) with

respect to π while Holding q(Z) fixed

πk =∑

γ(znk)

The other parameters can also be calculated from equation(2)

by counting partial derivative.

III. WEIGHTED EM

A. Histogram

Histogram is a discrete representation of distribution of data

value. And in digital images, a color histogram represents the

number of pixels that have colors in each of a fixed list of

color ranges. Uniform quantization is by far the most common

choice for histogram based applications. The basic idea of

uniform quantization is to divide the range of values in every

dimension into uniformly spaced bins. Then we can calculate

the number of data falling into bin b and get hb. And calculate

the probability of bin b

p(b) =hb

N, ,where N =

B∑

b=1

hb (3)

In this paper, uniform quantization in RGB color space is used.

And H3 histogram indicates the division of every dimension

of RGB space into H uniform space and results in totally H3

bins.

B. Objective Function of weighted EM

The objective function of conventional EM for finite mixture

model is:

log p(X; θ) =N∑

n=1

logK∑

k=1

πkpk(xn; θk) (4)

If we already know the lossless histogram, for example 2563

Histogram for jpeg pictures, and there are hb points of the

same value xb, then equation(4) can be written as

log p(X; θ) =

B∑

b=1

hb log

K∑

k=1

πkpk(xb; θk) (5)

Equation5 is the objective of weighted EM. So by regarding

the number of data falling into each bin of the histogram

hb as weight, when the histogram can represent the value

of data lossless, the objective of weighted EM algorithm is

equivalent to the conventional EM. And if the representation

is not lossless, weighted EM can be seen as an parameter

estimation method with an approximative objective function.

C. Weighted EM for Mixture Models

The process of weighted EM also comprises E-step, which

remains the same as E-step of conventional EM, and M-step.

403

In M-step, according to the objective function mentioned

above, the Q(θ, θold) can be written as

Q =∑

Z

q(Z) lnB∏

b=1

K∏

k=1

[πkp(xb|θk)]zbkhb

=∑

Z

q(Z)

B∑

b=1

K∑

k=1

zbkhb(lnπk + ln p(xb|θk))

=B∑

b=1

K∑

k=1

[∑

Z

q(Z)zbk]hb(lnπk + ln p(xb|θk))

Replace γ(zbk) = E[zbk] with∑

Z q(Z)zbk, the equation can

be written as

Q =B∑

b=1

K∑

k=1

γ(zbk)hb(lnπk + ln p(xb|θk))

And by defining

α(zbk) = γ(zbk)hb = E[zbk]hb

Q can be written as

Q =B∑

b=1

K∑

k=1

α(zbk)(lnπk + ln p(xb|θk)) (6)

Lagrange multiplier method can be adopted to solve πk

πk =

∑k α(zbk)∑

k

∑b α(zbk)

And the other parameters of the mixture model can be obtained

by counting the partial derivative of equation(6).

In conventional EM, γ are called responsibilities and γ(znk)represents how much responsibility component k takes for the

observation variable xn. When considering weighted EM, αused here also has significantly physical meaning and γ(zbk)can be understand as weighted responsibilities, which not

only contains the information of responsibilities of different

mixture components, but also expresses the weight of different

observation xb: hb. When hb is large, the observation xb

will have more influence during the estimation process of

parameters.

And according to the discussion above, the iteration process

of weighted EM for mixture models can be summarized as:

1) Initialize the parameters θold

2) Evaluate the weighted responsibilities:

α(zbk) = hbπkp(xb|θold

k )∑

k πkp(xb|θoldk )

3) Evaluate the new parameter:

Holding current α(zbk) fixed, maximize Q in

equation(6) with respect to θ , get θnew

4) Check for termination conditions

• Satisfied: Output parameter estimation results

• Not satisfied: θold = θnew goto step2

D. Weighted EM for Gaussian Mixture Models

1) Weighted EM for GMM: Gaussian Mixture

Model(GMM) is the most widely used mixture model,

and the previous works in skin detection [1][8] always

choose GMM as the density of skin color. In the case of

GMM, p(xb|θk) = N (xb|μk,Σk), so the following results

are obtained as a special case of previous subsection.

1) Evaluate the weighted responsibilities:

α(zbk) = hbπkN (xb|μk,Σk)∑Kk=1 πkN (xb|μk,Σk)

2) Evaluate the new parameter:

μnewk =

∑Bb=1 α(zbk)xb∑Bb=1 α(zbk)

Σnewk =

∑Bb=1 α(zbk)(xb − μk)(xb − μk)

T

∑Bb=1 α(zbk)

πk =

∑b α(zbk)∑

k

∑b α(zbk)

3) Loglikelihood Calculation:

ln p(X|π, μ,Σ,H) =B∑

k=1

hb ln{K∑

k=1

πkN (xb|μk,Σk)

2) Weighted EM for Online Learning GMM: It is also

convenient to change the above results into online learning

form so as to fulfill the requirement of adaptive learning in

some applications and display much better performance.

The online learning form of αN+1 can be written as

αN+1i = hn

πNi N (xN+1|μN

i ,ΣNi )

∑Kk=1 π

Nk N (xN+1|μN

k ,ΣNk )

(7)

Parameters updating equations according to αN+1 are

0 0.1 0.2 0.3 0.4 0.5 0.60.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95

1

Probability of false detection

Pro

babili

ty o

f corr

ect dete

ction

Conventional EM

Histogram

Weighted EM

Fig. 1. Comparison of ROC curves

404

Histogram Conventional EM Weighted EM0

5

10

15

20

25

Tim

e (

Ho

urs

)

Comparison of Running Time

(a) Estimation Time

Histogram Conventional EM Weighted EM0

50

100

150

200

250

300

Sto

rag

e S

pa

ce

(K

byte

s)

Comparison of Storage Costs

(b) Storage Cost

Fig. 2. Comparison of storage and computational efficiency

1) update μ: μN+1i = μN

i + rαN+1i (xN+1 − μN

i )2) update Σ: ΣN+1

i = ΣNi + rαN+1

i [(xN+1−μi)(xN+1−

μi)T − ΣN

i ]3) update π: πN+1

i = πNi + rαN+1

i [γ(z(N+1)i)− πNi ]

Where r is the learning rate to be fixed by hand.

IV. SKIN DETECTION EXPERIMENTS

A. Batch Learning Experiment

To test the performance of weighted EM, with the hardware

of 4G memory and Intel i5 670 processor, the algorithm is

implemented in Matlab to estimate GMM for skin and no-

skin with data given by [1]. The date set contains 13,640

manually labeled photos with nearly 1 billion labeled pixels.

For failing to read 9 of the pictures, 13,631 pictures are used.

With totally 8,962 non-skin pictures, 5,000 non-skin pictures

are random chosen as training samples and all others are

regarded as testing ones. For 4,669 skin pictures, 3000 are

selected as training ones with all others being testing ones.

And the total number of training pixels are 456,977,650 for

non-skin distribution modeling and 54,919,465 for skin one.

During the experiment, after obtaining the histograms of skin

and no-skin training pixels separately , two GMM models are

estimated by weighted EM. Different from [1]. which adopts

diagonal covariance for each component of Mixture Gaussian

(a) Estimation Time

(b) Area Under ROC Curve

Fig. 3. Estimation time and ROC performance of weghted EM under varyinghistogram size and mixture component number

so as to shorten the required time for estimation, here we make

no isotropic or diagonal approximation of covariance matrix.

After the estimation process, p(b|skin) and p(b|non− skin)can be calculated from the estimated parameter. Then skin

pixel classifier are conducted according to standard likelihood

ratio approach[1], and a particular pixel is classified as skin if

the bin b it belongs to satisfy

p(b|skin)p(b|non− skin)

≤ Θ (8)

where 0 ≤ Θ ≤ 1 is the threshold of the classifier.

Histogram model and conventional EM estimation with

diagonal approximation given in[1] are also implemented to

conduct a comparison. Figure 1 shows the performance of

different models quantified by ROC curves. The area under the

ROC curve is 0.9382 for best histogram (323), 0.9437 for bestweighted EM and 0.9321 for best conventional EM estimated

from. From the result, we can see weighted EM performs a

little better than histogram for the smoothness of parametric

model and outperforms conventional EM mainly in that no

diagonal approximation is made so that more accurate result

are given. Also, with weighted EM performing each iteration

greatly faster, now different initial parameters can be adopted

and eventually the set of parameters with best loglikelyhood

for training samples are selected.

It’s also important to compare the computational and storage

costs of different models. In [1], the author spend nearly

405

Fig. 5. An example: Performance of Online Weighted EM (upside pictures) and Histogram

0 0.2 0.4 0.6 0.8 1

x 10−6

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1x 10

−6

Probability of false detection

Pro

ba

bili

ty o

f co

rre

ct

de

tectio

n

ROC curve of Histogram

ROC curve of Online−EM

(a) ROC

−50 0 500.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Log Classification Threshold

Accu

racy

Accuracy of Histogram

Accuracy of Online−EM

(b) Accuracy as a function of log threshold

Fig. 4. Performance of adaptive learning

24 hours to training both skin and non-skin mixture models

using 10 Alpha workstations in parallel while the histogram

models could be finished in a matter of minutes. For weighted

EM, the total estimating time consists of thist, the time for

constructing histogram of training data set, and twem, the time

for parameter learning. Estimating 16 components with 256

histogram only needs 73.8607s and compared with the hours

of estimation with conventional EM, weighted EM is much

faster and it needs only a little more time than histogram.

Figure 3 displays time consumption of twem and ROC perfor-

mance with different components number and histogram size.

If we choose less mixture components or make estimation

from relatively small histogram, the time for estimation can

be reduced to even less than 1s, which enables the nearly real

time training of specified skin model in certain applications.

And the performance of weighted EM won’t decrease much

until the histogram decrease to 83 bins.

Parametric models display great strength, as shown in

Figure 2(b), with respect to storage costs. After learning the

model, the histogram needs to store Hd numbers, where Hindicates histogram size and d indicates the dimension. Then

when H equals 32 and d equals 3, the system requires 262Kbytes of storage(assuming 4 byte per bin). While GMM can

represent the result with k+k d(d+3)2 and k+2kd for diagonal

covariance matrix. So when H equals 32 and d equals 3, the

system only needs 896 bytes for diagonal covariance matrix

and 1280 bytes with no approximation.

B. Online Learning Experiment

The performance of system can be promoted when using

adaptive algorithm properly and online weighted EM is tested

by an adaptive human face detection experiment. 100 pictures

with face pixels occupying more than 30% of the whole

picture are picked in compaq data set [1]. The histogram of

each picture is obtained and then threshold are set to choose

bins classified as skin by the GMM classifier mentioned in

previous experiment. Then we use the selected bins in each

picture to modify the previous parameters with on-line learning

weighted EM and get different adaptive GMM model for

different picture. Finally the performance of adaptive model

is compared with fixed Histogram model.

Figure 4(a) exhibits the ROC accuracy curves of adaptive

model and fixed model. In Figure 4(b), with higher accuracy

406

and much wider threshold, it’s clear to see adaptive model

perform much better than fixed one. At the same time, the

peak accuracy value for adaptive model is better than fixed

one. When doing adaptive learning, as shown in Figure 5, the

results are acceptable during a wider threshold from 0.1 to

5. While the fixed one would perform well on relative much

narrower one. So we can see, the online weighted EM can get

quite nice performance by adaptive learning form of weighted

EM.

V. CONCLUSION

In this paper, we have proposed a computationally efficient

parametric density estimation algorithm for statistical color

models: weighted EM. With no sacrifice in performance, the

algorithm enjoys high computation and storage capacity and

it’s testified to perform well in experiment. Our technique can

be readily applied to other histogram or clustering quantization

methods and any mixture model which can be estimate by EM

procedure. And it can also be easily change into on-line form

or maximize posterior(MAP) form when considering the prior

distribution.

REFERENCES

[1] M. Jones and J. Rehg, “Statistical color models with application to skindetection,” International Journal of Computer Vision, vol. 46, no. 1, pp.81–96, 2002.

[2] A. Elgammal, R. Duraiswami, D. Harwood, and L. Davis, “Backgroundand foreground modeling using nonparametric kernel density estimationfor visual surveillance,” Proceedings of the IEEE, vol. 90, no. 7, pp.1151–1163, 2002.

[3] R. Hsu, M. Abdel-Mottaleb, and A. Jain, “Face detection in color images,”Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 24,no. 5, pp. 696–706, 2002.

[4] T. Celik, H. Demirel, H. Ozkaramanli, and M. Uyguroglu, “Fire detectionin video sequences using statistical color model,” in Acoustics, Speechand Signal Processing, 2006. ICASSP 2006 Proceedings. 2006 IEEEInternational Conference on. IEEE.

[5] S. Phung, A. Bouzerdoum Sr, and D. Chai Sr, “Skin segmentation usingcolor pixel classification: analysis and comparison,” Pattern Analysis andMachine Intelligence, IEEE Transactions on, vol. 27, no. 1, pp. 148–154,2005.

[6] J. Lee, Y. Kuo, P. Chung, E. Chen et al., “Naked image detection basedon adaptive and extensible skin color model,” Pattern recognition, vol. 40,no. 8, pp. 2261–2270, 2007.

[7] C. Bishop, Pattern recognition and machine learning. springer NewYork, 2006.

[8] W. Chen, Y. Shi, and G. Xuan, “Identifying computer graphics usinghsv color model and statistical moments of characteristic functions,” inMultimedia and Expo, 2007 IEEE International Conference on. IEEE.

407

[IEEE 2012 IEEE International Conference on Multimedia & Expo Workshops (ICMEW) - Melbourne,...

Documents

Transcript of [IEEE 2012 IEEE International Conference on Multimedia & Expo Workshops (ICMEW) - Melbourne,...