A Compound Poisson Model for Learning Discrete Bayesian Networks

Acta Mathematica Scientia 2013,33B(6):1767–1784

http://actams.wipm.ac.cn

A COMPOUND POISSON MODEL FOR LEARNING

DISCRETE BAYESIAN NETWORKS∗

Abdelaziz GHRIBI

Laboratory of Physic-Mathematics, University of Sfax, B.P. 1171, Sfax, Tunisia

E-mail: Ghribi [email protected]

Afif MASMOUDI

Laboratory of Probability and Statistics, University of Sfax, B.P. 1171, Sfax, Tunisia

E-mail: [email protected]

Abstract We introduce here the concept of Bayesian networks, in compound Poisson

model, which provides a graphical modeling framework that encodes the joint probability

distribution for a set of random variables within a directed acyclic graph. We suggest

an approach proposal which offers a new mixed implicit estimator. We show that the

implicit approach applied in compound Poisson model is very attractive for its ability to

understand data and does not require any prior information. A comparative study between

learned estimates given by implicit and by standard Bayesian approaches is established.

Under some conditions and based on minimal squared error calculations, we show that the

mixed implicit estimator is better than the standard Bayesian and the maximum likelihood

estimators. We illustrate our approach by considering a simulation study in the context of

mobile communication networks.

Key words Bayesian network; compound Poisson distribution; multinomial distribution;

implicit approach; mobile communication networks

2010 MR Subject Classification 46N30; 62F12; 62F15

1 Introduction

Bayesian networks [1–4] are graphical representation of assumed dependencies among a set

of variables. Statistically, these networks encode the joint probability distribution for a set of

random variables in a directed acyclic graph (DAG), which is a set of variables and a set of

directed links. Each node in the graph represents a random variable, while the edges between

the nodes represent probabilistic dependencies among the corresponding random variables. In

a DAG, if there is an arc from variable A to variable B, A is said to be a parent of the child B.

These models are very attractive for their ability to learn causal relationships and hence they

offer a powerful framework to understand and analyze data [5–8].

∗Received May 12, 2012; revised Nevember 3, 2012.

1768 ACTA MATHEMATICA SCIENTIA Vol.33 Ser.B

For example, in the context of mobile phone, a Bayesian network could represent the

relationships between covered level (X), network configuration (Y ) and interference (Z). Since

the covered level and the network configuration are the causes of “interference level”, one can

use the network structure given by (X) −→ (Z)←− (Y ). To compute the joint probability of

(X,Y, Z), we can use the Markov property

P(X,Y, Z) = P(X)P(Y )P(Z/X, Y ).

This property is used to reduce the number of parameters that are required to characterize the

joint probability distribution. According to this example, one might consider the diagnostic

support for the belief on the state of the covered level given the observation of the presence of

interference.

Usually, in statistical inference, there are two different approaches: classical inference

(frequentistic) approach and Bayesian approach. In the context of Bayesian inference, both

parameters and data are treated as random variables. While the frequentistic approach regard

parameters non-rondom.

By using the Bayesian approach the posterior distribution is given by multiplying the

likelihood function by a known prior distribution and then dividing by a norming constant.

Consequently, this prior information is used together with the data to derive the posterior

distribution [9–11].

In this work, we restrict to discrete Bayesian network with a compound Poisson conditional

model. Recall that a compound Poisson distributed random variable is defined by SN =N∑

i=1

Xi,

where (Xi)i∈N is a sequence of independent identically distributed random variables and N

follows a Poisson distribution. Compound Poisson distribution appears in many fields, for

example it is used to modeling the distribution of the total rainfall in a day; where each day

contains a Poisson distributed numbers of events [12]. The compound Poisson distribution is

also used in actuarial science and insurance for modeling the distribution of total claim amount.

We address the problem of learning parameters of a Bayesian network with a compound

Poisson conditional model. We propose a theoretical treatment of using a compound Poisson

model for understanding Bayesian networks. We use the implicit inference approach to estimate

the parameters in the compound Poisson model for the network. We study the performance of

the proposed implicit estimator by considering a simulation study.

The network learning is formulated as an implicit estimation problem [13]. We previously

described the implicit approach that does not require any priors in the framework of Bayesian

network [14]. In [15] Ben Hassen et al proposed a new algorithm to learn the parameters in

implicit networks when the set of parameters is incomplete. In the same context, Bouchaala et

al [16] proposed a new implicit score function for the structure learning of Bayesian networks.

In this paper, we briefly recall the principles of implicit inference and give an application

of the method in the case of compound Poisson Bayesian network. After that, we describe our

approach based on a compound Poisson model. We show how we can learn the probabilities of

a Bayesian network when the set of data is complete. We apply our approach to an important

existing problem in mobile communication networks, which is the prediction of the probability

of the signal level quality.

No.6 A. Ghribi & A. Masmoudi: A COMPOUND POISSON MODEL FOR BAYESIAN NETWORKS 1769

The outline of this paper is as follows: in Section 2, we present the basic concepts of the

implicit approach and we propose a mixed implicit estimator in compound Poisson distribution

and we study its asymptotic behavior. In Section 3, we develop an unified Bayesian network for

compound Poisson conditional model and finally in Section 4, we give an illustrative example

and present the conclusion.

2 Inference with the Implicit Method

2.1 General View of the Implicit Method

The implicit estimation method was proposed by Hassairi et al [13] as an alternative to

the Bayesian approach. Assuming some prior information, the Bayesian theory [9, 10] considers

an unknown parameter θ as a random variable and determines its posterior distribution given

data. By the same way, the implicit distribution is considered as a posterior distribution of a

parameter θ given data. In fact, consider a family of probability distributions {P(x/θ), θ ∈ Θ}parameterized by an unknown parameter θ in a set Θ; where x represents the observed data.

The implicit distribution P(θ/x) is calculated by multiplying the likelihood function P(x/θ) by

a counting measure σ if Θ is a countable set and by a Lebesgue measure σ if Θ is an open set

(σ depends only on the topological structure of Θ) and then dividing by the norming constant

c(x) =∫Θ P(x/θ)σ(dθ). Therefore, the implicit distribution is given by the following formula

P(θ/x)(dθ) = (c(x))−1P(x/θ)σ(dθ) and plays the role of a posterior distribution of θ given x in

the Bayesian method. This corresponds to a particular improper prior which depends only on

the topology of Θ (without any statistical assumption).

In particular, if the set of parameters Θ is bounded then σ is proportional to the uniform

distribution on Θ. If Θ is unbounded then σ is improper prior.

Provided its existence (which holds for most statistical models), the implicit distribution

can be used for the estimation of the parameter θ following a Bayesian methodology. The

implicit estimator θ of θ is a Bayes estimator with respect to the squared error loss function

given by the expected posterior mean, that is

θ = E(θ/x) =

∫

Θ

θP(θ/x)(dθ).

An important problem in Bayesian estimation is how to define the prior distribution? If the prior

information about the parameter θ is available it should be incorporated in the prior density.

If we have not prior information, we apply non informative Bayesian estimation or implicit

estimation. In multinomial case, Ben Hassen et al [14] applied the implicit approach. For the

estimation of parameters, they considered D = (N1, · · · , Nr) as a random variable following a

Multinomial distribution with unknown parameters N =r∑

i=1

Ni and θ = (θ1, · · · , θr). They first

estimated N by implicit method and then they used the estimator N to estimate θ.

2.2 Implicit Method in Compound Poisson Model

Let D = (N1, N2, · · · , Nr) be a random vector such that N =r∑

i=1

Ni follows a Poisson

distribution Pλ with parameter λ > 0. Suppose that the distribution of D given (N = n)

follows a multinomial distribution with parameters n and θ = (θ1, θ2, · · · , θr), where 0 < θi < 1


andr∑

i=1

θi = 1.

If Pλ,θ denotes the probability distribution of D, then by applying Bayes formula, one has

Pλ,θ(n1, n2, · · · , nr) =

∞∑

n=0

Pλ,θ(n1, n2, · · · , nr/N = n)P(N = n)

=

∞∑

n=0

[n!

r∏

i=1

θni

i

ni!

]e−λλn

n!δn(

r∑

i=1

ni)

= (

r∑

i=1

ni)!

[r∏

i=1

θni

i

ni!

]e−λλ

r∑i=1

ni

(r∑

i=1

ni)!=

r∏

i=1

e−λθi(λθi)ni

ni!,

where δn denotes the Dirac measure at n.

Therefore, the distribution Pλ,θ of D is the product of independent Poisson distributions

with parameters λθ1, λθ2, · · · , λθr

Pλ,θ(n1, n2, · · · , nr) =

r∏

i=1

[e−λθi

(λθi)ni

ni!

]

with the convention that D = (0, 0, · · · , 0) if N = 0, the distribution of D = (N1, N2, · · · , Nr) is

called compound Poisson distribution.

If we denote

Λ =

{θ = (θ1, θ2, · · · , θr) ∈ R

d+ :

r∑

i=1

θi = 1

}

the family

F ={Pλ,θ : (λ, θ) ∈ R

∗+ × Λ

}

is called compound Poisson model. The following proposition gives the implicit estimators of

θk and λ.

Proposition 2.1 Let D = (N1, N2, · · · , Nr) be a random vector following the compound

Poisson distribution with parameters λθ1, λθ2, · · · , λθr, then

i) the implicit estimator θk of θk is

θk = E(θk/D) =Nk + 1

r +r∑

i=1

Ni

, k = 1, 2, · · · , r; (2.1)

ii) the implicit estimator λ of λ is

λ = E(λ/D) = 1 +

r∑

i=1

Ni. (2.2)

In what follows, we need the following technical lemma.

Lemma 2.1 Suppose thatN follows a Poisson distribution Pλ with parameter λ ∈ N\{0}and let us denote ψ(λ) = E( 1

N+r/λ), then

1) ψ(λ) = −e−λ(−λ)−r∫ 0

−λe−xxr−1dx = (r − 1)!

{e−λ−1(−λ)r −

r−1∑k=1

(−λ)k−r

k!

};


2) limλ→+∞

λψ(λ) = (r − 1);

3) limλ−→+∞

λλ

= 0, in probability;

4) limλ−→+∞

√λ

N+r= 0, in probability.

The following proposition studies the asymptotic behavior of the implicit estimator θi as

λ increases to +∞. We show that this estimator is consistent and asymptotically normal.

Proposition 2.2 The following assertions hold.

1) The implicit estimator θi of θi is asymptotically unbiased estimator, in the sense that

limλ−→+∞

E(θi)− θi = 0.

2) θi converges to θi in quadratic mean, as λ −→ +∞.

3) For all sequence (λn)n such that λn −→ +∞ as n −→ +∞, there exists a subse-

quence (λnk)k such that (

√λnk

(θi − θi))k∈N converges in distribution to the centered normal

distribution N (0, θi(1 − θi)) with variance θi(1− θi), as k −→ +∞.

Proposition 2.3 (√λ(θi− θi))λ∈N converges in distribution to N (0, θi(1− θi)), as λ −→

+∞.

The proofs of Lemma 2.1, Proposition 2.2 and Proposition 2.3 are given in Appendix.

2.3 A New Mixed Implicit Estimator

In this subsection, we propose a new mixed implicit estimator (MIE) in compound Poisson

model which is a natural extension of the implicit estimator in multinomial model. Asymptotic

properties of this estimator is established. We present the results of a simulation study con-

ducted to evaluate the performance of our approach in comparison with Bayesian and likelihood

methods.

Now, let us consider our new estimator θi of θi defined by

θi = E

(N ′

i + 1

N ′ + r/θ1, · · · , θr, λ

), (2.3)

where N ′1, N

′2, · · · , N ′

r are r independent random variables such that N ′i follows a Poisson dis-

tribution P(λ.θi) and N ′ =r∑

i=1

N ′i .

We have called θi the mixed implicit estimator (MIE) of θi. The following proposition

gives an explicit formula of θi.

Proposition 2.4

θi = θi + (1− rθi)ψ(λ), (2.4)

where ψ is the function defined in Lemma 2.1.

Proof

θi = E

(N ′

i + 1

N ′ + r/θ1, · · · , θr, λ

)= E

(E

(N ′

i + 1

N ′ + r/θ1, · · · , θr, λ, N

′)/θ1, · · · , θr, λ

)

= E

(N ′θi + 1

N ′ + r/θ1, · · · , θr, λ

)

= θi + (1− rθi)ψ(λ).

2


Observe that, θi is a convex combination of the implicit estimator θi and the uniform

estimator 1r:

θi = (1− rψ(λ))θi + (rψ(λ))1

r. (2.5)

The weight (1− rψ(λ)) can be interpreted as confidence degree relative to the size of the data

base. If the sample size is large enough, then rψ(λ) is close to zero. In this case, it gives more

confidence to the estimator θi.

The distribution of the mixed implicit estimator θi of θi depends on λ which is equivalent

to the size of the sample and it is natural to consider the asymptotic behavior of this estimator

as λ increases to +∞. We prove that θi is consistent and asymptotically normal.

Theorem 2.1 The following assertions hold

1) θi converges to θi in quadratic mean, as λ −→ +∞,2) (√λ(θi−θi))λ∈N converges in distribution to the central normal distributionN (0, θi(1−

θi)) with variance θi(1− θi), as λ −→ +∞.

Proof 1) Since θi converges to θi and λ = 1 +r∑

i=1

Ni converges to +∞, in probability,

as λ −→ +∞, then by using Proposition 2.3 and the assertion 2) of Lemma 2.1, we obtain the

desired result.

2) According to Proposition 2.3, one has

√λ(θi − θi) =

√λ(θi − θi) +

√λ

λ(1− rθi)λψ(λ).

The assertions 2) and 3) of Lemma 2.1, imply that

limλ7→+∞

[1√λ

√λ

λ(1 − rθi)λψ(λ)

]= 0.

Therefore, by using Proposition 2.2,√λ(θi − θi) converges in distribution to the normal distri-

bution N (0, θi(1− θi)). 2

Recall that, in multinomial case, if we assume that the prior distribution for θ is a Dirichlet

distribution with parameters (α1, · · · , αr) then the Bayesian estimator of θk is given by

θk =Nk + αk

r∑i=1

(Ni + αi),

where priork = αkr∑

i=1

αi

denotes the prior probability of occurrences of the state k.

If priork is near to the true value of θk, this corresponds to favorable prior. If not we have

unfavorable prior.

We illustrate our estimation procedure by using a simulated data (with r = 2). We

generated n = 1000 observations from the compound Poisson distribution with parameter λθ

such that θ ∈ (0, 1) and λ > 0. We compute the Bayesian estimator θBu of θ with slightly

unfavorable priors and hyperparameter α = α1 +α2 = 100. By using Monte-Carlo method, we

compute our mixed implicit estimator θI .

The results of our simulations are summarized in Table 1, in which we give the parame-

ter’s mean estimates, confidence interval (with 95% confidence level), and mean squared error

between the estimator and the true value of the parameter (MSE = 1n

n∑i=1

(θi − θ)2 ).


Table 1 True and means estimated parameters of obtained by the Bayesian and

our mixed implicit estimators, as well as the confidence intervals and the MSE.

prior θBu MSE (θBu) θ

I MSE(θI) θMv MSE(θMv)

θ = 0.05 0.15 0.0593 0.00012 0.0520 5.0865e − 05 0.0502 4.7101e − 05

[0.0589; 0.0597] [0.0516; 0.0524] [0.0498; 0.0506]

θ = 0.1 0.20 0.1095 0.00016 0.1020 9.4310e − 05 0.1004 9.0671e − 05

[0.1089; 0.1100] [0.1014; 0.1026] [0.0998; 0.1010]

θ = 0.2 0.30 0.2089 0.00020 0.2010 0.00015 0.1998 0.00015

[0.2082; 0.2096] [0.2002; 0.2017] [0.1990; 0.2005]

θ = 0.3 0.40 0.3099 0.00027 0.3016 0.00021 0.3008 0.00021

[0.3090; 0.3107] [0.3007; 0.3025] [0.2999; 0.30185]

θ = 0.4 0.50 0.4094 0.00029 0.4008 0.00025 0.4004 0.00025

[0.4085; 0.4103] [0.3998; 0.4018] [0.3994; 0.4014]

θ = 0.5 0.60 0.5092 0.00028 0.5001 0.00023 0.5001 0.00023

[0.5083; 0.5101] [0.4992; 0.5011] [0.4992; 0.5011]

θ = 0.6 0.70 0.6088 0.00026 0.5993 0.000229 0.5997 0.00023

[0.6080; 0.6097] [0.5984; 0.6003] [0.5988; 0.6007]

θ = 0.7 0.80 0.7091 0.00026 0.6992 0.00022 0.7000 0.00022

[0.7082; 0.7099] [0.6982; 0.7001] [0.6990; 0.7009]

θ = 0.8 0.90 0.8083 0.00020 0.7979 0.00016 0.7991 0.00015

[0.8076; 0.8090] [0.7971; 0.7987] [0.7984; 0.7999]

θ = 0.9 0.80 0.9087 0.00015 0.8980 9.6908e − 05 0.8996 9.3953e − 05

[0.9082; 0.9093] [0.8974; 0.8986] [0.8990; 0.9002]

We compare the Bayesian, mixed implicit and maximum likelihood estimators in term of

mean squared errors for different true values of θ and λ > 0. Figure 1 shows that the MSE for the

mixed implicit approach is clearly lower than those calculated with the Bayesian method, when

the sample size is greater than 30. But, the MSE for maximum likelihood and mixed implicit

methods are similar (see Figure 1). For a small sample size (n=50) and for each θ ∈ (0, 1), we

simulate 1000 samples. In Figure 2, we report the mean MSE as a function of the parameter θ.

Observe that for θ ∈ (0.2, 0.8) (Figure 2), the MSE calculated for the mixed implicit estimator

is lower than that computed by the maximum likelihood estimation. However, when comparing

the mean MSE of the mixed implicit estimator with that calculated by the Bayesian method,

we notice that latter gives better results when θ ∈ (0, 0.6). This is due to the fact that we

use the formula prior = θ ∗ (1 − 0.15) which gives favorable priors, when θ ≤ 0.6. However, if

θ ∈ (0, 0.6) the same formula gives a slightly unfavorable priors, which explains the observed

efficiency of our method in Figure 2, when θ ∈ (0.6, 1). Consequently, for small data sets and

with some constraints on θ, the mixed implicit approach is more efficient than Bayesian and

maximum likelihood approaches.

We also performed simulations in order to validate the mixed implicit approach by compar-

ison with the standard implicit one. We simulated a data set from a compound poisson distri-

bution with parameter λ=100 and θ = (θ1, · · · , θr). True parameters values of θ = (θ1, · · · , θr),

mixed implicit θI , standard implicit θ estimates and mean squared errors are presented in Table

2. According to this table, we notice that our mixed implicit estimators resemble to standard


implicit estimators. When the data set size is small and r is large (λ = 100 and r ≥ 5), the

mean squared error calculated for the mixed implicit method is less than calculated for the

standard implicit one. In this case, the mixed implicit method is better than the implicit one.

Fig.1 This figure shows the variation of the mean MSE as a function of the sample size,

for several true values of θ ∈ {0.05, 0.2, 0.5, 0.7}.

Fig.2 This figure shows the variation of the mean MSE as a function of the true parameters

for 1000 samples with a common size n = 50.


Table 2 True and estimated parameters of θ obtained by the standard implicit estimates

and mixed implicit estimates, for different value of r, as well as there MSE.

True value

θ = (θ1, · · · , θr) θ = (θ1, · · · , θr) MSE (θ) θI = (θI

1, · · · , θI

r) MSE(θI )

r = 2 (0.5,0.5) 0.507772 6.040431e − 05 0.5076565 5.862214e − 05

0.492228 0.4923435

r = 2 (0.1,0.9) 0.1027190 7.393142e − 06 0.1039768 1.581471e − 05

0.8972810 0.8960232

r = 2 (0.4,0.6) 0.3704309 0.0008743307 0.3690476 0.0009580499

0.6295691 0.6309524

r = 3 (0.33,0.33,0.34) 0.2857892 0.001135640 0.2857892 0.0009803208

0.3499760 0.3499760

0.3642348 0.3642348

r = 3 (0.4,0.4,0.2) 0.3535354 0.001918165 0.3537241 0.002222902

0.4646465 0.4587912

0.1818182 0.1874847

r = 3 (0.2,0.1,0.7) 0.1216820 0.003439666 0.1140351 0.003843234

0.1628068 0.1578947

0.7155112 0.7280702

r = 5 0.19607843 0.25490196 0.001140587 0.25315293 0.001082285

0.23529412 0.24509804 0.24290210

0.20915033 0.16666667 0.16734831

0.30718954 0.29411765 0.28915842

0.05228758 0.03921569 0.04743824

r = 5 0.06172840 0.03418803 0.0004381077 0.04126630 0.0003245078

0.03703704 0.05128205 0.05663675

0.37860082 0.37606838 0.36893439

0.15226337 0.13675214 0.13921350

0.37037037 0.40170940 0.39394905

r = 7 0.18750000 0.19744059 0.0001298506 0.19714628 0.0001275068

0.03348214 0.03656307 0.03756213

0.03125000 0.03016453 0.03089937

0.21875000 0.21389397 0.21375930

0.17633929 0.17367459 0.17301490

0.20758929 0.22486289 0.22421578

0.14508929 0.12340037 0.12340223

r = 10 0.06095552 0.07142857 0.0008395715 0.07428159 0.0006786088

0.01812191 0.02678571 0.03278920

0.17627677 0.20535714 0.19450458

0.08237232 0.08928571 0.09160087

0.14003295 0.17857143 0.17303133

0.12685338 0.13392857 0.12881146

0.04118616 0.04464286 0.04827587

0.13509061 0.13392857 0.13099518

0.09884679 0.06250000 0.06721921

0.12026359 0.05357143 0.05849072


3 Implicit Inference with Bayesian Networks

A Bayesian network (BN) is a set of variables X = {X1, · · · , Xn} with a network structure

G that encodes a set of conditional independence assertions about variables in X , and a set of

local probability distributions associated with each variable. Together, these components define

the joint probability distribution for X . The network structure G is a directed acyclic graph G

and it is suitable for looking for relationships among all variables. The nodes in G correspond

to the variables in X1, · · · , Xn. Each Xi denotes both the variable and its corresponding node,

and Pa(Xi) the parents of node Xi in G as well as the variables corresponding to those parents.

The lack of possible arcs in G encode conditional independencies [2, 17]. In particular, given

structure G, the joint probability distribution for X is given by the product of all specified

conditional probabilities

P(X1, · · · , Xn) =

n∏

i=1

P(Xi/Pa(Xi)).

The local probability distributions P are the distributions corresponding to the terms in

the product of conditional distributions. When building BN without prior knowledge, the

probabilities will depend only on the structure of the parameters set.

Let d = {x1, x2, · · · , xt} a dataset with t observations and let a Bayesian network with

parameters (G, (Θ = {θijk, i = 1, · · · , n, j = 1, · · · , qi, k = 1, · · · , ri}, λ = (λij))), where ri is

the number of states related to variable Xi, qi is the number of parent configurations, xi =

(xi1, x

i2, · · · , xi

n), λij > 0 and θijk = P(Xi = k/Pa(Xi) = j).

We denote by Nijk(d) the number of cases in d such that Xi is in its k-th state and its

parent in its j-th state and Nij(d) =∑

1≤k≤ri

Nijk(d).

If the node i has no parents, we denote θi1k = P(Xi = k) and Ni1k(d) the number of cases

in d such that Xi is in its k-th state.

In this section, we suppose that Nij(d) follows a Poisson distribution with parameter λij .

So, we introduce a compound Poisson Bayesian network, that is a Bayesian network associated

to a compound Poisson model (i.e., the random vector (Nij1(d), Nij2(d), · · · , Nijri(d)) follows

a compound Poisson distribution with parameters λijθij1, λijθij2, · · · , λijθijri).

The following proposition gives the implicit distribution of (Θ, λ) given d.

Proposition 3.1 If the dataset d comes from a compound Poisson distribution, then

1) the posterior distribution of (Θ, λ) given the structure G and the data set d is defined

by

P(Θ, λ/d, G) =

n∏

i=1

qi∏

j=1

e−λijλ

Nij(d)ij

Nij(d)!

n∏

i=1

qi∏

j=1

(Nij(d) + ri − 1)!

ri∏

k=1

(θijk)Nijk(d)

Nijk(d)!;

2) the implicit estimators of θijk and λij are

θijk =Nijk(d) + 1

Nij(d) + riwith λij = Nij(d) + 1.

Proof Since the map

d→ (Nijk(d) : i ∈ {1, 2, · · · , n} , j ∈ {1, 2, · · · , qi} , k ∈ {1, 2, · · · , ri})


is injective, then

P(d/G,Θ, λ) = P(Nijk = Nijk(d) ∀ i ∈ {1, 2, · · · , n} , j ∈ {1, 2, · · · , qi} , k ∈ {1, 2, · · · , ri})

=

n∏

i=1

qi∏

j=1

ri∏

k=1

P (Nijk = Nijk(d) /G,Θ, λ)

=n∏

i=1

qi∏

j=1

ri∏

k=1

e−λijθijk(λijθijk)Nijk(d)

Nijk(d)!

=

n∏

i=1

qi∏

j=1

e−λijλ

Nij(d)ij

Nij(d)!

n∏

i=1

qi∏

j=1

Nij(d)!

ri∏

k=1

(θijk)Nijk(d)

Nijk(d)!.

Thus,

c(d) =

∫P(d/G,Θ, λ) dΘdλ =

n∏

i=1

qi∏

j=1

Nij(d)!

(Nij(d) + ri − 1)!. (3.1)

Hence, the implicit distribution of (Θ, λ/d, G) is given by

P(Θ, λ/d, G) =n∏

i=1

qi∏

j=1

e−λijλ

Nij(d)ij

Nij(d)!

n∏

i=1

qi∏

j=1

(Nij(d) + ri − 1)!

ri∏

k=1

(θijk)Nijk(d)

Nijk(d)!.

This indicates that the distribution of (λ/d, G) and (Θ/d, G) are given by

P(λ/d, G) =

n∏

i=1

qi∏

j=1

e−λijλ

Nij(d)ij

Nij(d)!

and

P(Θ/d, G) =n∏

i=1

qi∏

j=1

(Nij(d) + ri − 1)!

ri∏

k=1

(θijk)Nijk(d)

Nijk(d)!.

Now, it is possible to find the implicit estimator of the parameter θijk, which is given by

θijk =Nijk(d) + 1

Nij(d) + ri,

and λij = Nij(d) + 1, which finish the proof. 2

In the framework of the Bayesian network, the mixed implicit estimator of θijk is given by

θijk = θijk + (1− riθijk)ψ(λij).

If d is a dataset, an estimation of the probability of X = (X1, X2, · · · , Xn), given d, on the state

x = (x1, x2, · · · , xn) is

P(X = x/d,G) =

n∏

i=1

qi∏

j=1

ri∏

k=1

θijk

αijk(x)

,

where αijk(x) = 1 if xi = k and Pa(Xi) = j, and αijk(x) = 0 if not.


In fact, if ki(x) denotes the state of the node i and by ji(x) the state of its parent in the

configuration x, then by applying Markov property, one has

P(X = x/d,G) = P(X1 = x1, X2 = x2, · · · , Xn = xn)

=

n∏

i=1

P(Xi = xi/Pa(Xi) = ji(x))

=

n∏

i=1

θiji(x)ki(x)

=n∏

i=1

qi∏

j=1

ri∏

k=1

θijk

αijk(x)

.

The mixed implicit approach has the advantages of Bayesian methods without priors. In

fact, the choice of prior information in Bayesian approach is itself a problem because of the

lack and the cost of information. Our implicit method avoids the problem of priors and always

leads estimators to be derived and implemented.

Note, also if the sample size is small and the number of parents increases, the number

of cases Nij in the data set d, when the parents of the node i are in their j-th state, can be

equal to zero. Another advantage of our mixed implicit approach consists in the fact that the

estimator can always be calculated which is not the case when using the maximum likelihood

approach, especially for small sample size. However, when the sample size is large enough, the

two methods lead to the same performance.

3.1 Simulation Study

In this subsection we describe a small simulation study which highlights some of the im-

portant features of our implicit approach that we have already described.

Our simulation study deals with the diagnosis in mobile communication networks. We

estimate the probability of the signal quality in GSM/GPRS networks. The task of determining

this probability needs some explanatory indicators supposed to be Bernoulli variables which are:

X1 = Coverage (1: Lack , 2: Presence),

X2 = Network configuration (1: GSM, 2: GPRS),

X3 = Mobility (1: Fixed, 2: Mobile),

X4 = Interference Level (1: Presence, 2: Lack),

X5 = Network state(1: Saturated, 2: Unsaturated),

X6 = Signal quality (1: Low, 2: High).

These are the most common indicators that may cause a high number of dropped calls

in GSM/GPRS network. Each cause can be split in several subcauses, which could also be

considered in the model. The cause of the “signal quality” is the network state. The network

state itself is caused by the interference level and mobility of the connector. However, the

covered level and the network configuration are the causes of “interference level”. Then, we

obtain the causal DAG structure G given by Figure 3.


Fig.3 Network structure of the simplified problem of the diagnosis in mobile

communication networks.

To infer the probabilities of the parameters θijk in the graph G, we have implemented

in R language the compound Poisson Bayesian network approach, previously introduced, we

choose true parameters θijk and we generate a dataset of 1000 observations. The results of this

simulation study are summarized in Table 3 that reports the estimation results for the posterior

Bayesian estimates θBuijk , the mixed implicit estimates θI

ijk, the maximum likelihood estimates,

θMvijk and the mean squared errors of each of them.

Table 3 True and estimated parameters obtained by the Bayesian and our mixed implicit

estimator for the network structure given by Figure 3.

True parameter θI

ijkPrior θ

Bu

ijkθ

Mv

ijk

θ112 = 0.8 0.7858068 0.9 0.7972727 0.7870000

θ212 = 0.4 0.3963663 0.3 0.3872727 0.3960000

θ312 = 0.8 0.7919362 0.7 0.7845455 0.7930000

θ412 = 0.1 0.0961275 0.2 0.1341991 0.0839694

θ422 = 0.3 0.3132736 0.2 0.2472527 0.3048780

θ432 = 0.6 0.5744628 0.5 0.5619546 0.5750529

θ442 = 0.8 0.8301925 0.9 0.8502415 0.8343949

θ512 = 0.2 0.1998536 0.1 0.1413613 0.1868131

θ522 = 0.3 0.2755226 0.2 0.2361111 0.2672414

θ532 = 0.5 0.4757777 0.6 0.5034169 0.4749263

θ542 = 0.8 0.7880795 0.7 0.7743682 0.7907489

θ612 = 0.4 0.3944095 0.3 0.3759398 0.3935185

θ622 = 0.8 0.8093594 0.9 0.8248503 0.8116197

MSE(θ) 0.001447109 0.004454807 0.001656995


According to Table 3, the MSE computed with the mixed implicit approach is lower than

those calculated with the Bayesian and maximum likelihood methods. This result supports the

choice of the compound Poisson model relatively to the Multinomial one, and this especially

when θijk ∈ (0.2, 0.8) (see Figure 2) and/or the number of variables, in the Bayesian structure,

increases.

Globally, the concordance between the three approaches is very good. However, when

we compare the mixed implicit method to Bayesian method, we see a better precision of the

implicit compared to Bayesian method with slightly unfavorable priors.

In Figure 4, we represent the diagram of joint probabilities related to each configuration.

In most cases, when comparing the implicit joint probabilities with Bayesian ones, we notice

that the former are closer to the true joint probabilities than the latter ones.

Fig.4 This gives true joint probabilities as well as joint probabilities estimates using

Bayesian and mixed implicit methods for 64 configurations. Each configuration number

(on the abscissa axis) refers to the binary representation of this number minus one.

3.2 Conclusion

In this paper, we have introduced the compound Poisson model in the Bayesian network

framework and we have summarized this concept as an alternative approach to the multinomial

Bayesian networks. We apply implicit inference to this model for learning the parameters

of the network. The implicit framework is efficiently implemented in R language to infer

probabilities and to test the performance of our approach. A comparative study between

the learned estimates given by the three approaches, mixed implicit, standard Bayesian and

maximum likelihood based on true values, is established. This study shows a better performance

and precision of the compound Poisson model for the mixed implicit approach, mainly, when

the sample size is small.

Another interesting issue is to address the problem of learning the structure within the

framework of implicit inference in compound Poisson model.


3.3 Appendix

Proof of Proposition 2.1

The norming constant c(D) is given by

c(D) =

∫ r∏

i=1

[e−λθi

(λθi)Ni

Ni!

]dθdλ =

( r∑

i=1

Ni

)!

∫

Λ

r∏

i=1

θNi

i

Ni!dθ

∫

R+

e−λλ

r∑i=1

Ni

(r∑

i=1

Ni)!dλ

=

(r∑

i=1

Ni)!

(r∑

i=1

Ni + r − 1)!.

Hence, the implicit distribution of (θ, λ) is defined by

P(θ, λ/D) =

( r∑

i=1

Ni + r − 1

)!

r∏

i=1

θNi

i

Ni!

e−λλ

r∑i=1

Ni

(r∑

i=1

Ni)!.

Note that the conditional distribution P(θ, λ/D) of θ, λ given D is the product of the

Dirichlet distribution P(θ/D) = Dir(N1 +1, · · · , Nr +1) and the Gamma distribution P(λ/D) =

γ(1,r∑

i=1

Ni + 1), which proves the proposition.

Where

Dir(α1, α2, · · · , αr)(x1, x2, · · · , xr) = Γ

( r∑

i=1

αi

) r∏

i=1

xαi−1i

Γ(αi)δ1

( r∑

i=1

xi

)

denotes the Dirichlet density with parameters α1, α2, · · · , αr and

γ(a, b)(x) =ba

Γ(a)e−bxxa−11]0,+∞[(x)

denotes the Gamma density with parameters a and b. 2

Proof of Lemma 2.1

1) It’s easy to see that,

−e−λ(−λ)−r

∫ 0

−λ

e−xxr−1dx =

∫ 0

−λ

−e−λ(−λ)−r

+∞∑

k=0

(−1)kxk+r−1

k!dx

= −e−λ(−λ)−r

+∞∑

k=0

∫ 0

−λ

(−1)kxk+r−1

k!dx

= −e−λ(−λ)−r

+∞∑

k=0

(−1)k+1(−λ)k+r

k!(k + r)

=+∞∑

k=0

e−λλk

k!(k + r)

= E

(1

N + r/λ

).


By way of induction on r, one has

ψ(λ) = −e−λ(−λ)−r

∫ 0

−λ

e−xxr−1dx = (r − 1)!

{e−λ − 1

(−λ)r−

r−1∑

k=1

(−λ)k−r

k!

}.

2) Using the formula

ψ(λ) = (r − 1)!

{e−λ − 1

(−λ)r−

r−1∑

k=1

(−λ)k−r

k!

},

we obtain the desired result.

3) Using Chebyshev inequality, we obtain

for all ǫ > 0, P

(∣∣∣∣λ− 1

λ− 1

∣∣∣∣ > ǫ

)≤ Var(λ− 1)

ǫ2λ2=

1

ǫ2λ.

This shows that λλ

converges in probability to 1, as λ −→ +∞.

4) By using Markov inequality, for all ε > 0 and λ ∈ N, one has,

P

( √λ

N + r≥ ε

)≤ 1

εE

( √λ

N + r

)=

√λψ(λ)

ε.

Since limλ−→+∞

λψ(λ) = (r − 1)!,√

λN+r

converges almost surely to 0, as λ −→ +∞. 2

Proof of Proposition 2.2

1) E(θi) = E(E(Ni+1N+r

/N)) = E(Nθi+1N+r

) = θi + (1− rθi)E( 1N+r

) = θi + (1− rθi)ψ(λ).

Since limλ7→+∞

ψ(λ) = 0, then limλ7→+∞

E(θi)− θi = 0.

2) Note that

E((θi − θi)2) = E(E((θi − θi)

2/N)) (3.2)

= E

(Nθi(1− θi)

(N + r)2+

(1− rθi

(N + r)

)2)(3.3)

≤ [θi(1 − θi) + (1− rθi)2]ψ(λ). (3.4)

Since limλ7→+∞

ψ(λ) = 0, then limλ7→+∞

E((θi − θi)2) = 0.

3) By using the assertion 4) of Lemma 2.1, the sequence√

λn

N+rconverges, in probability,

to 0 as n −→ +∞. Then, there exists a subsequence (λnk)k such that

√λnk

N+rconverges almost

surely to 0 andλnk

Nconverges almost surely to 1, as k −→ +∞. Let Zλ =

√λ(θi − θi) and

LZλ(t) = E(etZλ) = E(et

√λ(θi−θi)) its Laplace transform. In order to prove that Zλnk

converges

in distribution to N (0, θi(1− θi)) as k −→ +∞, one can show that

limk 7→+∞

LZλnk(t) = e

t2θi(1−θi)

2 , ∀t ∈ R.

Observe that,

LZλnk(t) = E(et

√λnk

(θi−θi)) = E(E(et√

λnk(

Ni+1

N+r−θi)/N))

= e−t√

λnkθiE(e

t√

λnkN+r E(e

t√

λnkNi

N+r /N))

= e−t√

λnkθiE(e

t√

λnkN+r ((1 − θi) + θie

t√

λnkN+r )N ).


Using the fact that the sequencet√

λnk

N+rconverges almost surely to 0 and the Taylor expansions

of the functions log(1 + x) and ex, up to the second order around 0, we obtain

N log((1− θi) + θiet√

λnkN+r )

= N log

((1− θi) + θi

(1 +

t√λnk

N + r+

t2λnk

2(N + r)2

))+ o

((t√λnk

N + r

)2)

= N log

(1 + θi

(t√λnk

N + r+

t2λnk

2(N + r)2

))+ o

((t√λnk

N + r

)2)

= Nθi

(t√λnk

N + r+t2λnk

(1− θi)

2(N + r)2

)+ o

((t√λnk

N + r

)2).

We deduce that

LZλnk(t) = E

[e(Nθi(

t√

λnkN+r

++t2λnk

(1−θi)

2(N+r)2)+

t√

λnkN+r

−t√

λnkθi)+o((

t√

λnkN+r

)2)]

= E

[e((1−rθi)

√λnk

t

N+r+

t2θi(1−θi)

2

Nλnk(N+r)2

)++o((t√

λnkN+r

)2))].

SinceNλnk

(N + r)27→ 1 and

(1 − rθi)√λnk

t

N + r7→ 0

almost surely, as k → +∞,

(1− rθi)√λnk

t

N + r+t2θi(1− θi)

2

Nλnk

(N + r)27→ t2θi(1− θi)

2, almost surely.

Hence, LZλnk(t)→ e

t2θi(1−θi)

2 as k → +∞.

Consequently,

√λnk

(θi − θi) 7→ N (0, θi(1− θi)), in distribution.

2

Proof of Proposition 2.3 Recall that the set V of random variables equipped by the

distance d defined by

d(X,Y ) = E

( |X − Y |1 + |X − Y |

)

is a complete metric space.

Now, we will prove that (Zλ)λ∈N is bounded in V.

Observe that, d(Zλ, 0) = E( |Zλ|

1+|Zλ|)≤ E (|Zλ|) ≤

(E

(Z2

λ

)) 12 .

By using the Proposition 2.1 and the inequality (3.4), we obtain

E(Z2

λ

) 12 =

√λE((θi − θi)

2)

≤√

[θi(1 − θi) + (1− rθi)2]λψ(λ).

≤√

(2 + r2)λψ(λ).

Then, d(Zλ, 0) ≤√

(2 + r2)λψ(λ).


The result come from the fact that the function λψ(λ) is continuous on [0,+∞[ and that

limλ→+∞

λψ(λ) = (r − 1)!

Let (λn)n∈N be a sequence tending to +∞ as n → +∞ such that (Zλn)n converges in

probability to a random variable W . According to the assertion 3) of Proposition 2.2, we

can extract a subsequence (λnk)k such that Zλnk

converges in distribution, as k −→ +∞,

to N (0, θi(1 − θi)). Recall that the convergence in probability implies the convergence in

distribution, then by using the unicity of limit, we deduce thatW follows the normal distribution

N (0, θi(1−θi)). This implies that Zλnconverges in probability to W ∼ N (0, θi(1−θi)). Hence,

(Zλ)λ∈N∗ is a bounded sequence in V which admits a unique adherent pointW ∼ N (0, θi(1−θi)).

Then, it converges in probability and consequently in distribution. 2

References

[1] Pearl J. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. San Fransisco,

CA: Morgan Kaufmann, 1988

[2] Lauritzen S. Graphical models. Clarendon Press Oxford, 1996

[3] Jensen F, Lauritzen S, Olesen K. Bayesian updating in recursive graphical models by local computations.

Computational Statistical Quaterly, 1990, 4: 269–282

[4] Heckerman D. A tutorial on learning with Bayesian networks. Learning in Graphical Models. MIT Press,

1999: 301–354

[5] Heckerman D, Chickering D, Geiger D. Learning Bayesian Networks: Search methods and experimental

results//Proceedings of Fifth Conference on Artificial Intelligence and Statistics, 1995: 112–128

[6] Heckerman D, Breese J. Causal independence for probability assessment and inference using Bayesian

networks. IEEE Trans Syst Man Cybem, 1996, 26: 826–831

[7] Neapolitan R E. Learning Bayesian Networks. Prentice Hall, 2003

[8] Chickering D, Heckerman D, Meek C. Large-sample learning of Bayesian networks is NP-Hard. J Mach

Learn Res, 2004, 5: 1287–1330

[9] Robert C P. The Bayesian Choice: A Decision-Theoretic Motivation, New York: Springer-Verlag, 1994

[10] Bernardo J M. Reference posterior distributions for Bayesian inference (with discussion). J Royal Statistical

Soc, 1979, 14B: 113–147

[11] Jeffreys H. An Invariant Form for the Prior Probability in Estimation Problems. Proceedings of the Royal

Society of London. Series A, Mathematical and Physical Sciences, 1946, 186(1007): 453–461

[12] Revfeim K J A. An initial model of the relationship between rainfall events and daily rainfalls. J Hydrology,

1984, 75: 357–364

[13] Hassairi A, Masmoudi A, Kokonendji C. Implicit distributions and estimation. Comm Stat Theory Meth-

ods, 2005, 34: 245–252

[14] Ben Hassen H, Masmoudi A, Rebai A. Causal inference in biomolecular pathways using a Bayesian network

approach and an implicit method. J Theoret Biol, 2008, 253(4): 717–724

[15] Ben Hassen H, Masmoudi A, Rebai A. Inference in signal transduction pathways using EM algorthm and

an implicit algorithm: Incomplete data case. J Comput Biol, 2009, 16: 1227–1240

[16] Bouchaala L, Masmoudi A, Gargouri F, Rebai A. Improving algorithms for structure learning in Bayesian

networks using a new implicit score. Expert Syst Appl, 2010, 37(7): 5470–5475

[17] Leray P. Automatique et Traitement du Signal Reseaux Bayesiens: apprentissage et modelisation de

systemes complexes [Habilitation a diriger les recherches]. Universite de Rouen, UFR des Sciences, 2006

A Compound Poisson Model for Learning Discrete Bayesian Networks

Documents

Transcript of A Compound Poisson Model for Learning Discrete Bayesian Networks