A Compound Poisson Model for Learning Discrete Bayesian Networks
Transcript of A Compound Poisson Model for Learning Discrete Bayesian Networks
Acta Mathematica Scientia 2013,33B(6):1767–1784
http://actams.wipm.ac.cn
A COMPOUND POISSON MODEL FOR LEARNING
DISCRETE BAYESIAN NETWORKS∗
Abdelaziz GHRIBI
Laboratory of Physic-Mathematics, University of Sfax, B.P. 1171, Sfax, Tunisia
E-mail: Ghribi [email protected]
Afif MASMOUDI
Laboratory of Probability and Statistics, University of Sfax, B.P. 1171, Sfax, Tunisia
E-mail: [email protected]
Abstract We introduce here the concept of Bayesian networks, in compound Poisson
model, which provides a graphical modeling framework that encodes the joint probability
distribution for a set of random variables within a directed acyclic graph. We suggest
an approach proposal which offers a new mixed implicit estimator. We show that the
implicit approach applied in compound Poisson model is very attractive for its ability to
understand data and does not require any prior information. A comparative study between
learned estimates given by implicit and by standard Bayesian approaches is established.
Under some conditions and based on minimal squared error calculations, we show that the
mixed implicit estimator is better than the standard Bayesian and the maximum likelihood
estimators. We illustrate our approach by considering a simulation study in the context of
mobile communication networks.
Key words Bayesian network; compound Poisson distribution; multinomial distribution;
implicit approach; mobile communication networks
2010 MR Subject Classification 46N30; 62F12; 62F15
1 Introduction
Bayesian networks [1–4] are graphical representation of assumed dependencies among a set
of variables. Statistically, these networks encode the joint probability distribution for a set of
random variables in a directed acyclic graph (DAG), which is a set of variables and a set of
directed links. Each node in the graph represents a random variable, while the edges between
the nodes represent probabilistic dependencies among the corresponding random variables. In
a DAG, if there is an arc from variable A to variable B, A is said to be a parent of the child B.
These models are very attractive for their ability to learn causal relationships and hence they
offer a powerful framework to understand and analyze data [5–8].
∗Received May 12, 2012; revised Nevember 3, 2012.
1768 ACTA MATHEMATICA SCIENTIA Vol.33 Ser.B
For example, in the context of mobile phone, a Bayesian network could represent the
relationships between covered level (X), network configuration (Y ) and interference (Z). Since
the covered level and the network configuration are the causes of “interference level”, one can
use the network structure given by (X) −→ (Z)←− (Y ). To compute the joint probability of
(X,Y, Z), we can use the Markov property
P(X,Y, Z) = P(X)P(Y )P(Z/X, Y ).
This property is used to reduce the number of parameters that are required to characterize the
joint probability distribution. According to this example, one might consider the diagnostic
support for the belief on the state of the covered level given the observation of the presence of
interference.
Usually, in statistical inference, there are two different approaches: classical inference
(frequentistic) approach and Bayesian approach. In the context of Bayesian inference, both
parameters and data are treated as random variables. While the frequentistic approach regard
parameters non-rondom.
By using the Bayesian approach the posterior distribution is given by multiplying the
likelihood function by a known prior distribution and then dividing by a norming constant.
Consequently, this prior information is used together with the data to derive the posterior
distribution [9–11].
In this work, we restrict to discrete Bayesian network with a compound Poisson conditional
model. Recall that a compound Poisson distributed random variable is defined by SN =N∑
i=1
Xi,
where (Xi)i∈N is a sequence of independent identically distributed random variables and N
follows a Poisson distribution. Compound Poisson distribution appears in many fields, for
example it is used to modeling the distribution of the total rainfall in a day; where each day
contains a Poisson distributed numbers of events [12]. The compound Poisson distribution is
also used in actuarial science and insurance for modeling the distribution of total claim amount.
We address the problem of learning parameters of a Bayesian network with a compound
Poisson conditional model. We propose a theoretical treatment of using a compound Poisson
model for understanding Bayesian networks. We use the implicit inference approach to estimate
the parameters in the compound Poisson model for the network. We study the performance of
the proposed implicit estimator by considering a simulation study.
The network learning is formulated as an implicit estimation problem [13]. We previously
described the implicit approach that does not require any priors in the framework of Bayesian
network [14]. In [15] Ben Hassen et al proposed a new algorithm to learn the parameters in
implicit networks when the set of parameters is incomplete. In the same context, Bouchaala et
al [16] proposed a new implicit score function for the structure learning of Bayesian networks.
In this paper, we briefly recall the principles of implicit inference and give an application
of the method in the case of compound Poisson Bayesian network. After that, we describe our
approach based on a compound Poisson model. We show how we can learn the probabilities of
a Bayesian network when the set of data is complete. We apply our approach to an important
existing problem in mobile communication networks, which is the prediction of the probability
of the signal level quality.
No.6 A. Ghribi & A. Masmoudi: A COMPOUND POISSON MODEL FOR BAYESIAN NETWORKS 1769
The outline of this paper is as follows: in Section 2, we present the basic concepts of the
implicit approach and we propose a mixed implicit estimator in compound Poisson distribution
and we study its asymptotic behavior. In Section 3, we develop an unified Bayesian network for
compound Poisson conditional model and finally in Section 4, we give an illustrative example
and present the conclusion.
2 Inference with the Implicit Method
2.1 General View of the Implicit Method
The implicit estimation method was proposed by Hassairi et al [13] as an alternative to
the Bayesian approach. Assuming some prior information, the Bayesian theory [9, 10] considers
an unknown parameter θ as a random variable and determines its posterior distribution given
data. By the same way, the implicit distribution is considered as a posterior distribution of a
parameter θ given data. In fact, consider a family of probability distributions {P(x/θ), θ ∈ Θ}parameterized by an unknown parameter θ in a set Θ; where x represents the observed data.
The implicit distribution P(θ/x) is calculated by multiplying the likelihood function P(x/θ) by
a counting measure σ if Θ is a countable set and by a Lebesgue measure σ if Θ is an open set
(σ depends only on the topological structure of Θ) and then dividing by the norming constant
c(x) =∫Θ P(x/θ)σ(dθ). Therefore, the implicit distribution is given by the following formula
P(θ/x)(dθ) = (c(x))−1P(x/θ)σ(dθ) and plays the role of a posterior distribution of θ given x in
the Bayesian method. This corresponds to a particular improper prior which depends only on
the topology of Θ (without any statistical assumption).
In particular, if the set of parameters Θ is bounded then σ is proportional to the uniform
distribution on Θ. If Θ is unbounded then σ is improper prior.
Provided its existence (which holds for most statistical models), the implicit distribution
can be used for the estimation of the parameter θ following a Bayesian methodology. The
implicit estimator θ of θ is a Bayes estimator with respect to the squared error loss function
given by the expected posterior mean, that is
θ = E(θ/x) =
∫
Θ
θP(θ/x)(dθ).
An important problem in Bayesian estimation is how to define the prior distribution? If the prior
information about the parameter θ is available it should be incorporated in the prior density.
If we have not prior information, we apply non informative Bayesian estimation or implicit
estimation. In multinomial case, Ben Hassen et al [14] applied the implicit approach. For the
estimation of parameters, they considered D = (N1, · · · , Nr) as a random variable following a
Multinomial distribution with unknown parameters N =r∑
i=1
Ni and θ = (θ1, · · · , θr). They first
estimated N by implicit method and then they used the estimator N to estimate θ.
2.2 Implicit Method in Compound Poisson Model
Let D = (N1, N2, · · · , Nr) be a random vector such that N =r∑
i=1
Ni follows a Poisson
distribution Pλ with parameter λ > 0. Suppose that the distribution of D given (N = n)
follows a multinomial distribution with parameters n and θ = (θ1, θ2, · · · , θr), where 0 < θi < 1
1770 ACTA MATHEMATICA SCIENTIA Vol.33 Ser.B
andr∑
i=1
θi = 1.
If Pλ,θ denotes the probability distribution of D, then by applying Bayes formula, one has
Pλ,θ(n1, n2, · · · , nr) =
∞∑
n=0
Pλ,θ(n1, n2, · · · , nr/N = n)P(N = n)
=
∞∑
n=0
[n!
r∏
i=1
θni
i
ni!
]e−λλn
n!δn(
r∑
i=1
ni)
= (
r∑
i=1
ni)!
[r∏
i=1
θni
i
ni!
]e−λλ
r∑i=1
ni
(r∑
i=1
ni)!=
r∏
i=1
e−λθi(λθi)ni
ni!,
where δn denotes the Dirac measure at n.
Therefore, the distribution Pλ,θ of D is the product of independent Poisson distributions
with parameters λθ1, λθ2, · · · , λθr
Pλ,θ(n1, n2, · · · , nr) =
r∏
i=1
[e−λθi
(λθi)ni
ni!
]
with the convention that D = (0, 0, · · · , 0) if N = 0, the distribution of D = (N1, N2, · · · , Nr) is
called compound Poisson distribution.
If we denote
Λ =
{θ = (θ1, θ2, · · · , θr) ∈ R
d+ :
r∑
i=1
θi = 1
}
the family
F ={Pλ,θ : (λ, θ) ∈ R
∗+ × Λ
}
is called compound Poisson model. The following proposition gives the implicit estimators of
θk and λ.
Proposition 2.1 Let D = (N1, N2, · · · , Nr) be a random vector following the compound
Poisson distribution with parameters λθ1, λθ2, · · · , λθr, then
i) the implicit estimator θk of θk is
θk = E(θk/D) =Nk + 1
r +r∑
i=1
Ni
, k = 1, 2, · · · , r; (2.1)
ii) the implicit estimator λ of λ is
λ = E(λ/D) = 1 +
r∑
i=1
Ni. (2.2)
In what follows, we need the following technical lemma.
Lemma 2.1 Suppose thatN follows a Poisson distribution Pλ with parameter λ ∈ N\{0}and let us denote ψ(λ) = E( 1
N+r/λ), then
1) ψ(λ) = −e−λ(−λ)−r∫ 0
−λe−xxr−1dx = (r − 1)!
{e−λ−1(−λ)r −
r−1∑k=1
(−λ)k−r
k!
};
No.6 A. Ghribi & A. Masmoudi: A COMPOUND POISSON MODEL FOR BAYESIAN NETWORKS 1771
2) limλ→+∞
λψ(λ) = (r − 1);
3) limλ−→+∞
λλ
= 0, in probability;
4) limλ−→+∞
√λ
N+r= 0, in probability.
The following proposition studies the asymptotic behavior of the implicit estimator θi as
λ increases to +∞. We show that this estimator is consistent and asymptotically normal.
Proposition 2.2 The following assertions hold.
1) The implicit estimator θi of θi is asymptotically unbiased estimator, in the sense that
limλ−→+∞
E(θi)− θi = 0.
2) θi converges to θi in quadratic mean, as λ −→ +∞.
3) For all sequence (λn)n such that λn −→ +∞ as n −→ +∞, there exists a subse-
quence (λnk)k such that (
√λnk
(θi − θi))k∈N converges in distribution to the centered normal
distribution N (0, θi(1 − θi)) with variance θi(1− θi), as k −→ +∞.
Proposition 2.3 (√λ(θi− θi))λ∈N converges in distribution to N (0, θi(1− θi)), as λ −→
+∞.
The proofs of Lemma 2.1, Proposition 2.2 and Proposition 2.3 are given in Appendix.
2.3 A New Mixed Implicit Estimator
In this subsection, we propose a new mixed implicit estimator (MIE) in compound Poisson
model which is a natural extension of the implicit estimator in multinomial model. Asymptotic
properties of this estimator is established. We present the results of a simulation study con-
ducted to evaluate the performance of our approach in comparison with Bayesian and likelihood
methods.
Now, let us consider our new estimator θi of θi defined by
θi = E
(N ′
i + 1
N ′ + r/θ1, · · · , θr, λ
), (2.3)
where N ′1, N
′2, · · · , N ′
r are r independent random variables such that N ′i follows a Poisson dis-
tribution P(λ.θi) and N ′ =r∑
i=1
N ′i .
We have called θi the mixed implicit estimator (MIE) of θi. The following proposition
gives an explicit formula of θi.
Proposition 2.4
θi = θi + (1− rθi)ψ(λ), (2.4)
where ψ is the function defined in Lemma 2.1.
Proof
θi = E
(N ′
i + 1
N ′ + r/θ1, · · · , θr, λ
)= E
(E
(N ′
i + 1
N ′ + r/θ1, · · · , θr, λ, N
′)/θ1, · · · , θr, λ
)
= E
(N ′θi + 1
N ′ + r/θ1, · · · , θr, λ
)
= θi + (1− rθi)ψ(λ).
2
1772 ACTA MATHEMATICA SCIENTIA Vol.33 Ser.B
Observe that, θi is a convex combination of the implicit estimator θi and the uniform
estimator 1r:
θi = (1− rψ(λ))θi + (rψ(λ))1
r. (2.5)
The weight (1− rψ(λ)) can be interpreted as confidence degree relative to the size of the data
base. If the sample size is large enough, then rψ(λ) is close to zero. In this case, it gives more
confidence to the estimator θi.
The distribution of the mixed implicit estimator θi of θi depends on λ which is equivalent
to the size of the sample and it is natural to consider the asymptotic behavior of this estimator
as λ increases to +∞. We prove that θi is consistent and asymptotically normal.
Theorem 2.1 The following assertions hold
1) θi converges to θi in quadratic mean, as λ −→ +∞,2) (√λ(θi−θi))λ∈N converges in distribution to the central normal distributionN (0, θi(1−
θi)) with variance θi(1− θi), as λ −→ +∞.
Proof 1) Since θi converges to θi and λ = 1 +r∑
i=1
Ni converges to +∞, in probability,
as λ −→ +∞, then by using Proposition 2.3 and the assertion 2) of Lemma 2.1, we obtain the
desired result.
2) According to Proposition 2.3, one has
√λ(θi − θi) =
√λ(θi − θi) +
√λ
λ(1− rθi)λψ(λ).
The assertions 2) and 3) of Lemma 2.1, imply that
limλ7→+∞
[1√λ
√λ
λ(1 − rθi)λψ(λ)
]= 0.
Therefore, by using Proposition 2.2,√λ(θi − θi) converges in distribution to the normal distri-
bution N (0, θi(1− θi)). 2
Recall that, in multinomial case, if we assume that the prior distribution for θ is a Dirichlet
distribution with parameters (α1, · · · , αr) then the Bayesian estimator of θk is given by
θk =Nk + αk
r∑i=1
(Ni + αi),
where priork = αkr∑
i=1
αi
denotes the prior probability of occurrences of the state k.
If priork is near to the true value of θk, this corresponds to favorable prior. If not we have
unfavorable prior.
We illustrate our estimation procedure by using a simulated data (with r = 2). We
generated n = 1000 observations from the compound Poisson distribution with parameter λθ
such that θ ∈ (0, 1) and λ > 0. We compute the Bayesian estimator θBu of θ with slightly
unfavorable priors and hyperparameter α = α1 +α2 = 100. By using Monte-Carlo method, we
compute our mixed implicit estimator θI .
The results of our simulations are summarized in Table 1, in which we give the parame-
ter’s mean estimates, confidence interval (with 95% confidence level), and mean squared error
between the estimator and the true value of the parameter (MSE = 1n
n∑i=1
(θi − θ)2 ).
No.6 A. Ghribi & A. Masmoudi: A COMPOUND POISSON MODEL FOR BAYESIAN NETWORKS 1773
Table 1 True and means estimated parameters of obtained by the Bayesian and
our mixed implicit estimators, as well as the confidence intervals and the MSE.
prior θBu MSE (θBu) θ
I MSE(θI) θMv MSE(θMv)
θ = 0.05 0.15 0.0593 0.00012 0.0520 5.0865e − 05 0.0502 4.7101e − 05
[0.0589; 0.0597] [0.0516; 0.0524] [0.0498; 0.0506]
θ = 0.1 0.20 0.1095 0.00016 0.1020 9.4310e − 05 0.1004 9.0671e − 05
[0.1089; 0.1100] [0.1014; 0.1026] [0.0998; 0.1010]
θ = 0.2 0.30 0.2089 0.00020 0.2010 0.00015 0.1998 0.00015
[0.2082; 0.2096] [0.2002; 0.2017] [0.1990; 0.2005]
θ = 0.3 0.40 0.3099 0.00027 0.3016 0.00021 0.3008 0.00021
[0.3090; 0.3107] [0.3007; 0.3025] [0.2999; 0.30185]
θ = 0.4 0.50 0.4094 0.00029 0.4008 0.00025 0.4004 0.00025
[0.4085; 0.4103] [0.3998; 0.4018] [0.3994; 0.4014]
θ = 0.5 0.60 0.5092 0.00028 0.5001 0.00023 0.5001 0.00023
[0.5083; 0.5101] [0.4992; 0.5011] [0.4992; 0.5011]
θ = 0.6 0.70 0.6088 0.00026 0.5993 0.000229 0.5997 0.00023
[0.6080; 0.6097] [0.5984; 0.6003] [0.5988; 0.6007]
θ = 0.7 0.80 0.7091 0.00026 0.6992 0.00022 0.7000 0.00022
[0.7082; 0.7099] [0.6982; 0.7001] [0.6990; 0.7009]
θ = 0.8 0.90 0.8083 0.00020 0.7979 0.00016 0.7991 0.00015
[0.8076; 0.8090] [0.7971; 0.7987] [0.7984; 0.7999]
θ = 0.9 0.80 0.9087 0.00015 0.8980 9.6908e − 05 0.8996 9.3953e − 05
[0.9082; 0.9093] [0.8974; 0.8986] [0.8990; 0.9002]
We compare the Bayesian, mixed implicit and maximum likelihood estimators in term of
mean squared errors for different true values of θ and λ > 0. Figure 1 shows that the MSE for the
mixed implicit approach is clearly lower than those calculated with the Bayesian method, when
the sample size is greater than 30. But, the MSE for maximum likelihood and mixed implicit
methods are similar (see Figure 1). For a small sample size (n=50) and for each θ ∈ (0, 1), we
simulate 1000 samples. In Figure 2, we report the mean MSE as a function of the parameter θ.
Observe that for θ ∈ (0.2, 0.8) (Figure 2), the MSE calculated for the mixed implicit estimator
is lower than that computed by the maximum likelihood estimation. However, when comparing
the mean MSE of the mixed implicit estimator with that calculated by the Bayesian method,
we notice that latter gives better results when θ ∈ (0, 0.6). This is due to the fact that we
use the formula prior = θ ∗ (1 − 0.15) which gives favorable priors, when θ ≤ 0.6. However, if
θ ∈ (0, 0.6) the same formula gives a slightly unfavorable priors, which explains the observed
efficiency of our method in Figure 2, when θ ∈ (0.6, 1). Consequently, for small data sets and
with some constraints on θ, the mixed implicit approach is more efficient than Bayesian and
maximum likelihood approaches.
We also performed simulations in order to validate the mixed implicit approach by compar-
ison with the standard implicit one. We simulated a data set from a compound poisson distri-
bution with parameter λ=100 and θ = (θ1, · · · , θr). True parameters values of θ = (θ1, · · · , θr),
mixed implicit θI , standard implicit θ estimates and mean squared errors are presented in Table
2. According to this table, we notice that our mixed implicit estimators resemble to standard
1774 ACTA MATHEMATICA SCIENTIA Vol.33 Ser.B
implicit estimators. When the data set size is small and r is large (λ = 100 and r ≥ 5), the
mean squared error calculated for the mixed implicit method is less than calculated for the
standard implicit one. In this case, the mixed implicit method is better than the implicit one.
Fig.1 This figure shows the variation of the mean MSE as a function of the sample size,
for several true values of θ ∈ {0.05, 0.2, 0.5, 0.7}.
Fig.2 This figure shows the variation of the mean MSE as a function of the true parameters
for 1000 samples with a common size n = 50.
No.6 A. Ghribi & A. Masmoudi: A COMPOUND POISSON MODEL FOR BAYESIAN NETWORKS 1775
Table 2 True and estimated parameters of θ obtained by the standard implicit estimates
and mixed implicit estimates, for different value of r, as well as there MSE.
True value
θ = (θ1, · · · , θr) θ = (θ1, · · · , θr) MSE (θ) θI = (θI
1, · · · , θI
r) MSE(θI )
r = 2 (0.5,0.5) 0.507772 6.040431e − 05 0.5076565 5.862214e − 05
0.492228 0.4923435
r = 2 (0.1,0.9) 0.1027190 7.393142e − 06 0.1039768 1.581471e − 05
0.8972810 0.8960232
r = 2 (0.4,0.6) 0.3704309 0.0008743307 0.3690476 0.0009580499
0.6295691 0.6309524
r = 3 (0.33,0.33,0.34) 0.2857892 0.001135640 0.2857892 0.0009803208
0.3499760 0.3499760
0.3642348 0.3642348
r = 3 (0.4,0.4,0.2) 0.3535354 0.001918165 0.3537241 0.002222902
0.4646465 0.4587912
0.1818182 0.1874847
r = 3 (0.2,0.1,0.7) 0.1216820 0.003439666 0.1140351 0.003843234
0.1628068 0.1578947
0.7155112 0.7280702
r = 5 0.19607843 0.25490196 0.001140587 0.25315293 0.001082285
0.23529412 0.24509804 0.24290210
0.20915033 0.16666667 0.16734831
0.30718954 0.29411765 0.28915842
0.05228758 0.03921569 0.04743824
r = 5 0.06172840 0.03418803 0.0004381077 0.04126630 0.0003245078
0.03703704 0.05128205 0.05663675
0.37860082 0.37606838 0.36893439
0.15226337 0.13675214 0.13921350
0.37037037 0.40170940 0.39394905
r = 7 0.18750000 0.19744059 0.0001298506 0.19714628 0.0001275068
0.03348214 0.03656307 0.03756213
0.03125000 0.03016453 0.03089937
0.21875000 0.21389397 0.21375930
0.17633929 0.17367459 0.17301490
0.20758929 0.22486289 0.22421578
0.14508929 0.12340037 0.12340223
r = 10 0.06095552 0.07142857 0.0008395715 0.07428159 0.0006786088
0.01812191 0.02678571 0.03278920
0.17627677 0.20535714 0.19450458
0.08237232 0.08928571 0.09160087
0.14003295 0.17857143 0.17303133
0.12685338 0.13392857 0.12881146
0.04118616 0.04464286 0.04827587
0.13509061 0.13392857 0.13099518
0.09884679 0.06250000 0.06721921
0.12026359 0.05357143 0.05849072
1776 ACTA MATHEMATICA SCIENTIA Vol.33 Ser.B
3 Implicit Inference with Bayesian Networks
A Bayesian network (BN) is a set of variables X = {X1, · · · , Xn} with a network structure
G that encodes a set of conditional independence assertions about variables in X , and a set of
local probability distributions associated with each variable. Together, these components define
the joint probability distribution for X . The network structure G is a directed acyclic graph G
and it is suitable for looking for relationships among all variables. The nodes in G correspond
to the variables in X1, · · · , Xn. Each Xi denotes both the variable and its corresponding node,
and Pa(Xi) the parents of node Xi in G as well as the variables corresponding to those parents.
The lack of possible arcs in G encode conditional independencies [2, 17]. In particular, given
structure G, the joint probability distribution for X is given by the product of all specified
conditional probabilities
P(X1, · · · , Xn) =
n∏
i=1
P(Xi/Pa(Xi)).
The local probability distributions P are the distributions corresponding to the terms in
the product of conditional distributions. When building BN without prior knowledge, the
probabilities will depend only on the structure of the parameters set.
Let d = {x1, x2, · · · , xt} a dataset with t observations and let a Bayesian network with
parameters (G, (Θ = {θijk, i = 1, · · · , n, j = 1, · · · , qi, k = 1, · · · , ri}, λ = (λij))), where ri is
the number of states related to variable Xi, qi is the number of parent configurations, xi =
(xi1, x
i2, · · · , xi
n), λij > 0 and θijk = P(Xi = k/Pa(Xi) = j).
We denote by Nijk(d) the number of cases in d such that Xi is in its k-th state and its
parent in its j-th state and Nij(d) =∑
1≤k≤ri
Nijk(d).
If the node i has no parents, we denote θi1k = P(Xi = k) and Ni1k(d) the number of cases
in d such that Xi is in its k-th state.
In this section, we suppose that Nij(d) follows a Poisson distribution with parameter λij .
So, we introduce a compound Poisson Bayesian network, that is a Bayesian network associated
to a compound Poisson model (i.e., the random vector (Nij1(d), Nij2(d), · · · , Nijri(d)) follows
a compound Poisson distribution with parameters λijθij1, λijθij2, · · · , λijθijri).
The following proposition gives the implicit distribution of (Θ, λ) given d.
Proposition 3.1 If the dataset d comes from a compound Poisson distribution, then
1) the posterior distribution of (Θ, λ) given the structure G and the data set d is defined
by
P(Θ, λ/d, G) =
n∏
i=1
qi∏
j=1
e−λijλ
Nij(d)ij
Nij(d)!
n∏
i=1
qi∏
j=1
(Nij(d) + ri − 1)!
ri∏
k=1
(θijk)Nijk(d)
Nijk(d)!;
2) the implicit estimators of θijk and λij are
θijk =Nijk(d) + 1
Nij(d) + riwith λij = Nij(d) + 1.
Proof Since the map
d→ (Nijk(d) : i ∈ {1, 2, · · · , n} , j ∈ {1, 2, · · · , qi} , k ∈ {1, 2, · · · , ri})
No.6 A. Ghribi & A. Masmoudi: A COMPOUND POISSON MODEL FOR BAYESIAN NETWORKS 1777
is injective, then
P(d/G,Θ, λ) = P(Nijk = Nijk(d) ∀ i ∈ {1, 2, · · · , n} , j ∈ {1, 2, · · · , qi} , k ∈ {1, 2, · · · , ri})
=
n∏
i=1
qi∏
j=1
ri∏
k=1
P (Nijk = Nijk(d) /G,Θ, λ)
=n∏
i=1
qi∏
j=1
ri∏
k=1
e−λijθijk(λijθijk)Nijk(d)
Nijk(d)!
=
n∏
i=1
qi∏
j=1
e−λijλ
Nij(d)ij
Nij(d)!
n∏
i=1
qi∏
j=1
Nij(d)!
ri∏
k=1
(θijk)Nijk(d)
Nijk(d)!.
Thus,
c(d) =
∫P(d/G,Θ, λ) dΘdλ =
n∏
i=1
qi∏
j=1
Nij(d)!
(Nij(d) + ri − 1)!. (3.1)
Hence, the implicit distribution of (Θ, λ/d, G) is given by
P(Θ, λ/d, G) =n∏
i=1
qi∏
j=1
e−λijλ
Nij(d)ij
Nij(d)!
n∏
i=1
qi∏
j=1
(Nij(d) + ri − 1)!
ri∏
k=1
(θijk)Nijk(d)
Nijk(d)!.
This indicates that the distribution of (λ/d, G) and (Θ/d, G) are given by
P(λ/d, G) =
n∏
i=1
qi∏
j=1
e−λijλ
Nij(d)ij
Nij(d)!
and
P(Θ/d, G) =n∏
i=1
qi∏
j=1
(Nij(d) + ri − 1)!
ri∏
k=1
(θijk)Nijk(d)
Nijk(d)!.
Now, it is possible to find the implicit estimator of the parameter θijk, which is given by
θijk =Nijk(d) + 1
Nij(d) + ri,
and λij = Nij(d) + 1, which finish the proof. 2
In the framework of the Bayesian network, the mixed implicit estimator of θijk is given by
θijk = θijk + (1− riθijk)ψ(λij).
If d is a dataset, an estimation of the probability of X = (X1, X2, · · · , Xn), given d, on the state
x = (x1, x2, · · · , xn) is
P(X = x/d,G) =
n∏
i=1
qi∏
j=1
ri∏
k=1
θijk
αijk(x)
,
where αijk(x) = 1 if xi = k and Pa(Xi) = j, and αijk(x) = 0 if not.
1778 ACTA MATHEMATICA SCIENTIA Vol.33 Ser.B
In fact, if ki(x) denotes the state of the node i and by ji(x) the state of its parent in the
configuration x, then by applying Markov property, one has
P(X = x/d,G) = P(X1 = x1, X2 = x2, · · · , Xn = xn)
=
n∏
i=1
P(Xi = xi/Pa(Xi) = ji(x))
=
n∏
i=1
θiji(x)ki(x)
=n∏
i=1
qi∏
j=1
ri∏
k=1
θijk
αijk(x)
.
The mixed implicit approach has the advantages of Bayesian methods without priors. In
fact, the choice of prior information in Bayesian approach is itself a problem because of the
lack and the cost of information. Our implicit method avoids the problem of priors and always
leads estimators to be derived and implemented.
Note, also if the sample size is small and the number of parents increases, the number
of cases Nij in the data set d, when the parents of the node i are in their j-th state, can be
equal to zero. Another advantage of our mixed implicit approach consists in the fact that the
estimator can always be calculated which is not the case when using the maximum likelihood
approach, especially for small sample size. However, when the sample size is large enough, the
two methods lead to the same performance.
3.1 Simulation Study
In this subsection we describe a small simulation study which highlights some of the im-
portant features of our implicit approach that we have already described.
Our simulation study deals with the diagnosis in mobile communication networks. We
estimate the probability of the signal quality in GSM/GPRS networks. The task of determining
this probability needs some explanatory indicators supposed to be Bernoulli variables which are:
X1 = Coverage (1: Lack , 2: Presence),
X2 = Network configuration (1: GSM, 2: GPRS),
X3 = Mobility (1: Fixed, 2: Mobile),
X4 = Interference Level (1: Presence, 2: Lack),
X5 = Network state(1: Saturated, 2: Unsaturated),
X6 = Signal quality (1: Low, 2: High).
These are the most common indicators that may cause a high number of dropped calls
in GSM/GPRS network. Each cause can be split in several subcauses, which could also be
considered in the model. The cause of the “signal quality” is the network state. The network
state itself is caused by the interference level and mobility of the connector. However, the
covered level and the network configuration are the causes of “interference level”. Then, we
obtain the causal DAG structure G given by Figure 3.
No.6 A. Ghribi & A. Masmoudi: A COMPOUND POISSON MODEL FOR BAYESIAN NETWORKS 1779
Fig.3 Network structure of the simplified problem of the diagnosis in mobile
communication networks.
To infer the probabilities of the parameters θijk in the graph G, we have implemented
in R language the compound Poisson Bayesian network approach, previously introduced, we
choose true parameters θijk and we generate a dataset of 1000 observations. The results of this
simulation study are summarized in Table 3 that reports the estimation results for the posterior
Bayesian estimates θBuijk , the mixed implicit estimates θI
ijk, the maximum likelihood estimates,
θMvijk and the mean squared errors of each of them.
Table 3 True and estimated parameters obtained by the Bayesian and our mixed implicit
estimator for the network structure given by Figure 3.
True parameter θI
ijkPrior θ
Bu
ijkθ
Mv
ijk
θ112 = 0.8 0.7858068 0.9 0.7972727 0.7870000
θ212 = 0.4 0.3963663 0.3 0.3872727 0.3960000
θ312 = 0.8 0.7919362 0.7 0.7845455 0.7930000
θ412 = 0.1 0.0961275 0.2 0.1341991 0.0839694
θ422 = 0.3 0.3132736 0.2 0.2472527 0.3048780
θ432 = 0.6 0.5744628 0.5 0.5619546 0.5750529
θ442 = 0.8 0.8301925 0.9 0.8502415 0.8343949
θ512 = 0.2 0.1998536 0.1 0.1413613 0.1868131
θ522 = 0.3 0.2755226 0.2 0.2361111 0.2672414
θ532 = 0.5 0.4757777 0.6 0.5034169 0.4749263
θ542 = 0.8 0.7880795 0.7 0.7743682 0.7907489
θ612 = 0.4 0.3944095 0.3 0.3759398 0.3935185
θ622 = 0.8 0.8093594 0.9 0.8248503 0.8116197
MSE(θ) 0.001447109 0.004454807 0.001656995
1780 ACTA MATHEMATICA SCIENTIA Vol.33 Ser.B
According to Table 3, the MSE computed with the mixed implicit approach is lower than
those calculated with the Bayesian and maximum likelihood methods. This result supports the
choice of the compound Poisson model relatively to the Multinomial one, and this especially
when θijk ∈ (0.2, 0.8) (see Figure 2) and/or the number of variables, in the Bayesian structure,
increases.
Globally, the concordance between the three approaches is very good. However, when
we compare the mixed implicit method to Bayesian method, we see a better precision of the
implicit compared to Bayesian method with slightly unfavorable priors.
In Figure 4, we represent the diagram of joint probabilities related to each configuration.
In most cases, when comparing the implicit joint probabilities with Bayesian ones, we notice
that the former are closer to the true joint probabilities than the latter ones.
Fig.4 This gives true joint probabilities as well as joint probabilities estimates using
Bayesian and mixed implicit methods for 64 configurations. Each configuration number
(on the abscissa axis) refers to the binary representation of this number minus one.
3.2 Conclusion
In this paper, we have introduced the compound Poisson model in the Bayesian network
framework and we have summarized this concept as an alternative approach to the multinomial
Bayesian networks. We apply implicit inference to this model for learning the parameters
of the network. The implicit framework is efficiently implemented in R language to infer
probabilities and to test the performance of our approach. A comparative study between
the learned estimates given by the three approaches, mixed implicit, standard Bayesian and
maximum likelihood based on true values, is established. This study shows a better performance
and precision of the compound Poisson model for the mixed implicit approach, mainly, when
the sample size is small.
Another interesting issue is to address the problem of learning the structure within the
framework of implicit inference in compound Poisson model.
No.6 A. Ghribi & A. Masmoudi: A COMPOUND POISSON MODEL FOR BAYESIAN NETWORKS 1781
3.3 Appendix
Proof of Proposition 2.1
The norming constant c(D) is given by
c(D) =
∫ r∏
i=1
[e−λθi
(λθi)Ni
Ni!
]dθdλ =
( r∑
i=1
Ni
)!
∫
Λ
r∏
i=1
θNi
i
Ni!dθ
∫
R+
e−λλ
r∑i=1
Ni
(r∑
i=1
Ni)!dλ
=
(r∑
i=1
Ni)!
(r∑
i=1
Ni + r − 1)!.
Hence, the implicit distribution of (θ, λ) is defined by
P(θ, λ/D) =
( r∑
i=1
Ni + r − 1
)!
r∏
i=1
θNi
i
Ni!
e−λλ
r∑i=1
Ni
(r∑
i=1
Ni)!.
Note that the conditional distribution P(θ, λ/D) of θ, λ given D is the product of the
Dirichlet distribution P(θ/D) = Dir(N1 +1, · · · , Nr +1) and the Gamma distribution P(λ/D) =
γ(1,r∑
i=1
Ni + 1), which proves the proposition.
Where
Dir(α1, α2, · · · , αr)(x1, x2, · · · , xr) = Γ
( r∑
i=1
αi
) r∏
i=1
xαi−1i
Γ(αi)δ1
( r∑
i=1
xi
)
denotes the Dirichlet density with parameters α1, α2, · · · , αr and
γ(a, b)(x) =ba
Γ(a)e−bxxa−11]0,+∞[(x)
denotes the Gamma density with parameters a and b. 2
Proof of Lemma 2.1
1) It’s easy to see that,
−e−λ(−λ)−r
∫ 0
−λ
e−xxr−1dx =
∫ 0
−λ
−e−λ(−λ)−r
+∞∑
k=0
(−1)kxk+r−1
k!dx
= −e−λ(−λ)−r
+∞∑
k=0
∫ 0
−λ
(−1)kxk+r−1
k!dx
= −e−λ(−λ)−r
+∞∑
k=0
(−1)k+1(−λ)k+r
k!(k + r)
=+∞∑
k=0
e−λλk
k!(k + r)
= E
(1
N + r/λ
).
1782 ACTA MATHEMATICA SCIENTIA Vol.33 Ser.B
By way of induction on r, one has
ψ(λ) = −e−λ(−λ)−r
∫ 0
−λ
e−xxr−1dx = (r − 1)!
{e−λ − 1
(−λ)r−
r−1∑
k=1
(−λ)k−r
k!
}.
2) Using the formula
ψ(λ) = (r − 1)!
{e−λ − 1
(−λ)r−
r−1∑
k=1
(−λ)k−r
k!
},
we obtain the desired result.
3) Using Chebyshev inequality, we obtain
for all ǫ > 0, P
(∣∣∣∣λ− 1
λ− 1
∣∣∣∣ > ǫ
)≤ Var(λ− 1)
ǫ2λ2=
1
ǫ2λ.
This shows that λλ
converges in probability to 1, as λ −→ +∞.
4) By using Markov inequality, for all ε > 0 and λ ∈ N, one has,
P
( √λ
N + r≥ ε
)≤ 1
εE
( √λ
N + r
)=
√λψ(λ)
ε.
Since limλ−→+∞
λψ(λ) = (r − 1)!,√
λN+r
converges almost surely to 0, as λ −→ +∞. 2
Proof of Proposition 2.2
1) E(θi) = E(E(Ni+1N+r
/N)) = E(Nθi+1N+r
) = θi + (1− rθi)E( 1N+r
) = θi + (1− rθi)ψ(λ).
Since limλ7→+∞
ψ(λ) = 0, then limλ7→+∞
E(θi)− θi = 0.
2) Note that
E((θi − θi)2) = E(E((θi − θi)
2/N)) (3.2)
= E
(Nθi(1− θi)
(N + r)2+
(1− rθi
(N + r)
)2)(3.3)
≤ [θi(1 − θi) + (1− rθi)2]ψ(λ). (3.4)
Since limλ7→+∞
ψ(λ) = 0, then limλ7→+∞
E((θi − θi)2) = 0.
3) By using the assertion 4) of Lemma 2.1, the sequence√
λn
N+rconverges, in probability,
to 0 as n −→ +∞. Then, there exists a subsequence (λnk)k such that
√λnk
N+rconverges almost
surely to 0 andλnk
Nconverges almost surely to 1, as k −→ +∞. Let Zλ =
√λ(θi − θi) and
LZλ(t) = E(etZλ) = E(et
√λ(θi−θi)) its Laplace transform. In order to prove that Zλnk
converges
in distribution to N (0, θi(1− θi)) as k −→ +∞, one can show that
limk 7→+∞
LZλnk(t) = e
t2θi(1−θi)
2 , ∀t ∈ R.
Observe that,
LZλnk(t) = E(et
√λnk
(θi−θi)) = E(E(et√
λnk(
Ni+1
N+r−θi)/N))
= e−t√
λnkθiE(e
t√
λnkN+r E(e
t√
λnkNi
N+r /N))
= e−t√
λnkθiE(e
t√
λnkN+r ((1 − θi) + θie
t√
λnkN+r )N ).
No.6 A. Ghribi & A. Masmoudi: A COMPOUND POISSON MODEL FOR BAYESIAN NETWORKS 1783
Using the fact that the sequencet√
λnk
N+rconverges almost surely to 0 and the Taylor expansions
of the functions log(1 + x) and ex, up to the second order around 0, we obtain
N log((1− θi) + θiet√
λnkN+r )
= N log
((1− θi) + θi
(1 +
t√λnk
N + r+
t2λnk
2(N + r)2
))+ o
((t√λnk
N + r
)2)
= N log
(1 + θi
(t√λnk
N + r+
t2λnk
2(N + r)2
))+ o
((t√λnk
N + r
)2)
= Nθi
(t√λnk
N + r+t2λnk
(1− θi)
2(N + r)2
)+ o
((t√λnk
N + r
)2).
We deduce that
LZλnk(t) = E
[e(Nθi(
t√
λnkN+r
++t2λnk
(1−θi)
2(N+r)2)+
t√
λnkN+r
−t√
λnkθi)+o((
t√
λnkN+r
)2)]
= E
[e((1−rθi)
√λnk
t
N+r+
t2θi(1−θi)
2
Nλnk(N+r)2
)++o((t√
λnkN+r
)2))].
SinceNλnk
(N + r)27→ 1 and
(1 − rθi)√λnk
t
N + r7→ 0
almost surely, as k → +∞,
(1− rθi)√λnk
t
N + r+t2θi(1− θi)
2
Nλnk
(N + r)27→ t2θi(1− θi)
2, almost surely.
Hence, LZλnk(t)→ e
t2θi(1−θi)
2 as k → +∞.
Consequently,
√λnk
(θi − θi) 7→ N (0, θi(1− θi)), in distribution.
2
Proof of Proposition 2.3 Recall that the set V of random variables equipped by the
distance d defined by
d(X,Y ) = E
( |X − Y |1 + |X − Y |
)
is a complete metric space.
Now, we will prove that (Zλ)λ∈N is bounded in V.
Observe that, d(Zλ, 0) = E( |Zλ|
1+|Zλ|)≤ E (|Zλ|) ≤
(E
(Z2
λ
)) 12 .
By using the Proposition 2.1 and the inequality (3.4), we obtain
E(Z2
λ
) 12 =
√λE((θi − θi)
2)
≤√
[θi(1 − θi) + (1− rθi)2]λψ(λ).
≤√
(2 + r2)λψ(λ).
Then, d(Zλ, 0) ≤√
(2 + r2)λψ(λ).
1784 ACTA MATHEMATICA SCIENTIA Vol.33 Ser.B
The result come from the fact that the function λψ(λ) is continuous on [0,+∞[ and that
limλ→+∞
λψ(λ) = (r − 1)!
Let (λn)n∈N be a sequence tending to +∞ as n → +∞ such that (Zλn)n converges in
probability to a random variable W . According to the assertion 3) of Proposition 2.2, we
can extract a subsequence (λnk)k such that Zλnk
converges in distribution, as k −→ +∞,
to N (0, θi(1 − θi)). Recall that the convergence in probability implies the convergence in
distribution, then by using the unicity of limit, we deduce thatW follows the normal distribution
N (0, θi(1−θi)). This implies that Zλnconverges in probability to W ∼ N (0, θi(1−θi)). Hence,
(Zλ)λ∈N∗ is a bounded sequence in V which admits a unique adherent pointW ∼ N (0, θi(1−θi)).
Then, it converges in probability and consequently in distribution. 2
References
[1] Pearl J. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. San Fransisco,
CA: Morgan Kaufmann, 1988
[2] Lauritzen S. Graphical models. Clarendon Press Oxford, 1996
[3] Jensen F, Lauritzen S, Olesen K. Bayesian updating in recursive graphical models by local computations.
Computational Statistical Quaterly, 1990, 4: 269–282
[4] Heckerman D. A tutorial on learning with Bayesian networks. Learning in Graphical Models. MIT Press,
1999: 301–354
[5] Heckerman D, Chickering D, Geiger D. Learning Bayesian Networks: Search methods and experimental
results//Proceedings of Fifth Conference on Artificial Intelligence and Statistics, 1995: 112–128
[6] Heckerman D, Breese J. Causal independence for probability assessment and inference using Bayesian
networks. IEEE Trans Syst Man Cybem, 1996, 26: 826–831
[7] Neapolitan R E. Learning Bayesian Networks. Prentice Hall, 2003
[8] Chickering D, Heckerman D, Meek C. Large-sample learning of Bayesian networks is NP-Hard. J Mach
Learn Res, 2004, 5: 1287–1330
[9] Robert C P. The Bayesian Choice: A Decision-Theoretic Motivation, New York: Springer-Verlag, 1994
[10] Bernardo J M. Reference posterior distributions for Bayesian inference (with discussion). J Royal Statistical
Soc, 1979, 14B: 113–147
[11] Jeffreys H. An Invariant Form for the Prior Probability in Estimation Problems. Proceedings of the Royal
Society of London. Series A, Mathematical and Physical Sciences, 1946, 186(1007): 453–461
[12] Revfeim K J A. An initial model of the relationship between rainfall events and daily rainfalls. J Hydrology,
1984, 75: 357–364
[13] Hassairi A, Masmoudi A, Kokonendji C. Implicit distributions and estimation. Comm Stat Theory Meth-
ods, 2005, 34: 245–252
[14] Ben Hassen H, Masmoudi A, Rebai A. Causal inference in biomolecular pathways using a Bayesian network
approach and an implicit method. J Theoret Biol, 2008, 253(4): 717–724
[15] Ben Hassen H, Masmoudi A, Rebai A. Inference in signal transduction pathways using EM algorthm and
an implicit algorithm: Incomplete data case. J Comput Biol, 2009, 16: 1227–1240
[16] Bouchaala L, Masmoudi A, Gargouri F, Rebai A. Improving algorithms for structure learning in Bayesian
networks using a new implicit score. Expert Syst Appl, 2010, 37(7): 5470–5475
[17] Leray P. Automatique et Traitement du Signal Reseaux Bayesiens: apprentissage et modelisation de
systemes complexes [Habilitation a diriger les recherches]. Universite de Rouen, UFR des Sciences, 2006