APPROXIMATE BAYES MODEL SELECTION PROCEDURES...Markov random fields (MRFs) - equivalently, Gibbs...

APPROXIMATE BAYES MODEL SELECTION PROCEDURES

FOR MARKOV RANDOM FIELDS

Lynne Seymour

Department of Statistics

The University of Georgia

Chuanshu Ji

Department of Statistics

The University of North Carolina

ABSTRACT

For applications in texture synthesis, we derive two approximate Bayes criteria for selecting a

model from a collection of Markov random fields. The first criterion is based on a penalized

maximum likelihood. The second criterion, a Markov chain Monte Carlo approximation to the

first, has distinct computational advantages. Some simulation results are also presented.

KEY WORDS AND PHRASES: BIC, model selection, Gibbs random fields, Markov random

fields, texture synthesis, Markov chain Monte Carlo.

AMS CLASSIFICATION: primary - 62F15, 62E25; secondary - 62A15, 60J10.

The first author acknowledges partial support from ONR grant N00014-93-I-0043. The second

author acknowledges partial support from ONR grant N00014-89-J-1760 and NSF grant DM5

9310322.

1 Introduction

Markov random fields (MRFs) - equivalently, Gibbs random fields (GRFs) - were ori

ginally introduced as models in statistical mechanics. Hassner and Sklansky (1980) first proposed

the MRF as a statistical model for digital images. Since Geman and Geman (1984), MRFs have

been used often in imaging problems, such as image restoration (Geman and Geman, 1984; Besag,

1986), boundary detection (Geman, Geman, and Graffigne, 1987; Geman, Geman, Graffigne, and

Dong, 1990), and texture segmentation and synthesis (Cross and Jain, 1983; Geman and

Graffigne, 1986; Derin and Elliott, 1987; Acuna, 1992; Hu and Fahmy, 1992). More general

surveys may be found in Geman (1991) and Karr (1991). The overview by Rosenfeld (1993) also

1

contains relevant references.

This work is motivated by texture synthesis: For an image consisting of a single texture

with no degradation, we seek to choose a model from a collection of MRFs which produce typical

samples resembling real textures. Since the likelihood for MRFs may be written as an exponen

tial family, we take an approach similar to that of Schwarz (1978) to derive approximate Bayes

criteria (BIC) for choosing a "best" model.

In Section 2 we give a brief introduction to GRFs, a Bayesian formulation of the prob

lem, and our main results. Section 3 is devoted to the likelihood and Markov chain Monte Carlo

likelihood, which are basic elements in our solution to the model selection problem. In Section 4,

we present a small simulation study our results. All technical proofs are kept in Section 5.

2 Set-up and Main Results

2.1 Markov Random Fields and the Related Model Selection Problem

For simplicity, we consider GRFs induced by translation-invariant pair-potentials of

finite range. The extension to other pair-potentials is straightforward but involves heavy nota

tion. For general definitions of GRFs, see Ruelle (1978) or Georgii (1988).

With each pixel site i E Z2, associate a random variable X. taking values in a finite· set S

with discrete topology. Let n =SZ2 be the configuration space and each zEn be a realization of

a random field X. Here Z = {zi,i E Z2} and X = {Xi,i E Z2}, where z. represents the observed

gray-level (or some other local attribute) at the pixel i. For every A C Z2, the subconfiguration

space is 0A = SA, so write zA = {zi,i E A} E 0A and X A = {Xi,i E A}. For every i E Z2, also

write iZ = {Zj,j:# i} and iX = {Xj,i:# i}.

A collection U = {hU1(zo)' Pj U2(zo,Zj): Zo,Zj E S;j E Z2}, with 0 representing the ori-

gin, is called a (pair-) potential of range R > 0, where

(i) U1: S...R and U2: S X S ....R are two known functions, and U2 is symmetric;

(ii) hER (the external field coefficient) is an unknown parameter; and

(iii) /3 j E R, j E Z2 (the coupling coefficients) are also unknown parameters satisfying

Pj = P_j V i (symmetry) and Pj = 0 V i with Iii> R, where 1·1 is a norm on Z2. In

particular, Po = o.We let 9 be a generic vector parameter with components consisting of the external field and

coupling coefficients.

A Gibbs measure (GRF) induced by a potential U is a probability measure P on Osuch

that for every zEn and any finite A C Z2,

2

(2.1 )

with the energy associated with Z on A

and the partition function (normalizing factor)

When A = til, the (single-site) conditional probabilities {P(X i = zi I iX = iZ)'z EO}, are called

the local characteristics at i. Indeed, the left-hand side of (2.1) is determined by all local

characteristics at i E A (Geman, 1991).

Under our assumptions on U, the set O(U) of GRFs induced by U is always non-empty,

but need not be a singleton (phase transition). In general, O(U) is a convex, compact Choquet

simplex.

Example 2.1 Let U1(zi) = zi' U2(Zi,Zj) = ZiZj' and S = {-I, I}. This corresponds to the

general Ising models. In particular, if IJj = IJ > 0 for Ij I= 1 and IJ j = 0 otherwise, then we have

the well-known two-dimensional Ising model.

Example 2.2 Let U1(zi) == 0, and U2(zi,Zj) = 1/[1 +D'(Zi-Zj)2], where D' is a positive constant.

This potential is used in Geman and Graffigne (1986) for the texture models.

A neighborhood system 51 is a collection {oN"i: i E Z2} where J(i C Z2 is the set of neigh

bors of i E Z2 satisfying i ~ oN"i and i E J(j ¢> j E oN"i V i, j E Z2. Define the boundary of a finite

region A C Z2 by 8A = ( Ui e AJ( i )\A. Note that every P E O(U) is a MRF with respect to a

neighborhood system 51 in the sense that for every Z E 0 and any finite A C Z2,

(2.3)

with J(i = {j E Z2: IJj-i #: O} for every i.

Specification of a potential consists of specifying both the dimension of the parameter (J

and the neighborhood system 51 (which are related). Let e =RK be the parameter space of in

terest, which is decomposed as a disjoint union of several subspaces:

3

(2.4)

where each em corresponds to a candidate model (i.e., a potential) parametrized by an element

9 E Rkm. We assume that every closure em is a km-dimensional linear subspace of RK , m =O,I, ...,M. In particular, eo corresponds to the fully specified model with no unknown parame

ter. Denote the set of all candidate models by .At. = {O, 1, ..., M}, and let 5lm be the neighborhood

system for the model m E.At.. Notice that two models m 1 ::f. m2 are distinct if their parameter

spaces have different dimensions (i.e., km ::f. km ); or they are associated with different neigh-1 2

borhood systems (i.e., 5lm ::F 11m ); or both.1 2

The following examples will be used throughout this paper, including in simulations. For

these, let Ul(zi) == 0, U2(Zi,zi) =zizi and S ={-I, I}.

"Y

{3 {31 "Y {3 "Y

{3 i {3 {32 i {32 "Y {3 i {3 "Y

{3 {31 "Y (3 "Y

Figure 1 (m1) Figure 2 (m2) "Y

Figure 3 (rna)

Example 2.3 The model depicted in Fig. 1, denoted m1, is for the tw~dimensionallsingmodel.

Each site i has four nearest neighbors. The same coupling coefficient {3 is imposed for each pair

(i,j), j E Xi'

Example 2.4 For the model in Fig. 2, denoted m2, every site i again has four nearest neighbors.

However, different parameters {31 and {32 are used for a "vertical pair" and "horizontal pair"

interactions, respectively.

Example 2.5 For the model in Fig. 3, denoted rna, the neighborhood system has an expanded

graph which includes the twelve neighbors of each site for two l~yers. The two parameters {3 and

.,. are associated with the inner layer and the outer layer, respectively.

In these examples, 5lm =Ilm ::F Ilm , and km =1 while km =km =2. Several syn-1 2 a 1 2 a

thetic textures generated from m 1, m 2, and ma by the Gibbs sampler are shown in Fig. 4. The

coupling coefficients are assigned different values to produce different imaginary patterns:

"sands" (4a), "clouds" (4b), "wood grain" (4d), and "wall papers" (4c, 4e, 4f). Of course,

samples from such simple models are far from resembling real textures.

4

(Insert Fig. 4 here.)

In general, starting from 6 E a, a different model can be obtained either by equating

some components Pi in 0 (e.g., letting PI = 132 ~ P in m2 to obtain m1), or by letting some

13; =0 (e.g., letting ""I =0 in m3 to obtain m1). In this wayan exponential family may be re

duced to its minimal form.

(2.5)M

,,= E O'm"m'm=O

Taking the Bayesian approach of Schwarz (1978), we represent prior knowledge of the can

didate models by a prior distribution " on a which is a mixture of mutually singular probability

measures on the subspaces am' mE.Ab:

where Q'm > 0, mE.Ab, are constants such that EO'm = 1, and lI'm is a probability measure supm

ported on the closure am with a positive smooth density I'm' mE.Ab.

Let An C Z2 be the n x n symmetric square centered at the OllglD. Let the data

%An ~ %(n) be a single realization of X An ~ X(n), where X has a distribution P E g(U). We

write P9 for P to indicate the parametrization. Extend z(n) to a periodic configuration i En by

periodization (or "tiling"). Note that i depends on z(n). Then the likelihood is defined as

(2.6)

Any 0 which maximizes the likelihood is called a maximum likelihood estimate (MLE) based on

z(n), and is denoted bye.

The posterior distribution on a given z(n) is then given by

I A L(z(n),O)dll'(8) vII (A) - v measurable A C a.

z(n) - I eL(z(n),8)dll'(9)(2.7)

The decision to select a model m based on z(n) is denoted by p(z(n» = m, where P:OAn"'.AI, is

the decision function. Imposing 0-1 loss, the posterior Bayes risk of p( •) given z(n) is therefore

(2.8)

The Bayesian solution to the model selection problem is to choose a model mwith llz(n)(am)~llz(n)(am ) V mE .Ab.

5

2.2 Main Results

Our main results are that the two selection criteria of procedures (.1) and (.2) defined in

this section are BIC (i.e., give approximations to the Bayesian solution). The proofs are left to

Section 5.

The likelihood given in (2.6) is a full K-dimensional (standard) exponential family (c/.

Bamdorff-Nielsen, 1978, or Brown, 1986). Any exponential family which is not minimal can

always be reduced to a minimal exponential family through sufficiency, re-parametrization, and

proper choice of the reference measure; hence for each mEA let Lm (%(n),8), 8 E em' be the

likelihood written as a km-dimensional minimal exponential family, and let em be a corresponding

MLE restricted to em' when the MLE exists. The same notation 8 is used for the new para

meters after re-parametrization unless further distinction is needed.

More specifically, write

(2.9)

where 8T is the transpose of the (column) vector 8; IAnl is the cardinality of An; the sufficient

statistic Y is a K-dimensional vector whose components are

Il l.I: Ul( zi) and Ill[~ .I: U2( Zi,Zi+i)+ I: U2( Zi,Zi+i)]n I e An n I e An i e An

H; e An i+i e BAn

corresponding to hand {3j in 8, respectively, for each i E )(0; and the cumulant generating func

tion b(8) (depending on zBAn ) is given by

b(9) = lin1log~ exp{1 An18Ty}-zAn

Analogously, for each mE..Ab we can also write

(2.10 )

where the sufficient statistic Y m can be expressed as

(2.11 )

with each Zm,i dependent only upon z)(.; and bm(8) is the corresponding cumulant generatingI

6

function. For example, suppose i is in the interior of An. Then for model m2,

where E H is a sum over the two horizontal neighbors of i and E V is a sum over the two

vertical neighbors of i. Slight modifications are needed for i on the boundary of An. Z m i andl'

Z m i may be written likewise.3'

For each mEA, define the criterion

(2.12)

Then the first selection procedure (.1) is to choose the model of largest ci~).

The following three assumptions are needed to show that (.U is an approximate Bayes

procedure.

(AU Identifiability condition. Write {Pi(z;O),z E O,i E Z2} for the local characteristics with

parameter o. Then 0 is identifiable if Po(z;O) if.: Po(z;O') for some z E 0 whenever 0 if.: 0'. Identifi

ability can also be imposed via conditions on potentials (Georgii, 1988; Gidas, 1993).

(A2) Uniqueness ofthe GRF. The potential U satisfies

where

This is the uniqueness condition of Simon (1979) for the GR~ in our set-up. This inequality,

although slightly stronger than the uniqueness condition of Dobrushin (1968), is sharp (Georgii,

1988, p. 158) and easy to check.

(A3) Positivity of the covariance matrix of Y.._ For each mEA and every 0 E em' there exists

a positive definite matrix Bm(O) such that

7

where Ee(·) is the expectation with respect to Pe. Under (A2), the "lim int" in the above

expression can be replaced by "lim," and Bm(fJ) is always non-negative definite (but may be dege

nerate). A sufficient condition for the limit to be positive definite is that the additive terms Zm,i

in (2.11) have positive correlations under Pe.

7"ieorem 1 UndedAl) - (A3), (.1> is an approximate Bayes procedure Pe-a.s. as n-oo.

It is difficult to use (.1> in practice since MLEs are intractable. We give the following

Markov chain Monte Carlo (MCMC) alternative which relies on the likelihood approximation de

veloped by Geyer and Thompson (1992).

Fix a sample z(n) and let P", be a GRF with known parameter 1/;. Simulate an ergodic

Markov chain {X<0(n)};o= 1 under a probability measure P such that its equilibrium distribution

is the conditional probability measure P"'(. IZ8An) on nAn given z8An• For each mEA, define

(2.13 )

where the sufficient statistic Ym is based on z(n), and

(2.14)

where for each 1= 1, ...,L, the sufficient statistic Ym,l depends on both z8An

and X<l)(n). We

call any fJ which maximizes lm(L, fJ) a Monte Carlo MLE (MCMLE) restricted to em and based

on r(n) and X<l~n),.••,X<L~n). We denote such a fJ by Om(L).

Now for each mEA, define the criterion

(2.15)

Then the second selection procedure (.2) is to choose the model of largest Q<:a).

Let G.Pe be the joint measure of Pe and P.

Tlieorem! Under (AI) - (A3) there exists a sequence Ln such that (.2) with L =Ln is an approx

imate Bayes procedure G.Pe-a.s. as n-oo.

Some important issues related to the use of (.2) will be discussed in Section 3.2.

8

3 Parameter Estimat.ion

3.1 The MLE

Fix an arbitrary model m E.AI. and a parameter fJ E em throughout this section and in

the corresponding proof of Lemmas 1 and 2 in Section 5. We suppress the dependence on m to

save notation. Let X(n) be a sample generated from P" Vb(lp) and V 2b(lp) be the gradient

vector and Hessian matrix of b(lp) with respect to lp, respectively. Then the likelihood equation

may be written as Y =Vb(lp) =E<p(Y Iz8An

) where the conditional expectation of Y is over all

possible configurations zAn with z8An

fIXed. It follows from (2.11) and the multidimensional

ergodic theorem (Theorem <I4.A8) of Georgii, 1988) that

JLnJc,lY - Vb(fJ)J = 0, PrR.B. (3.1>

Lemma 1 Under (Al> - (A3), c ~ VTV2b(lp) v ~ G for some constants e,G > 0, uniformly for all

unit vectors v E Rk, all z8An ' alilp in a small neighborhood of fJ, and all large n.

From Lemma 1, there exists a small open neighborhood, say 0, of fJ on which Vb( . ) is a homeo

morphism, so Vb(O) is an open region for large n. It follows then from (3.1) that the likelihood

equation has a solution PrQ.s. The log-likelihood is globally concave, and locally strictly

concave; hence 8 is the unique MLE. Comets (1992) proved expone)ltial (strong) consistency of 8under (Al> only, assuming it exists.

In spite of these existence, uniqueness, and consistency results, the MLE is impractical

because the partition function is intractable. We propose the following alternative.

3.2 The MCMLE

Geyer and Thompson (1992) developed a method of Monte Carlo approximation to a like

lihood function that cannot be calculated nor well-approximated. We extend their result in our

context.

Lemma! Let '78 be a small compact neighborhood of the true parameter fJ such that Lemma 1

holds for each lp E '18' Then

and

t(L, lp) - logL(z<n), lp) +IAn Ib(1/J);

Vt(L,lp) - Vlog L(z<n),lp);

V 2t(L,lp) _ V210gL(z<n),lp),

9

(3.2)

(3.3)

<3.4)

are uniformly convergent for each tp E '18 P-o.s. as L- 00.

The MCMC likelihood £(L,tp) is globally concave since its Hessian can be expressed as a covari

ance matrix. In addition, 10gL(z<n),tp) is strictly concave on '18 by Lemma 1. Lemma 2 then

gives the strict concavity of t(L,tp) on '18 P-O.B. as L-oo. Therefore (analogous to the argument

for the existence and uniqueness of the MLE) the Monte Carlo likelihood equation Vt(L,tp) =0

has a solution 8(L) E '18 which is the unique MCMLE.

There are some important issues to consider when using the MCMLE as an estimate of

the true parameter. We have found in our simulation studies that the magnitude of n requires

that L must also be large in order to provide a good estimate of 8. This has serious computa

tional consequences, since generating a sequence of MCMC images is cumbersome. Also, although

tP is arbitrary in theory, the choice does affect the number of MCMC samples needed to get a

good approximation to the likelihood. Geyer and Thompson (1992) indicated that tP must be cho

sen close to the MLE in order to get a good estimate for (J with a reasonable number of MCMC

samples. In fact, for n = 100, a value of tP which is off by ten percent from 8 required more than

500 MCMC samples to get an MCMLE (Seymour, 1993). Geyer and Thompson (1992) proposed

an iterative method to get tP close enough to (J for a good estimate, but we could not get enough

MCMC samples to make this method work. For convenience, we took tP to be the maximum

pseudo-likelihood estimate (MPLE) (Besag, 1974, 1986; Geman and Geman, 1984) of 8.

-4 Implementation of the Monte Carlo BIC

Although there is an extensive literature in various Markov chain simulation algorithms,

we use the simple Gibbs Sampler (Geman and Geman, 1984) for both simulating textures and for

the MCMLE. We systematically update each site in an n x n random field, and we call a com

plete pass over the entire random field one iteration, as in Besag, York, and Mollie (1991), Smith

and Roberts (1993), and Besag and Green (1993). Such an implementation yields a Markov

chain with stationary transition probabilities.

In all of our simulation studies, we have used the three of models ml , m2' and m3 which

were introduced in Section 2.1. For convenience, we have omitted including a "largest" model

among the candidates. In addition, we chose m2 as the true model for its more complex direc

tional features; we rejected m1 because of its simplicity, and we rejected m3 because the MCMLE

seems to have difficulty with it. See Seymour (1993) for more extensive studies including m1 and

m 3•

Table 1 provides the results of using the MCMLE in simulation. For model m2, condi-

10

tion (A2) is given by IPII + IP21< i. The results presented are for n = 100, and for the sake of

comparison, the MCMLE has been computed for L =5, 100, and 500, as indicated in the column

Me( .). An entry of NC means that the estimate was not computable. Note the difficulties

encountered by the MCMLE when (A2) does not hold.

(Insert Table 1 near here.)

Table 2 displays the results of using (.2) when (A2) is satisfied by the realization. For

m 1 and m 3 , PI in the table corresponds to p, while P2 corresponds to 'Y. The symbol .* indicates

the chosen model. Obviously, the performance of (*2) is acceptable, and the choice of model is

quite clear in the sense that there are no other values close to the maximum index. The results in

Table 3, using (*2) for model selection in which the true model does not satisfy (A2), are not so

pleasant. In all cases, the procedure had difficulty evaluating the MCMLE under m3 •

(Insert Tables 2 and 3 near here.)

5 Proofs

Proof of Lemma 1. Refer to (2.11). For iEAn , write Zi=Zi-E9(ZdzaAn)' and note that

E 9(Zi) =o.

It follows from standard calculation that

V 2b('P) =IAn I[Elp( yyT Ii aAn)- Elp(Y IiaAn)Elp(yT Ii aAn)]

=llnIE'[C~nZi)(J~nZifl'.An}Under (A2), the family of continuous functions in <p given by' {V2b(<p):iaAn E 0aAn' n E Z+}

converges to B(<p) (given in (A3» as n-+oo uniformly for 'P in a neighborhood of e. Hence the

(A3) implies that VT V 2b('P)v ~ c for some c > o.To show

(5.1 )

for some C > 0 and uniformly for all <p in a neighborhood of e and all large n, notice that the

parameters satisfying (A2) form an open set, say a neighborhood of e. Following Proposition 8.8,

11

Example 8.28 (1), and Corollary 8.32 in Georgii (1988), X is a uniform mixing GRF under P 'P.

In particular,

(5.2)

for some c1, C1 > 0 and all i, JEAn with Ii-j Isufficiently large, where ~i = {i} U oN"i' i E Z2, and

d( . ) is the Euclidean distance between two sets. 0

Proof of Lemma 2. Let r(<p) =E~exP{IAn I(<p - tP)TY}} and note that r(L, <p)-r(<p) P-a.s. and

L-oo. It is a matter of simple manipulations to establish that each convergence in (3.2), (3.3>,

and (3.4) holds P-a.s. as L-oo for fIXed <p ERic.

Let

exp{1 An I(<p - tP)TY(x';\n»}II;L(<P) = -L-r--...:...-------~

L eXP{IAnI(<p - tP)TY(x'°(n»}1=1

for j =1, .. .,L. For every uniformly bounded function f:OAn -+R, consider the family

f = { t f II;L( . ), LEN}.1=1

Note that f is uniformly bounded. The gradient of II;L with respect to <p is

so that

This latter gradient is uniformly bounded in norm since y(.) is uniformly bounded in norm

(since, in fad, the potentials which make up y(. ) are so). Thus f is equicontinuous, and the

Ascoli-Arzela Theorem implies that f has a subsequence which is uniformly convergent on '7(J'

Let F1 ={t(L, . ),L EN}, F2 ={Vt(L,. ),L EN}, and F3 ={V2t(L,· ),L EN}. Note that

f 2 and F3 are f-type families (as above). Then both F2 and F3 have uniformly convergent sub

sequences on '7(J' In addition, F1 is uniformly bounded, and it is equicontinuous since F 2 is uni

formly bounded. Again, by the Ascoli-Arzela Theorem, F1 also has a uniformly convergent sub

sequence on '78.

12

Each convergence (3.2), (3.3), and (3.4) holds pointwise but not uniformly for !p in a count

able dense subset of '16 P-a.s. as L-oo. Since each of the limits in (3.2), (3.3), and (3.4) are con

tinuous functions on '16' it follows that these hold uniformly for each !p e 'Ie, P-a.s. as L - 00. 0

Proof of Theorem 1. We show that the model m1 chosen by (.1) is (asymptotically) equal to the

model which maximizes the posterior probability nz(n)(em } Write the collection of candidate

models as .4=.41(147)U{147}U.42(147), where 147e.4 is the true model, 9Eetp is the true

parameter, .41(147)= {m E .4:9;' em}' and .42(147)= {m e .4:etp C em}. .41(147) corresponds

to an under-parametrized choice of model or incorrect specification of neighborhood system (dif

ferent neighborhoods will correspond to different subspaces which may have the same dimension),

while .42(147) corresponds to an over-parametrized choice. Note particularly that etp is a proper

subset of the spaces corresponding to models in .42(147).

This proof will proceed in three steps. In Step 1, for a model me .42(147) where the MLE

exists and converges Pe-a.s. to the true parameter 9, we shall show that the posterior probability

nz(n)(em) is asymptotically equivalent to c1~) as n-oo. In Step 2, we sill show that the

Bayesian procedure will not choose any model mE .41(147) as n-oo. Finally, in Step 3, we will

show that (.1) will not choose any model mE .41(147) as n-oo.

~ 1: We seek to choose the model mE .42(147) which maximizes nz(n)(8m ); equiva

lently, we may maximize the log of (2.7>:

(5.3)

The second term is not a function of m, and therefore does not affect the maximization problem.

Concentrating on the first term in (5.3), write the likelihood as a function of m via reduction of

the exponential family to its minimal form. Thus, since 1fm ,(em ) =0 for m' =F m, it suffices to

show that the asymptotic expansion

(5.4)

holds uniformly on {zen): II 8m- 911 < 6} for some 6 > 0 and for !p e em. This will imply that

asymptotically, (.1) and the Bayesian procedure choose the same model in .4~147).

Choose 6 > 0 so that Lemma 1 holds for all cp with 11!p - 911 < 36. On the left-hand side of

(5.4), write

13

(5.5)

Using the minimal exponential family version of the likelihood, we may write

Now, ~Tym - bm(~) is concave in ~ for all ~ e em; by Lemma 1, it is strictly concave in a small

neighborhood of 9. By Taylor expansion about the MLE, we have

for some ~' e em satisfying II~' - em II ~ II ~ - em I~ Then by Lemma 1 and for II ~ - em 11:s 6, we

have

where e= cf > O. Hence the last integral term in (5.5) is

J L (zen),~) {}Q' m _ d1r < Q' ex -e A

II- II mL ( 9) m(~) - ~ p I nl

~-9m > 6 m zen), m

(5.6)

(5.7)

We must now investigate log of the first integral term in (5.5). Recall that lI'm has a density

Ilm > O. Bence,

14

Integrate by substitution using u = velA" I(Y' - 8m ), and we get

for some c' > o. Therefore, as n - 00,

(5.8)

By the same token,

Thus (5.4) follows from (5.6), (5.7), (5.8), and (5.9>.

S!m~: Let M is the "largest" model (i.e., the model which can be reduced to any of the

other candidate models), so that em ~ eM for all mEA. Note here that eM = RK , and that

we may write 8M =8since 8M is a "global" MLE over the set A.

We show that as n-oo, the Bayesian procedure will not choose any model mE A 1(tz").

Since 0 Ft em' we have "Y' - 0 II ~ 36 for all Y' E em and some 6 > o. Note that with large

probability and for large n, 118- 0 II < 6 so that II Y' - 811> 6 for all Y' E em.

(5.10)

From the derivation of (5.6), it follows that

(5.11 )

for some € > O. Therefore

15

By (5.4),

Bence

~ (km L(z<n), 9) exp { -EIAnl}

log1eML(z<n),lp)d7r(lp) = IAnl[ery - b(9)]-~ 10g1Anl+ 0(1)

... kM= log L(z<n), 8) - T logl An I+ 0(1)

for large n. The Bayes solution therefore will not choose a model in .Ab1(t4:1).

S!m~: We show that as n-oo, (.1) will not choose any model mE .Ab1(t4:1). It follows

from (5.6) that

for large n. Bence, (.1) will not choose a model mE .Ab1(t4:1) as n-oo. 0

Proof of Theorem 2. Again, let M denote the largest model, so that 9M(L) =9(L) is a "global"

MCMLE over the set.Ab. Recall that c:J>, is the joint measure of P, and P.

Decompose A as in the proof of Theorem 1, and suppose that mE A1(to:I). In the same

spirit used in (5.10), we may write

Note that 9(L)-e c:J>,-a.s. as n-oo and L-oo.

For mE .Ab1(to:I), ert em and thus (J is some positive dis~ce from em' Similarly there is

some positive distance between the MLE 9 and em' as well as between 9(L) and em because of

their existence and uniqueness properties. Then (5.10), (5.U), and Lemma 2, imply that there

exists some 6 > 0 such thatsup ...

'P e em t(L, lp) ~ £(L, e(L» - 6 IAn ~

Therefore,

16

for large n. Bence (.2) will not choose a model mE ..Ab1(Q) as n-oo and L-oo.

Now, suppose mE {c:7} U..Abt Q ). Then there exists a unique MCMLE 6m (L) E '7gnem.

For every large n, Theorem 1 holds Pra.8. Fix such a large n, and let mdenote the model cho

sen by (.l), By Lemma 2, lm(L,6m(L»-logLm(z<n),6m)+IAn l bm(tt') P-a.8. as L = Ln-oo.

Bence, ci~) is close to ci~) plus some constant for each m E {~} U ..Abt~), P-a.8. for large

L = Ln' ciia) is thus sifted out as the largest among {ti~):m E {~} U..Ab2(~)}' Therefore (.2)

chooses the same model as (.U P-a.s. as L =Ln-oo; and in addition, (*2) chooses the same

model as the Bayes procedure"ra.s. as n - 00 and Ln - 00. 0

References

C. O. Acuna (1992). Texture modeling using Gibbs distributions. CVGIP: Graphical Modelsand Image Processing 54 210-222.

O. Barndorff-Nielsen (1978). Informa#on and Exponential Families in Statistical Theory. JohnWiley & Sons, New York.

J. Besag (1974). Spatial interaction and the statistical analysis of lattice systems. J. Roy.Statist. Soc. Sere B 6 192-236.

J. Besag (1986). On the statistical analysis of dirty pictures (with discussion). J. Roy. Statist.Soc. Sere B 48 259-302.

J. Besag and P. J. Green (1993). Spatial statistics and Bayesian computation. J. Roy. Statist.Soc. Sere B 55 24-38.

J. Besag, J. York, and A. Mollie (1991). Bayesian image restoration, with two applications inspatial statistics (with discussion). Ann. Inst. Statist. Math. 43 1-59.

L. D. Brown (1986). Fundamentals of Statistical Exponential Families with Applications In

Statistical Decision Theory. IMS Lecture Notes - Monograph Series 9.

F. Comets (1992). On consistency of a class of estimators for exponential families of Markovrandom fields on the lattice. A nne Statist. 20 455-468.

G. Cross and A. Jain (1983). Markov random field texture models. IEEE Trans. Patt. Anal.Machine Intell. 5 25-39.

B. Derin and B. Elliott (1987). Modeling and segmentation of noisy and textured images. IEEETrans. Patt. Anal. Machine Intell. t 39-55.

R. L. Dobrushin (1968). The problem of uniqueness of a Gibbs random field and the problem ofphase transition. Functional Anal. and its Appl. 2302-312.

R. S. Ellis (1985). Entropy, Large Deviations, and Statistical Mechanics. Springer-Verlag, NewYork.

17

D. Geman (1991). Random fields and inverse problems in imaging. Lecture Notes in Math. 1427113-193. Springer-Verlag, New York.

D. Geman, S. Geman, C. Graffigne, and P. Dong (1990). Boundary detection by constrainedoptimization. IEEE Trans. Patte A nal. Machine Intell. 12 609-628.

S. Geman and D. Geman (1984). Stochastic relaxation, Gibbs distributions, and Bayesianrestoration of images. IEEE Trans. Patte Anal. Machine Intell. 6721-741.

S. Geman, D. Geman, and C. Graffigne (1987). Locating texture and object boundaries.In: P. A. Devijver and J. Kittler, Eels, Patte Recog. Theory and Appl. Springer-Verlag, NewYork.

S. Geman and C. Graffigne (1986). Markov random field image models and their applications tocomputer vision. Proc. International Congo of Mathematicians 1496-1517. Berkeley, CA.

B. O. Georgii (1988). Gibbs Measures and Phase Transitions. Walter de Gruyter, Berlin-NewYork.

C. J. Geyer and E. A. Thompson (1992). Constrained Monte Carlo maximum likelihood for dependent data. J. Roy. Statist. Soc. Sere B 54 657-699.

B. Gidas (1993). Parameter Estimation for Gibbs Distributions from Fully Observed Data. In:R. Chellappa and A. Jain, Eels, Markov Random Fields: Theory and Applications. AcademicPress, New York, 471-498.

M. Bassner and J. Sklansky (1980). The use of Markov random fields as models of texture.Computer Graphics and Image Processing 12 357-370.

R. Hu and M. M. Fahmy (1992). Texture segmentation based on a hierarchical Markov randomfield model. IEEE Trans. Signal Processing 26 285-305.

A. Karr (1991). Statistical models and methods in image analysis: A survey. In: I. V. Basawaand N. U. Prabhu, Eds, Inference for Stochastic Processes. Marcel Dekker.

A. Rosenfeld (1993). Image Modeling during the 1980's: A Brief Overview. In: R. Chellappaand A. Jain, Eds, Markov Random Fields: Theory and Applications. Academic Press, NewYork, 1-10.

D. Ruelle (1978). Thermodynamic Formalism. Addison-Wesley,Reading, Massachusetts.

G. Schwarz (1978). Estimating the dimension of a model. Ann. Statist. 6461-464.

L. Seymour (1993). Parameter estimation and model selection in image analysis using GibbsMarkov random fields. Ph.D. Dissertation, Department of Statistics, The University of NorthCarolina.

B. Simon (1979). A remark on Dobrushin's uniqueness theorem. Comm. in Math. Phys. 68 183185.

A. F. M. Smith and G. O. Roberts (1993). Bayesian Computation via the Gibbs sampler andrelated Markov chain Monte Carlo methods. J. Roy. Statist. Soc. Ser. B 55 3-23.

18

MPLE MC(5) MC(100) MC(500)

P1 .01 .007899 -.338189 .005851 .005171

P2 0.1 .099767 -.571546 .099162 .099127

PI 0.1 .096094 -.506890 .096130 .096169

P2 .01 .011600 -1.10292 .009396 .009783

PI 1.0 .996524 NC 1.00293 .997085

P2 0.1 .105420 NC .091354 .095746

PI 0.1 .109033 NC NC .086748

P2 1.0 .956369 NC NC 1.01163

Table 1 MCMLE for m2

PI = .01 P2 =.1

MCMCPI P2 c1;;)sample size model

100 1 .05197 - 52

.*2 .00483 .09988 90

3 .05300 .00037 47

500 1 .05181 - 52

.*2 .00493 .09955 90

3 .05224 -.00041 47

PI =.1 P2 = .01

MCMCPI P2 c1;;)sample size model

,

100 1 .05243 - 52

..2 .09620 .00976 83

3 .05236 .00187 46

500 1 .05222 - 52

.*2 .09568 .00967 83

3 .05236 .00082 46

Table 2 Model Selection under (A2)

19

/31 =.1 /32= 1

MCMCPI P2 c1~)sample size model

100 **1 .49334 - 6342

2 NC NC NC

3 NC NC NC

500 1 .49338 - 6343

..2 .09718 .98382 7869

3 NC NC NC

131 = 1 132 =.1

MCMCPI P2 c1~)sample size model

100 1 .49715 - 6473

**2 1.0017 .09577 8237

3 NC NC NC

500 1 .49723 - 6474

..2 .95175 .06875 8239

3 NC NC NC

Table 3 Model Selection not under (A2)

20

Textures for Figure 4 follow.

Captions for Figure 4, respectively:

a) ml fJ = 0.1

b) m l fJ = 1.0

c) ml fJ = -1.0

d) m2 PI = 1.0, P2 = 0.1

e) m2 Pl =1.0, P2 =-1.0

1) m3 P= 1.0, l' = -1.0

21

mibn1..

lYYLbLO. -1.o

APPROXIMATE BAYES MODEL SELECTION PROCEDURES...Markov random fields (MRFs) - equivalently, Gibbs...

Documents

Transcript of APPROXIMATE BAYES MODEL SELECTION PROCEDURES...Markov random fields (MRFs) - equivalently, Gibbs...