Large sample property of the bayes factor in a spline semiparametric regression model

12
Mathematical Theory and Modeling www.iiste.org ISSN 2224-5804 (Paper) ISSN 2225-0522 (Online) Vol.3, No.11, 2013 91 Large Sample Property of The Bayes Factor in a Spline Semiparametric Regression Model Ameera Jaber Mohaisen and Ammar Muslim Abdulhussain Mathematics Department College of Education for Pure Science AL-Basrah University-Iraq E-mail: [email protected] Abstract In this paper, we consider semiparametric regression model where the mean function of this model has two part, the first is the parametric part is assumed to be linear function of p-dimensional covariates and nonparametric ( second part ) is assumed to be a smooth penalized spline. By using a convenient connection between penalized splines and mixed models, we can representation semiparametric regression model as mixed model. In this model, we investigate the large sample property of the Bayes factor for testing the polynomial component of spline model against the fully spline semiparametric alternative model. Under some conditions on the prior and design matrix, we identify the analytic form of the Bayes factor and show that the Bayes factor is consistent. Keywords: Mixed Models, Semiparametric Regression Model, Penalized Spline, Bayesian Model, , Marginal Distribution, Prior Distribution, Posterior Distribution, Bayes Factor, Consistent. 1. Introduction In many applications in different fields, we need to use one of a collection of models for correlated data structures, for example, multivariate observations, clustered data, repeated measurements, longitudinal data and spatially correlated data. Often random effects are used to describe the correlation structure in clustered data, repeated measurements and longitudinal data. Models with both fixed and random effects are called mixed models. The general form of a linear mixed model for the i th subject (i = 1,…, n) is given as follows (see [10,13,15]), = +∑ + , ~0, , ~(0, ) (1) where the vector has length , and are, respectively, a × design matrix and a × design matrix of fixed and random effects. β is a p-vector of fixed effects and u ! are the q -vectors of random effects. The variance matrix is a × matrix and is a × matrix. We assume that the random effects { ; % = 1,…,( ; ) = 1,…,*} and the set of error terms { ,…, , } are independent. In matrix notation (see [13,15]), = + + (2) Here = ( ,…., , ) . has length = ∑ , , = ( . ,…, , . ) . is a × design matrix of fixed effects, Z is a × block diagonal design matrix of random effects, q= ∑ q ! / ! , u = (u 0 ,…,u / 0 ) 0 is a q-vector of random effects, = 1%23( ,…, , ) is a × matrix and = 1%23( ,…, ) is a × block diagonal matrix. In this paper, we consider semiparametric regression model (see [1,6,8,9,10,13,15]), for which the mean function has two part, the parametric ( first part ) is assumed to be linear function of p-dimensional covariates and nonparametric ( second part) is assumed to be a smooth penalized spline. By using a convenient connection between penalized splines and mixed models, we can representation semiparametric regression model as mixed model. In this model we investigate large sample properties of the Bayes factor for testing the pure polynomial component of spline null model whose mean function consists of only the polynomial component against the fully spline semiparametric alternative model whose mean function comprises both the pure polynomial and the component spline basis functions. The asymptotic properties of the Bayes factor in nonparametric or semiparametric models have been studied mainly in nonparametric density estimation problems related to goodness of fit testing. These theoretical results include (see [4,7,12,14]). Compared to the previous approaches to density estimations problems for goodness of fit testing and model selection, little work has been done on nonparametric regression problems. Choi and et al 2009 studied the semiparametric additive regression models as the encompassing model with algebraic smoothing and obtained the closed form of the Bayes factor and studied the asymptotic behavior of the Bayes factor based on the closed form ( see [3] ). In this paper, we obtain the closed form and studied the asymptotic behavior of the Bayes factor in spline semiparametric regression model and we proved that the Bayes factor converges to infinity under the pure polynomial model and the Bayes factor converges to zero almost surely under the spline semiparametric regression alternative and we show that the Bayes factor is consistent.

description

International peer-reviewed academic journals call for papers, http://www.iiste.org

Transcript of Large sample property of the bayes factor in a spline semiparametric regression model

Page 1: Large sample property of the bayes factor in a spline semiparametric regression model

Mathematical Theory and Modeling www.iiste.org

ISSN 2224-5804 (Paper) ISSN 2225-0522 (Online)

Vol.3, No.11, 2013

91

Large Sample Property of The Bayes Factor in a Spline

Semiparametric Regression Model

Ameera Jaber Mohaisen and Ammar Muslim Abdulhussain

Mathematics Department College of Education for Pure Science AL-Basrah University-Iraq

E-mail: [email protected]

Abstract

In this paper, we consider semiparametric regression model where the mean function of this model has two part,

the first is the parametric part is assumed to be linear function of p-dimensional covariates and nonparametric

( second part ) is assumed to be a smooth penalized spline. By using a convenient connection between penalized

splines and mixed models, we can representation semiparametric regression model as mixed model. In this

model, we investigate the large sample property of the Bayes factor for testing the polynomial component of

spline model against the fully spline semiparametric alternative model. Under some conditions on the prior and

design matrix, we identify the analytic form of the Bayes factor and show that the Bayes factor is consistent.

Keywords: Mixed Models, Semiparametric Regression Model, Penalized Spline, Bayesian Model, , Marginal

Distribution, Prior Distribution, Posterior Distribution, Bayes Factor, Consistent.

1. Introduction

In many applications in different fields, we need to use one of a collection of models for correlated data

structures, for example, multivariate observations, clustered data, repeated measurements, longitudinal data and

spatially correlated data. Often random effects are used to describe the correlation structure in clustered data,

repeated measurements and longitudinal data. Models with both fixed and random effects are called mixed

models. The general form of a linear mixed model for the ith

subject (i = 1,…, n) is given as follows (see

[10,13,15]), �� = ��� + ∑ ���� � + �� , ��~��0, �� , ��~�(0, ��) (1)

where the vector �� has length �� , �� and � are, respectively, a �� × � design matrix and a �� × �� design

matrix of fixed and random effects. β is a p-vector of fixed effects and u ! are the q -vectors of random effects.

The variance matrix � is a �� × �� matrix and �� is a �� × �� matrix.

We assume that the random effects { �� ; % = 1, … , ( ; ) = 1, … , *} and the set of error terms {��, … , �,} are

independent. In matrix notation (see [13,15]), � = �� + � + � (2)

Here � = (��, … . , �,). has length � = ∑ ��,� � , � = (��. , … , �,.). is a � × � design matrix of fixed effects, Z

is a � × � block diagonal design matrix of random effects, q = ∑ q!/! � , u = (u�0, … , u/0)0 is a q-vector of

random effects, � = 1%23(��, … , �,) is a � × � matrix and � = 1%23(��, … , ��) is a � × � block diagonal

matrix. In this paper, we consider semiparametric regression model (see [1,6,8,9,10,13,15]), for which the mean

function has two part, the parametric ( first part ) is assumed to be linear function of p-dimensional covariates

and nonparametric ( second part) is assumed to be a smooth penalized spline. By using a convenient connection

between penalized splines and mixed models, we can representation semiparametric regression model as mixed

model. In this model we investigate large sample properties of the Bayes factor for testing the pure polynomial

component of spline null model whose mean function consists of only the polynomial component against the

fully spline semiparametric alternative model whose mean function comprises both the pure polynomial and the

component spline basis functions. The asymptotic properties of the Bayes factor in nonparametric or

semiparametric models have been studied mainly in nonparametric density estimation problems related to

goodness of fit testing. These theoretical results include (see [4,7,12,14]). Compared to the previous approaches

to density estimations problems for goodness of fit testing and model selection, little work has been done on

nonparametric regression problems. Choi and et al 2009 studied the semiparametric additive regression models

as the encompassing model with algebraic smoothing and obtained the closed form of the Bayes factor and

studied the asymptotic behavior of the Bayes factor based on the closed form ( see [3] ).

In this paper, we obtain the closed form and studied the asymptotic behavior of the Bayes factor in spline

semiparametric regression model and we proved that the Bayes factor converges to infinity under the pure

polynomial model and the Bayes factor converges to zero almost surely under the spline semiparametric

regression alternative and we show that the Bayes factor is consistent.

Page 2: Large sample property of the bayes factor in a spline semiparametric regression model

Mathematical Theory and Modeling www.iiste.org

ISSN 2224-5804 (Paper) ISSN 2225-0522 (Online)

Vol.3, No.11, 2013

92

2. Semiparametric and Penalized Spline

Consider the model: 4� = ∑ �5� + �(567�,�6 8 ) + �� , % = 1,2, … , ( (3)

Where 4� , … , 4, response variables and the unobserved errors are �� , … , �, are known to be i.i.d. normal with

mean 0 and covariance :;<= with :;< known.

The mean function of the regression model in (3) has two parts. The parametric ( first part ) is assumed to be

linear function of p-dimensional covariates 5� and nonparametric (second part) �(567�,�) is function defined on

some index set > ⊂ ��.

The model (3) can be expressed as a smooth penalized spline with q degree, then it's become as: 4� = ∑ �5� + ∑ �67567�,�@ �6 8 + ∑ {�A(567�,� − CADA � )7@ + �� (4)

Where C�, … , CD are inner knots 2 < C� < , , , < CD < F.

By using a convenient connection between penalized splines and mixed models. Model (4) is rewritten as

follows (see [13,15])

� = �� + � + � (5)

where

� = G4�⋮4,I , � = JKKKKL �8⋮�6�67�⋮�67@MN

NNNO , � = G��⋮�DI , = P(567�,� − C�)7@ ⋯ (567�,� − CD)7@⋮ ⋱ ⋮(567�,, − C�)7@ ⋯ (567�,, − CD)7@

S

� = JKKKL1 5�� … 56�1 5�< … 56<

567�,� … 567�,�@567�,< … 567�,<@

⋮ ⋮ ⋱ ⋮1 5�, … 56, ⋮ ⋱ ⋮567�,, … 567�,,@ MN

NNO

We assume ( < C + � + � + 1, ( + T = C + � + � + 1, (T is integer number and greater than 1 ), and we

assume that the design matrix is orthogonal in the following way: U,. U, = (=, ∀( ≥ 1 (6)

where:

U, = ( � , �) and � be the ( − (� + � + 1) × ( − (� + � + 1) matrix, and let = � + X

Based on the above setup for regression model (5) and the assumption of the orthogonal design matrix (6), we

consider a Bayesian model selection problem in a spline semiparametric regression problem. Specifically, we

would like to choose between a Bayesian spline semiparametric model and its pure polynomial counterpart by

the criterion of the Bayes factor for two hypotheses, Y8: � = �� + � versus Y�: � = �� + � + � (7) As for the prior of �, 2(1 � under Y�, we assume � and u are independent and � ~ �(0, ∑8) , ∑8 = :[<=67@7� (8) � ~ �(0, ∑�) , ∑� = :\<=D (9)

3. Bayes factor and marginal distribution

The response 4� in (4) follows the normal distribution with mean ]� and variance :;< , where ]� = ∑ �5� +6 8)=1��)5�+1,%)+C=1^{�C(5�+1,%−^C)+�, for, %=1,2,…,(. Thus, given covariates and ](=(]1, …, ]()>, the

n-dimensional response vector �, = (4� , … , 4,). follows the n-dimensional normal distribution with mean ],

and covariance matrix :;<=, where =, is the ( × ( identity matrix. Also from the prior distributions specified in

(8) and (9), we can deduce the joint distribution of ], is the multivariate normal distribution (U_�) with mean

zero and ( × ( covariance matrix �∑8�. + ∑�. .

In summary, we have the following:- �,|], ~ U_�,(],, :;<=,) , ], ~ U_�,(0,, �∑8�. + ∑�.) (10)

Result 1:

Suppose the distribution of ], and �, are given by (10). Then the posterior distribution of P(],|�,)is the

multivariate normal with the following mean and variance. a(],|�, , :;<) = ∑<(∑< + :;<=, )b��,

Page 3: Large sample property of the bayes factor in a spline semiparametric regression model

Mathematical Theory and Modeling www.iiste.org

ISSN 2224-5804 (Paper) ISSN 2225-0522 (Online)

Vol.3, No.11, 2013

93

c2*(],|�, , :;<) = (∑<b� + (:;<=, )b�)b� = ∑<(∑< + :;<=, )b�:;<=,

Where ∑< = �∑8�. + ∑�. . Furthermore, marginally �, follows

�,~ U_�(0, :;<=, + ∑<) (11)

Note the (%, ))th element of ∑<, ∑<(%, )) = ∑ :[<67@7�A � 5�A5A + ∑ :\<d�AdADA � , %, ) = 1, … , ( (12)

where d�A = (567�,� − CA)7@

Applications of the previous result yield that, the marginal distribution of �, under Y8 and Y� are respectively �,(0, _8) and �,(0, _�), where _8 = :;<=, + �∑8�. _� = :;<=, + �∑8�. + ∑�. Hence, the Bayes factor for testing problem (7) is given by:

e8� = �(�|Y8)�(�|Y�) = 1f2gdet (_8) exp { − 12 �,._8b��,}1f2gdet (_�) exp { − 12 �,._�b��,}

= fmno (pq)fmno (pr) exp{− �< �,.( _8b�−_�b�)�,} (13)

4. Asymptotic behavior of Bayes factor

The Bayes factor is said to be consistent if ( see[3,14]) s%�,→u e8� = ∞ , 2. T. w8u under the Y8, (14) s%�,→u e8� = 0 , 2. T. w�u under the Y�, (15)

We investigate the consistency of the Bayes factor, by establishing (14) and (15), under the spline

semiparametric regression model described in the section 2. By truncating the regression up to the first ( terms.

We get 4� = ∑ �5� + ∑ �67567�,�@ �6 8 + ∑ {�A(567�,� − CA,b(67@7�)A � )7@ } + �� Now define _�,, as the covariance matrix of the marginal distribution of �, under Y� as:- _�,, = :;<=, + �∑8�. + �∑�,��.

= :;<=, + U,x,U,. (16)

Where ∑�,� is diagonal matrix with diagonal elements :\< from size ( − (� + � + 1) × ( − (� + � + 1) and

x, = y∑8 0��0�� ∑�,�z

where 0��, 0�� are the matrices for zeros element with size (� + � + 1) × ( − (� + � + 1) and ( − (� + � +1) × (� + � + 1), respectively.

Then, we can rewrite (16) by using x, as: _�,, = U, { |}~, =, + x, � U,. (17)

Let _8 be the covariance matrix of the marginal distribution of �, under Y8, then _8 can be represented as: _8 = :;<=, + �∑8�. + �0���.

= :;<=, + U,�,U,. where: �, = y ∑8 0��0�� 0��z

where 0�� is the matrix for zeros element with size ( − (� + � + 1) × ( − (� + � + 1), then we can rewrite _8

as the following: _8 = U, { |}~, =, + �, � U,. (18)

Then we can use the two matrices above (_8 , _�,,) for matrix inversions and calculating determinants as shown

in following lemma.

Lemma1: Let _�,, and _8 be defined as in (17) and (18) respectively, and suppose U, satisfied (6). Then, the

following hold.

i) U,U,. = U,. U, = (=,

ii) det(_8) = (67@7�(|}~, + :[<)67@7� :;<(,b(67@7�)) iii) det�_�,,� = (, (|}~, + :[<)67@7�(|}~, + :\<),b(67@7�) iv) _8b� = �,~ U, { |}~, =, + �, �b� U,.

Page 4: Large sample property of the bayes factor in a spline semiparametric regression model

Mathematical Theory and Modeling www.iiste.org

ISSN 2224-5804 (Paper) ISSN 2225-0522 (Online)

Vol.3, No.11, 2013

94

v) _�,,b� = �,~ U, { |}~, =, + x, �b� U,. Proof:-

i) By multiplying U, and U,. to equation (6) from right and left side, we have:- U,U,.U,U,. = U,(=,U,.

= U,(U,.

= (U,U,.

Multiplying (U,U,.)b� to the above equation from left side, we get:- U,U,.U,U,.(U,U,.)b� = (U,U,.(U,U,.)b� U,U,. = (=,

ii) _8 = U, { |}~, =, + �, � U,., and by using (6), we get

= �� , ��JKKKKKKL(|}~, + :[<) 0 ⋯ 0 0

0 (|}~, + :[<) ⋯ 0 0⋮00

⋮00⋱ ⋮ ⋮⋯ |}~, 0⋯ 0 |}~, MN

NNNNNO �� , ��.

= JKKKKKL((|}~, + :[<) 0 ⋯ 0 0

0 ((|}~, + :[<) ⋯ 0 0⋮00⋮00

⋱ ⋮ ⋮⋯ :;< 0⋯ 0 :;<MNNNNNO

Then

det(_8) = (67@7� (|}~, + :[<)67@7�:;<(,b(67@7�))

iii) _�,, = U, { |}~, =, + x, � U,.

= �� , ��JKKKKKKL(|}~, + :[<) 0 ⋯ 0 0

0 (|}~, + :[<) ⋯ 0 0⋮00

⋮00⋱ ⋮ ⋮⋯ |}~, + :\< 0⋯ 0 |}~, + :\<MN

NNNNNO �� , ��.

= JKKKKKKL((|}~, + :[<) 0 ⋯ 0 0

0 ((|}~, + :[<) ⋯ 0 0⋮00

⋮00⋱ ⋮ ⋮⋯ ((|}~, + :\<) 0⋯ 0 ((|}~, + :\<)MN

NNNNNO

Then

det�_�,,� = (, (:;<( + :[<)67@7�(:;<( + :\<),b(67@7�) iv) _8b� = {U, { |}~, =, + �, � U,. }b�

= U,.b� { |}~, =, + �, �b� U,b�

= ���q, (=, U,.b� { |}~, =, + �, �b� U,b�(=, ���q

,

= �,~ U,U,. U,.b� { |}~, =, + �, �b� U,b�U,U,. by lemma 1.i

= �,~ U, { |}~, =, + �, �b� U,.

Page 5: Large sample property of the bayes factor in a spline semiparametric regression model

Mathematical Theory and Modeling www.iiste.org

ISSN 2224-5804 (Paper) ISSN 2225-0522 (Online)

Vol.3, No.11, 2013

95

v) _�b� = {U, { |}~, =, + x, � U,. }b�

= U,.b� { |}~, =, + x, �b� U,b�

= ���q, (=, U,.b� { |}~, =, + x, �b� U,b�(=, ���q

,

= �,~ U,U,. U,.b� { |}~, =, + x, �b� U,b�U,U,.

= �,~ U, { |}~, =, + x, �b� U,.

Now let e�8� = fmno (pq,�)fmno (pr) exp{− �< �,.( _8b�−_�,,b�)�,} (19)

Which an approximation of e8� with _� replaced by _�,, in (13)

Thus,

s�3 e�8� = 12 s�3 det (_�,,)det (_8) − 12 �,.( _8b�−_�,,b�)�,

2 log e�8� = log det (_�,,)det (_8) − �,.( _8b�−_�,,b�)�,

By results of lemma1 we have:-

2 log e�8� = log (, (:;<( + :[<)67@7�(:;<( + :\<),b(67@7�) (67@7� (:;<( + :[<)67@7�:;<

(,b(67@7�)) − 1(< ��,.U, �� :;<( =, + �, �b� − � :;<( =, + x, �b�� U,.�,�

= log ,��(����q) (�}~� 7|�~)��(����q) |}~(��(����q)) − �,~ {�,.U, y0(67@7�)×(67@7�) 0��0�� �� z U,.�,}

= log( |}~7,|�~ |}~ ),b(67@7�) − �,.�,�,}

where �, = 1(< U, y0(67@7�)×(67@7�) 0��0�� �� z U,.

and �� = x%23 ( ,~|�~(|}~7,|�~)|}~ )

To establish the consistency of the Bayes factor we focus on e�8� , first and the remaining terms will be

considered later. Without loss of generality we assume :;< = 1 in the remainder of this paper.

Now let �,,� = ∑ ,|�~(�7,|�~),� � = ,~|�~(�7,|�~) (20) �,,< = ∑ log(1 + (:\<),� � = ( log(1 + (:\<) (21) �,,� = ∑ ( ,|�~�7,|�~)<,� � = (�( |�~�7,|�~)< (22)

Lemma2: let �, = (4� , … , 4,). where 4� ′T are independent normal random variable with mean ∑ �5� +6 8)=1���+)5�+1,%) and variance :�2=1, then there exist a positive constant �1 such that

���,q { ∑ log(1 + (:\<) − ,b(67@7�)� � �,.�,�,� > ��, with probability tending to 1.

This implies that, under Y8.

log e�8� ,→u���� ∞, in probability. Proof

Let �, = ( ��, �<, … , �,) are independent standard normal random variables, Note that �, �, � + �� and �,.�,�, = (�, + ��).�, (�, + ��)

= �,.�,�, + (��).�,(��) + �,.�,�� + (��).�,�,.

By the orthogonality of the design matrix

Page 6: Large sample property of the bayes factor in a spline semiparametric regression model

Mathematical Theory and Modeling www.iiste.org

ISSN 2224-5804 (Paper) ISSN 2225-0522 (Online)

Vol.3, No.11, 2013

96

(��).�, = (��). �,~ U, y0(67@7�)×(67@7�) 0��0�� �� z U,. = 0.

Then �,.�,�, = �,.�,�,.

Using the expectation formula of the quadratic form given in (12), we obtain

0 < a(�,.�,�,) = �*(�,) = � (:\<(1 + (:\<),

� �= �,,�

Note that �,< = 1(  U, y0(67@7�)×(67@7�) 0��0�� �� z U,.U, y0(67@7�)×(67@7�) 0��0�� �� z U,. = �,¡ U, y0(67@7�)×(67@7�) 0��0�� ��< z U,.

and �*(�,<) = �,~ ∑ ( ,~|�~�7,|�~)<,� � = ∑ ( ,|�~�7,|�~)<,� � = �,,�.

Similarly, using the variance formula of the quadratic form of the multivariate normal variables, we have c2*(�,.�,�,) = 2�*(�,<) = 2 ∑ ( ,|�~�7,|�~)<,� � = 2�,,�.

Let ¢, = ��,~b��,q<��,q . Then,

w* { ���,q { ∑ log(1 + (:\<)< − ,b(67@7�)� � �,.�,�,� ≤ ¢, }

= w* � { �,.�,�,}�,,� ≥ �,,<�,,� − ¢, � = w* � { �,.�,�,}�,,� − a(�,.�,�,)�,,� ≥ �,,<�,,� − ¢, − a(�,.�,�,)�,,� � = w* � 1�,,� {(�,.�,�,) − a(�,.�,�,)} ≥ �,,< − a(�,.�,�,)�,,� − ¢, � ≤ ���,q~ ¤¥��¦�§¨�¦��

(©�,~�©�,q~©�,q )~ = ª��,¡(��,~b��,q)~ ,→u���� 0,

where the last statement holds from the Chebyshev inequality.

Then �,,< − �,,� ≥ ¢��,,� for some ¢� > 0 and let ¢, ≥ ¢�/2.

Hence, we conclude that there exists a positive constant �� = ¢�/2 such that ���,q { ∑ log(1 + (:\<)< − ,b(67@7�)� � �,.�,�,� > ��, with probability tending to 1.

Consequently, it follows that 2 log e�8� = ∑ log(1 + (:\<)< − ,b(67@7�)� � �,.�,�, ,→u���� ∞ in probability under Y8 log e�8� ,→u���� ∞

Lemma3

Suppose we have the penalized spline smoothing, then 1�,,� �*� _�,,b� − _�b� � ,→u���� 0

Proof:

Since _� − _�,, is the covariance matrix of ]∗(5) = ∑ dA,7XA ,b(67@)7� �A = X� it is a positive definite matrix.

Thus _�,,b� − _�b� is also a positive definite matrix. Thus, �*� _�,,b� � ≥ �*( _�b�), also from the matrix analogue

of the Cauchy-Schwarz inequality, we have

�*�X. _�,,b� � ≤ ­�*(X. X)�*�_�,,b<� = ®X® ¯_�,,b�¯ where

¯_�,,b�¯ = { �*�_�,,b<� }�< = � 1(< { � °:;<( + :[<±b<67@7�� �

+ � °:;<( + :\<±b<,b(67@7�)� �

}� �<

Note that

Page 7: Large sample property of the bayes factor in a spline semiparametric regression model

Mathematical Theory and Modeling www.iiste.org

ISSN 2224-5804 (Paper) ISSN 2225-0522 (Online)

Vol.3, No.11, 2013

97

∑ ²|}~, + :[<³b<67@7�� � = 67@7�´�}~� 7|µ~¶~ = 67@7�

·�}~�� �µ~� ¸~ = ,~(67@7�)(|}~7, |µ~)~ ≤ 67@7�|µ¹ = º(1)

And

� °:;<( + :\<±b<,b(67@7�)� �

= ( − (� + � + 1)´:;<( + :\<¶< ≤ (

´:;<( + :\<¶< = (�(:;< + (:\<)< ≤ (�((:\<)< = º(() Thus, it follows that ¯_�,,b�¯ = »º ²�,³¼� <⁄ = º(()b� <⁄

Since sup¿∈�8,��®X® < 1 from the distances between truncated power functions, it follows that ®X®< = ∑ XA< (5�),� � ≤ ( , where XA< (5�) = (567�,� − CA)7@ is element in X

Thus, ���,q �*� _�,,b� − _�b� � = ���,q �*� _�,,b�( _� − _�,,) _�b� �

= ���,q ∑ :\<(A.,7XA ,b(67@7�) _�,,b�(A. _�b�).)

≤ |�~��,q ¯_�,,b�¯< ∑ ®A®<,7XA ,b(67@7�) ≤ ,|�~, ��,q ∑ 1,7XA ,b(67@7�) ≤ D |�~ ��,q = D|�~�~��~(q����~ )

= D(�7,|�~),~ Thus 1�,,� �*� _�,,b� − _�b� � ≤ ^(1 + (:\<)(< ,→u���� 0

Theorem(1)

Consider the testing problem in (7), suppose that the true model is Y8 ( pure polynomial model ) and let w8,

denote the true distribution of the whole data, with p.d.f w8(4|�8) = ∅{( 4 − �.�8)/ :;}, where ∅(∙) is the

standard normal density, then, the Bayes factor is consistent under the null hypothesis Y8: lim,→u e8� = ∞ in w8, probability.

Proof log e8� = 12 log det (_�)det (_8) − 12 �,.( _8b�−_�b�)�, = �< log mno (pq)mno (pq,�) + �< log mno (pq,�)mno (pr) − �< �,.( _8b�−_�,,b�)�, − �< �,.(_�,,b�−_�b�)�, = loge�8� + �< log mno (pq)mno (pq,�) − �< �,.(_�,,b�−_�b�)�,

= loge�8� + Å� + Å< where Å� = �< log mno (pq)mno (pq,�) , Å< = − �< �,.(_�,,b�−_�b�)�, Since _� − _�,, is a positive definite matrix, 1Æ� (_�) ≥ 1Æ� (_�,,), thus Å� ≥ 0 a( �,.�_�,,b�−_�b�)�, � = :;< �*� _�,,b� − _�b�� + (��).� _�,,b� − _�b�� ��, since _�,,b� − _�b�, _�,,b� and _�b� are all nonnegative definite matrices, it follows that 0 ≤ (��).(_�,,b� − _�b�) �� ≤ (��). _�,,b� ��. By orthogonality of the design matrix (6), we have �.U, = U,.� = ( � =67@7� , 0,b(67@7�)�. Thus similarly to the proof of lemma (3), we get:

(��). _�,,b� �� = �,~ (��).U, { |}~, =, + x, �b� U,. ��

= �,~ �.( � =67@7� , 0,b(67@7�)� { |}~, =, + x, �b� ( � =67@7� , 0,b(67@7�)� �

= �. � =67@7� , 0,b(67@7�)� { |}~, =, + x, �b� � =67@7� , 0,b(67@7�)� �

= ∑ ��<67@7�� � { |}~, + :[< �b� = º(1). Let Ç > 0, by the Markov inequality when r = 1 and lemma (3), we have:

�* { 1�,,� �,.�_�,,b�−_�b�)�, > Ç È ≤ 1�,,� { a( �,.�_�,,b�−_�b�)�, �Ç }

Page 8: Large sample property of the bayes factor in a spline semiparametric regression model

Mathematical Theory and Modeling www.iiste.org

ISSN 2224-5804 (Paper) ISSN 2225-0522 (Online)

Vol.3, No.11, 2013

98

≤ ���,q É��pq,��qbpq�q �Ê + ���,q (Ë[)§ pq,��q Ë[Ê ,→u���� 0.

Note that _�,,b� − _�b� is nonnegative definite and thus, �,.(_�,,b�−_�b�)�, is nonnegative random variable.

Therefore, Å< = ���,,��. The above results with lemma (2) imply there exists a constant �� such that 1�,,� log e8� ≥ 1�,,� log e�8� + �(1)

> �� + �(1). That is, e8� ≥ exp( �,,�{ �� + �(1)}) ,→u���� ∞ in w8, probability.

This completes the proof.

Now we consider asymptotic behavior of the Bayes factor under Y� and assuming that the true model for �, is

in Y�.

Lemma(4):

Let �, = (4� , … , 4,) where 4� ′T are independent normal random variables with mean ∑ �5� +6 8)=1���+)5�+1,%)+C=1(−(�+�+1)d%C�C and variance :�2=1 and let Ì(= a �( > �( a �( . Then, lim,→u �Í� �,.�,�, = 1, in w�u probability.

Proof:

From the moment formula of the quadratic form of normal random variables, the expectation and variance of

quadratic form �,.�,�, are given by: a(�,.�,�, ) = �* ( �, ) + a( �, ). �, a( �, ) = ∑ ,|�~(�7,|�~),b(67@7�)� � + a( �, ). �, a( �, ) = �,,� + Ì, , and c2*(�,.�,�, ) = 2�* � �,< � + 4a( �, ). �,< a( �, ). Note Ì, = a( �, ). �, a( �, )

= � �. , �,b(67@7�). �U,. �,~ U, y0(67@7�)×(67@7�) 0��0�� �� z U,. U,� �. , �,b(67@7�). �.

= � �. , �,b(67@7�). � (. �,~ y0(67@7�)×(67@7�) 0��0�� �� z ( � �. , �,b(67@7�). �.

= � �. , �,b(67@7�). � y0(67@7�)×(67@7�) 0��0�� �� z � �. , �,b(67@7�). �.

= �Ab(67@7�). �� �,b(67@7�) = ∑ ,~|�~\Ï~�7,|�~,b(67@7�)� � ,

since T��� �� < ∞ , we have Ì, ≤ T��� ��< ( ∑ ,|�~�7,|�~,� � = º�( �,,� �.

On the other hand, consider a fixed positive integer � ≥ 1 such that �Ð< > 0 and a sufficiently large ( with ,|�~�7,|�~ ≥ �<. For such ( and �.

Ì, = � (<:\<��<1 + (:\<,b(67@7�)

� �≥ (<:\<1 + (:\< �Ð<

= ( ,|�~�7,|�~ �Ð< ≥ ( \Ñ~< .

Further, similarly to the previous calculation, we have

a(�,).�,<a(�,) = ��. , �,b(67@7�). �U,. 1(� U, y0(67@7�)×(67@7�) 0��0�� ��< z U,.U,��. , �,b(67@7�). �. = 1( �,b(67@7�). ��,,b(67@7�)< �,b(67@7�) = �, ∑ ,¹|�¹\Ï~(�7,|�~)~,b(67@7�)� � .

As in case of a( �, ). �, a( �, )

Page 9: Large sample property of the bayes factor in a spline semiparametric regression model

Mathematical Theory and Modeling www.iiste.org

ISSN 2224-5804 (Paper) ISSN 2225-0522 (Online)

Vol.3, No.11, 2013

99

a( �, ).�,< a( �, ) ≤ ∑ ,¡|�¹\Ï~��7,|�~�~,� � ≤ ( T��� ��< ∑ ,~|�¹��7,|�~�~,� � = º( (�,,�).

Consider a fixed positive integer � ≥ 1 such that �Ð< > 0 and a sufficiently large n with ,|�~�7,|�~ ≥ �< . Again, for

such n and N, a( �, ). �,< a( �, ) = ∑ ,¡|�¹\Ï~��7,|�~�~ ,b(67@7�)� � ≥ ( �Ð< ² ,|�~�7,|�~³< ≥ \Ñ~  (.

By combining value of �,,� and the previous result, we have c2*(�,.�,�, ) ≤ º�( �,,� �.

Furthermore, note that a( �, ). �, a( �, ) ≥ a( �, ). �,< a( �, ). Let Ç > 0, by the Chebshev inequality. �* » Ò �Í� �,.�,�, − Ó�¦�§¨�¦��Í� Ò > Ç ¼ ≤ ¤¥��¦�§¨�¦��Í�~Ê~ = º( (b< ) ,→u���� 0.

Since lim,→u Ó�¦�§¨�¦��Í� = lim,→u ²��,qÍ� + Í�Í� ³ = 1 lim,→u �Í� �,.�,�, = 1 , in probability

This completes the proof.

Lemma(5):

Suppose we have the penalized spline smoothing, then

lim,→u1Ì, log 1Æ�(_�)1Æ�(_�,,) = 0

Proof:

From a property of the determinant of the positive definite matrix, we have 1Æ�(_�) ≤ ∏ �:;< + ∑ :[<67@7�A � 5�A< + ∑ :\<d�A<,7XA � �,� � = (, ∏ ´ |}~, + ∑ |µ~¿ÏÕ~,67@7�A � + ∑ |�~ÖÏÕ~,,7XA � ¶,� � ,

where 5�A , d�A is the k th element of 5� , d� respectively.

From lemma (1), we have: det�_�,,� = (, (|}~, + :\<),b(67@7�) (|}~, + :[<)67@7�.

Thus

log 1Æ�(_�)1Æ�(_�,,) = log(1Æ�(_�)) − log(1Æ��_�,,�)

≤ ( log ( + ∑ log ´|}~, + ∑ |µ~¿ÏÕ~,67@7�A � + ∑ |�~ÖÏÕ~,,A � ¶,� � − { ( log ( + (� + � + 1) log ²|}~, +:�2+(−�+�+1:�2(+:�2=º1−º1=º(1).

Therefore, it follows that �Í� log �×É(pq)�×É(pq,�) ,→u���� 0

Theorem(2):

Consider the testing problem in (7), suppose that the true model is in Y� ( the penalized spline semiparametric

model ) and let w�, denote the true distribution of the whole data, with p.d.f. ��(4|��, ��) = ∅{(4 −(�.�� + ��))/:;}, where ∅(∙) is the standard normal density. Then the Bayes factor is consistent under the

alternative hypothesis Y�: lim,→u e8� = 0 in w�, probability.

Proof: log e8� = 12 log det (_�)det (_8) − 12 �,.( _8b�−_�b�)�, = �< log mno (pq)mno (pq,�) + �< log mno (pq,�)mno (pr) − �< �,.( _8b�−_�,,b�)�, − �< �,.(_�,,b�−_�b�)�, = loge�8� − �< �,.( _8b�−_�,,b�)�, − �< �,.(_�,,b�−_�b�)�,

= loge�8� + Å� + Å< ,

Page 10: Large sample property of the bayes factor in a spline semiparametric regression model

Mathematical Theory and Modeling www.iiste.org

ISSN 2224-5804 (Paper) ISSN 2225-0522 (Online)

Vol.3, No.11, 2013

100

where

loge�8� = 12 log det (_�,,)det (_8) − 12 �,.( _8b�−_�,,b�)�,

Å� = �< log mno (pq)mno (pq,�) , Å< = − �< �,.(_�,,b�−_�b�)�, . First, by lemma (1),

mno (pq,�)mno (pr) = ,� (�}~� 7|µ~)����q(�}~� 7|�~)��(����q)

,����q(�}~� 7|µ~)����q |}~(��(����q)) = ,��(����q)´�}~� 7|�~¶��(����q)|}~(��(����q)) = ²|}~7,|�~|}~ ³,b(67@7�)

=²1 + ,|�~|}~ ³,b(67@7�) = º(() ⟹ log ²1 + ,|�~|}~ ³,b(67@7�) = º(().

And from lemma (4) Ì, ≤ º((<). Then �Í� log mno (pq,�)mno (pr) ≤ º((b�) ,→u���� 0.

Therefore, 1Ì, loge�8� = 12 ° 1Ì log det�_�,,�det(_8) − 1Ì �,.( _8b�−_�,,b�)�,±

Since _8b� − _�,,b� = �,, then �Í� �,.( _8b�−_�,,b�)�, = �Í� �,.�,�,, from lemma (4), lim,→u �Í� �,.�,�, = 1. Then 1Ì, loge�8� ,→u���� = 12 ° 1Ì log det�_�,,�det(_8) − 1Ì �,.( _8b�−_�,,b�)�,± ,→u���� = 12 (0 − 1) = −12

∴ �Í� loge�8� ,→u���� b�< , in the w�, probability.

And from lemma (5), �Í� Å� ,→u���� 0, finally, note that Å< is nonpositive random variable since _�,,b� − _�b� is a nonnegative definite

matrix.

Therefore, by combining the above results, there exist �< > 0 such that lim,→u sup �Í� log e8� ≤ bÚ~< , with probability tending to 1.

Hence, e8� ,→u���� 0, in w�, probability.

This completes the proof.

5.Conclusion

1- The Bayes factor for testing problem Y8: � = �� + � versus Y�: � = �� + � + � is given by: e8� = fmno (pq)fmno (pr) exp{− �< �,.( _8b�−_�b�)�,}. 2- If �, = (4� , … , 4,). where 4� ′T are independent normal random variable with mean ∑ �5� +6 8 ∑ �67567�,�@ � and variance :;< = 1, then there exist a positive constant �� such that ���,q { ∑ log(1 + (:\<) − ,b(67@7�)� � �,.�,�,� > ��, with probability tending to 1.

This implies that, under Y8. log e�8� ,→u���� ∞, in probability.

3- If the true model is Y8 ( the pure polynomial model ), then, the Bayes factor is consistent under the null

hypothesis Y8. This implies that, lim,→u e8� = ∞ in w8, probability.

4- If �, = (4� , … , 4,) where 4� ′T are independent normal random variables with mean ∑ �5� +6 8)=1���+)5�+1,%)+C=1(−(�+�+1)d%C�C and variance :�2=1 and if Ì(=a�(>�(a(�(). Then, lim,→u �Í� �,.�,�, = 1, in w�u probability.

5- If the true model is in Y� ( the penalized spline semiparametric model ) ,then the Bayes factor is

consistent under the alternative hypothesis Y�. This implies that, lim,→u e8� = 0 in w�, probability.

6.References

[1] Angers, J.-F. and Delampady, M. (2001). Bayesian nonparametric regression using wavelets. Sankhy2Û Ser. B

63, 287–308.

[2] Berger, J. O. (1985). Statistical Decision Theory and Bayesian Analysis. Springer, New York.

Page 11: Large sample property of the bayes factor in a spline semiparametric regression model

Mathematical Theory and Modeling www.iiste.org

ISSN 2224-5804 (Paper) ISSN 2225-0522 (Online)

Vol.3, No.11, 2013

101

[3] Choi, T., Lee, J. and Roy, A. (2008). A note on the Bayes factor in a semiparametric regression model,

Journal of Multivariate Analysis. 1316-1327.

[4] G.Gosal, J. Lember, A. Van der Vaart, Nonparametric Bayesian model selection and averaging, Electron.J.

Stat. 2 (2008) 63-89.

[5] Ghosh, J. K., Delampady, M. and Samanta, T. (2006). Introduction to Bayesian Analysis: Theory and

Methods. Springer, New York.

[6] Ghosh, J. K. and Ramamoorthi, R. V. (2003). Bayesian Nonparametric. Springer, New York.

[7] I. Verdinelli, L. Wasserman, Bayesian goodness –of-fit testing using infinite- dimentional exponential

families, Ann. Stat.26 (4) (1998)1215-1241.

[8] L. H. Zhao, Bayesian aspects of some nonparametric problems., Ann.Statist. 28 (2000) 532-552.

[9] P. J. Lenk, Bayesian inference for semiparametric regression using a Fourier representation, J. R, Stat. Soc.

Ser. B Stat. Methodol. 61 (4) 1999) 863-879.

[10] Natio, k. (2002) " Semiparametric regression with multiplicative adjustment '' Communications in statistics,

Theory and methods 31, 2289-2309.

[11] Pelenis, J (2012) Bayesian Semiparametric Regression. Institute for Advanced Studies, Vienna 1-39.

[12] R.McVinish, J. Rousseau,K. Mengerson, Bayesian goodness –of-fit testing with mixtures of triangular

distributions, Scand. J, statist. (2008), in press (doi:10.1111/j.1467-9469.2008.00620.x).

[13] Ruppert, D, Wand, M. P. and Carroll, R. J.(2003) Semiparametric Regression. Cambridge University Press.

[14] S.C. Dass, lee, A note on the consistency of Bayes factors for testing point null versus non-parametric

alternative, J.Statist. 26 (4) (1998) 1215-1241.

[15] Tarmaratram, K. (2011) Robust Estimation and Model Selection in Semiparametric Regression Models,

proefschrift voorgedragen tot het behalen van de grad van Doctor.

Page 12: Large sample property of the bayes factor in a spline semiparametric regression model

This academic article was published by The International Institute for Science,

Technology and Education (IISTE). The IISTE is a pioneer in the Open Access

Publishing service based in the U.S. and Europe. The aim of the institute is

Accelerating Global Knowledge Sharing.

More information about the publisher can be found in the IISTE’s homepage:

http://www.iiste.org

CALL FOR JOURNAL PAPERS

The IISTE is currently hosting more than 30 peer-reviewed academic journals and

collaborating with academic institutions around the world. There’s no deadline for

submission. Prospective authors of IISTE journals can find the submission

instruction on the following page: http://www.iiste.org/journals/ The IISTE

editorial team promises to the review and publish all the qualified submissions in a

fast manner. All the journals articles are available online to the readers all over the

world without financial, legal, or technical barriers other than those inseparable from

gaining access to the internet itself. Printed version of the journals is also available

upon request of readers and authors.

MORE RESOURCES

Book publication information: http://www.iiste.org/book/

Recent conferences: http://www.iiste.org/conference/

IISTE Knowledge Sharing Partners

EBSCO, Index Copernicus, Ulrich's Periodicals Directory, JournalTOCS, PKP Open

Archives Harvester, Bielefeld Academic Search Engine, Elektronische

Zeitschriftenbibliothek EZB, Open J-Gate, OCLC WorldCat, Universe Digtial

Library , NewJour, Google Scholar