Multivariate area level models for small areaMultivariate area level models for small area...

21
Multivariate area level models for small area estimation. a Domingo Morales González [email protected] Universidad Miguel Hernández de Elche a In collaboration with Roberto Benavent Multivariate area level models for small area estimation – p. 1/

Transcript of Multivariate area level models for small areaMultivariate area level models for small area...

Page 1: Multivariate area level models for small areaMultivariate area level models for small area estimation. a Domingo Morales González d.morales@umh.es Universidad Miguel Hernández de

Multivariate area level models for small areaestimation. a

Domingo Morales Gonzá[email protected]

Universidad Miguel Hernández de Elche

aIn collaboration with Roberto Benavent

Multivariate area level models for small area estimation – p. 1/21

Page 2: Multivariate area level models for small areaMultivariate area level models for small area estimation. a Domingo Morales González d.morales@umh.es Universidad Miguel Hernández de

Index.

1. Small Area Estimation.

2. Multivariate Fay-Herriot models.

3. Application to Spanish Living Condition Survey data.

Multivariate area level models for small area estimation – p. 2/21

Page 3: Multivariate area level models for small areaMultivariate area level models for small area estimation. a Domingo Morales González d.morales@umh.es Universidad Miguel Hernández de

Small Area Estimation

Official surveys are designed to obtain reliable estimates in planneddomains.

For example, The Spanish Living Condition Survey (SLCS) hassufficiently large sample sizes in the autonomous communities (planneddomains).

Then, thedirect estimatorshave acceptably small mean squared errors inthe autonomous communities .

However, the SLCS sample sizes are too small within provinces(unplanned domainsor small areas) and therefore the direct estimates arenot reliable in these domains.

Small Area Estimationis a branch of Statistics that gives procedures toimprove the direct estimates in unplanned domains.

We introduce small area estimators based onMultivariate area-level mixed models.

Multivariate area level models for small area estimation – p. 3/21

Page 4: Multivariate area level models for small areaMultivariate area level models for small area estimation. a Domingo Morales González d.morales@umh.es Universidad Miguel Hernández de

Multivariate Fay-Herriot models

LetU be a finite population partitioned intoD domainsU1, . . . , UD.

Let µd = (µd1, . . . µdR)′ be a vector of characteristics of interest in the

domaind.

Let yd = (yd1, . . . ydR)′ be a vector of direct estimators ofµd.

Themultivariate Fay-Herriot modelis defined in two stages.

Thesampling modelis

yd = µd + ed, d = 1, . . . , D, (1)

the vectorsed ∼ N (0, Ved) are independent,theR ×R covariance matricesVed are known.

Thelinking modelassumes that theµdr ’s are linearly related toxdr = (xdr1, . . . , xdrpr

) with pr explanatory variables.

xd = diag(xd1, . . . , xdR)R×p with p =∑R

r=1 pr.

β = (β′1, . . . , β

′r)

′p×1.

Multivariate area level models for small area estimation – p. 4/21

Page 5: Multivariate area level models for small areaMultivariate area level models for small area estimation. a Domingo Morales González d.morales@umh.es Universidad Miguel Hernández de

Multivariate Fay-Herriot models

González-Manteiga et al. (2008b) considered the linking model

µd = xdβ + 1Rvd, vdind∼ N1(0, σ

2v), d = 1, . . . , D, (2)

where1n is then× 1 vector with all elements equal to 1.

We introduce amultivariate Fay-Herriot modelby assuming (1) andsubstituting the condition (2) by the more realistic linking model

µd = xdβ + ud, udind∼ NR(0, Vud), d = 1, . . . , D, (3)

the vectorsud’s are independent of the vectorsed’s.TheR×R covariance matricesVud depend onm unknown

parameters,θ1, . . . , θm, with 1 ≤ m ≤ R(R−1)2 +R.

Multivariate area level models for small area estimation – p. 5/21

Page 6: Multivariate area level models for small areaMultivariate area level models for small area estimation. a Domingo Morales González d.morales@umh.es Universidad Miguel Hernández de

Multivariate Fay-Herriot models

We consider four particularizations of model (3).

Model 0is the product of independent marginal models that assumes (1),(3) and takes

Ved = diag1≤r≤R

(σ2edr), Vud

= diag1≤r≤R

(σ2ur), d = 1, . . . , D, (4)

the sampling error variancesσ2edr ’s are known,

m = R andθr = σ2ur, r = 1, . . . , R.

The components ofed andud are independent under Model 0.

Model 1assumes (1) and (3), with a known but not necessarily diagonalmatrixVe, and independent components ofud, i.e.

Vud= diag

1≤r≤R

(σ2ur), d = 1, . . . , D, (5)

m = R andθr = σ2ur, r = 1, . . . , R.

Model 0 is Model 1 withVe diagonal.Multivariate area level models for small area estimation – p. 6/21

Page 7: Multivariate area level models for small areaMultivariate area level models for small area estimation. a Domingo Morales González d.morales@umh.es Universidad Miguel Hernández de

Multivariate Fay-Herriot models

Model 2assumes (1), (3) with AR(1)-correlatedud, i.e.

Vud = σ2uΩd(ρ), Ωd(ρ) =

1

1− ρ2

1 ρ · · · ρR−1

ρ 1 · · · ρR−2

......

...

ρR−1 ρR−2 · · · 1

,

(6)

m = 2, θ1 = σ2u, θ2 = ρ.

Model 3assumes (1), (3) with HAR(1)-correlatedud, i.e.

udr = ρudr−1+adr, ud0 ∼ N(0, σ2

0

), adr

ind∼ N(0, σ2

r

), r = 1, . . . , R,

(7)

σ20 = 1,

ud0, adr, r = 1, . . . , R, are independent,

m = R+ 1 andθ1 = σ21 , . . . , θR = σ2

R, θR+1 = ρ.

Multivariate area level models for small area estimation – p. 7/21

Page 8: Multivariate area level models for small areaMultivariate area level models for small area estimation. a Domingo Morales González d.morales@umh.es Universidad Miguel Hernández de

Application to real data

We are interested in estimating small area poverty proportions and gapsby using data from the 2006 Spanish Living Condition Survey (SLCS).

We calculate EBLUPs based on multivariate Fay-Herriot models.

The target domains are the 52 Spanish provinces crossed by sex(D = 104).

The target indicators are the poverty proportion (α = 0) and gap (α = 1),

Yαd =1

Nd

Nd∑

j=1

yαdj , yαdj =

(z − Edj

z

I(Edj < z),

z is the poverty line andEdj is the equivalised net income of individualj within domaind,j = 1, . . . , Nd, d = 1, . . . , D.

s is the global sample andsd is the sample of domaind The sample sizes aren andnd respectively, so that

s = ∪Dd=1sd andn =

∑D

d=1 nd.Multivariate area level models for small area estimation – p. 8/21

Page 9: Multivariate area level models for small areaMultivariate area level models for small area estimation. a Domingo Morales González d.morales@umh.es Universidad Miguel Hernández de

Application to real data

Thedirect estimatorof the domaintotalYdr =∑Nd

j=1 ydrj is

Y dirdr =

j∈sd

wdj ydrj ,

wherewdj ’s are the official calibrated sampling weights.

The estimated domain size isNdird =

∑j∈sd

wdj .

A direct estimatorof the domainmeanYdr is ydr = Y dirdr /Ndir

d .

Theydr ’s are the responses in the area-level model.

The design-based covariances of these estimators are approximated by

covπ(Ydirdr1

, Y dirdr2

) =∑

j∈sd

wdj(wdj − 1)(ydr1j − ydr1)(ydr2j − ydr2),

σπ,d,r1,r2 = covπ(ydr1 , ydr2) = covπ(Ydirdr1

, Y dirdr2

)/N2d .

We take theσπ,d,r1,r2 ’s as the known elements of the matrixVed in themultivariate Fay-Herriot models. Multivariate area level models for small area estimation – p. 9/21

Page 10: Multivariate area level models for small areaMultivariate area level models for small area estimation. a Domingo Morales González d.morales@umh.es Universidad Miguel Hernández de

Application to real data

The availableauxiliary variablesare the domain proportions of people inthe categories of the following classification variables:

Age (age1: ≤ 15, age2: 16− 24, age3: 25− 49, age4: 50− 64,age5: ≥ 65),Education (edu0: less than primary,edu1: primary,edu2: secondary,edu3: university),Citizenship (cit1: Spanish,cit2: not Spanish),Labor situation (lab0: ≤ 15, lab1: employed,lab2: unemployed,lab3: inactive).

As the proportions of people in the categories of a given variable sum upto one, we take the reference categories out of the auxiliarydata file.

The reference categories areage5, edu3, cit2 andlab3.

We present two applications.

Thefirst applicationjointly estimates 2006 poverty proportions and gapsfor provinces crossed by sex.

Thesecond applicationjointly estimates 2005 and 2006 povertyproportions for provinces crossed by sex.

Multivariate area level models for small area estimation – p. 10/21

Page 11: Multivariate area level models for small areaMultivariate area level models for small area estimation. a Domingo Morales González d.morales@umh.es Universidad Miguel Hernández de

Application 1

For jointly estimating 2006 poverty proportions (α = 0) and gaps(α = 1), we fit Model 3 to a subset of auxiliary variables.

Variables constant age1 age2 edu1 cit1 lab2β1 -0.70357 0.95490 1.45541 0.74745 0.30873 1.50050

p-value 0.00000 0.00066 0.00165 0.00000 0.00137 0.00006Table 1. Regression parameters andp-values for Model 3,α = 0, 2006.

Variables constant edu0 edu1 edu2 cit1 lab1β2 -0.37458 0.97049 0.34255 0.16551 0.152031 -0.06384

p-value 0.00001 0.00000 0.00001 0.11197 0.00104 0.02502Table 2. Regression parameters andp-values for Model 3,α = 1, 2006.

By observing the signs of the regression parameters we conclude thatprovinces having larger proportions of population in categoriesage1,age2, edu1, cit1 andlab2 have greater poverty proportion.

On the other side, provinces having larger proportions of population incategoriesedu0, edu1, edu2, andcit1 and smaller proportions ofpopulation in the categorylab1 have greater poverty gaps.

Multivariate area level models for small area estimation – p. 11/21

Page 12: Multivariate area level models for small areaMultivariate area level models for small area estimation. a Domingo Morales González d.morales@umh.es Universidad Miguel Hernández de

Application 1

The estimates of the variance component parameters areσ2u1 = 0.00138,

σ2u2 = 0.00037 andρ = 0.01859.

We testH0 : σ2u1 = σ2

u2. The test statistics is

T12 =σ2u1 − σ2

u2√ν11 + ν22 − 2ν12

= 3.34588,

νij , i, j = 1, 2, 3 are the elements of the inverse of the REML

Fisher information matrix of Model 3 evaluated atθ = (σ21 , σ

22 , ρ).

As T12 ∼asym

N(0, 1) underH0, thep-value is 0.00082.

We conclude that random effects variances are different andweprefer Model 3 instead of Model 2.

We also testH0 : ρ = 0. The test statistics isTρ = ρ√ν33

= 1.96464.

As Tρ ∼asym

N(0, 1) underH0, thep-value is 0.049456.

We conclude that both components (poverty proportion and gap) arepositively correlated and we prefer Model 3 instead of Model1.

Multivariate area level models for small area estimation – p. 12/21

Page 13: Multivariate area level models for small areaMultivariate area level models for small area estimation. a Domingo Morales González d.morales@umh.es Universidad Miguel Hernández de

Application 1

pd<10

10<pd<20

20<pd<30

pd>30

Poverty Proportion − Men − 2006

pd<10

10<pd<20

20<pd<30

pd>30

Poverty Proportion − Women − 2006

gd<3

3<gd<6

6<gd<10

gd>10

Poverty Gap − Men − 2006

gd<3

3<gd<6

6<gd<10

gd>10

Poverty Gap − Women − 2006

Figure 1. Poverty proportions (top) and gaps (bottom) for men (left) andwomen (right) in Spanish provinces during 2006.

Multivariate area level models for small area estimation – p. 13/21

Page 14: Multivariate area level models for small areaMultivariate area level models for small area estimation. a Domingo Morales González d.morales@umh.es Universidad Miguel Hernández de

Application 1

Root mse − poverty proportion − men 2006

24 73 113 127 170 189 223 280 376 466 528 689 1367

0.0

00

.02

0.0

40

.06

0.0

80

.10

DIREBLUP 3

Root mse − poverty gap − men 2006

24 73 113 127 170 189 223 280 376 466 528 689 1367

0.0

00

.02

0.0

40

.06

0.0

8 DIREBLUP 3

Figure 2. Root-MSEs of direct and EBLUP (under Model 3) estimators ofpoverty proportions (left) and gaps (right) in Spanish provinces during 2006.

Multivariate area level models for small area estimation – p. 14/21

Page 15: Multivariate area level models for small areaMultivariate area level models for small area estimation. a Domingo Morales González d.morales@umh.es Universidad Miguel Hernández de

Application 2

For jointly estimating 2005 and 2006 poverty proportions (α = 0), we fitModel 3 to a subset of auxiliary variables.

Variables constant age1 age2 edu1 cit1 lab2

β -0.65428 0.69780 2.38240 0.71074 0.25924 0.71268

p-value 0.00010 0.06540 0.00049 0.00000 0.08960 0.15129

Table 3. Regression parameters andp-values for Model 3,α = 0, 2005.

Variables constant age1 age2 edu1 cit1 lab2β -0.75278 0.88497 1.89752 0.79734 0.31471 2.04460

p-value 0.00000 0.00609 0.00047 0.00000 0.00414 0.00000Table 4. Regression parameters andp-values for Model 3,α = 0, 2006.

By observing the signs of the regression parameters we conclude thatprovinces having larger proportions of population in categoriesage1,age2, edu1, cit1 andlab2 have greater poverty proportion in 2005 and2006.

Multivariate area level models for small area estimation – p. 15/21

Page 16: Multivariate area level models for small areaMultivariate area level models for small area estimation. a Domingo Morales González d.morales@umh.es Universidad Miguel Hernández de

Application 2

The estimates of the variance component parameters areσ2u1 = 0.00256,

σ2u2 = 0.00193 andρ = 0.02105.

We testH0 : σ2u1 = σ2

u2. The test statistics is

T12 =σ2u1 − σ2

u2√ν11 + ν22 − 2ν12

= 1.0756,

whereνij , i, j = 1, 2, 3 are the elements of the inverse of the REML

Fisher information matrix of Model 3 evaluated atθ = (σ21 , σ

22 , ρ).

As T12 ∼asym

N(0, 1) underH0, thep-value is 0.28208.

We cannot conclude that random effects variances are different andwe prefer Model 2 instead of Model 3.

Therefore, we fit Model 2 to the subset of auxiliary variables.

Multivariate area level models for small area estimation – p. 16/21

Page 17: Multivariate area level models for small areaMultivariate area level models for small area estimation. a Domingo Morales González d.morales@umh.es Universidad Miguel Hernández de

Application 2

Variables constant age1 age2 edu1 cit1 lab2β2005 -0.53822 0.67365 1.74785 0.60288 0.23672 0.99025p-value 0.00040 0.03876 0.00209 0.00000 0.08998 0.02351

β2006 -0.74083 0.90128 1.69006 0.68294 0.37468 1.78575p-value 0.00000 0.00595 0.00127 0.00000 0.00163 0.00007

Table 5. Regression parameters andp-values for Model 2 andα = 0.

We testH0 : ρ = 0 under model 2. The test statistics is

Tρ =ρ

√ν22

= 16.72633,

νij , i, j = 1, 2 are the elements of the inverse of the REML Fisher

information matrix of Model 2 evaluated atθ = (σ2, ρ).

As Tρ ∼asym

N(0, 1) underH0, thep-value is 0.00.

We conclude that both components (2005 and 2006 poverty proportions)are positively correlated and we prefer Model 2 instead of Model 1.

Multivariate area level models for small area estimation – p. 17/21

Page 18: Multivariate area level models for small areaMultivariate area level models for small area estimation. a Domingo Morales González d.morales@umh.es Universidad Miguel Hernández de

Application 2

pd<10

10<pd<20

20<pd<30

pd>30

Poverty Proportion − Men − 2005

pd<10

10<pd<20

20<pd<30

pd>30

Poverty Proportion − Women − 2005

pd<10

10<pd<20

20<pd<30

pd>30

Poverty proportion − Men − 2006

pd<10

10<pd<20

20<pd<30

pd>30

Poverty proportion − Women − 2006

Figure 3. Poverty proportions in 2005 (top) and 2006 (bottom) for men (left)and women (right) in Spanish provinces during 2006.

Multivariate area level models for small area estimation – p. 18/21

Page 19: Multivariate area level models for small areaMultivariate area level models for small area estimation. a Domingo Morales González d.morales@umh.es Universidad Miguel Hernández de

Application 2

Root mse − poverty proportion − men 2006

24 73 113 127 170 189 223 280 376 466 528 689 1367

0.0

00

.02

0.0

40

.06

0.0

80

.10

DIREBLUP 2

Root mse − poverty proportion − men 2005

24 73 113 127 170 189 223 280 376 466 528 689 1367

0.0

00

.02

0.0

40

.06

0.0

80

.10

DIREBLUP 2

Figure 4. Root-MSEs of direct and EBLUP (under Model 2) estimators ofpoverty proportions for 2006 (left) and 2005 (right) in Spanish provinces.

Multivariate area level models for small area estimation – p. 19/21

Page 20: Multivariate area level models for small areaMultivariate area level models for small area estimation. a Domingo Morales González d.morales@umh.es Universidad Miguel Hernández de

References

Benavent R., Morales D. (2016). Multivariate Fay-Herriot models forsmall area estimation. Computational Statistics and Data Analysis, 94,372-390.

Esteban, M.D., Morales, D., Pérez, A., Santamaría, L., 2011. TwoArea-Level Time Models for Estimating Small Area Poverty Indicators.Journal of the Indian Society of Agricultural Statistics, 66, 11, 75-89.

Esteban, M.D., Morales, D., Pérez, A., Santamaría, L., 2012. Small areaestimation of poverty proportions under area-level time models.Computational Statistics and Data Analysis, 56, 2840-2855.

Fay, R.E., Herriot, R.A., 1979. Estimates of income for small places: Anapplication of James-Stein procedures to census data. Journal of theAmerican Statistical Association 74, 269-277.

González-Manteiga, W., Lombardía, M.J., Molina, I., Morales, D.,Santamaría, L., 2008b. Analytic and Bootstrap Approximations ofPrediction Errors under a Multivariate Fay-Herriot Model.Computational Statistics and Data Analysis, 52 (12), 5242-5252.

Särndal C.E., Swensson B., Wretman J., 1992. Model assistedsurveysampling. Springer-Verlag.

Multivariate area level models for small area estimation – p. 20/21

Page 21: Multivariate area level models for small areaMultivariate area level models for small area estimation. a Domingo Morales González d.morales@umh.es Universidad Miguel Hernández de

Thank you

Thank you

for your attention

Multivariate area level models for small area estimation – p. 21/21