The obsession with weight in the modelling world And it’s ancillary affects on Analysis.

34
The obsession with The obsession with weight in the weight in the modelling world modelling world And it’s ancillary affects on An
  • date post

    15-Jan-2016
  • Category

    Documents

  • view

    215
  • download

    0

Transcript of The obsession with weight in the modelling world And it’s ancillary affects on Analysis.

Page 1: The obsession with weight in the modelling world And it’s ancillary affects on Analysis.

The obsession with The obsession with weight in the modelling weight in the modelling worldworld

And it’s ancillary affects on Analysis

Page 2: The obsession with weight in the modelling world And it’s ancillary affects on Analysis.

The basicThe basic

The basic idea of samplingThe basic idea of sampling

The reason behind complicating a The reason behind complicating a good ideagood idea

The implication when modelling The implication when modelling datadata

Page 3: The obsession with weight in the modelling world And it’s ancillary affects on Analysis.

How How Sampling Sampling Works.Works.

1%Random

(systematic)

3%Random

5%Random

10%Random

2.5%Stratified

Now let’s assume that we had some idea about the

picture we wanted to see. And we decide to stratify the

sample. In this case we decide to sample different

areas of the picture at different rates, the

backgroud, the dress, the face, the hands, etc...

Imagine a well known picture

Since a picture is made up of points of colour (pixels), we

will sample the points of colour at different rates.

Page 4: The obsession with weight in the modelling world And it’s ancillary affects on Analysis.

How Sampling Works.How Sampling Works.

3% 5% 10%

2.5%Stratified

1%

Page 5: The obsession with weight in the modelling world And it’s ancillary affects on Analysis.

How does this affect How does this affect modeling or analysismodeling or analysis The sample is no longer simply The sample is no longer simply

randomrandom

We purposefully biaised the sample We purposefully biaised the sample to gain efficiencies to meet other to gain efficiencies to meet other goalsgoals

This bias is corrected when we This bias is corrected when we apply the design weights.apply the design weights.

Page 6: The obsession with weight in the modelling world And it’s ancillary affects on Analysis.

If you were to analyse each stratum separatelyEach part can actually be treated as surveys each with a simpler design The sampling frame or design allows you to keep all these part together in a cohesive way for analysis.

FrameworFrameworkk

Still there would be some difficulty associated with the correction for non-response and final callibration (post)

Page 7: The obsession with weight in the modelling world And it’s ancillary affects on Analysis.

The way we sample is

reflected and corrected by how

we weight the data in the end.

How to How to interpret interpret samplingsampling

If you looked only at the If you looked only at the parts we sampledparts we sampled– You wouldn’t get an accurate You wouldn’t get an accurate

picture.picture.– All the parts would be there All the parts would be there

but not in the right but not in the right proportions.proportions.

The design weights The design weights compensate for the compensate for the known distortions. The known distortions. The final weights include final weights include estimated distortions.estimated distortions.

Page 8: The obsession with weight in the modelling world And it’s ancillary affects on Analysis.

What would you use to base the fundamental multivariate relationships in your model or analysis ?

Page 9: The obsession with weight in the modelling world And it’s ancillary affects on Analysis.

Steps to calculate the Steps to calculate the weights – Basic overviewweights – Basic overview

At the survey design stage, some At the survey design stage, some factors are used to determine the factors are used to determine the sample size requiredsample size required

Probability of selection calculatedProbability of selection calculated First series of adjustments for First series of adjustments for

non-responsenon-response Post-stratificationPost-stratification

Page 10: The obsession with weight in the modelling world And it’s ancillary affects on Analysis.

Factors to determine Factors to determine the sample sizethe sample size

Characteristics to be estimated Characteristics to be estimated (small proportions)(small proportions)

Required precision of the Required precision of the estimates (targetted CV)estimates (targetted CV)

Variability of the dataVariability of the data Expected non-response rateExpected non-response rate Size of the populationSize of the population

Page 11: The obsession with weight in the modelling world And it’s ancillary affects on Analysis.

Original design weightOriginal design weight

Once the sample is selected in Once the sample is selected in each stratum, calculate the each stratum, calculate the original weight:original weight:– NNhh/n/nhh, where « h » is the stratum, where « h » is the stratum

Since the sample is selected from Since the sample is selected from LFS, get original weight from LFS.LFS, get original weight from LFS.– Adjustments for the number of Adjustments for the number of

available children.available children.

Page 12: The obsession with weight in the modelling world And it’s ancillary affects on Analysis.

Non-response Non-response adjustmentadjustment Adjustments must be made to Adjustments must be made to

take into account the total non-take into account the total non-responseresponse

Characteristics of respondents vs Characteristics of respondents vs non-respondents are analyzed:non-respondents are analyzed:– Province, income, level of education Province, income, level of education

of parents, depression scale of PMK, of parents, depression scale of PMK, urban/rural, etc.urban/rural, etc.

Page 13: The obsession with weight in the modelling world And it’s ancillary affects on Analysis.

Post-stratificationPost-stratification

Adjustment factor calculated in Adjustment factor calculated in order to post-stratify the sample order to post-stratify the sample to known population counts, by:to known population counts, by:– Province, age, genderProvince, age, gender

Page 14: The obsession with weight in the modelling world And it’s ancillary affects on Analysis.

Final weightFinal weight

WWff = W = Wii X Adj X Adj11 X Adj X Adj22

Where Where – WWff: Final weight: Final weight

– WWii: initial weight: initial weight

– AdjAdj11: Non-response adjustment: Non-response adjustment

– AdjAdj22: Post stratification: Post stratification

Page 15: The obsession with weight in the modelling world And it’s ancillary affects on Analysis.

Link between analysis and the Link between analysis and the sample design (weight)sample design (weight)

Child’sAbility

Child’sAbility

IntelligenceIntelligenceIntelligenceIntelligence

Social environmentSocial environmentSocial environmentSocial environment

SchoolSchoolSchoolSchoolTeachersTeachersTeachersTeachers

MaterialsMaterialsMaterialsMaterials

CurriculumCurriculumCurriculumCurriculum

Grade Grade levellevel

Grade Grade levellevel

SubjectSubjectSubjectSubject

ProvinceProvinceProvinceProvince

Province is Province is a stratuma stratum

Province is Province is a stratuma stratum

The proportion of kids in the sample being taught

the PEI curriculum is much larger than what’s found in the population

The proportion of kids in the sample being taught

the PEI curriculum is much larger than what’s found in the population

Page 16: The obsession with weight in the modelling world And it’s ancillary affects on Analysis.

Link between analysis and the Link between analysis and the sample designsample design

There are very few things in a child’s life that is not related to where they live.• In the city versus in a small village• In a small province versus a large one

• what social/educational programs are offered• what social support and services are offered• regional cultural differences• to name a few…

Page 17: The obsession with weight in the modelling world And it’s ancillary affects on Analysis.

Weights for cycle 4Weights for cycle 4

– Cross-sectional weightsCross-sectional weights– Longitudinal weights, including the Longitudinal weights, including the

converted respondents.converted respondents.– Longitudinal weights, children Longitudinal weights, children

introduced in C1 and respondent to introduced in C1 and respondent to all cycles. all cycles. NEWNEW

– Not to mention the bootstrap Not to mention the bootstrap weights, which are used for an weights, which are used for an entirely different purpose.entirely different purpose.

Page 18: The obsession with weight in the modelling world And it’s ancillary affects on Analysis.

Cross-sectional WeightsCross-sectional Weights

Available for all cycles, up to Cycle Available for all cycles, up to Cycle 4.4.

When are they used?When are they used? Cycle 4 cross-sectional weights:Cycle 4 cross-sectional weights:

– to represent the population aged 0-17 to represent the population aged 0-17 in 2000-01.in 2000-01.

– …… Cycle 1 weights:Cycle 1 weights:

– to represent the population aged 0-11 to represent the population aged 0-11 in 1994-95.in 1994-95.

Page 19: The obsession with weight in the modelling world And it’s ancillary affects on Analysis.

Cross-sectional Weights - Cycle Cross-sectional Weights - Cycle 4 - Warning4 - Warning

In Cycle 4, children with a cross-In Cycle 4, children with a cross-sectional weight come from 4 sectional weight come from 4 different cohorts (introduced in different cohorts (introduced in 1994, 1996, 1998 and 2000).1994, 1996, 1998 and 2000).

By 2000, the 1994 cohort has been By 2000, the 1994 cohort has been around for 6 years:around for 6 years:– cross-sectional representativity cross-sectional representativity

decreases over time because of decreases over time because of sample erosion and population sample erosion and population change (immigration).change (immigration).

Page 20: The obsession with weight in the modelling world And it’s ancillary affects on Analysis.

Cross-sectional Weights - Cycle Cross-sectional Weights - Cycle 55

For Cycle 5 (2002-2003), no For Cycle 5 (2002-2003), no children aged 6 and 7.children aged 6 and 7.

In addition, the 1994 cohort’s In addition, the 1994 cohort’s cross-sectional representativity cross-sectional representativity has declined even further (erosion has declined even further (erosion and immigration).and immigration).

As a result, cross-sectional weights As a result, cross-sectional weights will be calculated only for children will be calculated only for children aged 0-5.aged 0-5.

Page 21: The obsession with weight in the modelling world And it’s ancillary affects on Analysis.

Cross-sectional weights in a Cross-sectional weights in a nutshellnutshell

Cross-sectional Cross-sectional weights must be weights must be used whenused when the the analysisanalysis concerns a concerns a specific yearspecific year, , when when you want a you want a snapshot of thesnapshot of the situation situation at a at a specific point in specific point in timetime..

Page 22: The obsession with weight in the modelling world And it’s ancillary affects on Analysis.

Longitudinal Weights Longitudinal Weights

Longitudinal weights represent the Longitudinal weights represent the population of children at the time population of children at the time they were brought in to the survey.they were brought in to the survey.– Children introduced in Cycle 1: Children introduced in Cycle 1:

longitudinal weights represent the longitudinal weights represent the population of children aged 0-11 in population of children aged 0-11 in 1994-95.1994-95.

Page 23: The obsession with weight in the modelling world And it’s ancillary affects on Analysis.

Longitudinal Weights Longitudinal Weights (continued)(continued)

– Children introduced in Cycle 2: Children introduced in Cycle 2: longitudinal weights represent the longitudinal weights represent the population of children aged 0-1 in population of children aged 0-1 in 1996-97.1996-97.

– Children introduced in Cycle 3: Children introduced in Cycle 3: longitudinal weights represent the longitudinal weights represent the population of children aged 0-1 in population of children aged 0-1 in 1998-99.1998-99.

– Children introduced in Cycle 4: Children introduced in Cycle 4: longitudinal weights represent the longitudinal weights represent the population of children aged 0-1 in population of children aged 0-1 in 2000-01.2000-01.

Page 24: The obsession with weight in the modelling world And it’s ancillary affects on Analysis.

When are longitudinal weights When are longitudinal weights used?used?

When you want to track a cohort of When you want to track a cohort of children introduced in a particular children introduced in a particular cycle and see how they’ve cycle and see how they’ve developed over time.developed over time.

Page 25: The obsession with weight in the modelling world And it’s ancillary affects on Analysis.

Longitudinal Weights - Cycle 4Longitudinal Weights - Cycle 4

Something new in Cycle 4:Something new in Cycle 4: 2 sets of longitudinal weights:2 sets of longitudinal weights:

– Set 1: Weights for children who Set 1: Weights for children who responded in their first cycle and in responded in their first cycle and in Cycle 4 (possible non-response in Cycle 4 (possible non-response in Cycle 2 or 3)Cycle 2 or 3)

– Set 2: Weights for those introduced in Set 2: Weights for those introduced in cycle 1 who responded in every cycle. cycle 1 who responded in every cycle. NEWNEW..

Page 26: The obsession with weight in the modelling world And it’s ancillary affects on Analysis.

Longitudinal Weights - Cycle 4Longitudinal Weights - Cycle 4

Difference between the 2 sets of Difference between the 2 sets of longitudinal weightslongitudinal weights– To avoid total non-response in Cycle 2 To avoid total non-response in Cycle 2

or 3, the set of weights for those who or 3, the set of weights for those who responded throughout can be used.responded throughout can be used.

– If you’re only interested in the If you’re only interested in the changes between Cycle 1 and Cycle 4 changes between Cycle 1 and Cycle 4 directly, the longitudinal weights directly, the longitudinal weights including converted respondents can including converted respondents can be used.be used.

Page 27: The obsession with weight in the modelling world And it’s ancillary affects on Analysis.

ExamplesExamples

Following are real examples taken Following are real examples taken from the NLSCY datafrom the NLSCY data

Page 28: The obsession with weight in the modelling world And it’s ancillary affects on Analysis.

Weighting - Examples

Average weights in Cycle 4.

55-year-old-year-old 77

11-year-olds-year-olds

PrincePrince E Eddwwardard Island Island

Page 29: The obsession with weight in the modelling world And it’s ancillary affects on Analysis.

1515-year-old-year-old

Weighting - Examples

Average weights in Cycle 4 (continued)

OntarioOntario 712712

1515-year-olds-year-olds

Page 30: The obsession with weight in the modelling world And it’s ancillary affects on Analysis.

Example: Proportion of children Example: Proportion of children aged 0-17, by province, Cycle aged 0-17, by province, Cycle 4, UNWEIGHTED4, UNWEIGHTED

24% of Canada’s 24% of Canada’s children live in children live in the Maritime the Maritime provinces … provinces … whereas in whereas in reality...reality...

Province Sample size Percentage

Nfld 1,826 6.0%PEI 1,025 3.4%NS 2,259 7.5%NB 2,037 6.7%Que 5,337 17.6%Ont 7,468 24.6%Man 2,356 7.8%Sask 2,353 7.8%Alberta 2,986 9.9%BC 2,659 8.8%Total 30,306

Page 31: The obsession with weight in the modelling world And it’s ancillary affects on Analysis.

Example: Proportion of children Example: Proportion of children aged 0-17, by province, Cycle aged 0-17, by province, Cycle 4, 4, WEIGHTEDWEIGHTED Whereas in Whereas in

reality…7.3% of reality…7.3% of children live in children live in the Maritime the Maritime provinces.provinces.

Province Sample size Percentage

Nfld 116,080 1.6%PEI 33,311 0.5%NS 208,160 2.9%NB 165,078 2.3%Que 1,590,325 22.5%Ont 2,747,236 38.8%Man 289,265 4.1%Sask 265,221 3.8%Alberta 763,858 10.8%BC 892,908 12.6%Total 7,071,442

Page 32: The obsession with weight in the modelling world And it’s ancillary affects on Analysis.

Number of children aged 0-15 Number of children aged 0-15 by year of age, Quebec, Cycle by year of age, Quebec, Cycle 3, unweighted3, unweighted

The The conclusion is conclusion is obvious…obvious…

Huge increase Huge increase in births in in births in 1993 and 1993 and 1997!!!!!1997!!!!!

Age Birth Year Sample size Percentage

0 1998 306 4.9%1 1997 1,055 16.8%2 1996 326 5.2%3 1995 449 7.1%4 1994 405 6.4%5 1993 1,627 25.8%6 1992 313 5.0%7 1991 201 3.2%8 1990 265 4.2%9 1989 170 2.7%10 1988 221 3.5%11 1987 154 2.4%12 1986 224 3.6%13 1985 167 2.7%14 1984 241 3.8%15 1983 171 2.7%Total 6,295

Page 33: The obsession with weight in the modelling world And it’s ancillary affects on Analysis.

Number of children aged 0-15 Number of children aged 0-15 by year of age, Quebec, Cycle by year of age, Quebec, Cycle 3, WEIGHTED3, WEIGHTED

So much for So much for the pseudo the pseudo baby boom...baby boom...

Age Birth Year Population Percentage

0 1998 73,254 5.2%1 1997 78,769 5.5%2 1996 84,713 6.0%3 1995 88,662 6.2%4 1994 87,895 6.2%5 1993 91,466 6.4%6 1992 95,101 6.7%7 1991 78,882 5.6%8 1990 116,752 8.2%9 1989 73,451 5.2%10 1988 107,819 7.6%11 1987 75,130 5.3%12 1986 98,202 6.9%13 1985 79,400 5.6%14 1984 100,385 7.1%15 1983 91,205 6.4%Total 1,421,086

Page 34: The obsession with weight in the modelling world And it’s ancillary affects on Analysis.

ConclusionConclusion

To be obsessed with weights is a To be obsessed with weights is a good thing…where statistical good thing…where statistical analysis is concernedanalysis is concerned