The obsession with weight in the modelling world And it’s ancillary affects on Analysis.
-
date post
15-Jan-2016 -
Category
Documents
-
view
215 -
download
0
Transcript of The obsession with weight in the modelling world And it’s ancillary affects on Analysis.
The obsession with The obsession with weight in the modelling weight in the modelling worldworld
And it’s ancillary affects on Analysis
The basicThe basic
The basic idea of samplingThe basic idea of sampling
The reason behind complicating a The reason behind complicating a good ideagood idea
The implication when modelling The implication when modelling datadata
How How Sampling Sampling Works.Works.
1%Random
(systematic)
3%Random
5%Random
10%Random
2.5%Stratified
Now let’s assume that we had some idea about the
picture we wanted to see. And we decide to stratify the
sample. In this case we decide to sample different
areas of the picture at different rates, the
backgroud, the dress, the face, the hands, etc...
Imagine a well known picture
Since a picture is made up of points of colour (pixels), we
will sample the points of colour at different rates.
How Sampling Works.How Sampling Works.
3% 5% 10%
2.5%Stratified
1%
How does this affect How does this affect modeling or analysismodeling or analysis The sample is no longer simply The sample is no longer simply
randomrandom
We purposefully biaised the sample We purposefully biaised the sample to gain efficiencies to meet other to gain efficiencies to meet other goalsgoals
This bias is corrected when we This bias is corrected when we apply the design weights.apply the design weights.
If you were to analyse each stratum separatelyEach part can actually be treated as surveys each with a simpler design The sampling frame or design allows you to keep all these part together in a cohesive way for analysis.
FrameworFrameworkk
Still there would be some difficulty associated with the correction for non-response and final callibration (post)
The way we sample is
reflected and corrected by how
we weight the data in the end.
How to How to interpret interpret samplingsampling
If you looked only at the If you looked only at the parts we sampledparts we sampled– You wouldn’t get an accurate You wouldn’t get an accurate
picture.picture.– All the parts would be there All the parts would be there
but not in the right but not in the right proportions.proportions.
The design weights The design weights compensate for the compensate for the known distortions. The known distortions. The final weights include final weights include estimated distortions.estimated distortions.
What would you use to base the fundamental multivariate relationships in your model or analysis ?
Steps to calculate the Steps to calculate the weights – Basic overviewweights – Basic overview
At the survey design stage, some At the survey design stage, some factors are used to determine the factors are used to determine the sample size requiredsample size required
Probability of selection calculatedProbability of selection calculated First series of adjustments for First series of adjustments for
non-responsenon-response Post-stratificationPost-stratification
Factors to determine Factors to determine the sample sizethe sample size
Characteristics to be estimated Characteristics to be estimated (small proportions)(small proportions)
Required precision of the Required precision of the estimates (targetted CV)estimates (targetted CV)
Variability of the dataVariability of the data Expected non-response rateExpected non-response rate Size of the populationSize of the population
Original design weightOriginal design weight
Once the sample is selected in Once the sample is selected in each stratum, calculate the each stratum, calculate the original weight:original weight:– NNhh/n/nhh, where « h » is the stratum, where « h » is the stratum
Since the sample is selected from Since the sample is selected from LFS, get original weight from LFS.LFS, get original weight from LFS.– Adjustments for the number of Adjustments for the number of
available children.available children.
Non-response Non-response adjustmentadjustment Adjustments must be made to Adjustments must be made to
take into account the total non-take into account the total non-responseresponse
Characteristics of respondents vs Characteristics of respondents vs non-respondents are analyzed:non-respondents are analyzed:– Province, income, level of education Province, income, level of education
of parents, depression scale of PMK, of parents, depression scale of PMK, urban/rural, etc.urban/rural, etc.
Post-stratificationPost-stratification
Adjustment factor calculated in Adjustment factor calculated in order to post-stratify the sample order to post-stratify the sample to known population counts, by:to known population counts, by:– Province, age, genderProvince, age, gender
Final weightFinal weight
WWff = W = Wii X Adj X Adj11 X Adj X Adj22
Where Where – WWff: Final weight: Final weight
– WWii: initial weight: initial weight
– AdjAdj11: Non-response adjustment: Non-response adjustment
– AdjAdj22: Post stratification: Post stratification
Link between analysis and the Link between analysis and the sample design (weight)sample design (weight)
Child’sAbility
Child’sAbility
IntelligenceIntelligenceIntelligenceIntelligence
Social environmentSocial environmentSocial environmentSocial environment
SchoolSchoolSchoolSchoolTeachersTeachersTeachersTeachers
MaterialsMaterialsMaterialsMaterials
CurriculumCurriculumCurriculumCurriculum
Grade Grade levellevel
Grade Grade levellevel
SubjectSubjectSubjectSubject
ProvinceProvinceProvinceProvince
Province is Province is a stratuma stratum
Province is Province is a stratuma stratum
The proportion of kids in the sample being taught
the PEI curriculum is much larger than what’s found in the population
The proportion of kids in the sample being taught
the PEI curriculum is much larger than what’s found in the population
Link between analysis and the Link between analysis and the sample designsample design
There are very few things in a child’s life that is not related to where they live.• In the city versus in a small village• In a small province versus a large one
• what social/educational programs are offered• what social support and services are offered• regional cultural differences• to name a few…
Weights for cycle 4Weights for cycle 4
– Cross-sectional weightsCross-sectional weights– Longitudinal weights, including the Longitudinal weights, including the
converted respondents.converted respondents.– Longitudinal weights, children Longitudinal weights, children
introduced in C1 and respondent to introduced in C1 and respondent to all cycles. all cycles. NEWNEW
– Not to mention the bootstrap Not to mention the bootstrap weights, which are used for an weights, which are used for an entirely different purpose.entirely different purpose.
Cross-sectional WeightsCross-sectional Weights
Available for all cycles, up to Cycle Available for all cycles, up to Cycle 4.4.
When are they used?When are they used? Cycle 4 cross-sectional weights:Cycle 4 cross-sectional weights:
– to represent the population aged 0-17 to represent the population aged 0-17 in 2000-01.in 2000-01.
– …… Cycle 1 weights:Cycle 1 weights:
– to represent the population aged 0-11 to represent the population aged 0-11 in 1994-95.in 1994-95.
Cross-sectional Weights - Cycle Cross-sectional Weights - Cycle 4 - Warning4 - Warning
In Cycle 4, children with a cross-In Cycle 4, children with a cross-sectional weight come from 4 sectional weight come from 4 different cohorts (introduced in different cohorts (introduced in 1994, 1996, 1998 and 2000).1994, 1996, 1998 and 2000).
By 2000, the 1994 cohort has been By 2000, the 1994 cohort has been around for 6 years:around for 6 years:– cross-sectional representativity cross-sectional representativity
decreases over time because of decreases over time because of sample erosion and population sample erosion and population change (immigration).change (immigration).
Cross-sectional Weights - Cycle Cross-sectional Weights - Cycle 55
For Cycle 5 (2002-2003), no For Cycle 5 (2002-2003), no children aged 6 and 7.children aged 6 and 7.
In addition, the 1994 cohort’s In addition, the 1994 cohort’s cross-sectional representativity cross-sectional representativity has declined even further (erosion has declined even further (erosion and immigration).and immigration).
As a result, cross-sectional weights As a result, cross-sectional weights will be calculated only for children will be calculated only for children aged 0-5.aged 0-5.
Cross-sectional weights in a Cross-sectional weights in a nutshellnutshell
Cross-sectional Cross-sectional weights must be weights must be used whenused when the the analysisanalysis concerns a concerns a specific yearspecific year, , when when you want a you want a snapshot of thesnapshot of the situation situation at a at a specific point in specific point in timetime..
Longitudinal Weights Longitudinal Weights
Longitudinal weights represent the Longitudinal weights represent the population of children at the time population of children at the time they were brought in to the survey.they were brought in to the survey.– Children introduced in Cycle 1: Children introduced in Cycle 1:
longitudinal weights represent the longitudinal weights represent the population of children aged 0-11 in population of children aged 0-11 in 1994-95.1994-95.
Longitudinal Weights Longitudinal Weights (continued)(continued)
– Children introduced in Cycle 2: Children introduced in Cycle 2: longitudinal weights represent the longitudinal weights represent the population of children aged 0-1 in population of children aged 0-1 in 1996-97.1996-97.
– Children introduced in Cycle 3: Children introduced in Cycle 3: longitudinal weights represent the longitudinal weights represent the population of children aged 0-1 in population of children aged 0-1 in 1998-99.1998-99.
– Children introduced in Cycle 4: Children introduced in Cycle 4: longitudinal weights represent the longitudinal weights represent the population of children aged 0-1 in population of children aged 0-1 in 2000-01.2000-01.
When are longitudinal weights When are longitudinal weights used?used?
When you want to track a cohort of When you want to track a cohort of children introduced in a particular children introduced in a particular cycle and see how they’ve cycle and see how they’ve developed over time.developed over time.
Longitudinal Weights - Cycle 4Longitudinal Weights - Cycle 4
Something new in Cycle 4:Something new in Cycle 4: 2 sets of longitudinal weights:2 sets of longitudinal weights:
– Set 1: Weights for children who Set 1: Weights for children who responded in their first cycle and in responded in their first cycle and in Cycle 4 (possible non-response in Cycle 4 (possible non-response in Cycle 2 or 3)Cycle 2 or 3)
– Set 2: Weights for those introduced in Set 2: Weights for those introduced in cycle 1 who responded in every cycle. cycle 1 who responded in every cycle. NEWNEW..
Longitudinal Weights - Cycle 4Longitudinal Weights - Cycle 4
Difference between the 2 sets of Difference between the 2 sets of longitudinal weightslongitudinal weights– To avoid total non-response in Cycle 2 To avoid total non-response in Cycle 2
or 3, the set of weights for those who or 3, the set of weights for those who responded throughout can be used.responded throughout can be used.
– If you’re only interested in the If you’re only interested in the changes between Cycle 1 and Cycle 4 changes between Cycle 1 and Cycle 4 directly, the longitudinal weights directly, the longitudinal weights including converted respondents can including converted respondents can be used.be used.
ExamplesExamples
Following are real examples taken Following are real examples taken from the NLSCY datafrom the NLSCY data
Weighting - Examples
Average weights in Cycle 4.
55-year-old-year-old 77
11-year-olds-year-olds
PrincePrince E Eddwwardard Island Island
1515-year-old-year-old
Weighting - Examples
Average weights in Cycle 4 (continued)
OntarioOntario 712712
1515-year-olds-year-olds
Example: Proportion of children Example: Proportion of children aged 0-17, by province, Cycle aged 0-17, by province, Cycle 4, UNWEIGHTED4, UNWEIGHTED
24% of Canada’s 24% of Canada’s children live in children live in the Maritime the Maritime provinces … provinces … whereas in whereas in reality...reality...
Province Sample size Percentage
Nfld 1,826 6.0%PEI 1,025 3.4%NS 2,259 7.5%NB 2,037 6.7%Que 5,337 17.6%Ont 7,468 24.6%Man 2,356 7.8%Sask 2,353 7.8%Alberta 2,986 9.9%BC 2,659 8.8%Total 30,306
Example: Proportion of children Example: Proportion of children aged 0-17, by province, Cycle aged 0-17, by province, Cycle 4, 4, WEIGHTEDWEIGHTED Whereas in Whereas in
reality…7.3% of reality…7.3% of children live in children live in the Maritime the Maritime provinces.provinces.
Province Sample size Percentage
Nfld 116,080 1.6%PEI 33,311 0.5%NS 208,160 2.9%NB 165,078 2.3%Que 1,590,325 22.5%Ont 2,747,236 38.8%Man 289,265 4.1%Sask 265,221 3.8%Alberta 763,858 10.8%BC 892,908 12.6%Total 7,071,442
Number of children aged 0-15 Number of children aged 0-15 by year of age, Quebec, Cycle by year of age, Quebec, Cycle 3, unweighted3, unweighted
The The conclusion is conclusion is obvious…obvious…
Huge increase Huge increase in births in in births in 1993 and 1993 and 1997!!!!!1997!!!!!
Age Birth Year Sample size Percentage
0 1998 306 4.9%1 1997 1,055 16.8%2 1996 326 5.2%3 1995 449 7.1%4 1994 405 6.4%5 1993 1,627 25.8%6 1992 313 5.0%7 1991 201 3.2%8 1990 265 4.2%9 1989 170 2.7%10 1988 221 3.5%11 1987 154 2.4%12 1986 224 3.6%13 1985 167 2.7%14 1984 241 3.8%15 1983 171 2.7%Total 6,295
Number of children aged 0-15 Number of children aged 0-15 by year of age, Quebec, Cycle by year of age, Quebec, Cycle 3, WEIGHTED3, WEIGHTED
So much for So much for the pseudo the pseudo baby boom...baby boom...
Age Birth Year Population Percentage
0 1998 73,254 5.2%1 1997 78,769 5.5%2 1996 84,713 6.0%3 1995 88,662 6.2%4 1994 87,895 6.2%5 1993 91,466 6.4%6 1992 95,101 6.7%7 1991 78,882 5.6%8 1990 116,752 8.2%9 1989 73,451 5.2%10 1988 107,819 7.6%11 1987 75,130 5.3%12 1986 98,202 6.9%13 1985 79,400 5.6%14 1984 100,385 7.1%15 1983 91,205 6.4%Total 1,421,086
ConclusionConclusion
To be obsessed with weights is a To be obsessed with weights is a good thing…where statistical good thing…where statistical analysis is concernedanalysis is concerned