Missing income data in the millennium cohort study: Evidence from the first two sweeps
description
Transcript of Missing income data in the millennium cohort study: Evidence from the first two sweeps
Missing income data in the millennium cohort study:
Evidence from the first two sweeps
Authors: Denise Hawkes and Ian Plewis
Discussant: Nicholas [email protected]
Introduction and overview
Data – Millennium Cohort Study
Research questions – What are the factors associated with non-response? More specifically:
Are there within household and individual correlations for missing income data? Is the sex of the interviewer an important explanatory variable? How is missing data in sweep one related to missing data in sweep two? Is attrition at sweep two related to the level of household income or the failure to
provide data in sweep one?
Method – Descriptive analysis Binary and Multinomial Logit models with non-response as dependent variable Binary Logit with attrition between sweep one and sweep two as dependent variable
Data
Millennium Cohort Study First sweep – 18,819 babies born in the UK from 1st September 2000 (from 18,552 families).
Interviewed when baby was 9 months old Second Sweep – 14,898 families from original sample and 692 new families. Interviewed when
children around 3 years old. Information from main respondent (usually mother) and partner of respondent (usually father)
Incomplete information on income through: Unit non-response (response rate 72% in first sweep) Partner non-response (88% of families with partners responded) Item non-response for income (6% of main respondents and partners did not provide income data) Attrition between sweeps (79% of eligible families responded in sweep two)
Income information: Collected from those currently doing paid work, those who have a paid job but are on leave, those
who have worked in the past but have no current job. For employees – total take home pay and gross pay For self employed – ‘amount you personally took out of the business after all taxes and costs’
Data
Millennium Cohort Study First sweep – 18,819 babies born in the UK from 1st September 2000 (from 18,552 families).
Interviewed when baby was 9 months old Second Sweep – 14,898 families from original sample and 692 new families. Interviewed when
children around 3 years old. Information from main respondent (usually mother) and partner of respondent (usually father)
Incomplete information on income through: Unit non-response (response rate 72% in first sweep) Partner non-response (88% of families with partners responded) Item non-response for income (6% of main respondents and partners did not provide income data) Attrition between sweeps (79% of eligible families responded in sweep two)
Income information: Collected from those currently doing paid work, those who have a paid job but are on leave, those
who have worked in the past but have no current job. For employees – total take home pay and gross pay For self employed – ‘amount you personally took out of the business after all taxes and costs’
Data
Millennium Cohort Study First sweep – 18,819 babies born in the UK from 1st September 2000 (from 18,552 families).
Interviewed when baby was 9 months old Second Sweep – 14,898 families from original sample and 692 new families. Interviewed when
children around 3 years old. Information from main respondent (usually mother) and partner of respondent (usually father)
Incomplete information on income through: Unit non-response (response rate 72% in first sweep) Partner non-response (88% of families with partners responded) Item non-response for income (6% of main respondents and partners did not provide income data) Attrition between sweeps (79% of eligible families responded in sweep two)
Income information: Collected from those currently doing paid work, those who have a paid job but are on leave, those
who have worked in the past but have no current job. For employees – total take home pay and gross pay For self employed – ‘amount you personally took out of the business after all taxes and costs’
Patterns of income response
Original sample (paper has information on new families and proxies)
Sweep one Sweep two
Main Partner Main Partner
Income response 45.9% 64.7% 50.6% 62.9%Don’t know 1.8% 2.1%
Refusal 0.9% 2.1%
Total non-response 2.7% 4.3% 4.4% 8.7%
Not applicable 51.5% 31.0% 45.1% 28.4%
Sample 18,552 14,898
Patterns of income response
Original sample (paper has information on new families and proxies)
Sweep one Sweep two
Main Partner Main Partner
Income response 45.9% 64.7% 50.6% 62.9%Don’t know 1.8% 2.1%
Refusal 0.9% 2.1%
Total non-response 2.7% 4.3% 4.4% 8.7%
Not applicable 51.5% 31.0% 45.1% 28.4%
Sample 18,552 14,898
Modelling non-response – Main respondent
Sweep one Sweep two
Spec. (I) Spec. (II) Spec. (III)
Self employed 6.4 6.8 6.6 6.7
Has a partner 0.58 0.57 0.56
Social class Intermediate 1.6
- Reference managerial Small employers and self employment 1.8
and professional Lower supervisors and technical
Semi routine and routine
Ethnicity Mixed
- Reference white Indian 2.4 2.3 2.3
Pakistani and Bangladeshi
Black or Black British 1.6
Other ethnic group 2.3
Country Wales
- Reference England Scotland
Northern Ireland 1.7 1.5
Respondent did not respond in sweep one - - 3.0 3.0
Respondent same in sweep one and two - - - 5.3
Sample Size 8,190 5,800 5,800 5,800
Modelling non-response – Main respondent
Sweep one Sweep two
Spec. (I) Spec. (II) Spec. (III)
Self employed 6.4 6.8 6.6 6.7
Has a partner 0.58 0.57 0.56
Social class Intermediate 1.6
- Reference managerial Small employers and self employment 1.8
and professional Lower supervisors and technical
Semi routine and routine
Ethnicity Mixed
- Reference white Indian 2.4 2.3 2.3
Pakistani and Bangladeshi
Black or Black British 1.6
Other ethnic group 2.3
Country Wales
- Reference England Scotland
Northern Ireland 1.7 1.5
Respondent did not respond in sweep one - - 3.0 3.0
Respondent same in sweep one and two - - - 5.3
Sample Size 8,190 5,800 5,800 5,800
Modelling non-response – Partner (I)
Sweep one Sweep two
Spec. (I) Spec. (II) Spec. (III)
Self employed 1.7 3.6 3.6 3.6
Social class Intermediate
- Reference managerial Small employers and self employment 3.0
and professional Lower supervisors and technical 0.68
Semi routine and routine 0.66
NVQ Level 1
NVQ Levels NVQ Level 2 0.63
- Reference none NVQ Level 3 0.59
NVQ Level 4 0.47
NVQ Level 5 0.34
Other/overseas qual only
Ethnicity Mixed 2.3 2.4 2.5
- Reference white Indian 1.8 2.5 2.3 2.3
Pakistani and Bangladeshi 2.2 2.4 2.2 2.2
Black or Black British
Other ethnic group 2.0
Owner occupier 0.76 0.76 0.77
Modelling non-response – Partner (I)
Sweep one Sweep two
Spec. (I) Spec. (II) Spec. (III)
Self employed 1.7 3.6 3.6 3.6
Social class Intermediate
- Reference managerial Small employers and self employment 3.0
and professional Lower supervisors and technical 0.68
Semi routine and routine 0.66
NVQ Level 1
NVQ Levels NVQ Level 2 0.63
- Reference none NVQ Level 3 0.59
NVQ Level 4 0.47
NVQ Level 5 0.34
Other/overseas qual only
Ethnicity Mixed 2.3 2.4 2.5
- Reference white Indian 1.8 2.5 2.3 2.3
Pakistani and Bangladeshi 2.2 2.4 2.2 2.2
Black or Black British
Other ethnic group 2.0
Owner occupier 0.76 0.76 0.77
Modelling non-response – Partner (II)
Sweep one Sweep two
Spec. (I) Spec. (II) Spec. (III)
Country Wales
- Reference England Scotland
Northern Ireland 1.9 1.5 1.6 1.6
Respondent did not respond in sweep one - - 4.6 4.5
Respondent same in sweep one and two - - - 0.39
Sample Size 10,754 7,893 7,893 7,893
Other modeling – Multinomial Logit and attrition
Multinomial Logit – Response vs. don’t know vs. refuse Main respondent:
Self employed only significantly more likely to be ‘don’t know’ not ‘refusal’ Same with social class variables Black or Black British as well as Northern Ireland more likely to refuse
Partner respondent: Self employed significantly more likely to refuse and not know NVQ levels and ethnicity both associated with refusal
Attrition at sweep two Higher income in sweep one associated with lower odds of attrition between sweep one
and sweep two Main income and partner income non-response in sweep one associated with higher
odds of attrition between sweep one and sweep two
Other modeling – Multinomial Logit and attrition
Multinomial Logit – Response vs. don’t know vs. refuse Main respondent:
Self employed only significantly more likely to be ‘don’t know’ not ‘refusal’ Same with social class variables Black or Black British as well as Northern Ireland more likely to refuse
Partner respondent: Self employed significantly more likely to refuse and not know NVQ levels and ethnicity both associated with refusal
Attrition at sweep two Higher income in sweep one associated with lower odds of attrition between sweep one
and sweep two Main income and partner income non-response in sweep one associated with higher
odds of attrition between sweep one and sweep two
Summary
Household and individual correlations for missing income data
Self employment, some ethnic groups (though not consistent), Northern Ireland The sex of the interviewer is not an important explanatory variable in explaining income
non-response Some variables only associated with ‘don’t know’ or ‘refusal’ only
Missing data in sweep one associated with higher odds of missing data in sweep two
Especially amongst partner respondents
Higher household income in sweep one associated with lower attrition in sweep two
Missing data in sweep one associated with higher attrition in sweep two
Suggested further work and information
Models for non-response More diagnostic information (e.g. tests of group significance) Information on the child?
Interviewer bias Multilevel model? Interactions or other information on the interviewer
Implications for survey design Difference between don’t know and refusal