Evaluating a Propensity Score Adjustment for Combining Probability and Non-Probability Samples in a...

22
Evaluating a Propensity Score Adjustment for Combining Probability and Non-Probability Samples in a National Survey FedCASIC March 5, 2015 Kurt R. Peters, PhD Heather Driscoll, MS Pedro Saavedra, PhD

Transcript of Evaluating a Propensity Score Adjustment for Combining Probability and Non-Probability Samples in a...

Evaluating a Propensity Score Adjustment for Combining Probability and Non-Probability Samples in a National Survey

FedCASIC

March 5, 2015

Kurt R. Peters, PhDHeather Driscoll, MSPedro Saavedra, PhD

2

Outline

2012 Canadian Nature Survey– Research questions– Survey design

Weighting Methodology

Results (Comparison of weighted estimates)

Conclusions

3

Research Questions

National population survey of Canadian adults

2012 CANADIAN NATURE SURVEY

Connection to & awareness of nature

Nature-based activities, participation, and expenditures

Human/wildlife conflict

4

Survey Design

Complex sample design with hybrid probability and non-probability samples

Multi-mode administration (Paper + Web)

For probability sample (nationally):– 76,363 addresses sampled from ABS frame– 15,207 completes– 20% response rate (lower bound)

For non-probability samples (nationally):– 8,897 completes

2012 CANADIAN NATURE SURVEY

5

Survey Design2012 CANADIAN NATURE SURVEY

P

W

P

W

P

W

C

P

P

P P

P

P

P

W

W W

C

P

W

P

W

P Probability (ABS)

W Non-Probability (Web Panel)

C Non-Probability (Community)

SAMPLE TYPES

PROVINCEABS RESPONSES

WEB PANEL RESPONSES

AB 1,511 818ON 1,011 4,584QC 1,029 2,986TOTAL 3,551 8,388

6

Survey Design

Address-Based Sample of Canadian Adults– Drawn from Canada Post address file– Stratification:• Province/Territory (all except Nunavut)• Urban/Rural address (Canada Post frame variable)

– Mode of Administration:• Paper, with Web option

– Within-HH selection by Last Birthday Method– Targeted 1,000 completes in each province and territory

2012 CANADIAN NATURE SURVEY

P

7

Survey Design

Web Panel Sample– Canadian adults recruited via social media and websites– Recruited to match key demographic distributions (ethnicity, age, education, income)– In each P/T, fielded until target number of completes was reached

2012 CANADIAN NATURE SURVEY

W

8

Weighting Methodology

Focus of current research is evaluation of weighting to combine the probability (ABS) and non-probability (Web panel) datasets for analysis

P W

18 - 25 26 - 35 36 - 45 46 - 55 56 - 65 66 - 75 76 - 1000

5

10

15

20

25

Age (Unweighted)

Population ABS Panel

Perc

ent

Male Female0

102030405060

Sex (Unweighted)

Population ABS Panel

Perc

ent

9

Weighting Methodology

An ABS analytic weight was developed for ABS respondents– Standard probability-based selection weight adjusted for non-response and post-

stratified to Census totals:• Province x Age x Sex• Province x Urban/Rural• Aboriginal/Non-Aboriginal

10

Weighting Methodology

The following approach was explored for combining the ABS and Panel respondents into a single weighted dataset:1. Estimate probability of observation in Panel (vs. Population)

2. Score all (Panel and ABS) cases to assign a probability of observation under Panel design

3. Assign probability of observation under ABS design to Panel cases

4. Combine ABS and Panel observation probabilities to compute combined weight

11

Weighting Methodology

Estimate probability of observation in Panel (vs. Population) using weighted logistic regression– Outcome = Observation in Panel (vs. Population)• P(Observation) = P(Selection) * P(Response)

– Weights:• For ABS cases, weight = ABS analytic weight (NR-adjusted and post-stratified to population)• For Panel cases, weight = 1

12

Weighting Methodology

Estimate probability of observation in Panel (vs. Population) using weighted logistic regression– Predictors: Effect Comparison Odds Ratio

Province AB vs QC 0.7ON vs QC 1.1

Age 18 - 25 vs 76 - 100 7.926 - 35 vs 76 - 100 8.736 - 45 vs 76 - 100 7.746 - 55 vs 76 - 100 6.456 - 65 vs 76 - 100 7.366 - 75 vs 76 - 100 4.3

Sex Female vs Male 1.2Urbanicity Urban vs. Rural 1.2Nature-related Profession No vs. Yes 0.9Aboriginal No vs. Yes 1.1Immigrant No vs. Yes 1.3Education (Highest) Elementary vs. Other 0.4

Some HS vs. Other 1.4HS vs. Other 2.42-yr College vs. Other 1.8Bachelor’s vs. Other 1.2Master’s vs. Other 1.1Doctorate vs. Other 1.2

HH Income 0.9

ns

nsns

nsnsns

Base Model Full Model

13

Weighting Methodology

Score all (Panel and ABS) cases to assign a probability of observation in Panel• Mean estimated probability of observation under Panel design:

Female Male-0.00010.00000.00010.00020.00030.00040.00050.00060.00070.0008

Sex

18-25 26-35 36-45 46-55 56-65 66-75 76-100-0.0001

0.00000.00010.00020.00030.00040.00050.00060.00070.0008

Age

Urban Rural-0.00010.00000.00010.00020.00030.00040.00050.00060.00070.0008

Urban/Rural

<$25K $25K-$50K

$50K-$75K

$75K-$100K

>$100K-0.00010.00000.00010.00020.00030.00040.00050.00060.00070.0008

Income

Elementary

Some HS HS

2-YR Coll

Bachelor's

Master's

Doctorate

Other0

0.00010.00020.00030.00040.00050.00060.00070.0008

Education

No Yes-0.00009999999999999981.76182853028894E-19

0.00010.00020.00030.00040.00050.00060.00070.0008

Nature-Related Professionns

14

Weighting Methodology

Assign probability of observation under ABS design to Panel cases– Probability of observation under ABS design computed as inverse of post-stratified

ABS analytic weight– Within post-stratification classes, same ABS probability was assigned to Panel

respondents• This assumes that ABS and Panel cases within these classes have the same probability of

observation under ABS design

– Result is that all cases in combined sample have a (true or estimated) probability of observation under both the ABS and Panel designs

P(Observation)ABS Panel

Sample Source

ABS Inverse of post-stratified, NR-adjusted ABS sampling weight

Matched by post-stratification class

Panel Estimated Panel probability Estimated Panel probability

15

Weighting Methodology

Combine ABS and Panel probabilities to compute combined weight

16

Results

Demographics

18 - 25 26 - 35 36 - 45 46 - 55 56 - 65 66 - 75 76 - 1000

5

10

15

20

25

Age

Population ABS (Unweighted)Panel (Unweighted) Combined (Weighted)

Perc

ent

Male Female0

10

20

30

40

50

60

Sex

Population ABS (Unweighted)Panel (Unweighted) Combined (Weighted)

Perc

ent

17

Results

Demographics

Education > HS

HH Income > $50,000

0% 10% 20% 30% 40% 50% 60% 70% 80%

ABS (ABS Weight) Combined (Combined Weight)Panel (1/p(Panel) Panel (Unweighted)

18

Results

Key Survey Outcomes

MAD of Panel from ABS population estimates is

10% lower after weighting, and ~40% lower with

combined weighted sample

Nature-related profession

Chose where to live in part to have access to nature

Chose to spend more time outdoors in the last year to experience nature

Aware of the concept of species at risk

Aware of the concept of biodiversity

Aware of the concept of ecosystem services

Participated in some form of nature-based recreation

Participated in fishing

Spent >$40 in donations and membership dues to nature organizations

Experienced a threat from wild animals

Experienced damage to personal property caused by wild animals

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

ABS (ABS Weight) Combined (Combined Weight)Panel (1/p(Panel) Panel (Unweighted)

19

Conclusions

Unweighted panel data differed from benchmarks– Demographics: More female, younger, lower income, less educated, more urban– Outcomes: • Accurate (±2 points):

– Nature-related profession– Aware of the concept of species at risk – Experienced a threat from wild animals– Experienced damage to personal property caused by wild animals

• Overestimates (>2 points over):

– Chose where to live in part to have access to nature – Participated in fishing

• Underestimates (>2 points under):

– Chose to spend more time outdoors in the last year to experience nature – Aware of the concept of biodiversity– Aware of the concept of ecosystem services– Participated in some form of nature-based recreation– Spent >$40 in donations and membership dues to nature organizations

20

Conclusions

Propensity score model was used to estimate probability of being observed in the panel compared to general population– Model explained only some of the variance () – room for improvement– Nevertheless, estimated probability of observation• Brought panel demographics in line with population • Reduced bias in panel estimates for key survey outcomes• Made possible the combination of probability (ABS) and non-probability (Panel) data into a

single, weighted dataset

21

Conclusions

Next steps…– Building a more comprehensive model of P(Observation) under panel design– Can statistical matching (“data fusion”) exploit differences between ABS and Panel

respondents to increase efficiency of data collection?• For example, using ABS to estimate prevalence and Panel to collect detailed per-person data

(such as expenditures, travel days, etc.)• May lower administration cost and respondent burden

– Does reduction in bias via panel weight come at the price of increased variance? How accurate are estimates of sampling error from modeled probabilities of selection?