A Theoretical Framework for Adaptive Collection Designs

Click here to load reader

  • date post

    04-Feb-2016
  • Category

    Documents

  • view

    45
  • download

    1

Embed Size (px)

description

A Theoretical Framework for Adaptive Collection Designs. Jean-François Beaumont, Statistics Canada David Haziza, Université de Montréal International Total Survey Error Workshop Québec, June 19-22, 2011. Selected literature review Framework Definition of the problem - PowerPoint PPT Presentation

Transcript of A Theoretical Framework for Adaptive Collection Designs

  • A Theoretical Framework for Adaptive Collection DesignsJean-Franois Beaumont, Statistics CanadaDavid Haziza, Universit de MontralInternational Total Survey Error WorkshopQubec, June 19-22, 2011

  • OverviewSelected literature reviewFrameworkDefinition of the problemChoice of quality indicator and cost functionMathematical formulation of the problemSolution and discussionConclusion*

  • Literature review: Groves & Heeringa (2006, JRSS, Series A)

    Responsive designs: Use paradata to guide changes in the features of data collection in order to achieve higher quality estimates per unit costParadata: Data about data collection processExamples of features: mode of data collection, use of incentives , Need to define quality and determine quality indicatorsTwo main concepts: phase and phase capacity*

  • Literature review: Groves & Heeringa (2006, JRSS, Series A)

    Phase: Period of data collection during which the same set of methods is usedPhase 1: gather information about design featuresPhases 2+: alter features (e.g., subsampling of nonrespondents, larger incentives, )A phase is continued until its phase capacity is reachedJudged by the stability of an indicator as the phase matures*

  • Literature review: Schouten, Cobben & Bethlehem (2009, SM)

    Goal: determine an indicator of nonresponse bias as an alternative to response ratesProposed a quality indicator, called R-indicator:

    Population standard deviation must be estimated Response probabilities, , must be estimated using some modelAn issue: indicator depends on the proper choice of model (choice of auxiliary variables)*

  • Literature review: Schouten, Cobben & Bethlehem (2009, SM)

    Another issue: indicator does not depend on the variables of interest but nonresponse bias doesMaximal bias of : is the unadjusted estimator of the population mean:

    Two limitations of maximal bias (and R-indicator):unadjusted estimator is rarely used in practicedepends on proper specification of*

  • Literature review: Peytchev, Riley, Rosen, Murphy & Lindblad (2010, SRM)

    Goal: Reduce nonresponse bias through case prioritizationSuggest targeting individuals with lower estimated response probabilitiesFor instance, give them larger incentives or give interviewer incentivesTheir approach is basically equivalent to trying to increase the R-indicator (or achieving a more balanced sample)Recommend using auxiliary variables that are associated with the variables of interest*

  • Literature review: Laflamme & Karaganis (2010, ECQ)

    Development and implementation of responsive designs for CATI surveys at Statistics CanadaPlanning phase: before data collection starts (determination of strategies, analyses of previous data, )Initial collection phase: evaluate different indicators to determine when the next phase should startTwo Responsive Designs (RD) phases *

  • Literature review: Laflamme & Karaganis (2010, EQC)

    RD phase 1: prioritize cases (based on paradata or other information) with the objective of improving response rates increase the number of respondents (desirable)RD phase 2: prioritize cases with the objective of reducing the variability of response rates between domains of interest (increasing R-indicator) likely reduce the variability of weight adjustments (desirable)*

  • Literature review: Schouten, Calinescu & Luiten (2011, Stat. Netherlands)

    First paper to propose a theoretical framework for adaptive survey designsSuggest:Maximizing quality for a given cost; orMinimizing cost for a given qualityRequires a quality indicator (e.g., overall response rate, R-indicator, Maximal bias, )Which one to use?*

  • Definition of the problem

    Adaptive collection design: Any procedure of calls prioritization or resources allocation that is dynamic as data collection progressesUse paradata (or other information) to adapt itself to what is observed during data collectionFocus on calls prioritizationOur objective: Maximize quality for a given costContext: CATI surveys*

  • Choice of quality indicator

    Focus of the literature: Find collection designs that reduce nonresponse bias (or maximize R-indicator) of an unadjusted estimatorWe think the focus should not be on nonresponse bias. Why?Any bias that can be removed at the collection stage can also be removed at the estimation stage We suggest reducing nonresponse variance of an estimator adjusted for nonresponse*

  • Quality indicator

    Suppose we want to estimate the total:Assuming that nonresponse is uniform within cells, an asymptotically unbiased estimator is:

    Quality indicator: The nonresponse variance

    *

  • Overall cost

    Overall cost:

    *

  • Expected overall cost

    Expected overall cost:*

  • Mathematical formulation

    Objective: Find that minimizes the nonresponse variance

    subject to a fixed expected overall cost,Solution:

    Note:Equivalent to maximizing the R-indicator only in a very special scenario

    *

  • Implementation

    Find the effort (number of attempts) necessary to achieve the target response probability

    Procedure: Select cases to be interviewed with probability proportional to the effortIssues:1) Avoid small estimated to avoid an unduly large effort2) Might want to ensure that a certain time has elapsed between two consecutive calls *

  • Graph of variance vs cost

    Minimum nonresponse varianceExpected overall cost*

    Chart1

    2020

    1010

    6.676.67

    55

    44

    3.333.33

    2.862.86

    2.52.5

    2.222.22

    22

    11

    Variance

    Variance

    Sheet1

    CostVariance

    120

    210

    36.67

    45

    54

    63.33

    72.86

    82.5

    92.22

    102

    201

  • Revised solution Solution of the optimization problem is found before data collection startsMay be a good idea to revise the solution periodically (e.g., daily)Some parameters might need to be modifiedUpdate remaining budget and expected overall costThe revised optimization problem is similar to the initial one*

  • Revised solutionSolution (same as before):

    Revised target response probability:

    Effort:*Could be negative

  • Conclusion

    Next steps:Simulation studyAdapt the theory for practical applicationsTest in a real production environmentWhich quality indicator? Nonresponse variance? Others?Reduction of nonresponse bias: subsampling of nonrespondentsOur approach could be used within the subsample*

  • Thanks - Merci For more information, please contact:Pour plus dinformation, veuillez contacter :Jean-Franois Beaumont ([email protected])David Haziza ([email protected])*

    ***