A REVIEW OF CONCEPTUAL APPROACHES AND ......ALEXANDER WENZ There is an ongoing debate in the survey...

33
A REVIEW OF CONCEPTUAL APPROACHES AND EMPIRICAL EVIDENCE ON PROBABILITY AND NONPROBABILITY SAMPLE SURVEY RESEARCH CARINA CORNESSE * ANNELIES G. BLOM DAVID DUTWIN JON A. KROSNICK EDITH D. DE LEEUW ST EPHANE LEGLEYE JOSH PASEK DARREN PENNAY BENJAMIN PHILLIPS JOSEPH W. SAKSHAUG BELLA STRUMINSKAYA ALEXANDER WENZ There is an ongoing debate in the survey research literature about whether and when probability and nonprobability sample surveys pro- duce accurate estimates of a larger population. Statistical theory provides a justification for confidence in probability sampling as a function of the survey design, whereas inferences based on nonprobability sampling are entirely dependent on models for validity. This article reviews the CARINA CORNESSE is a postdoctoral researcher; she is with the Collaborative Research Center SFB 884 “Political Economy of Reforns” at the University of Mannheim, 68131 Mannheim, Germany. ANNELIES G. BLOM is Professor for Data Science at the Department of Political Science, School of Social Sciences, and Principal Investigator of the German Internet Panel (GIP) at the Collaborative Research Center (SFB 884) “Political Economy of Reforms” at the University of Mannheim, 68131 Mannheim, Germany. DAVID DUTWIN is Senior Vice President; he is with NORC at the University of Chicago, 55 East Monroe Street, 30th Floor, Chicago, IL 60603, USA. JON A. KROSNICK is Frederic O. Glover Professor in Humanities and Social Sciences, Professor of Communication, and Professor of Political Science at Stanford University, 432 McClatchy Hall, 450 Serra Mall, Stanford, CA 94305, USA. EDITH D. DE LEEUW is Professor Emerita of Survey Methodology at the Department of Methodology and Statistics at Utrecht University, Sjoerd Groenmangebouw, Padualaan 14, 3584 CH Utrecht, Netherlands. ST EPHANE LEGLEYE is Head of the Team “Drug Use in the General Population” at CESP, Inserm, facult e de m edecine UVSQ, facult e de m edecine, Universit e Paris-Saclay, Universit e Paris-Sud and Head of the “Household Living Conditions” Unit at Institut National de la Statistique et des Etudes doi: 10.1093/jssam/smz041 Advance access publication 9 January 2020 V C The Author(s) 2020. Published by Oxford University Press on behalf of the American Association for Public Opinion Research. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact [email protected] Journal of Survey Statistics and Methodology (2020) 8, 4–36 Downloaded from https://academic.oup.com/jssam/article-abstract/8/1/4/5699631 by guest on 22 May 2020

Transcript of A REVIEW OF CONCEPTUAL APPROACHES AND ......ALEXANDER WENZ There is an ongoing debate in the survey...

  • A REVIEW OF CONCEPTUAL APPROACHES ANDEMPIRICAL EVIDENCE ON PROBABILITY ANDNONPROBABILITY SAMPLE SURVEY RESEARCH

    CARINA CORNESSE*

    ANNELIES G. BLOMDAVID DUTWINJON A. KROSNICKEDITH D. DE LEEUWST�EPHANE LEGLEYEJOSH PASEKDARREN PENNAYBENJAMIN PHILLIPSJOSEPH W. SAKSHAUGBELLA STRUMINSKAYAALEXANDER WENZ

    There is an ongoing debate in the survey research literature aboutwhether and when probability and nonprobability sample surveys pro-duce accurate estimates of a larger population. Statistical theory providesa justification for confidence in probability sampling as a function of thesurvey design, whereas inferences based on nonprobability sampling areentirely dependent on models for validity. This article reviews the

    CARINA CORNESSE is a postdoctoral researcher; she is with the Collaborative Research Center SFB884 “Political Economy of Reforns” at the University of Mannheim, 68131 Mannheim, Germany.ANNELIES G. BLOM is Professor for Data Science at the Department of Political Science, School ofSocial Sciences, and Principal Investigator of the German Internet Panel (GIP) at theCollaborative Research Center (SFB 884) “Political Economy of Reforms” at the University ofMannheim, 68131 Mannheim, Germany. DAVID DUTWIN is Senior Vice President; he is withNORC at the University of Chicago, 55 East Monroe Street, 30th Floor, Chicago, IL 60603,USA. JON A. KROSNICK is Frederic O. Glover Professor in Humanities and Social Sciences,Professor of Communication, and Professor of Political Science at Stanford University, 432McClatchy Hall, 450 Serra Mall, Stanford, CA 94305, USA. EDITH D. DE LEEUW is ProfessorEmerita of Survey Methodology at the Department of Methodology and Statistics at UtrechtUniversity, Sjoerd Groenmangebouw, Padualaan 14, 3584 CH Utrecht, Netherlands. ST�EPHANELEGLEYE is Head of the Team “Drug Use in the General Population” at CESP, Inserm, facult�e dem�edecine UVSQ, facult�e de m�edecine, Universit�e Paris-Saclay, Universit�e Paris-Sud and Head ofthe “Household Living Conditions” Unit at Institut National de la Statistique et des �Etudes

    doi: 10.1093/jssam/smz041 Advance access publication 9 January 2020VC The Author(s) 2020. Published by Oxford University Press on behalf of the American Association for Public Opinion Research.This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License(http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction inany medium, provided the original work is properly cited. For commercial re-use, please contact [email protected]

    Journal of Survey Statistics and Methodology (2020) 8, 4–36

    Dow

    nloaded from https://academ

    ic.oup.com/jssam

    /article-abstract/8/1/4/5699631 by guest on 22 May 2020

  • current debate about probability and nonprobability sample surveys. Wedescribe the conditions under which nonprobability sample surveys mayprovide accurate results in theory and discuss empirical evidence onwhich types of samples produce the highest accuracy in practice. Fromthese theoretical and empirical considerations, we derive best-practicerecommendations and outline paths for future research.

    KEYWORDS: Accuracy; Nonprobability sampling; Probabilitysampling; Survey data quality; Survey inference; Weightingadjustments.

    1. INTRODUCTION

    In recent years, several cases of mispredicted election outcomes have made thenews across the world. Prominent examples include the 2015 Israeli parliamen-tary election where the majority of polls predicted that Benjamin Netanyahu

    �Economiques, 88, avenue Verdier, CS 70058, 92541 Montrouge cedex, France. JOSH PASEK isAssociate Professor, Department of Communication and Media and Faculty Associate, Center forPolitical Studies, Institute for Social Research at the University of Michigan, 105 S. State Street,5413 North Quad, Ann Arbor MI 48109. DARREN PENNAY is Founder, Board Member, andExecutive Director Research, Methods & Strategy at the Social Research Centre, Campus visitor atAustralian National University Centre for Social Research, Level 9, 277 William St, MelbourneVictoria Australia 3000 and Methods and Adjunct Professor with the Institute for Social ScienceResearch (ISSR), University of Queensland. BENJAMIN PHILLIPS is Senior Research Director, SurveyMethodology at the Social Research Centre, Level 9, 277 William St, Melbourne Victoria Australia3000 and Campus Visitor at the Centre for Social Research & Methods, Australian NationalUniversity. JOSEPH W. SAKSHAUG is Distinguished Researcher, Head of the Data Collection and DataIntegration Unit, and Acting Head of the Statistical Methods Research Department at the Institutefor Employment Research (IAB), 104 Regensburgerstr. 90478 Nuremberg, Germany, UniversityProfessor of Statistics in the Department of Statistics at the Ludwig Maximilian University ofMunich, 80539 Munich, Germany, and Honorary Full Professor in the School of Social Sciences atthe University of Mannheim, 68131 Mannheim, Germany. BELLA STRUMINSKAYA is assistant profes-sor with the Department of Methodology and Statistics at the University of Utrecht, SjoerdGroenmangebouw Padualaan 14, 3584 CH Utrecht, Netherlands. ALEXANDER WENZ is a postdoctoralresearcher with the Collaborative Research Center SFB 884 “Political Economy of Reforms” at theUniversity of Mannheim, 68131 Mannheim, Germany and the Institute for Social and EconomicResearch at the University of Essex. The authors would like to thank the Collaborative ResearchCenter (SFB) 884 “Political Economy of Reforms” (projects A8 and Z1), funded by the GermanResearch Foundation (DFG), for organizing the Workshop on Probability-based and NonprobabilitySurvey Research in July 2018 at the University of Mannheim. In addition, the authors would like tothank Melvin John, Sabrina Seidl, and Nourhan Elsayed for their assistance with manuscript prepa-ration. The authors are grateful to Andrew Mercer for valuable input and feedback on earlier ver-sions of this manuscript. The order of authors is alphabetical after the lead author.

    *Address correspondence to Carina Cornesse, Collaborative Research Center SFB 884 “PoliticalEconomy of Reforms,” University of Mannheim, 68131 Mannheim, Germany, E-mail: [email protected]

    Review of Probability and Nonprobability Research 5

    Dow

    nloaded from https://academ

    ic.oup.com/jssam

    /article-abstract/8/1/4/5699631 by guest on 22 May 2020

  • would lose his presidency,1 the 2016 Brexit referendum where the majority ofpolls predicted that Britain would vote to remain in the European Union,2 andthe 2016 US presidential election where the majority of polls predicted thatHillary Clinton would defeat Donald Trump.3 When investigating potentialreasons for these and other polling failures, researchers have pointed towardthe fact that election polls based on probability samples usually reached moreaccurate predictions than election polls based on nonprobability samples(Sohlberg, Gilljam, and Martinsson 2017; Sturgis, Kuha, Baker, Callegaro,and Fisher 2018).

    The finding that election polls based on probability samples usually reachbetter predictions than election polls based on nonprobability samples is notnew. Already in the 1920s and 1930s, the scientific community debated whichsampling design was better: probability sampling, as initially introduced byArthur L. Bowley in 1906, or nonprobability sampling, as initially introducedby Anders N. Kiaer in 1895 (Lessler and Kalsbeek 1992; Bethlehem 2009).After several dramatic cases of mispredicted election outcomes in the UnitedStates (Literary Digest in 1936, see Crossley 1937; Gallup, Crossley, andRoper in the polling debacle of 1948, see Converse 1987), nonprobability sam-pling was identified as a principal cause of prediction inaccuracy and wasreplaced by probability sampling in most high-quality social research.

    With the rise of the internet in the late 20th century, however, nonprobabil-ity sampling rose to popularity again as a fast and cheap method for recruitingonline panels (Göritz, Reinhold, and Batinic 2000). However, nonprobabilityonline panels face a number of challenges, such as noncoverage of peoplewithout internet access and selection bias due to the reliance on conveniencesamples of volunteers who might participate in multiple online panels(Bethlehem 2017). Despite these challenges, a vast amount of opinion polls to-day are conducted using nonprobability online panels (Callegaro, Villar,Yeager, and Krosnick 2014a; Callegaro, Baker, Bethlehem, Göritz, andKrosnick 2014b). In addition, nonpanel-based recruitment of online respond-ents, for example through river sampling (American Association for PublicOpinion Research 2013) has been on the rise. As a result, the majority of sur-vey data collected online around the world today rely on nonprobability sam-ples (Callegaro et al. 2014a, 2014b).

    Regardless of whether they are designed to predict election outcomes or tomeasure public opinion and regardless of whether they are conducted online oroffline, when they are used for research and polling purposes, probability and

    1. https://edition.cnn.com/2015/03/18/middleeast/israel-election-polls/index.html, accessed onNovember 30, 2019.

    2. https://www.theguardian.com/politics/2016/jun/24/how-eu-referendum-pollsters-wrong-opin-ion-predict-close, accessed on September 30, 2019.

    3. https://www.forbes.com/sites/startswithabang/2016/11/09/the-science-of-error-how-polling-botched-the-2016-election/#babf86437959, accessed on September 30, 2019.

    6 Cornesse et al.

    Dow

    nloaded from https://academ

    ic.oup.com/jssam

    /article-abstract/8/1/4/5699631 by guest on 22 May 2020

    https://edition.cnn.com/2015/03/18/middleeast/israel-election-polls/index.htmlhttps://www.theguardian.com/politics/2016/jun/24/how-eu-referendum-pollsters-wrong-opinion-predict-closehttps://www.theguardian.com/politics/2016/jun/24/how-eu-referendum-pollsters-wrong-opinion-predict-closehttps://www.forbes.com/sites/startswithabang/2016/11/09/the-science-of-error-how-polling-botched-the-2016-election/#babf86437959https://www.forbes.com/sites/startswithabang/2016/11/09/the-science-of-error-how-polling-botched-the-2016-election/#babf86437959

  • nonprobability sample surveys often share a common goal: to efficiently esti-mate the characteristics of a large population based on measurements of a smallsubset of the population. Therefore, both probability and nonprobability sam-ple surveys require that (i) the sampled units are exchangeable with non-sampled units that share the same measured characteristics, (ii) no parts of thepopulation are systematically excluded entirely from the sample, and (iii) thecomposition of the sampled units with respect to observed characteristics eithermatches or can be adjusted to match the composition of the larger population(Mercer, Kreuter, Keeter, and Stuart 2017).

    Despite their shared objective of providing accurate insights into a popula-tion of interest, probability and nonprobability sample surveys differ in a criti-cal aspect. The key difference between probability and nonprobability samplesurveys lies in the type and strength of the justification for why each approachshould achieve accuracy. In the case of probability sample surveys, the justifi-cation is probability sampling theory, which is based on a set of establishedmathematical principles (Fisher 1925; Neyman 1934; Kish 1965). This soundtheoretical basis makes it possible to compute the accuracy of estimates (e.g.,in the form of confidence intervals or margins of error) and gives a universalvalidity to the estimation method. Furthermore, because providers of probabil-ity sample surveys routinely describe the details of the data-generating process,researchers are able to make adjustments that account for potential coverage,sampling, and nonresponse biases (e.g., Harter, Battaglia, Buskirk, Dillman,and English 2016; Blumberg and Luke 2017).

    For nonprobability sample surveys, the justification for expecting accuratemeasurements rests on untested modeling assumptions that are based on aresearcher’s beliefs about the characteristics that make the sample different fromthe rest of the population and how those characteristics relate to the researchtopic. These assumptions can take different forms, such as quasi-randomizationor superpopulation modeling (Deville 1991; Elliott and Valliant 2017).However, there is no general statistical theory of nonprobability sampling thatjustifies when and why accurate inferences can be expected: the validity istopic and survey dependent. Furthermore, online nonprobability sample pro-viders often consider their data collection procedures to be proprietary, thusmaking it difficult or impossible to know what factors to include in anymodel aimed at correcting for selection bias in key estimates (Mercer et al.2017).

    This article is intended to move the debate about probability and nonprob-ability sample surveys forward: We first describe the assumptions that must bemade in order to expect nonprobability samples to yield accurate results (sec-tion 2). We then summarize the empirical evidence on the accuracy of proba-bility and nonprobability sample surveys to date (section 3). Finally, weconclude our review with practical recommendations and paths for future re-search (section 4).

    Review of Probability and Nonprobability Research 7

    Dow

    nloaded from https://academ

    ic.oup.com/jssam

    /article-abstract/8/1/4/5699631 by guest on 22 May 2020

  • 2. CONCEPTUALIZATION OF NONPROBABILITYSAMPLING APPROACHES

    The Total Survey Error (TSE; Groves and Lyberg 2010) framework that formsthe bedrock of quality assessments for probability samples does not cleanlytranslate to the world of nonprobability sample data collection. Nonprobabilitysamples do not involve a series of well-controlled and well-understood depar-tures from a perfect sampling frame. Instead, most such samples rely on a col-lection of convenience samples that are aggregated and/or adjusted, with thegoal of reducing the final difference between sample and population.Nonprobability samples cannot be evaluated by quantifying and summing theerrors that occur at each stage of the sampling process. Instead, nonprobabilitysamples can only be evaluated by assessing how closely the final modeledsample compares to the population in terms of various characteristics.

    Although little has been proposed by way of formal statistical theory justify-ing the use of nonprobability samples for population inferences, the methodsadopted for engaging with these samples suggest that a few combinations ofassumptions could justify such an approach. In general, justifications can stemfrom four basic types of claims: (i) that any sample examining a particularquestion will yield the same inferences, (ii) that the specific design of the sam-ple, as related to the questions at hand, will produce conclusions that mirror thepopulation of interest, (iii) that a series of analytical steps will account for anydifferences between the sample and the population, and (iv) that the particularcombination of sample and/or analytic approaches will produce accurate popu-lation estimates. Hence, the suggestion that any particular method, when usedfor a specific research question, is appropriate depends on underlying claimsabout the question of interest, the sample, and any adjustment procedures used.

    2.1 Design Ignorability Due to the Question of Interest

    In some cases, the method of sampling may be unrelated to the phenomena ofinterest. For instance, researchers may be trying to understand some processthat occurs for all individuals, as in most physiological and some psychologicalstudies. Under these circumstances, it may be reasonable to presume that anygiven group of individuals would behave like any other group of individualsunless there are known confounding factors. Researchers may claim that non-probability sampling is irrelevant when answering particular questions for afew different reasons. They may believe that the process they are investigatingis universal and, thus, that all people would behave similarly. On a more lim-ited scope, they might contend that the particular phenomenon they are study-ing is appropriately distributed in any broad population sample and that thespecific composition of that sample is unlikely to influence their conclusions.

    The suggestion that some particular inference is unrelated to the sample be-ing drawn could derive either from theoretical expectations of orthogonality or

    8 Cornesse et al.

    Dow

    nloaded from https://academ

    ic.oup.com/jssam

    /article-abstract/8/1/4/5699631 by guest on 22 May 2020

  • prior empirical evidence of orthogonality. Of particular note, there are sometheoretical and empirical reasons to believe that certain classes of inferencemay be more or less susceptible to sample imbalance. Some scholars have ar-gued that trends over time in attitudes and behaviors should be less stronglydependent on sample composition than estimates of the distributions of thoseattitudes and behaviors (Page and Shapiro 1992). Similarly, a few empiricalstudies have found that relations between variables were more similar acrossprobability and nonprobability samples than other types of estimates (Berrens,Bohara, Jenkins-Smith, Silva, and Weimer 2003; Pasek 2016). The claim thatsome kinds of inferences may be made equivalently well regardless of sam-pling strategy may sometimes be correct. The challenge, however, is determin-ing when this might be the case.

    2.2 Fit for Purpose Designs

    Researchers do not need to establish that the question they are studying is im-pervious to sampling strategies to make the case that a nonprobability sampleis appropriate for their inferences. Instead, they can assert that the designemployed mitigates whatever biases might have emerged in the sampling pro-cess. The classic example of this type of argument stems from the use of quotasamples. Quota samples are typically designed to ensure that the set ofrespondents matches the population on certain key demographic parameters.The idea underlying this approach is that the demographic parameters thatform the basis for the quotas capture the sources of bias for researchers’ infer-ences. To the extent that this is true, inferences made from quota samples willbe accurate because all potential confounds are neutralized by the design. Thatis, the remaining error induced from the sampling process would be orthogonalto the questions of interest.

    Notably, demographic quotas are not the only way to select respondentssuch that they reflect the population across key confounds. Scholars have pro-posed techniques ranging from matching individuals in nonprobability sampleswith individuals from probability samples as a means to recruit respondents tosurveys (Rivers 2007) to blending together samples drawn from sources thathave known opposing biases (Comer 2019). For any of these processes, if a re-searcher can be confident that the sample selection strategy eliminates all po-tential confounds for their particular question of interest, then the use of thatsampling strategy is not only justifiable but will yield accurate inferences.

    The challenge with these sorts of approaches is that the accuracy of criticalassumptions can only really be established empirically. It is also unclear whatto make of evidence that a particular conclusion is robust to a particular sam-pling decision. It may be the case that the nature of the question and/or type ofinference renders that conclusion accurate for any similar question on a simi-larly derived sample or it might be that the particularities of a single analysis

    Review of Probability and Nonprobability Research 9

    Dow

    nloaded from https://academ

    ic.oup.com/jssam

    /article-abstract/8/1/4/5699631 by guest on 22 May 2020

  • happen to have yielded identical conclusions by mere chance. There is no obvi-ous way to establish which of these is the case, though claims of empirical ro-bustness are strengthened by a clear theoretical rationale.

    2.3 Global Adjustment Approaches

    In the next two sections, we describe modeling approaches that have been usedto improve the accuracy of nonprobability sample data. Researchers have longknown that even probability samples are sometimes inaccurate either bychance or due to variations in the likelihood that certain subgroups of the popu-lation will participate in a survey. For this reason, many statistical adjustmentprocedures typically used to adjust for systematic biases in probability sampleshave been adopted to adjust for selection biases in nonprobability samples.These approaches can be divided into two types: global adjustments andoutcome-specific adjustments. Global adjustments refer to approaches that usea model to create a single adjustment that can be applied in any subsequentanalysis, regardless of the outcome of interest. Outcome-specific adjustmentstailor the adjustment model to a specific outcome of interest.

    Regarding global adjustments, one commonly used approach is calibrationweighting (Deville and S€arndal 1992; Roshwalb, Lewis, and Petrin 2016;Santoso, Stein, and Stevenson 2016). Calibration weighting involves weight-ing the respondent pool such that the weighted sample totals of a certain char-acteristic correspond to known population totals of that same characteristic.The known population totals might come from census data, official statistics,or other data sources assumed to be of high quality. The procedure produces aglobal adjustment weight that can be applied to the analysis of any outcomevariable. Using such weights amounts to making the assumption that once theknown sources of deviation are accounted for in the adjustment procedure, theremaining errors will be unrelated to the likelihood that a particular unit in thepopulation participated in the survey. This strategy presumes that the sampledunits within specified population subgroups will be roughly equivalent to thenonsampled units within those subgroups with respect to any inferences thatwill be made with the data.

    Although calibration weighting only requires access to population-levelbenchmark data, alternative global adjustment procedures make use of unit-level reference data to improve the accuracy of nonprobability sample esti-mates. One approach, known as sample matching, attempts to compose abalanced nonprobability sample by selecting units from a very large frame,such as a list of registered members of an opt-in panel, based on an array ofauxiliary characteristics (often demographic) that closely match to the charac-teristics of units from a reference probability sample (Rivers 2007; Vavreckand Rivers 2008; Bethlehem 2016). The matching procedure, which may beperformed before any units are invited to the nonprobability survey, relies on a

    10 Cornesse et al.

    Dow

    nloaded from https://academ

    ic.oup.com/jssam

    /article-abstract/8/1/4/5699631 by guest on 22 May 2020

  • distance metric (e.g., Euclidean distance) to identify the closest match betweenpairs of units based on the set of common auxiliary characteristics.

    Another approach that uses unit-level reference data is propensity scoreweighting. This approach is performed after the survey data have been col-lected from units in a nonprobability sample. The basic procedure is to verti-cally concatenate the nonprobability sample survey data with a referencedataset, typically a large probability sample survey. Then a model (e.g., logitor probit) is fitted using variables measured in both datasets to predict the prob-ability that a particular unit belongs to the nonprobability sample (Rosenbaumand Rubin 1983, 1984; Lee 2006; Dever, Rafferty, and Valliant 2008; Valliantand Dever 2011). A weight is then constructed based on the inverse of this esti-mated inclusion probability and used in any subsequent analysis of the non-probability survey data. Like calibration weighting, propensity score weightingonly works if two conditions are satisfied: (i) the weighting variables and thepropensity of response in the sample are correlated; and (ii) the weighting vari-ables are correlated with the outcome variables of interest. A related approachis to use the concatenated dataset to fit a prediction model using variables mea-sured in both datasets, which is then used to impute the values of variables forunits in the nonprobability sample that were only observed for units in the ref-erence probability sample (Raghunathan 2015).

    All of the previously described adjustment approaches do not require theuse of the reference data beyond the weighting, matching, or imputation stepsand are discarded during the analysis of the nonprobability survey data.Alternative approaches combine both data sources and analyze them jointly.One such approach is pseudo design-based estimation (Elliott 2009; Elliott andValliant 2017), where pseudo-inclusion probabilities are estimated for the non-probability sample units based on a set of variables common to both nonprob-ability and probability samples. Different techniques may be used to estimatethe inclusion probabilities. For example, one could concatenate the nonprob-ability and probability datasets and predict the probability of participating inthe nonprobability survey, similar to the aforementioned propensity scoreweighting procedure. Alternatively, one could employ sample matching withboth surveys and donate a probability sample unit’s inclusion probability tothe closest recipient match in the nonprobability sample. Once the pseudo-inclusion probabilities have been assigned to all nonprobability sample units,then these units can be treated as if they were selected using the same underly-ing sampling mechanism as the probability sample units. The datasets maythen be combined and analyzed jointly by using the actual and pseudo weights.For variance estimation, Elliott and Valliant (2017) recommend the use ofdesign-based resampling approaches, such as the bootstrap or jackknife to ac-count for variability in both the pseudo weights and the target quantity of inter-est. For nonprobability samples that have an underlying cluster structure (e.g.,different types of persons recruited from different web sites), cluster resam-pling approaches should be used.

    Review of Probability and Nonprobability Research 11

    Dow

    nloaded from https://academ

    ic.oup.com/jssam

    /article-abstract/8/1/4/5699631 by guest on 22 May 2020

  • Another approach to combining and analyzing probability and nonprobabil-ity samples jointly is blended calibration (DiSogra, Cobb, Chan, and Dennis2011; Fahimi, Barlas, Thomas, and Buttermore 2015). Blended calibration is aform of calibration weighting that combines a weighted probability samplewith an unweighted nonprobability sample and calibrates the combined sampleto benchmark values measured on units from the weighted probability samplesurvey. The combined sample is then analyzed using the calibrated weights.

    In summary, each of the previously described approaches produces a singleglobal adjustment that can be applied to any analysis regardless of the outcomevariable of interest. These methods entail the assumption that the selectionmechanism of the nonprobability sample is ignorable conditional on the varia-bles used in the adjustment method. For example, selection bias is assumed tobe negligible or nonexistent within subgroups used in the calibration and pro-pensity scoring procedures. It is further assumed that the adjustment variablesin the reference data source (census or probability sample survey) are measuredwithout error and are highly correlated with the target analysis variables andcorrelated with the probability to participate in the nonprobability samplesurvey. A very large and diverse reference data source with an extensive set ofcommon variables is therefore needed to maximize the validity of these strongassumptions. One should always keep in mind that global adjustment proce-dures may improve the situation for some estimates but not others, and there isno guarantee that biases in the nonprobability data will be removedcompletely. In practice, one never knows with certainty that ignorabilityassumptions hold, although in some cases it may be possible to place boundson the potential magnitude of bias through sensitivity analysis (Little, West,Boonstra, and Hu 2019).

    2.4 Outcome-Specific Adjustment Approaches

    Turning now to outcome-specific adjustment approaches, these approaches uti-lize adjustment models that are tailored to a specific outcome variable. That is,they adjust for the selection mechanism into a nonprobability sample with re-spect to a given outcome variable, Y, which is of interest to the researcher.Such approaches attempt to control for variables that govern the selection pro-cess and are correlated with the target outcome variable. One example of sucha framework is the notion that probability and nonprobability samples bothconstitute draws from a hypothetical infinite “superpopulation” (Deville 1991).The goal of the analyst is then to model the data-generating process of the un-derlying superpopulation by accounting for all relevant variables in the analy-sis model. In practice, this means that a researcher may fit a prediction modelfor some analysis variable Y based on the sample at hand, which is then usedto predict the Y’s for the nonsampled units. The sampled and nonsampled unitsare then combined to estimate the quantity of interest (e.g., mean, total,

    12 Cornesse et al.

    Dow

    nloaded from https://academ

    ic.oup.com/jssam

    /article-abstract/8/1/4/5699631 by guest on 22 May 2020

  • regression coefficient) for Y in the total population (Elliott and Valliant 2017).The key assumptions of this approach are that the analysis variable, Y, isexplained through a common model for the sampled and nonsampled units andthat all parameters governing the superpopulation model are controlled for inthe analysis model. The approach also requires the availability of auxiliarydata on the population to make predictions for the nonsampled units. Varianceestimation for the predictions can be implemented using a variety of frequentistmethods, including jackknife and bootstrap replication estimators as describedin Valliant, Dorfman, and Royall (2000).

    Another model-based approach, which can be applied in a superpopulationframework, is model-assisted calibration. This approach involves constructingcalibrated weights by using a model to predict the values for a given analysisvariable (Wu and Sitter 2001). The calibrated weights are generated based onconstraints placed on the population size and population total of the predictedvalues. Various model selection approaches (e.g., LASSO) have been pro-posed for parsimonious modeling of the analysis variable (Chen, Valliant, andElliott 2018). A key assumption of the method is that the model is correctlyspecified and capable of making reliable predictions across different samplesof the population. It is also assumed that all relevant superpopulation parame-ters are included in the model if the method is implemented in a superpopula-tion framework.

    Multilevel regression and poststratification is another approach used to esti-mate a specific outcome of interest from a nonprobability sample (Wang,Rothschild, Goel, and Gelman 2015; Downes, Gurrin, English, Pirkis, Currier2018). The basic idea is to fit a multilevel regression model predicting an out-come given a set of covariates. The use of a multilevel model makes it possibleto incorporate a large number of covariates or high order interactions into theprediction model (Ghitza and Gelman 2013). The model is then used to esti-mate the mean value for a large number of poststratification cells defined bythe cross-classification of all variables used in the regression model. It is neces-sary that the relative size of each cell in the population is known or can be reli-ably estimated from external data sources such as a census or populationregistry. Population quantities are estimated by aggregating these predictedcell means with each cell weighted proportionally to its share of the population.The multilevel regression model allows for cell-level estimates to be generatedeven when few units exist within the sample cells. The method can also beimplemented in a Bayesian hierarchical modeling framework (Park, Gelman,and Bafumi 2004). Like all of the previously mentioned model-basedapproaches, the model is assumed to control for all variables that affect theprobability of inclusion in the nonprobability sample. The method also requiresgood model fit, and large cell sizes are preferable to generate robust cell-levelestimates. For additional hierarchical and Bayesian modeling approaches thathave been proposed to estimate outcomes from nonprobability samples, we

    Review of Probability and Nonprobability Research 13

    Dow

    nloaded from https://academ

    ic.oup.com/jssam

    /article-abstract/8/1/4/5699631 by guest on 22 May 2020

  • refer to Ganesh, Pineau, Chakraborty, and Dennis (2017), Pfeffermann (2017),and Pfeffermann, Eltinge, and Brown (2015).

    Collectively, these approaches to using nonprobability sample data to reachpopulation-level conclusions depend on combinations of assumptions about ig-norable errors and the availability of information about the sources of nonran-domness in respondent selection that can be used to adjust for any errors thatare not ignorable. Although these dependencies are not fundamentally differentfrom the assumptions underlying the use of probability samples, they are moredifficult to rely on as we know little about the factors that lead individuals tobecome members of nonprobability samples.

    3. THE ACCURACY OF PROBABILITY ANDNONPROBABILITY SAMPLE SURVEYS

    Several studies have empirically assessed the accuracy of probability and non-probability sample surveys by comparing survey outcomes to external popula-tion benchmarks (see table 1). The vast majority of these studies concludedthat probability sample surveys have a significantly higher accuracy than non-probability sample surveys. Only a few studies have found that probabilitysample surveys do not generally have a significantly higher accuracy than non-probability sample surveys.

    Table 1 provides a list of the studies that are included in our overview. Akey inclusion requirement is that the studies contain comparisons of probabil-ity and nonprobability sample surveys with external population benchmarks.We do not include any studies published in languages other than English orthat contain sample accuracy assessments of probability and nonprobabilitysample surveys only as a minor byproduct or that compare probability samplesurveys with nonsurvey convenience samples, such as Amazon MTurk (e.g.,Coppock and McClellan, 2019).

    3.1 Initial Accuracy Comparisons for Probability and NonprobabilitySample Surveys

    As table 1 shows, a number of studies have demonstrated that probability sam-ple surveys have a higher accuracy than nonprobability sample surveys. Thehigher accuracy of probability sample surveys has been demonstrated acrossvarious topics, such as voting behavior (Malhotra and Krosnick 2007; Changand Krosnick 2009; Sturgis et al. 2018), health behavior (Yeager, Krosnick,Chang, Javitz, and Levendusky 2011), consumption behavior (Szolnoki andHoffmann 2013), sexual behavior and attitudes (Erens, Burkill, Couper,Conrad, and Clifton 2014; Legleye, Charrance, Razafindratsima, Bajos, andBohet 2018), and socio-demographics (Malhotra and Krosnick 2007; Chang

    14 Cornesse et al.

    Dow

    nloaded from https://academ

    ic.oup.com/jssam

    /article-abstract/8/1/4/5699631 by guest on 22 May 2020

  • Tab

    le1.

    Stud

    ies

    onth

    eA

    ccur

    acy

    ofP

    roba

    bilit

    y(P

    S)an

    dN

    onpr

    obab

    ility

    (NP

    S)Sa

    mpl

    eSu

    rvey

    s

    Stu

    dyC

    ount

    ryB

    ench

    mar

    kP

    Sm

    odes

    stud

    ied

    PS

    mor

    eac

    cura

    teth

    anN

    PS

    ?aD

    id(r

    e)w

    eigh

    ting

    suffi

    cien

    tlyre

    -du

    ceN

    PS

    bias

    ?b

    Blo

    met

    al.(

    2018

    )G

    erm

    any

    Cen

    sus

    data

    F2F

    ,web

    Yes

    (uni

    vari

    ate)

    No

    (rak

    ing)

    Ele

    ctio

    nda

    taM

    acIn

    nis

    etal

    .(20

    18)

    US

    AH

    igh-

    qual

    ityP

    SP

    hone

    ,web

    Yes

    (uni

    vari

    ate)

    No

    (pos

    tstr

    atifi

    catio

    n)D

    asso

    nnev

    ille

    etal

    .(20

    18)

    Bel

    gium

    Cen

    sus

    data

    F2F

    Yes

    (uni

    vari

    ate)

    No

    (uns

    peci

    fied

    appr

    oach

    )–

    univ

    aria

    teE

    lect

    ion

    outc

    ome

    No

    (mul

    tivar

    iate

    )N

    /A–

    mul

    tivar

    iate

    Leg

    leye

    etal

    .(20

    18)

    Fra

    nce

    Cen

    sus

    data

    Pho

    neY

    es(u

    niva

    riat

    e)N

    /AP

    enna

    yet

    al.(

    2018

    )A

    ustr

    alia

    Hig

    h-qu

    ality

    PS

    Pho

    neY

    es(u

    niva

    riat

    e)N

    o(p

    osts

    trat

    ifica

    tion)

    Stu

    rgis

    etal

    .(20

    18)

    UK

    Ele

    ctio

    nou

    tcom

    eF

    2FY

    es(u

    niva

    riat

    e)N

    o(r

    akin

    g,pr

    open

    sity

    wei

    ghtin

    g,m

    atch

    ing)

    Dut

    win

    and

    Bus

    kirk

    (201

    7)U

    SA

    Hig

    h-qu

    ality

    PS

    F2F

    ,pho

    neY

    es(u

    niva

    riat

    e)N

    o(p

    rope

    nsity

    wei

    ghtin

    g,m

    atch

    -in

    g,ra

    king

    )S

    ohlb

    erg

    etal

    .(20

    17)

    Sw

    eden

    Ele

    ctio

    nou

    tcom

    eP

    hone

    Yes

    (uni

    vari

    ate)

    N/A

    Brü

    ggen

    etal

    .(20

    16)

    Net

    herl

    ands

    Pop

    ulat

    ion

    regi

    ster

    F2F

    ,web

    Yes

    (uni

    vari

    ate)

    No

    (Gen

    eral

    ized

    Reg

    ress

    ion

    Est

    imat

    ion)

    –un

    ivar

    iate

    ,biv

    aria

    teY

    es(b

    ivar

    iate

    )K

    enne

    dyet

    al.(

    2016

    )U

    SA

    Hig

    h-qu

    ality

    PS

    Web

    ,mai

    lN

    o(u

    niva

    riat

    e)N

    /A–

    univ

    aria

    teN

    o(m

    ultiv

    aria

    te)

    N/A

    –m

    ultiv

    aria

    teP

    asek

    (201

    6)U

    SA

    Hig

    h-qu

    ality

    PS

    Pho

    neY

    es(u

    niva

    riat

    e)N

    o(r

    akin

    g,pr

    open

    sity

    wei

    ghtin

    g)–

    univ

    aria

    te,

    biva

    riat

    e,lo

    ngitu

    din

    alN

    o(b

    ivar

    iate

    )Y

    es(l

    ongi

    tudi

    nal)

    Con

    tinue

    d

    Review of Probability and Nonprobability Research 15

    Dow

    nloaded from https://academ

    ic.oup.com/jssam

    /article-abstract/8/1/4/5699631 by guest on 22 May 2020

  • Tab

    le1.

    Con

    tinue

    d

    Stu

    dyC

    ount

    ryB

    ench

    mar

    kP

    Sm

    odes

    stud

    ied

    PS

    mor

    eac

    cura

    teth

    anN

    PS

    ?aD

    id(r

    e)w

    eigh

    ting

    suffi

    cien

    tlyre

    -du

    ceN

    PS

    bias

    ?b

    Gitt

    elm

    anet

    al.(

    2015

    )U

    SA

    Hig

    h-qu

    ality

    PS

    Pho

    neN

    o(u

    niva

    riat

    e)N

    o(p

    osts

    trat

    ifica

    tion)

    Ans

    olab

    eher

    ean

    dS

    chaf

    fner

    (201

    4)U

    SA

    Hig

    h-qu

    ality

    PS

    Pho

    ne,m

    ail

    No

    (uni

    vari

    ate)

    N/A

    Ele

    ctio

    nou

    tcom

    eN

    o(m

    ultiv

    aria

    te)

    Ere

    nset

    al.(

    2014

    )U

    KH

    igh-

    qual

    ityP

    SC

    AS

    IY

    es(u

    niva

    riat

    e)N

    /AS

    tein

    met

    zet

    al.(

    2014

    )N

    ethe

    rlan

    dsH

    igh-

    qual

    ityP

    SW

    ebN

    o(u

    niva

    riat

    e)Y

    es(p

    rope

    nsity

    wei

    ghtin

    g)–

    uni-

    vari

    ate,

    biva

    riat

    eN

    o(b

    ivar

    iate

    )A

    nsol

    abeh

    ere

    and

    Riv

    ers

    (201

    3)U

    SA

    Hig

    h-qu

    ality

    PS

    F2F

    No

    (uni

    vari

    ate)

    N/A

    Ele

    ctio

    nou

    tcom

    eS

    zoln

    okia

    ndH

    offm

    ann

    (201

    3)G

    erm

    any

    Hig

    h-qu

    ality

    PS

    F2F

    ,pho

    neY

    es(u

    niva

    riat

    e)N

    /AC

    han

    and

    Am

    bros

    e(2

    011)

    Can

    ada

    Uns

    peci

    fied

    Web

    No

    (uni

    vari

    ate)

    N/A

    Sch

    erpe

    nzee

    lan

    dB

    ethl

    ehem

    (201

    1)N

    ethe

    rlan

    dsE

    lect

    ion

    outc

    ome

    F2F

    ,web

    Yes

    (uni

    vari

    ate)

    N/A

    Uns

    peci

    fied

    Yea

    ger

    etal

    .(20

    11)

    US

    AH

    igh-

    qual

    ityP

    SP

    hone

    ,web

    Yes

    (uni

    vari

    ate)

    No

    (pos

    tstr

    atifi

    catio

    n)C

    hang

    and

    Kro

    snic

    k(2

    009)

    US

    AH

    igh-

    qual

    ityP

    SP

    hone

    ,web

    Yes

    (uni

    vari

    ate)

    No

    (rak

    ing)

    Wal

    ker,

    Pet

    tit,

    and

    Rub

    inso

    n(2

    009)

    cU

    SA

    Uns

    peci

    fied

    Pho

    ne,m

    ail

    Yes

    (uni

    vari

    ate)

    N/A

    Loo

    svel

    dtan

    dS

    onck

    (200

    8)B

    elgi

    umC

    ensu

    sda

    taF

    2FN

    o(u

    niva

    riat

    e)N

    o(p

    osts

    trat

    ifica

    tion,

    prop

    ensi

    tyw

    eigh

    ting)

    16 Cornesse et al.

    Dow

    nloaded from https://academ

    ic.oup.com/jssam

    /article-abstract/8/1/4/5699631 by guest on 22 May 2020

  • Mal

    hotr

    aan

    dK

    rosn

    ick

    (200

    7)U

    SA

    Hig

    h-qu

    ality

    PS

    F2F

    Yes

    (uni

    vari

    ate)

    No

    (uns

    peci

    fied

    )B

    erre

    nset

    al.(

    2003

    )U

    SA

    Hig

    h-qu

    ality

    PS

    Web

    ,pho

    neN

    o(u

    niva

    riat

    e)Y

    es(r

    akin

    g,pr

    open

    sity

    wei

    ght-

    ing)

    –un

    ivar

    iate

    ,mul

    tivar

    iate

    No

    (mul

    tivar

    iate

    )

    a The

    stud

    yre

    sults

    repo

    rted

    here

    are

    base

    don

    initi

    alco

    mpa

    riso

    nsof

    accu

    racy

    asre

    port

    edby

    the

    auth

    ors.

    Som

    est

    udie

    sus

    era

    w(u

    nwei

    ghte

    d)da

    tain

    thei

    rin

    itial

    com

    pari

    sons

    ,w

    hile

    othe

    rsus

    eda

    tath

    atar

    eal

    read

    yw

    eigh

    ted

    usin

    gei

    ther

    wei

    ghts

    that

    the

    auth

    ors

    calc

    ulat

    edth

    emse

    lves

    orth

    ew

    eigh

    tsth

    ata

    surv

    eyve

    ndor

    deliv

    ered

    with

    the

    data

    .bW

    ere

    port

    whe

    ther

    (re)

    wei

    ghtin

    gha

    ssu

    ffici

    ently

    redu

    ced

    bias

    base

    don

    the

    auth

    ors’

    own

    judg

    men

    tsas

    repo

    rted

    inth

    eir

    conc

    lusi

    ons.

    Thi

    sco

    lum

    nin

    clud

    esst

    udie

    sth

    atco

    mpa

    rera

    w(u

    nwei

    ghte

    d)da

    tain

    thei

    rin

    itial

    com

    pari

    son

    ofac

    cura

    cyw

    ithth

    eda

    taaf

    ter

    aw

    eigh

    ting

    proc

    edur

    ew

    aspe

    r-fo

    rmed

    and

    stud

    ies

    that

    used

    wei

    ghte

    dda

    tain

    thei

    rin

    itial

    com

    pari

    son

    ofac

    cura

    cyan

    dre

    wei

    ghte

    dth

    eda

    tain

    subs

    eque

    ntco

    mpa

    riso

    ns.S

    tudi

    esth

    aton

    lyre

    port

    edra

    wda

    taor

    only

    wei

    ghte

    dda

    ta,w

    ithou

    tusi

    ngan

    yre

    wei

    ghtin

    gap

    proa

    ches

    ,are

    labe

    lled

    “not

    appl

    icab

    le”

    (N/A

    )in

    this

    colu

    mn.

    c As

    repo

    rted

    byC

    alle

    garo

    etal

    .(20

    14a,

    2014

    b).

    Review of Probability and Nonprobability Research 17

    Dow

    nloaded from https://academ

    ic.oup.com/jssam

    /article-abstract/8/1/4/5699631 by guest on 22 May 2020

  • and Krosnick 2009; Yeager et al. 2011; Szolnoki and Hoffmann 2013; Erenset al. 2014; Dutwin and Buskirk 2017; MacInnis, Krosnick, Ho, and Cho2018). In addition, the higher accuracy of probability sample surveys has beenfound across a number of countries, such as Australia (Pennay, Neiger,Lavrakas, and Borg 2018), France (Legleye et al. 2018), Germany (Szolnokiand Hoffmann 2013; Blom, Ackermann-Piek, Helmschrott, Cornesse, Bruch,and Sakshaug 2018), the Netherlands (Scherpenzeel and Bethlehem 2011;Brüggen, van den Brakel, and Krosnick 2016), Sweden (Sohlberg et al. 2017),the United Kingdom (Sturgis et al. 2018), and the United States (Malhotra andKrosnick 2007; Chang and Krosnick 2009; Yeager et al. 2011; Dutwin andBuskirk 2017; MacInnis et al. 2018). Furthermore, the higher accuracy ofprobability sample surveys has been shown over time, with the first study dem-onstrating higher accuracy of probability sample surveys in 2007 (Malhotraand Krosnick, 2007) to the most recent ones in 2018 (Blom et al. 2018;Legleye et al. 2018; MacInnis et al. 2018; Sturgis et al. 2018). All of thesestudies from different times and countries that focused on different topicsreached the conclusion that probability sample surveys led to more accurateestimates than nonprobability samples.

    A prominent study from this line of research is Yeager et al. (2011). In a se-ries of analyses of surveys conducted between 2004 and 2008 in the UnitedStates, the authors found that probability sample surveys were consistentlymore accurate than nonprobability sample surveys across many benchmarkvariables (primary demographics such as age, gender, and education; second-ary demographics such as marital status and homeownership; and nondemo-graphics such as health ratings and possession of a driver’s license), even afterpoststratification weighting.

    Another influential study that is based on a particularly rich database wasconducted with four probability sample face-to-face surveys, one probabilitysample online survey, and eighteen nonprobability sample online surveys (theso-called NOPVO [National Dutch Online Panel Comparison Study] project)in 2006 to 2008 in the Netherlands (Brüggen et al. 2016). In line with the find-ings from Yeager et al. (2011), the authors found that the probability face-to-face and internet surveys were consistently more accurate than the nonprob-ability internet surveys across a variety of sociodemographic variables and var-iables on health and life satisfaction.

    A recent example that focuses on the current debate about why polls mispre-dict election outcomes was published by Sturgis et al (2018). Investigatingwhy British polls mispredicted the outcome of the 2015 UK general election,Sturgis et al. (2018) examined data from twelve British pre-election polls andassessed a number of potential reasons for the polling debacle, such as whethervoters changed their mind at the last minute (i.e., “late swing”), whether themode in which the surveys were conducted (telephone or online) played a role,or whether polls failed to correct for the common problem of overestimatingvoter turnout (i.e., proper turnout weighting). The authors found that sample

    18 Cornesse et al.

    Dow

    nloaded from https://academ

    ic.oup.com/jssam

    /article-abstract/8/1/4/5699631 by guest on 22 May 2020

  • inaccuracy due to nonprobability sample selection likely had the biggest im-pact on the mispredictions.

    Although most studies have demonstrated that accuracy is higher in proba-bility sample surveys than in nonprobability sample surveys, two studies haveyielded mixed findings depending on the type of estimate examined (Pasek2016; Dassonneville, Blais, Hooghe, and Deschouwer 2018). Both studiesshow that accuracy is higher in probability sample surveys for univariate esti-mates. Pasek (2016) also reports higher accuracy of probability sample surveysfor longitudinal analyses. However, these studies found no difference in accu-racy regarding bivariate (Pasek 2016) and multivariate (Dassonneville et al.2018) estimates.

    Several studies have found no consistent superiority in accuracy of probabil-ity or nonprobability sample surveys over one another. These studies generallyyielded mixed findings: a probability sample survey was found to be more ac-curate than some but not all nonprobability sample surveys examined(Kennedy, Mercer, Keeter, Hatley, McGeeney, and Gimenez 2016); or proba-bility sample surveys were shown to be more accurate than nonprobabilitysample surveys on some variables while nonprobability sample surveys weremore accurate than probability sample surveys on other variables (Loosveldtand Sonck 2008; Chan and Ambrose 2011; Steinmetz, Bianchi, Tijdens, andBiffignandi 2014). In some of the studies, the authors speculated that it mightbe survey mode rather than the sampling design that led to comparable accu-racy (Berrens et al. 2003; Ansolabehere and Schaffner 2014; Gittelman,Thomas, Lavrakas, and Lange 2015).

    In general, it should be noted that sample accuracy assessments face somechallenges. One common challenge of such studies is to disentangle modeeffects (i.e., measurement bias) from sampling effects (i.e., selection bias).This challenge occurs because probability sample surveys are usually con-ducted offline (e.g., via face-to-face or telephone interviews), whereas non-probability sample surveys are usually conducted online via nonprobabilityonline panels. However, several studies show that it is possible to disentanglethe mode effect from the sampling effect by comparing offline probability sam-ple surveys with online probability sample surveys (mode effect) and compar-ing online probability sample surveys to online nonprobability sample surveys(sampling effect). The majority of these studies conclude that both offline andonline probability sample surveys are more accurate than nonprobability onlinesample surveys (Chang and Krosnick 2009; Scherpenzeel and Bethlehem2011; Yeager et al. 2011; Brüggen et al. 2016; Dutwin and Buskirk 2017;Blom et al. 2018; MacInnis et al. 2018).

    A related challenge that sample accuracy assessments of probability andnonprobability sample surveys face is the question of how to measure accuracyin a way that accounts for both sampling variability and systematic bias. Thischallenge occurs because often there is only one probability sample survey andone nonprobability sample survey available for sample accuracy assessment.

    Review of Probability and Nonprobability Research 19

    Dow

    nloaded from https://academ

    ic.oup.com/jssam

    /article-abstract/8/1/4/5699631 by guest on 22 May 2020

  • However, several large-scale studies that compare a larger number of probabil-ity sample surveys with a larger number of nonprobability sample surveyshave found that probability sample surveys are consistently more accurate thannonprobability sample surveys (Yeager et al. 2011; Brüggen et al. 2016; Blomet al. 2018; MacInnis et al. 2018; Sturgis et al. 2018). This suggests that al-though surveys vary on a number of design factors other than their samplingdesign (e.g., incentive schemes, contact frequency) and might sometimes bemore or less accurate by chance, samples are generally more likely to havehigher accuracy if they are based on probability sampling procedures ratherthan nonprobability sampling procedures.

    Another common challenge that sample accuracy assessments face is theavailability of appropriate gold standard benchmarks. In the current literature,the most commonly used benchmarks are large-scale, high-quality probabilitysample surveys. A typical example of such a benchmark is the US AmericanCommunity Survey (Yeager et al. 2011 and MacInnis et al. 2018). Otherbenchmarks used in the literature are population census data (Legleye et al.2018), election outcomes (Sohlberg et al. 2017), and population register data(Brüggen et al. 2016).

    All of these benchmarks have advantages and disadvantages. A key advan-tage of using a large-scale, high-quality probability sample survey is that theset of variables available for comparisons usually includes not only sociode-mographic variables but also substantive variables on attitudes and behavior. Adisadvantage is that large-scale, high-quality probability sample surveys aresurveys themselves and might therefore contain typical survey errors, such ascoverage, sampling, and nonresponse errors (Groves and Lyberg 2010).

    Census data and population register data have the advantage of not sufferingfrom survey errors. However, such data are often not available for the currentyear and might therefore be outdated at the time of the study. Population regis-ter data might also be outdated, for example, if immigration, emigration, births,and deaths are not captured in a timely manner. In addition, census data andpopulation register data are typically limited to a small set of sociodemo-graphic characteristics. With regard to election outcomes, an advantage is thatthey are key variables of substantive interest to many social scientists.However, if survey data fail to accurately predict election outcomes, there aremany potential explanations for this besides the sampling approach; seeSturgis et al. (2018) for a list of reasonable explanations tested after the Britishpolling disaster of 2015.

    3.2 Weighting Approaches to Reduce Bias in Nonprobability SampleSurveys

    Many studies examining the accuracy of probability and nonprobability samplesurveys have attempted to eliminate biases in nonprobability sample surveys.

    20 Cornesse et al.

    Dow

    nloaded from https://academ

    ic.oup.com/jssam

    /article-abstract/8/1/4/5699631 by guest on 22 May 2020

  • The majority of these studies found that weighting did not sufficiently reducebias in nonprobability sample surveys (see table 1). Generally speaking, proba-bility sample surveys were found to be more accurate than nonprobability sam-ple surveys even after (re)weighting.

    Although the majority of studies found that weighting did not reduce thebias in nonprobability sample surveys sufficiently (table 1), some studiesshowed that weighting did reduce the bias somewhat. However, whetherresearchers considered the bias to be sufficiently reduced by weighting variedfrom study to study. For example, Berrens et al. (2003) considered the bias in anonprobability sample survey sufficiently reduced even though an estimate ofmean household income deviated from the benchmark by between 4.8 percent-age points (after propensity weighting) and 11.9 percentage points (after rak-ing). A study concluding that weighting approaches sufficiently reduced biasin nonprobability sample surveys also reported that weighting increased thevariance of the estimates significantly (Steinmetz et al. 2014).

    Most of the studies listed in table 1 focus on the success of weighting proce-dures to reduce bias in nonprobability sample surveys. Only a few studieslisted in table 1 have also assessed the success of weighting procedures in re-ducing bias in probability sample surveys. For example, MacInnis et al. (2018)reported that weighting reliably eliminated the small biases present inunweighted probability sample survey data. This is in line with research byGittelman et al. (2015), who showed that poststratification weighting success-fully reduced biases in a probability sample survey but was less successfulacross a number of nonprobability survey samples and even increased bias inone instance.

    The studies listed in table 1 used one or more common weighting procedures,such as raking, poststratification, and propensity weighting, to improve the accu-racy of nonprobability sample survey measurements. Several other studies in theliterature have investigated the effectiveness of various weighting procedures inreducing bias in nonprobability sample surveys, without examining the accuracyof probability sample surveys. Table 2 provides an overview of these studies. Akey inclusion requirement is that studies assess whether weighting nonprobabilitysample surveys reduced bias as compared with unweighted estimates and if so, towhat extent. We again exclude all studies published in languages other thanEnglish and studies that contain assessments of weighting procedures for non-probability sample surveys only as a minor byproduct or that examine nonsurveydata.

    In general, the majority of the studies that investigated the effectiveness ofvarious weighting procedures in reducing bias in nonprobability sample sur-veys (table 2) reached the same conclusion as the studies that assessed weight-ing approaches in both probability and nonprobability sample surveys(table 1): weighting does not sufficiently reduce bias in nonprobability samplesurveys. Only a few studies found that weighting could sufficiently reduce biasin nonprobability sample surveys.

    Review of Probability and Nonprobability Research 21

    Dow

    nloaded from https://academ

    ic.oup.com/jssam

    /article-abstract/8/1/4/5699631 by guest on 22 May 2020

  • As table 2 shows, two of the studies that documented a sufficient reductionin the bias of nonprobability sample surveys applied multilevel regression andpoststratification weights (Wang et al. 2015; Gelman, Goel, Rothschild, andWang 2017), one study applied model-based poststratification (Goel, Obeng,and Rothschild 2015), and one study applied propensity weighting and calibra-tion (Lee and Valliant 2009). Two of these studies used weighting to adjustnonprobability sample survey data to accurately predict election outcomes afterthe actual election outcomes were already known, which reduces confidence inthe conclusions (Wang et al. 2015; Gelman et al. 2017).

    In sum, the majority of the research on weighting and accuracy finds thatthe inaccuracy of nonprobability samples cannot be reliably solved by weight-ing procedures. Some authors conducting such studies also offer explanationsas to why their attempts to achieve accurate estimates from weighted nonprob-ability samples were not successful. Mercer, Lau, and Kennedy (2018), for in-stance, show that complex weighting procedures outperform basic weightingprocedures. Furthermore, the authors show that to get accurate estimates fromnonprobability sample surveys by weighting, the availability of variables thatpredict the outcome of interest are more important than which statisticalmethod is used.

    4. CLOSING REMARKS

    In this article, we have reviewed conceptual approaches and empirical evi-dence on probability and nonprobability sample surveys. Probability samplingtheory is well established and based on sound mathematical principles,whereas nonprobability sampling is not. Although there are potential justifica-tions for drawing inferences from nonprobability samples, the rationale formany studies remains unarticulated, and the inferences from nonprobabilitysample surveys generally require stronger modeling assumptions than are nec-essary for probability samples. The basic problem with these modelingassumptions remains that they cannot be tested. We have therefore proposed aconceptual framework for nonprobability sample surveys to explicate thesemodeling assumptions, including practical suggestions about when it might bejustified to make such assumptions (section 2).

    In addition, we have summarized the empirical evidence on the accuracy ofprobability and nonprobability sample surveys (section 3). Our literature over-view shows that, even in the age of declining response rates, accuracy in prob-ability sample surveys is generally higher than in nonprobability samplesurveys. There is no empirical support to the claim that switching to nonprob-ability sample surveys is advisable because the steadily declining responserates across the globe compromise probability sample survey data quality.Based on the accumulated empirical evidence, our key recommendation is tocontinue to rely on probability sample surveys.

    22 Cornesse et al.

    Dow

    nloaded from https://academ

    ic.oup.com/jssam

    /article-abstract/8/1/4/5699631 by guest on 22 May 2020

  • In the case that only nonprobability sample survey data are available, werecommend carefully choosing among the various modeling approaches basedon their underlying assumptions. In order for researchers to be able to justifythe modeling approaches used, we recommend obtaining as much informationas possible about the data-generating process (see the transparency recommen-dations in the appendix).

    Apart from these key recommendations, this report also shows that there aregaps in the existing literature. To be able to evaluate if and when nonprobabil-ity sample surveys can be used as an alternative to probability sample surveys,

    Table 2. Studies Exclusively Investigating Weighting Procedures to Reduce Biasin Nonprobability Sample Surveys

    Study Benchmark Does weighting sufficiently reducebias in NPS?a

    Mercer et al. (2018) High-quality PS No (raking, propensity weighting,matching)

    Smyk, Tyrowicz, andVan der Velde (2018)

    High-quality PS No (propensity weighting)

    Gelman et al. (2017) High-quality PS No (raking)Election outcome Yes (multilevel regression and

    poststratification)Goel et al. (2015) High-quality PS No (raking)

    Yes (model-basedpoststratification)

    Wang et al. (2015) Election outcome Yes (multilevel regression andpoststratification)

    Lee and Valliant (2009) High-quality PS Yes (propensity weighting,calibration)

    Schonlau, van Soest,Kapteyn, and Couper(2009)

    High-quality PS No (propensity weighting, matching)

    Schonlau, van Soest, andKapteyn (2007)

    PS No (propensity weighting)

    Lee 2006 High-quality PS No (propensity weighting)Duffy, Smith, Terhanian,

    and Bremer (2005)High-quality PS No (raking, propensity weighting)

    Schonlau, Zapert,Simon, Sanstad, andMarcus (2004)

    PS No (poststratification, propensityweighting)

    Taylor (2000) High-quality PS þElection outcome

    Yes (raking, propensity weighting)

    aWe report whether weighting sufficiently reduced bias based on the authors’ ownjudgments as reported in their conclusions.

    Review of Probability and Nonprobability Research 23

    Dow

    nloaded from https://academ

    ic.oup.com/jssam

    /article-abstract/8/1/4/5699631 by guest on 22 May 2020

  • we need more insights into the success of nonprobability sample surveys inproducing accurate estimates in bivariate and multivariate analyses, longitudi-nal analyses, and experimental research settings. In addition, we need more re-search into variance estimation and advanced weighting techniques, inparticular with regard to collecting and utilizing weighting variables that arecorrelated with key survey variables and the data-generating process.

    Finally, this report shows that there is great variability across nonprobabilitysample surveys. Therefore, we would like to end this report with a general callfor more transparency in the survey business. For users, researchers and cli-ents, it can be difficult to decide which vendors to trust with data collection. Aslong as many vendors are unwilling to disclose necessary information aboutthe data collection and processing, researchers will remain unable to make in-formed decisions about vendors and will lack the information necessary to un-derstand the limitations of their collected data. The availability of researchreports that outline the methodology used for data collection and manipulationis therefore of utmost importance (Bethlehem 2017, Chapter 11).

    As clients, we can reward vendors who are willing to provide more method-ological information to us. As a practical matter, this requires being explicitabout our needs prior to contracting and in reaching out to a broad set of non-probability sample providers. Another form of action we can take as clients isto ensure that when contracting vendors who belong to an organization withrelevant standards, such as ESOMAR, or that have ISO 20252 or 26362 certifi-cation, these vendors disclose information as required by the respective codeor certification (see the appendix and Bethlehem [2017, Chapter 12] for moreinformation on relevant transparency guidelines and standards).

    24 Cornesse et al.

    Dow

    nloaded from https://academ

    ic.oup.com/jssam

    /article-abstract/8/1/4/5699631 by guest on 22 May 2020

  • Appendix: Transparency Guidelines

    Various codes of ethics and guidelines address the disclosure of methodologi-cal information for online panels (e.g., Arbeitskreis Deutscher Markt- undSozialforschungsinstitute eV 2001; Interactive Marketing ResearchOrganization [IMRO] 2015; International Organization for Standardization[ISO] 2009, 2012; ESOMAR 2012, 2014, 2015a, 2015b; AmericanAssociation for Public Opinion Research 2015). The lack of informationavailable from some online panel vendors can unfortunately make it impossi-ble for researchers to comply with their own codes or certifications (e.g., ISO2012, §4.5.1.4; American Association for Public Opinion Research 2015,§III.A.5–8, 12, 14). The unwillingness of some vendors to disclose necessaryinformation is unfortunate for all concerned. Consumers of research are de-nied information needed to form an opinion as to the quality of the research.Researchers are unable to make informed decisions about vendors and lack in-formation needed to understand the limitations of their data. Further, vendorsthemselves are unable to benefit from the methodological advances that wouldfollow from greater availability of information on panel operations.

    What can be done? Given the abundance of guidelines—several of whichare particularly helpful (IMRO 2015; ESOMAR 2015a)—there is little needfor another extensive set of recommendations. Therefore, we summarize com-mon reporting recommendations in table A1 and, for the most part, refer thereader to existing guidelines addressing these points (particularly helpful sour-ces are italicized). At times, we make additional recommendations that maygo beyond reporting requirements; these contain the words, “Werecommend.”

    In table A2, we also list relevant case-level data that researchers may wishto ensure that the vendor will be able to provide. Such data will likely be mostuseful to the researcher where they request all data from all cases invited toparticipate in the survey, not only those that completed the survey or were notremoved for data quality reasons.

    Review of Probability and Nonprobability Research 25

    Dow

    nloaded from https://academ

    ic.oup.com/jssam

    /article-abstract/8/1/4/5699631 by guest on 22 May 2020

  • Tab

    leA

    1.R

    epor

    ting

    Rec

    omm

    enda

    tion

    s

    Con

    stru

    ctD

    escr

    iptio

    nR

    efer

    ence

    s

    Uni

    vers

    eW

    hat

    univ

    erse

    does

    the

    pane

    lre

    pres

    ent?

    How

    wel

    ldo

    esth

    epa

    nelc

    over

    that

    univ

    erse

    ?A

    mer

    ican

    Ass

    ocia

    tion

    for

    Pub

    licO

    pini

    onR

    esea

    rch

    (201

    5,§I

    II.A

    .3);

    ES

    OM

    AR

    (201

    4,§5

    .2);

    ISO

    (200

    9,§4

    .4.1

    ,§4.

    7);I

    SO

    (201

    2,§4

    .5.1

    .4,§

    7.2.

    e,§7

    .2.f

    .2)

    Sam

    plin

    gfr

    ame

    Wha

    tm

    etho

    dor

    met

    hods

    are

    used

    tore

    crui

    tpa

    nel

    mem

    bers

    ?A

    mer

    ican

    Ass

    ocia

    tion

    for

    Pub

    licO

    pini

    onR

    esea

    rch

    (201

    5,§I

    II.A

    .5-8

    );E

    SO

    MA

    R(2

    012,

    §2,

    §5);

    ES

    OM

    AR

    (201

    4,§5

    .2);

    ES

    OM

    AR

    (201

    5a,

    §3.7

    );E

    SO

    MA

    R(2

    015b

    ,§5

    .2,

    §6,

    §6.1

    );IM

    RO

    (201

    5,§4

    );IS

    O(2

    012,

    §4.5

    .1.4

    ,§7.

    2.f)

    We

    reco

    mm

    end

    that

    the

    perc

    enta

    geof

    the

    sam

    ple

    recr

    uite

    dfr

    omdi

    ffer

    ent

    sour

    ces

    bein

    clud

    ed(e

    .g.,

    on-

    line

    adve

    rtis

    emen

    t,em

    aill

    ist,

    aggr

    egat

    orw

    ebsi

    te).

    Pan

    elsi

    zeH

    owla

    rge

    isth

    epa

    nel?

    Wha

    tis

    the

    perc

    enta

    geof

    activ

    em

    embe

    rs?

    How

    is“a

    ctiv

    e”de

    fine

    d?IM

    RO

    (201

    5,§1

    4,§1

    7);I

    SO

    (200

    9,§4

    .4.2

    )

    Rep

    leni

    shm

    enta

    ndre

    tirem

    ent

    How

    freq

    uent

    lyar

    ene

    wpa

    nelm

    embe

    rsad

    ded?

    IMR

    O(2

    015,

    §18,

    §19)

    ;IS

    O(2

    009,

    §4.5

    .1)

    Are

    pane

    lmem

    bers

    retir

    edfr

    omth

    epa

    nelb

    yth

    eve

    ndor

    ?If

    so,w

    hatc

    rite

    ria

    are

    used

    tode

    term

    ine

    retir

    emen

    t?W

    hat

    isth

    epa

    nel’

    san

    nual

    “chu

    rnra

    te”

    (the

    perc

    enta

    geof

    all

    pane

    lists

    who

    are

    volu

    ntar

    ilyor

    invo

    lunt

    arily

    re-

    tired

    each

    year

    )?A

    ggre

    gato

    rsi

    tes

    Doe

    sth

    epa

    nel

    draw

    onag

    greg

    ator

    site

    s,w

    hich

    are

    site

    sth

    atal

    low

    resp

    onde

    nts

    tose

    lect

    from

    mul

    tiple

    surv

    eys

    for

    whi

    chth

    eym

    ayqu

    alif

    y(s

    eeIM

    RO

    2015

    )

    IMR

    O(2

    015,

    §6)

    Sam

    ple

    from

    othe

    rpa

    nels

    Was

    sam

    ple

    from

    othe

    rpr

    ovid

    ers

    used

    inth

    isst

    udy?

    ES

    OM

    AR

    (201

    2,§4

    )

    Ble

    ndin

    gIf

    ther

    eis

    mor

    eth

    anon

    esa

    mpl

    eso

    urce

    ,ho

    war

    eth

    ese

    blen

    ded

    toge

    ther

    ?E

    SO

    MA

    R(2

    012,

    §3);

    ES

    OM

    AR

    (201

    5b,§

    6.1)

    26 Cornesse et al.

    Dow

    nloaded from https://academ

    ic.oup.com/jssam

    /article-abstract/8/1/4/5699631 by guest on 22 May 2020

  • Sam

    ple

    draw

    nfr

    omfr

    ame

    How

    was

    the

    sam

    ple

    sele

    cted

    from

    the

    pane

    l?A

    nyqu

    o-ta

    sor

    othe

    rsp

    ecifi

    catio

    nsus

    edin

    sele

    ctio

    nof

    the

    sam

    -pl

    esh

    ould

    besp

    ecifi

    ed.H

    owm

    any

    units

    wer

    edr

    awn?

    ES

    OM

    AR

    (201

    2,§7

    );E

    SO

    MA

    R(2

    015a

    ,§3

    .7);

    ES

    OM

    AR

    (201

    5b,

    §6.2

    );IM

    RO

    (201

    5,§2

    3);

    ISO

    (200

    9,4.

    6.1)

    ;IS

    O(2

    012,

    4.5.

    1.4)

    Rou

    ters

    Was

    aro

    uter

    used

    ?If

    so,

    how

    did

    the

    vend

    orde

    term

    ine

    whi

    chsu

    rvey

    sar

    eco

    nsid

    ered

    for

    apa

    rtic

    ipan

    t?W

    hat

    isth

    epr

    iori

    tyba

    sis

    for

    allo

    catin

    gpa

    rtic

    ipan

    tsto

    sur-

    veys

    ?H

    owis

    pote

    ntia

    lbi

    asfr

    omth

    eus

    eof

    aro

    uter

    addr

    esse

    d?

    Am

    eric

    anA

    ssoc

    iatio

    nfo

    rP

    ublic

    Opi

    nion

    Res

    earc

    h(2

    015,

    §III

    .A.1

    4);

    ES

    OM

    AR

    (201

    2,§8

    -§11

    );E

    SO

    MA

    R(2

    015a

    ,§3.

    7);E

    SO

    MA

    R(2

    015b

    ,§6.

    2)

    Iden

    tity

    valid

    atio

    nH

    owdo

    esth

    eve

    ndor

    confi

    rmpa

    rtic

    ipan

    ts’

    iden

    titie

    s?H

    owdo

    esth

    eve

    ndor

    dete

    ctfr

    audu

    lent

    part

    icip

    ants

    ?A

    mer

    ican

    Ass

    ocia

    tion

    for

    Pub

    licO

    pini

    onR

    esea

    rch

    (201

    5,§I

    II.A

    .17)

    ;E

    SO

    MA

    R(2

    012,

    §22)

    ;E

    SO

    MA

    R(2

    015a

    ,§3.

    8);

    ES

    OM

    AR

    (201

    5b,§

    6.1)

    ;IM

    RO

    (201

    5,§2

    4);I

    SO

    (200

    9,§4

    .3.4

    .2,§

    4.3.

    4.3)

    With

    in-p

    anel

    dedu

    plic

    atio

    nW

    hatp

    roce

    dure

    sdo

    esth

    epa

    nele

    mpl

    oyto

    ensu

    reth

    atno

    rese

    arch

    part

    icip

    antc

    anco

    mpl

    ete

    the

    surv

    eym

    ore

    than

    once

    ?

    Am

    eric

    anA

    ssoc

    iatio

    nfo

    rP

    ublic

    Opi

    nion

    Res

    earc

    h(2

    015,

    §III

    .A.1

    7);

    ES

    OM

    AR

    (201

    5a,

    §3.2

    ,§3

    .8);

    IMR

    O(2

    015,

    §25)

    Cro

    ss-p

    anel

    dedu

    plic

    atio

    nIf

    mul

    tiple

    sam

    ple

    sour

    ces

    are

    used

    ,ho

    wdo

    esth

    epa

    nel

    ensu

    reth

    atno

    rese

    arch

    part

    icip

    ant

    can

    com

    plet

    eth

    esu

    rvey

    mor

    eth

    anon

    ce?

    Am

    eric

    anA

    ssoc

    iatio

    nfo

    rP

    ublic

    Opi

    nion

    Res

    earc

    h(2

    015,

    §III

    .A.1

    7);

    ES

    OM

    AR

    (201

    2,§3

    );E

    SO

    MA

    R(2

    015a

    ,§3.

    8);I

    MR

    O(2

    015,

    §29)

    Cat

    egor

    yex

    clus

    ion

    How

    are

    pane

    lm

    embe

    rsw

    hoha

    vere

    cent

    lyco

    mpl

    eted

    asu

    rvey

    onth

    esa

    me

    topi

    ctr

    eate

    d?If

    they

    are

    excl

    uded

    ,fo

    rho

    wlo

    ngor

    usin

    gw

    hatr

    ules

    ?

    IMR

    O(2

    015,

    §15)

    ;IS

    O(2

    009,

    §4.6

    .2)

    Par

    ticip

    atio

    nlim

    itsH

    owof

    ten

    can

    apa

    nel

    mem

    ber

    beco

    ntac

    ted

    tota

    kepa

    rtin

    asu

    rvey

    ina

    spec

    ified

    peri

    od?

    ES

    OM

    AR

    (201

    2,§1

    9,§2

    0);

    ES

    OM

    AR

    (201

    5a,

    §3.8

    );IM

    RO

    (201

    5,§1

    6);I

    SO

    (200

    9,§4

    .5.1

    ,§4.

    6.2) C

    ontin

    ued

    Review of Probability and Nonprobability Research 27

    Dow

    nloaded from https://academ

    ic.oup.com/jssam

    /article-abstract/8/1/4/5699631 by guest on 22 May 2020

  • Tab

    leA

    1.C

    ontin

    ued

    Con

    stru

    ctD

    escr

    iptio

    nR

    efer

    ence

    s

    Oth

    erda

    taqu

    ality

    chec

    ksW

    hatc

    heck

    sar

    em

    ade

    for

    satis

    fici

    ngan

    dfr

    aud?

    Am

    eric

    anA

    ssoc

    iatio

    nfo

    rP

    ublic

    Opi

    nion

    Res

    earc

    h(2

    015,

    §III

    .A.1

    7);

    ES

    OM

    AR

    (201

    2,§1

    8);

    ES

    OM

    AR

    (201

    5a,

    §3.3

    ,§3

    .8);

    ES

    OM

    AR

    (201

    5b,

    §6,

    §6.1

    ,§6

    .4);

    IMR

    O(2

    015,

    §30)

    ;IS

    O(2

    009,

    §4.6

    .6,

    §4.7

    );IS

    O(2

    012,

    §7.2

    .k)

    The

    sem

    ight

    incl

    ude

    chec

    ksfo

    rsp

    eedi

    ngan

    dpa

    ttern

    edre

    spon

    ses,

    trap

    ques

    tions

    ,cou

    nts

    ofm

    issi

    ngor

    nons

    ub-

    stan

    tive

    resp

    onse

    s(e

    .g.,

    don’

    tkn

    ow),

    devi

    cefi

    nger

    -pr

    intin

    gor

    logi

    cch

    ecks

    .P

    anel

    profi

    leH

    owof

    ten

    are

    profi

    les

    upda

    ted?

    IMR

    O(2

    015,

    §21)

    Fie

    ldw

    ork

    date

    sW

    hen

    was

    the

    surv

    eyfi

    elde

    d?If

    ther

    ew

    ere

    mul

    tiple

    sam

    ple

    sour

    ces,

    did

    the

    fiel

    dpe

    riod

    vary

    betw

    een

    sour

    ces?

    Am

    eric

    anA

    ssoc

    iatio

    nfo

    rP

    ublic

    Opi

    nion

    Res

    earc

    h(2

    015,

    §III

    .A.4

    );E

    SO

    MA

    R(2

    014,

    §5.2

    );E

    SO

    MA

    R(2

    015a

    ,§3

    .8);

    ES

    OM

    AR

    (201

    5b,

    §6.3

    );IS

    O(2

    009,

    §4.7

    );IS

    O(2

    012,

    §4.8

    .3,§

    7.2.

    g)In

    vita

    tions

    and

    rem

    inde

    rsW

    hatt

    exti

    sus

    edfo

    rin

    vita

    tions

    and

    any

    rem

    inde

    rs?

    Am

    eric

    anA

    ssoc

    iatio

    nfo

    rP

    ublic

    Opi

    nion

    Res

    earc

    h(2

    015,

    §III

    .A.1

    6);

    ES

    OM

    AR

    (201

    2,§1

    3);

    ISO

    (200

    9,§4

    .7)

    We

    reco

    mm

    end

    disc

    losi

    ngth

    em

    axim

    umnu

    mbe

    rof

    rem

    inde

    rsse

    nt.

    Ince

    ntiv

    esW

    hati

    ncen

    tives

    ,if

    any,

    are

    offe

    red

    topa

    nelm

    embe

    rs?

    Am

    eric

    anA

    ssoc

    iatio

    nfo

    rP

    ublic

    Opi

    nion

    Res

    earc

    h(2

    015,

    §III

    .A.1

    6);

    ES

    OM

    AR

    (201

    2,§1

    4);

    ES

    OM

    AR

    (201

    5a,

    §3.7

    );E

    SO

    MA

    R(2

    015b

    ,§6

    .1-2

    );IM

    RO

    (201

    5,§2

    0);I

    SO

    (200

    9,§4

    .5.2

    );IS

    O(2

    012,

    §7.2

    .i)Q

    uest

    ionn

    aire

    Inth

    eev

    ent

    that

    the

    rese

    arch

    erdi

    dno

    tcr

    eate

    the

    ques

    -tio

    nnai

    re,

    the

    ques

    tionn

    aire

    shou

    ldbe

    avai

    labl

    e,in

    -cl

    udin

    gan

    ysc

    reen

    ing

    ques

    tions

    .

    Am

    eric

    anA

    ssoc

    iatio

    nfo

    rP

    ublic

    Opi

    nion

    Res

    earc

    h(2

    015,

    §III

    .A.2

    ,§I

    II.A

    .14-

    15);

    ES

    OM

    AR

    (201

    4,§5

    .2);

    ES

    OM

    AR

    (201

    5a,

    §3.8

    );E

    SO

    MA

    R(2

    015b

    ,§5

    .2,§

    6.3)

    ;IS

    O(2

    009,

    §4.7

    );IS

    O(2

    012,

    §7.2

    .l)A

    sw

    ebsu

    rvey

    sar

    ean

    inte

    nsel

    yvi

    sual

    med

    ium

    (e.g

    .,C

    oupe

    r,T

    oura

    ngea

    u,an

    dK

    enyo

    n20

    04an

    dC

    oupe

    r,T

    oura

    ngea

    u,C

    onra

    d,an

    dC

    raw

    ford

    2004

    )w

    ere

    com

    -m

    end

    that

    scre

    ensh

    ots

    bepr

    ovid

    edas

    part

    ofre

    port

    ing.

    28 Cornesse et al.

    Dow

    nloaded from https://academ

    ic.oup.com/jssam

    /article-abstract/8/1/4/5699631 by guest on 22 May 2020

  • Fin

    aldi

    spos

    ition

    san

    dou

    tcom

    era

    tes

    How

    man

    ypa

    nel

    mem

    bers

    wer

    ein

    vite

    dto

    com

    plet

    eth

    esu

    rvey

    ?W

    hat

    isth

    ebr

    eak-

    off

    rate

    ?H

    owar

    eth

    ese

    calc

    ulat

    ed?

    Am

    eric

    anA

    ssoc

    iatio

    nfo

    rP

    ublic

    Opi

    nion

    Res

    earc

    h(2

    015,

    §III

    .A.1

    8);

    ES

    OM

    AR

    (201

    5a,

    §3.7

    -8);

    ES

    OM

    AR

    (201

    5b,

    §6.1

    );IM

    RO

    (201

    5,§2

    6);

    ISO

    (200

    9,§4

    .7);

    ISO

    (201

    2,7.

    2.f.

    4)W

    ere

    com

    men

    dth

    atth

    ede

    fini

    tions

    ofou

    tcom

    era

    tes

    de-

    velo

    ped

    byC

    alle

    garo

    and

    DiS

    ogra

    (200

    8)an

    dD

    iSog

    raan

    dC

    alle

    garo

    (201

    6)be

    used

    ;th

    ese

    are

    larg

    ely

    inco

    r-po

    rate

    din

    Am

    eric

    anA

    ssoc

    iatio

    nfo

    rP

    ublic

    Opi

    nion

    Res

    earc

    h(2

    015)

    .H

    andl

    ing

    ofm

    obile

    devi

    ces

    /sm

    artp

    hone

    s

    We

    reco

    mm

    end

    docu

    men

    ting

    whe

    ther

    the

    ques

    tionn

    aire

    layo

    utis

    adap

    ted/

    optim

    ized

    for

    smar

    tpho

    neco

    mpl

    e-tio

    nan

    d,if

    so,h

    owit

    isad

    apte

    dor

    optim

    ized

    .

    Not

    refe

    renc

    edin

    any

    guid

    elin

    e

    The

    reis

    anex

    tens

    ive

    liter

    atur

    eon

    met

    hods

    for

    form

    at-

    ting

    web

    surv

    eys

    onm

    obile

    devi

    ces

    and

    impl

    icat

    ions

    for

    data

    qual

    ity(e

    .g.,

    Pey

    tche

    van

    dH

    ill20

    10;

    Chr

    zan

    and

    Sau

    nder

    s20

    12;

    deB

    ruijn

    ean

    dW

    ijnan

    t20

    13a,

    2013

    b;L

    ink,

    Mur

    phy,

    Sch

    ober

    ,B

    uski

    rk,

    Hun

    ter

    Chi

    lds

    2014

    ;M

    avle

    tova

    and

    Cou

    per

    2014

    ,201

    6;A

    rn,

    Klu

    g,an

    dK

    olod

    ziej

    ski2

    015;

    Str

    umin

    skay

    a,W

    eyan

    dt,

    and

    Bos

    njak

    2015

    ;L

    ugtig

    and

    Toe

    poel

    2016

    ;R

    evill

    a,T

    onin

    elli,

    and

    Och

    oa,

    2016

    ;R

    evill

    a,T

    onin

    elli,

    and

    Och

    oa20

    16;

    Cou

    per,

    Ant

    oun,

    and

    Mav

    leto

    va20

    17;

    Pet

    erso

    n,G

    riffi

    n,L

    aFra

    nce,

    and

    Li

    2017

    ;A

    ntou

    n,K

    atz,

    Arg

    ueta

    ,an

    dW

    ang

    2018

    ).A

    lthou

    ghno

    tin

    cor-

    pora

    ted

    into

    othe

    rgu

    idel

    ines

    ,it

    isim

    port

    ant

    that

    cli-

    ents

    are

    suffi

    cien

    tlyin

    form

    edas

    hand

    ling

    mob

    ilede

    vice

    sm

    ayha

    veco

    nseq

    uenc

    esfo

    rbr

    eak-

    off

    and

    mea

    sure

    men

    t

    Review of Probability and Nonprobability Research 29

    Dow

    nloaded from https://academ

    ic.oup.com/jssam

    /article-abstract/8/1/4/5699631 by guest on 22 May 2020

  • Table A2. Case-Level Data Recommendations

    Construct Notes

    Sample source Sample source (e.g., list, web advertisement)Within-survey paradata Date-time survey started and completed

    Duration between survey started and completedFrequency of invitations to surveys: each week, etc.Absence of invitations to concurrent surveys during a

    certain time (e.g., 2 consecutive weeks after the first in-vitation to participate in the survey)

    Device information provided for each session for surveyscompleted in multiple sessions:User agent string or equivalent information (e.g.,

    operating system and