Online Survey Sample and Data Quality Protocols · ONLINE SURVE SAMPLE AND DATA QUALIT PROTOCOLS...
Transcript of Online Survey Sample and Data Quality Protocols · ONLINE SURVE SAMPLE AND DATA QUALIT PROTOCOLS...
sotech.com | 800-576-2728 1sotech.com | 800-576-2728
A Marketing Research Consultancy
Online Survey Sample and Data Quality Protocols
Socratic Technologies,
Inc., has developed
sophisticated sample
scanning and quality
assessment programs
to identify and correct
problems that may
lead to reduced data
reliability and bias.
sotech.com | 800-576-2728 2
ONLINE SURVEY SAMPLE AND DATA QUALITY PROTOCOLS
Sample and Data QualityHistorical PerspectiveFrom the earliest days of research, there have been problems with sample quality (i.e., poor recruiting, inaccurate screening, bias in sample pools, etc.) and potential respondents have attempted to submit multiple surveys (paper and pen-cil), lied to get into compensated studies (mall intercepts and focus groups) and displayed lazy answering habits (all forms of data collection).
In the age of Internet surveying, this is becoming a highly discussed topic because we now have the technology to measure sample problems and we can detect exactly how many people are involved in “bad survey behaviors.” While this puts a keen spotlight on the nature of problems, we also have the technology to correct many of these issues in real time.
So, because we are now aware of potential issues, we are better prepared than at any time in the past to deal with threats to data quality. This paper will detail the steps and procedures that we use at Socratic Technologies to ensure the highest data quality by correcting problems in both sample sourcing and bad survey behavior.
Sample Sources & Quality ProceduresThe first line of defense in overall data quality is the sample source. Catching problems begins with examining the way panels are recruited.
According to a variety of industry sources, preidentified sample sources (versus Web
intercepts using pop-up invitations or ban- ner ads) now account for almost 80% of U.S. online research participants (and this proportion is growing). Examples include:
• Opt-in lists
• Customer databases
• National research panels
• Private communities
A common benefit of all of these sources is that they include a ready-to-use data-base from which a random or predefined sample can be selected and invited. In addition, prerecruitment helps to solidify the evidence of an opt-in permission for contact or to more completely establish an existing business relationship —at least one of which is needed to meet the requirements of email contact under the federal CAN-SPAM Act of 2003.
In truth, panels of all kinds contain some level of bias driven by the way recruit-ment strategy is managed. At Socratic we rely on panels that are recruited primarily through direct invitation. We exclude sample sources that are recruited using a “get-paid-for-taking-surveys” approach. This ensures that the people who we are inviting to our surveys are not participating for strictly mercenary purposes—which has been shown to distort answers (i.e., answering questions in such a way as to “please” the researcher in exchange for future monetary rewards).
In addition, we work with panel partners who undertake thorough profile verifica-tion and database cleaning procedures on an ongoing basis.
Panels and sample
sources are like wine:
If you start with poor
grapes, no matter what
the skill of the wine-
maker, the wine is still
poor. How panels are
recruited determines
the long-run quality of
the respondents they
produce.
sotech.com | 800-576-2728 3
ONLINE SURVEY SAMPLE AND DATA QUALITY PROTOCOLS
Our approved panel partners regularly scan databases for:
• Unlikely duplicated Internet server addresses
• Series of similar addresses (abc@hot-mail, bcd@hotmail, cde@hotmail, etc.)
• Replicated mailing addresses (for incentive checks)
• Other data that might indicate multiple sign-ups by the same individual
• Impossible changes to profiling infor-mation (e.g., a 34-year-old woman becoming an 18-year-old man)
• Lack of responsiveness (most drop panelists if they fail to respond to five invitations in a row)
• Non-credible qualifications (e.g., per-sons who consistently report ownership or experience with every screening option)
• A history of questionable survey behav-ior (see “Cheating Probability Score” later in this document)
Figure 1: Socratic Technologies panel development Web site
sotech.com | 800-576-2728 4
Socratic’s Network of Global Panel ProvidersThe following list details the panel partners (subject to change) to whom we regularly turn for recruitment on a global basis.
VENDOR NAME COUNTRIES
3D interactive.com Australia
42 Market Research France
Accurate Market Research Mexico
Adperio US
Advaith Asia
AG3 Brazil, Argentina, Mexico, Chile
AIP Corporation Asia
Alterecho Belgium
Amry Research Russia, Ukraine
ARC Poland
Aurora UK
Aussie Survey UK & Australia
Authentic Response US
Beep World Austria, Switzerland, Germany
BestLife LATAM
Blueberries Israel
C&R Research Services, Inc. US
Campus Fund Raiser US
Cint All Countries
Clear Voice / Oceanside All Countries
Community View India
Corpscan India
Cotterweb US
Data Collect Czech Republic
Delvinia Canada
EC Global Panel LATAM, US
Eksen Turkey
Embrain Co. Asia
Empanel US
Empathy Panel Ireland
Empowered Comm. Australia
ePanel Marketing Research China
Erewards/ ResearchNOW All Countries
Esearch US, Canada, UK
VENDOR NAME COUNTRIES
EuroClix B.V. Panelcliz Netherlands
Flying Post UK, France, Germany
Focus Forward US
Gain Japan
Garcia Research Associates US
GMI All Countries
HRH Greece
IID Interface in Design Asia
Inquision South Africa, Turkey
Insight CN China, Hong Kong
Inzicht Netherlands, Belgium, France
iPanelOnline Asia
Ithink US
Itracks Canada
Ivox Belgium
Lab 42 All Countries
Lightspeed Research Italy, Spain, Germany, (UK Kantar Group) Australia, New Zealand, Netherlands, France, Sweden, UK, Switzerland
Livra LATAM
Luth Research US
M3 Research Nordics
Maktoob Research Middle East
Market Intelligence US, EU
Market Tools US, UK, France, Canada, Australia
Market-xcel India, Singapore
Masmi Hungary, Russia, Ukraine
Mc Million US
Mo Web EU
My Points US, Canada
Nerve planet India, China, Japan
Net, Intelligence & Research Korea
Netquest Portugal, Spain
ODC Service Italy, France, Germany, Spain, UK
VENDOR NAME COUNTRIES
Offerwise US
OMI Russia, Ukraine
Opinion Health US
Opinion Outpost/SSI All Countries
Opinions UAE, Saudi Arabia
Panel Base UK
Panel Service Africa South Africa
Panelbiz EU
Panthera Interactive All Countries
Precision Sample US
Public Opinious Canada
Pure Profile UK, US, Australia
Quick Rewards Russia, Ukraine, US, UK
Rakuten Research Japan
Resulta Asia
RPA Asia
Sample Bus Asia
Schlesinger Assoc. US
Seapanels Asia
Spec Span US
Spider Metrix Australia, UK, Canada, New Zealand, South Africa, US
STR Center All Countries
Telkoma South Africa
Testspin/WorldWide All Countries
Think Now Research US
TKL Interactive US
TNS New Zealand New Zealand
Toluna All Countries
United sample All Countries
Userneeds Nordics
uthink Canada
WebMD Market Research Services US
World One Research US, France, Germany, Spain
YOC Germany
YOUMINT India
Zapera (You Gov) All Countries
ONLINE SURVEY SAMPLE AND DATA QUALITY PROTOCOLS
Unlike other data
collection modes, the
server technology
used in online surveys
gives the researcher
far more control over
in-process problems
related to cheating and
lazy behavior.
Anti-Cheating ProtocolsAs a first step in identifying and rejecting bad survey behavior, we need to differen-tiate between Cheating and Lazy Behavior. The solutions Socratic uses for handling each type of problem differ by class of delinquency.
Cheaters attempt to enter a survey multi-ple times in order to:
• Collect compensation
• Sabotage results
Lazy folks don’t really think, and they do the least amount of work in order to:
• Receive compensation
• Avoid the burden, boredom or fatigue of long, repetitious, difficult surveys
Many forms of possible cheating and lazy respondent behaviors can be detected using server-based data and response pattern recognition technologies. In some cases, bad respondents are immediately detected and rejected before they even begin the survey. This is critical for qual-ity, because “illegitimate” or “duplicat-ed” respondents decrease the value of every completed interview. Sometimes, we allow people to enter the survey, but then use pattern recognition software to detect “answer sequences” that warrant “tagging and bagging.” Note: While we inform cheaters that they’re busted and won’t be getting any incentive, we don’t tell them how they were caught!
One of our key tools in assessing the quality of a respondent is the Socratic Cheating Probability Score (CPS). A CPS looks at many possible problems and
classifies the risk associated with accept-ing an interview as “valid and complete.”
However, we also need to be careful not to use a “medium probability score” as an automatic disqualifier. Just because the results are not what we expect, doesn’t mean they are wrong! Margin-al scores should be used to “flag” an interview, which should then be reviewed before rejecting. High scores are usually rejected mid-survey before the respon-dent is qualified as having “completed.”
Here are some examples of how we use technology to detect and reject common respondent problems:
Repeat Survey AttemptsSome cheaters simply attempt to retake surveys over and over again. These are the easiest to detect and reject. To avoid self-selection bias, most large surveys today are done “by customized invitation” (CAN-SPAM 2003) and use a “handshake” protocol. Preregistering individuals with verified profiling data in order to establish a double or triple opt-in status.
Cheaters Solutions: Handshake ProtocolsA handshake protocol entails generating a unique URL suffix-code, which is used for the link to the survey in the email invitation. It is tied to a specific individu-al’s email address and/or panel member identification. Once it is marked as “com-plete” in the database no other sub-missions are permitted on that person’s account. An example of this random suffix code is as follows:
http://sotechsurvey.com/survey/?pid= wx54Dlo1
sotech.com | 800-576-2728 5
ONLINE SURVEY SAMPLE AND DATA QUALITY PROTOCOLS
sotech.com | 800-576-2728 6
ONLINE SURVEY SAMPLE AND DATA QUALITY PROTOCOLS
Supplementing the invitation handshake, a cookie check is utilized. At the start of all surveys, Socratic looks for a cookie bearing that survey_id and, if it is found, the user will not be allowed to take the survey again. The respondent ID is immediately blocked, so that even if the respondent removes the cookie later on, he or she still won’t be allowed back in. At the time a survey is finished (complete or termination), a cookie with the survey_id will be placed on the user’s machine.
But cookie checks are no longer suffi-cient by themselves to prevent multiple submission attempts. More advanced identification is needed.
For a more advanced identification verifi-cation, Socratic utilizes an IP & Browser Config Check. This is a server-level test that is invisible to the respondent. Whenever a person’s browser hits a Web site, it exchanges information with the Web server in order for the Web pages (or survey pages) to display correctly. For responses to all surveys, a check can be made for multiple elements:
IP Address
The first level of validation comes from checking the IP address of the respondent’s computer. IP addresses are usually gen-erated based on a tightly defined geogra-phy. So if someone is supposed to be in California, and their IP address indicates a China-based service, this would be flagged as a potential cheating attempt.
Browser String
Each browser sends a great deal of
information about the user’s system to the survey server. These strings are then logged and subsequent survey attempts are compared to determine whether exact matches are occurring. These are exam-ples of browser strings:
• Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; FunWebProd-ucts; Advanced Searchbar; .NET CLR 1.1.4322; .NET CLR 1.0.3705; KelkooToolbar 1.0.0)
• Mozilla/4.0 (compatible; MSIE 6.1; Windows 95/98/NT/ME/2000/XP; 10290201-SDM; SV1; .NET CLR 1.0.3)
Language Setting
Another browser-based information set that is transmitted consists of the lan-guage settings for the user’s system. These, too, are logged and compared to subsequent survey attempts:
• en-us,x-ns1pG7BO_dHNh7,x-ns2U3
• en-us,en;q=0.8,en-gb;q=0.5,sv;
• zh-cn;q=1.0,zh-hk;q=0.9,zh-tw;
• en-us, ja;q=0.90, ja-jp;q=0.93
Internal Clock Setting
Finally, the user’s computer system has an internal time-keeping function that continuously monitors the time of day and date out to a number of decimal points. Each user’s computer will vary slightly even within the same time zone or within the same company’s system.
When these four measurement are taken together, the probability of two exact settings on all readable elements is extremely low.
Technology for cheating
in online surveys has
proliferated over the
past 10 years and in
some areas of the world
has become a cottage
industry. However,
with the correct server
technology, Socratic
can detect the profiles
of cheating applications
and thwart them in real
time, prior to completing
a survey.
sotech.com | 800-576-2728 7
ONLINE SURVEY SAMPLE AND DATA QUALITY PROTOCOLS
Techno-CheatersSome cheaters are caught because they are trying to use technology to submit multiple surveys. Form populators and keystroke replicators are examples of auto-fill technologies.
Techno-Cheaters SolutionsTotal automation can be thwarted by creating non-machine readable code keys that are used at the beginning of a survey to make sure a human being is respond-ing versus a computer “bot.” We refer to this as a Handshake Code Key Protocol.
One of the most popular Handshake Code Key Protocols is CAPTCHA. To prevent bots and other automated form completers from entering our surveys, a distorted image of a word or number can be displayed on the start screen of all Socratic projects. In order to gain access to a survey, the user has to enter the word or number shown in the image into a text box; if the result does not match the image, the user will not be allowed to enter the survey. (Note: Some dispensa-tion and use of alternative forms of code keys are available for visually impaired individuals.)
As computers become more and more sophisticated in their ability to detect patterns, the CAPTCHA distortions have become more complex.
Images that can be “read” by image recognition bots (as of 2013)
Images that cannot be “read” by image recognition bots
Images adapted from Mori & Malik, ca. 2003, Breaking a visual CAPTCHA, http://www.cs.berkeley .edu/~mori/gimpy/gimpy. html
Lazy behavior is far more
prevalent as a survey
problem than outright
cheating—primarily
because it’s easier to
defeat cheaters than
people who aren’t paying
attention. With new,
more sophisticated
algorithms, however,
it is now possible to
limit the influence of
lazy respondents in
mid-survey.
sotech.com | 800-576-2728 8
ONLINE SURVEY SAMPLE AND DATA QUALITY PROTOCOLS
Lazy Respondent BehaviorA far more common problem with survey takers across all modes of data collection are people who just don’t take the time and effort to answer questions carefully. This can result in rushed surveys or those with replicated pattern issues.
There are several reasons why respondents don’t pay attention:
• Problem 1: Just plain lazy
• Problem 2: Survey design is torturous
●● Too long
●● Boring/repetitious
●● Too difficult
●● Not enough compensation
●● No affinity with sponsor
But whatever the reason for lazy behavior, the symptoms are similar, and the pre-ventative technologies are the same.
Speeders
In the case of rushed survey respondents (“speeders”), speed of submission can be used to detect surveys completed too quickly. One statistical metric that Socratic uses is the Minimum Survey Time Thresh-old. By adapting a normative formula for estimating the length of a survey based on the number of various types of ques-tions, one can calculate an estimated time to completion and determine if actu-al time to completion is significantly lower. This test is run at a predetermined point, before the survey has been completed.
Based on the time since starting, the number of closed-ended questions, and the number of open-ended questions, a
determination will be made as to whether the respondent has taken an adequate amount of time to answer the questions.
If (Time < (((# of CEs * secs/CE) + (# of OEs * secs/OE)) * 0.5)) Then FLAG.
Replicated PatternsAnother common problem caused by lazy behavior is the appearance of patterned answers throughout a survey (e.g., choos-ing the first answer for every question, or selecting a single rating point for all attri- butes). These are fairly easy to detect and the respondent can be “intercepted” in mid-survey and asked to reconsider pat-terned sequences. Socratic uses Pattern Recognition Protocols within a survey to detect and correct these types of problems.
Here are some of the logic-based solutions we apply for common patterning problems:
• XMas Treeing: This technique will iden-tify those who “zig-zag” their answers (e.g., 1, 2, 3, 4, 5, 4, 3, 2, 1, etc.)
●● How to: When all attributes are completed, take the absolute value of all attribute-to-attribute differences. If the mean value is close to 1 you should flag them.
• Straight-Lining: This technique will identify those who straight-line answers to a survey (e.g., taking the first choice on an answer set or entering 4, 4, 4, 4, 4, 4 on a matrix, etc.)
●● How to: Subtract each attribute (SubQuestion) from the previous and keep a running total. When all attributes are completed take the absolute value. If the mean value is 0 you should flag them.
The majority of problems
related to data quality
can be detected before
a survey is completed.
However, a variety of
ongoing checks can add
even more assurance
that respondents are who
they claim to be and are
located in the correct
location. Panel cleaning
is necessary for long-run
viability.
sotech.com | 800-576-2728 9
ONLINE SURVEY SAMPLE AND DATA QUALITY PROTOCOLS
Random AnswersWhile these Pattern Recognition Protocols pick up many common problems, they cannot detect random answer submission (e.g., 1, 5, 3, 2, 5, 4, 3, 1, 1, etc.). For this we need another type of logic: Con-vergent/Divergent Validity tests.
This type of test relies on the assumption that similar questions should be an-swered in a similar fashion and polar op-posites should receive inverse reactions. For example, if someone strongly agrees that a product concept is “expensive,” he or she should not also strongly agree that the same item is “inexpensive.” When these types of tests are in place, the survey designer has some flexibility to in-tercept a survey with “validity issues” and request that the respondent reconsider his or her answers.
Cross-Survey Answer Block Sequences Occasionally, other anti-cheating/anti-la-zy behavior protocols will fail to detect a well-executed illegitimate survey. For this purpose, Socratic also scans for repeated sequences using a Record Comparison Algorithm. Questionnaires are contin-uously scanned, record-to-record, for major blocks of duplicated field contents (e.g., >65% identical answer sequences). Note: Some level of discretion will be needed on surveys for which great simi-larities of opinion or homogeneity in the target population are anticipated.
Future development is also planned to scan open-ended comments for dupli-
cated phrases and blocks of similar text within live surveys. Currently, this can only be done post hoc.
Post-Survey Panel CleaningPost-Survey DetectionFor the panels managed by Socratic Technologies, the quality assurance program extends beyond the sample cleaning and mid-survey error testing. We also continuously monitor issues that can only be detected post hoc.
Address VerificationEvery third or fourth incentive payment should be made by check, or a notice mailed to a physical address. If people want their reward, they have to drop any aliases or geographic pretext in order for delivery to be completed, and often times you can catch cheaters prior to distribu-tion of an incentive. Of course, dupli-cated addresses, P.O. boxes, etc., are a give-away. We also look for slight name derivatives not usually caught by banks, including:
• nick names (Richard Smith and Dick Smith)
• use of initials (Richard Smith and R. Smith)
• unusual capitalization (Richard Smith and RiCHard SmiTH)
• small misspellings (Richard Smith and Richerd Smith)
sotech.com | 800-576-2728 10
ONLINE SURVEY SAMPLE AND DATA QUALITY PROTOCOLS
ConclusionMany features and security checks are now available for assuring the validity of modern online research. This includes pre-survey panel quality, mid-survey cheating and lazy behavior detection and post-survey panel cleaning.
With these technologies in place, online research can now be more highly regulat-ed than any other form of data collection.
Not all survey bad behavior is malicious; some is driven by poor survey design. Some discretion will always be a require-ment of survey usability:
• Writing screeners that don’t telegraph qualification requirements
• Keeping survey length and burden to a reasonable level
• Minimizing the difficulty of compliance
• Enhancing the engagement levels of boring tasks
• Maximizing the communication that par-ticipation is worthwhile and appreciated
While Socratic’s techniques can flag pos-sible cheating or lazy behavior, we believe that the analyst should not just auto-matically reject interviews, but examine marginal cases for possible validity.
sotech.com | 800-576-2728 11
CONTACT
Socratic Technologies, Incorporated, is a leader in the science of computer-based and interactive research methods. Founded in 1994 and headquartered in San Francisco, it is a research-based consultancy that builds proprietary, interactive tools that accelerate and improve research methods for the study of global markets. Socratic Technologies specializes in product development, brand articulation, and advertising research for the business-to-business and consumer products sectors.
Registered Trademarks, Salesmarks and Copyrights The following product and service descriptors are protected and all rights are reserved. Brand Power RatingTM, BPRTM, Brand Power IndexTM, CATM, Configurator AnalysisTM, Customer Risk Quadrant AnalysisTM, NCURATM, ReportSafeTM, reSearch EngineTM, SABRTM, Site-Within-SurveyTM, Socratic CollageBuilderSM, Socratic ClutterBookSM, Socratic BrowserSM, Socratic BlurMeterSM, Socratic CardSortSM, Socratic ColorModelerSM, Socratic CommuniScoreTM, Socratic Forum®, Socratic CopyMarkupSM, Socratic Te-ScopeSM, Socratic PerceptometerSM, Socratic Usability LabSM, The Bruzzone ModelTM, Socratic ProductExhibitorSM, Socratic Concept HighlighterSM, Socratic Site DiagnosticSM, Socratic VirtualMagazineSM, Socratic VisualDifferentiatorSM, Socratic Web BoardsSM, Socratic Web SurveySM 2.0, Socratic WebComm ToolsetSM, SSDSM, Socratic WebPanel ToolsetSM, SWSSM 2.0, Socratic Commitment AnalysisTM, Socratic WebConnectSM, Socratic Advocacy Driver AnalysisTM.
Socratic Technologies, Inc. © 1994–2014. Reproduction in whole or part without written permission is prohibited. Federal law provides severe civil and criminal penalties for unauthorized duplication or use of this material in physical or digital forms, including for internal use. ISSN 1084-2624.
sotech.com | 800-576-2728
San Francisco HeadquartersSocratic Technologies, Inc. 2505 Mariposa Street San Francisco, CA 94110-1424 T 415-430-2200 (800-5-SOCRATIC)
Chicago Regional OfficeSocratic Technologies, Inc. 211 West Wacker Drive, Suite 1500 Chicago, IL 60606-1217 T 312-727-0200 (800-5-SOCRATIC)
Contact Ussotech.com/contact