Online Survey Sample and Data Quality Protocols · ONLINE SURVE SAMPLE AND DATA QUALIT PROTOCOLS...

11
sotech.com | 800-576-2728 A Marketing Research Consultancy Online Survey Sample and Data Quality Protocols

Transcript of Online Survey Sample and Data Quality Protocols · ONLINE SURVE SAMPLE AND DATA QUALIT PROTOCOLS...

sotech.com | 800-576-2728 1sotech.com | 800-576-2728

A Marketing Research Consultancy

Online Survey Sample and Data Quality Protocols

Socratic Technologies,

Inc., has developed

sophisticated sample

scanning and quality

assessment programs

to identify and correct

problems that may

lead to reduced data

reliability and bias.

sotech.com | 800-576-2728 2

ONLINE SURVEY SAMPLE AND DATA QUALITY PROTOCOLS

Sample and Data QualityHistorical PerspectiveFrom the earliest days of research, there have been problems with sample quality (i.e., poor recruiting, inaccurate screening, bias in sample pools, etc.) and potential respondents have attempted to submit multiple surveys (paper and pen-cil), lied to get into compensated studies (mall intercepts and focus groups) and displayed lazy answering habits (all forms of data collection).

In the age of Internet surveying, this is becoming a highly discussed topic because we now have the technology to measure sample problems and we can detect exactly how many people are involved in “bad survey behaviors.” While this puts a keen spotlight on the nature of problems, we also have the technology to correct many of these issues in real time.

So, because we are now aware of potential issues, we are better prepared than at any time in the past to deal with threats to data quality. This paper will detail the steps and procedures that we use at Socratic Technologies to ensure the highest data quality by correcting problems in both sample sourcing and bad survey behavior.

Sample Sources & Quality ProceduresThe first line of defense in overall data quality is the sample source. Catching problems begins with examining the way panels are recruited.

According to a variety of industry sources, preidentified sample sources (versus Web

intercepts using pop-up invitations or ban- ner ads) now account for almost 80% of U.S. online research participants (and this proportion is growing). Examples include:

• Opt-in lists

• Customer databases

• National research panels

• Private communities

A common benefit of all of these sources is that they include a ready-to-use data-base from which a random or predefined sample can be selected and invited. In addition, prerecruitment helps to solidify the evidence of an opt-in permission for contact or to more completely establish an existing business relationship —at least one of which is needed to meet the requirements of email contact under the federal CAN-SPAM Act of 2003.

In truth, panels of all kinds contain some level of bias driven by the way recruit-ment strategy is managed. At Socratic we rely on panels that are recruited primarily through direct invitation. We exclude sample sources that are recruited using a “get-paid-for-taking-surveys” approach. This ensures that the people who we are inviting to our surveys are not participating for strictly mercenary purposes—which has been shown to distort answers (i.e., answering questions in such a way as to “please” the researcher in exchange for future monetary rewards).

In addition, we work with panel partners who undertake thorough profile verifica-tion and database cleaning procedures on an ongoing basis.

Panels and sample

sources are like wine:

If you start with poor

grapes, no matter what

the skill of the wine-

maker, the wine is still

poor. How panels are

recruited determines

the long-run quality of

the respondents they

produce.

sotech.com | 800-576-2728 3

ONLINE SURVEY SAMPLE AND DATA QUALITY PROTOCOLS

Our approved panel partners regularly scan databases for:

• Unlikely duplicated Internet server addresses

• Series of similar addresses (abc@hot-mail, bcd@hotmail, cde@hotmail, etc.)

• Replicated mailing addresses (for incentive checks)

• Other data that might indicate multiple sign-ups by the same individual

• Impossible changes to profiling infor-mation (e.g., a 34-year-old woman becoming an 18-year-old man)

• Lack of responsiveness (most drop panelists if they fail to respond to five invitations in a row)

• Non-credible qualifications (e.g., per-sons who consistently report ownership or experience with every screening option)

• A history of questionable survey behav-ior (see “Cheating Probability Score” later in this document)

Figure 1: Socratic Technologies panel development Web site

sotech.com | 800-576-2728 4

Socratic’s Network of Global Panel ProvidersThe following list details the panel partners (subject to change) to whom we regularly turn for recruitment on a global basis.

VENDOR NAME COUNTRIES

3D interactive.com Australia

42 Market Research France

Accurate Market Research Mexico

Adperio US

Advaith Asia

AG3 Brazil, Argentina, Mexico, Chile

AIP Corporation Asia

Alterecho Belgium

Amry Research Russia, Ukraine

ARC Poland

Aurora UK

Aussie Survey UK & Australia

Authentic Response US

Beep World Austria, Switzerland, Germany

BestLife LATAM

Blueberries Israel

C&R Research Services, Inc. US

Campus Fund Raiser US

Cint All Countries

Clear Voice / Oceanside All Countries

Community View India

Corpscan India

Cotterweb US

Data Collect Czech Republic

Delvinia Canada

EC Global Panel LATAM, US

Eksen Turkey

Embrain Co. Asia

Empanel US

Empathy Panel Ireland

Empowered Comm. Australia

ePanel Marketing Research China

Erewards/ ResearchNOW All Countries

Esearch US, Canada, UK

VENDOR NAME COUNTRIES

EuroClix B.V. Panelcliz Netherlands

Flying Post UK, France, Germany

Focus Forward US

Gain Japan

Garcia Research Associates US

GMI All Countries

HRH Greece

IID Interface in Design Asia

Inquision South Africa, Turkey

Insight CN China, Hong Kong

Inzicht Netherlands, Belgium, France

iPanelOnline Asia

Ithink US

Itracks Canada

Ivox Belgium

Lab 42 All Countries

Lightspeed Research Italy, Spain, Germany, (UK Kantar Group) Australia, New Zealand, Netherlands, France, Sweden, UK, Switzerland

Livra LATAM

Luth Research US

M3 Research Nordics

Maktoob Research Middle East

Market Intelligence US, EU

Market Tools US, UK, France, Canada, Australia

Market-xcel India, Singapore

Masmi Hungary, Russia, Ukraine

Mc Million US

Mo Web EU

My Points US, Canada

Nerve planet India, China, Japan

Net, Intelligence & Research Korea

Netquest Portugal, Spain

ODC Service Italy, France, Germany, Spain, UK

VENDOR NAME COUNTRIES

Offerwise US

OMI Russia, Ukraine

Opinion Health US

Opinion Outpost/SSI All Countries

Opinions UAE, Saudi Arabia

Panel Base UK

Panel Service Africa South Africa

Panelbiz EU

Panthera Interactive All Countries

Precision Sample US

Public Opinious Canada

Pure Profile UK, US, Australia

Quick Rewards Russia, Ukraine, US, UK

Rakuten Research Japan

Resulta Asia

RPA Asia

Sample Bus Asia

Schlesinger Assoc. US

Seapanels Asia

Spec Span US

Spider Metrix Australia, UK, Canada, New Zealand, South Africa, US

STR Center All Countries

Telkoma South Africa

Testspin/WorldWide All Countries

Think Now Research US

TKL Interactive US

TNS New Zealand New Zealand

Toluna All Countries

United sample All Countries

Userneeds Nordics

uthink Canada

WebMD Market Research Services US

World One Research US, France, Germany, Spain

YOC Germany

YOUMINT India

Zapera (You Gov) All Countries

ONLINE SURVEY SAMPLE AND DATA QUALITY PROTOCOLS

Unlike other data

collection modes, the

server technology

used in online surveys

gives the researcher

far more control over

in-process problems

related to cheating and

lazy behavior.

Anti-Cheating ProtocolsAs a first step in identifying and rejecting bad survey behavior, we need to differen-tiate between Cheating and Lazy Behavior. The solutions Socratic uses for handling each type of problem differ by class of delinquency.

Cheaters attempt to enter a survey multi-ple times in order to:

• Collect compensation

• Sabotage results

Lazy folks don’t really think, and they do the least amount of work in order to:

• Receive compensation

• Avoid the burden, boredom or fatigue of long, repetitious, difficult surveys

Many forms of possible cheating and lazy respondent behaviors can be detected using server-based data and response pattern recognition technologies. In some cases, bad respondents are immediately detected and rejected before they even begin the survey. This is critical for qual-ity, because “illegitimate” or “duplicat-ed” respondents decrease the value of every completed interview. Sometimes, we allow people to enter the survey, but then use pattern recognition software to detect “answer sequences” that warrant “tagging and bagging.” Note: While we inform cheaters that they’re busted and won’t be getting any incentive, we don’t tell them how they were caught!

One of our key tools in assessing the quality of a respondent is the Socratic Cheating Probability Score (CPS). A CPS looks at many possible problems and

classifies the risk associated with accept-ing an interview as “valid and complete.”

However, we also need to be careful not to use a “medium probability score” as an automatic disqualifier. Just because the results are not what we expect, doesn’t mean they are wrong! Margin-al scores should be used to “flag” an interview, which should then be reviewed before rejecting. High scores are usually rejected mid-survey before the respon-dent is qualified as having “completed.”

Here are some examples of how we use technology to detect and reject common respondent problems:

Repeat Survey AttemptsSome cheaters simply attempt to retake surveys over and over again. These are the easiest to detect and reject. To avoid self-selection bias, most large surveys today are done “by customized invitation” (CAN-SPAM 2003) and use a “handshake” protocol. Preregistering individuals with verified profiling data in order to establish a double or triple opt-in status.

Cheaters Solutions: Handshake ProtocolsA handshake protocol entails generating a unique URL suffix-code, which is used for the link to the survey in the email invitation. It is tied to a specific individu-al’s email address and/or panel member identification. Once it is marked as “com-plete” in the database no other sub-missions are permitted on that person’s account. An example of this random suffix code is as follows:

http://sotechsurvey.com/survey/?pid= wx54Dlo1

sotech.com | 800-576-2728 5

ONLINE SURVEY SAMPLE AND DATA QUALITY PROTOCOLS

sotech.com | 800-576-2728 6

ONLINE SURVEY SAMPLE AND DATA QUALITY PROTOCOLS

Supplementing the invitation handshake, a cookie check is utilized. At the start of all surveys, Socratic looks for a cookie bearing that survey_id and, if it is found, the user will not be allowed to take the survey again. The respondent ID is immediately blocked, so that even if the respondent removes the cookie later on, he or she still won’t be allowed back in. At the time a survey is finished (complete or termination), a cookie with the survey_id will be placed on the user’s machine.

But cookie checks are no longer suffi-cient by themselves to prevent multiple submission attempts. More advanced identification is needed.

For a more advanced identification verifi-cation, Socratic utilizes an IP & Browser Config Check. This is a server-level test that is invisible to the respondent. Whenever a person’s browser hits a Web site, it exchanges information with the Web server in order for the Web pages (or survey pages) to display correctly. For responses to all surveys, a check can be made for multiple elements:

IP Address

The first level of validation comes from checking the IP address of the respondent’s computer. IP addresses are usually gen-erated based on a tightly defined geogra-phy. So if someone is supposed to be in California, and their IP address indicates a China-based service, this would be flagged as a potential cheating attempt.

Browser String

Each browser sends a great deal of

information about the user’s system to the survey server. These strings are then logged and subsequent survey attempts are compared to determine whether exact matches are occurring. These are exam-ples of browser strings:

• Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; FunWebProd-ucts; Advanced Searchbar; .NET CLR 1.1.4322; .NET CLR 1.0.3705; KelkooToolbar 1.0.0)

• Mozilla/4.0 (compatible; MSIE 6.1; Windows 95/98/NT/ME/2000/XP; 10290201-SDM; SV1; .NET CLR 1.0.3)

Language Setting

Another browser-based information set that is transmitted consists of the lan-guage settings for the user’s system. These, too, are logged and compared to subsequent survey attempts:

• en-us,x-ns1pG7BO_dHNh7,x-ns2U3

• en-us,en;q=0.8,en-gb;q=0.5,sv;

• zh-cn;q=1.0,zh-hk;q=0.9,zh-tw;

• en-us, ja;q=0.90, ja-jp;q=0.93

Internal Clock Setting

Finally, the user’s computer system has an internal time-keeping function that continuously monitors the time of day and date out to a number of decimal points. Each user’s computer will vary slightly even within the same time zone or within the same company’s system.

When these four measurement are taken together, the probability of two exact settings on all readable elements is extremely low.

Technology for cheating

in online surveys has

proliferated over the

past 10 years and in

some areas of the world

has become a cottage

industry. However,

with the correct server

technology, Socratic

can detect the profiles

of cheating applications

and thwart them in real

time, prior to completing

a survey.

sotech.com | 800-576-2728 7

ONLINE SURVEY SAMPLE AND DATA QUALITY PROTOCOLS

Techno-CheatersSome cheaters are caught because they are trying to use technology to submit multiple surveys. Form populators and keystroke replicators are examples of auto-fill technologies.

Techno-Cheaters SolutionsTotal automation can be thwarted by creating non-machine readable code keys that are used at the beginning of a survey to make sure a human being is respond-ing versus a computer “bot.” We refer to this as a Handshake Code Key Protocol.

One of the most popular Handshake Code Key Protocols is CAPTCHA. To prevent bots and other automated form completers from entering our surveys, a distorted image of a word or number can be displayed on the start screen of all Socratic projects. In order to gain access to a survey, the user has to enter the word or number shown in the image into a text box; if the result does not match the image, the user will not be allowed to enter the survey. (Note: Some dispensa-tion and use of alternative forms of code keys are available for visually impaired individuals.)

As computers become more and more sophisticated in their ability to detect patterns, the CAPTCHA distortions have become more complex.

Images that can be “read” by image recognition bots (as of 2013)

Images that cannot be “read” by image recognition bots

Images adapted from Mori & Malik, ca. 2003, Breaking a visual CAPTCHA, http://www.cs.berkeley .edu/~mori/gimpy/gimpy. html

Lazy behavior is far more

prevalent as a survey

problem than outright

cheating—primarily

because it’s easier to

defeat cheaters than

people who aren’t paying

attention. With new,

more sophisticated

algorithms, however,

it is now possible to

limit the influence of

lazy respondents in

mid-survey.

sotech.com | 800-576-2728 8

ONLINE SURVEY SAMPLE AND DATA QUALITY PROTOCOLS

Lazy Respondent BehaviorA far more common problem with survey takers across all modes of data collection are people who just don’t take the time and effort to answer questions carefully. This can result in rushed surveys or those with replicated pattern issues.

There are several reasons why respondents don’t pay attention:

• Problem 1: Just plain lazy

• Problem 2: Survey design is torturous

●● Too long

●● Boring/repetitious

●● Too difficult

●● Not enough compensation

●● No affinity with sponsor

But whatever the reason for lazy behavior, the symptoms are similar, and the pre-ventative technologies are the same.

Speeders

In the case of rushed survey respondents (“speeders”), speed of submission can be used to detect surveys completed too quickly. One statistical metric that Socratic uses is the Minimum Survey Time Thresh-old. By adapting a normative formula for estimating the length of a survey based on the number of various types of ques-tions, one can calculate an estimated time to completion and determine if actu-al time to completion is significantly lower. This test is run at a predetermined point, before the survey has been completed.

Based on the time since starting, the number of closed-ended questions, and the number of open-ended questions, a

determination will be made as to whether the respondent has taken an adequate amount of time to answer the questions.

If (Time < (((# of CEs * secs/CE) + (# of OEs * secs/OE)) * 0.5)) Then FLAG.

Replicated PatternsAnother common problem caused by lazy behavior is the appearance of patterned answers throughout a survey (e.g., choos-ing the first answer for every question, or selecting a single rating point for all attri- butes). These are fairly easy to detect and the respondent can be “intercepted” in mid-survey and asked to reconsider pat-terned sequences. Socratic uses Pattern Recognition Protocols within a survey to detect and correct these types of problems.

Here are some of the logic-based solutions we apply for common patterning problems:

• XMas Treeing: This technique will iden-tify those who “zig-zag” their answers (e.g., 1, 2, 3, 4, 5, 4, 3, 2, 1, etc.)

●● How to: When all attributes are completed, take the absolute value of all attribute-to-attribute differences. If the mean value is close to 1 you should flag them.

• Straight-Lining: This technique will identify those who straight-line answers to a survey (e.g., taking the first choice on an answer set or entering 4, 4, 4, 4, 4, 4 on a matrix, etc.)

●● How to: Subtract each attribute (SubQuestion) from the previous and keep a running total. When all attributes are completed take the absolute value. If the mean value is 0 you should flag them.

The majority of problems

related to data quality

can be detected before

a survey is completed.

However, a variety of

ongoing checks can add

even more assurance

that respondents are who

they claim to be and are

located in the correct

location. Panel cleaning

is necessary for long-run

viability.

sotech.com | 800-576-2728 9

ONLINE SURVEY SAMPLE AND DATA QUALITY PROTOCOLS

Random AnswersWhile these Pattern Recognition Protocols pick up many common problems, they cannot detect random answer submission (e.g., 1, 5, 3, 2, 5, 4, 3, 1, 1, etc.). For this we need another type of logic: Con-vergent/Divergent Validity tests.

This type of test relies on the assumption that similar questions should be an-swered in a similar fashion and polar op-posites should receive inverse reactions. For example, if someone strongly agrees that a product concept is “expensive,” he or she should not also strongly agree that the same item is “inexpensive.” When these types of tests are in place, the survey designer has some flexibility to in-tercept a survey with “validity issues” and request that the respondent reconsider his or her answers.

Cross-Survey Answer Block Sequences Occasionally, other anti-cheating/anti-la-zy behavior protocols will fail to detect a well-executed illegitimate survey. For this purpose, Socratic also scans for repeated sequences using a Record Comparison Algorithm. Questionnaires are contin-uously scanned, record-to-record, for major blocks of duplicated field contents (e.g., >65% identical answer sequences). Note: Some level of discretion will be needed on surveys for which great simi-larities of opinion or homogeneity in the target population are anticipated.

Future development is also planned to scan open-ended comments for dupli-

cated phrases and blocks of similar text within live surveys. Currently, this can only be done post hoc.

Post-Survey Panel CleaningPost-Survey DetectionFor the panels managed by Socratic Technologies, the quality assurance program extends beyond the sample cleaning and mid-survey error testing. We also continuously monitor issues that can only be detected post hoc.

Address VerificationEvery third or fourth incentive payment should be made by check, or a notice mailed to a physical address. If people want their reward, they have to drop any aliases or geographic pretext in order for delivery to be completed, and often times you can catch cheaters prior to distribu-tion of an incentive. Of course, dupli-cated addresses, P.O. boxes, etc., are a give-away. We also look for slight name derivatives not usually caught by banks, including:

• nick names (Richard Smith and Dick Smith)

• use of initials (Richard Smith and R. Smith)

• unusual capitalization (Richard Smith and RiCHard SmiTH)

• small misspellings (Richard Smith and Richerd Smith)

sotech.com | 800-576-2728 10

ONLINE SURVEY SAMPLE AND DATA QUALITY PROTOCOLS

ConclusionMany features and security checks are now available for assuring the validity of modern online research. This includes pre-survey panel quality, mid-survey cheating and lazy behavior detection and post-survey panel cleaning.

With these technologies in place, online research can now be more highly regulat-ed than any other form of data collection.

Not all survey bad behavior is malicious; some is driven by poor survey design. Some discretion will always be a require-ment of survey usability:

• Writing screeners that don’t telegraph qualification requirements

• Keeping survey length and burden to a reasonable level

• Minimizing the difficulty of compliance

• Enhancing the engagement levels of boring tasks

• Maximizing the communication that par-ticipation is worthwhile and appreciated

While Socratic’s techniques can flag pos-sible cheating or lazy behavior, we believe that the analyst should not just auto-matically reject interviews, but examine marginal cases for possible validity.

sotech.com | 800-576-2728 11

CONTACT

Socratic Technologies, Incorporated, is a leader in the science of computer-based and interactive research methods. Founded in 1994 and headquartered in San Francisco, it is a research-based consultancy that builds proprietary, interactive tools that accelerate and improve research methods for the study of global markets. Socratic Technologies specializes in product development, brand articulation, and advertising research for the business-to-business and consumer products sectors.

Registered Trademarks, Salesmarks and Copyrights The following product and service descriptors are protected and all rights are reserved. Brand Power RatingTM, BPRTM, Brand Power IndexTM, CATM, Configurator AnalysisTM, Customer Risk Quadrant AnalysisTM, NCURATM, ReportSafeTM, reSearch EngineTM, SABRTM, Site-Within-SurveyTM, Socratic CollageBuilderSM, Socratic ClutterBookSM, Socratic BrowserSM, Socratic BlurMeterSM, Socratic CardSortSM, Socratic ColorModelerSM, Socratic CommuniScoreTM, Socratic Forum®, Socratic CopyMarkupSM, Socratic Te-ScopeSM, Socratic PerceptometerSM, Socratic Usability LabSM, The Bruzzone ModelTM, Socratic ProductExhibitorSM, Socratic Concept HighlighterSM, Socratic Site DiagnosticSM, Socratic VirtualMagazineSM, Socratic VisualDifferentiatorSM, Socratic Web BoardsSM, Socratic Web SurveySM 2.0, Socratic WebComm ToolsetSM, SSDSM, Socratic WebPanel ToolsetSM, SWSSM 2.0, Socratic Commitment AnalysisTM, Socratic WebConnectSM, Socratic Advocacy Driver AnalysisTM.

Socratic Technologies, Inc. © 1994–2014. Reproduction in whole or part without written permission is prohibited. Federal law provides severe civil and criminal penalties for unauthorized duplication or use of this material in physical or digital forms, including for internal use. ISSN 1084-2624.

sotech.com | 800-576-2728

San Francisco HeadquartersSocratic Technologies, Inc. 2505 Mariposa Street San Francisco, CA 94110-1424 T 415-430-2200 (800-5-SOCRATIC)

Chicago Regional OfficeSocratic Technologies, Inc. 211 West Wacker Drive, Suite 1500 Chicago, IL 60606-1217 T 312-727-0200 (800-5-SOCRATIC)

Contact Ussotech.com/contact