Discussion Alan Zaslavsky Harvard Medical School.

DiscussionAlan Zaslavsky

Harvard Medical School

Fabrication as a Statistical Procedure

• Fabrication is like imputation– Duplication is like hot deck– Duplication with random modifications is like

multiple imputation– Duplication is like weight modification

• Fabrication is a multilevel process– Interview, interviewer, area, … project level

Fabrication as a Game• Payoffs/risks to fabricator– Reduce effort while receiving payment– Risks greater for higher-level organization/person

• Detection/deterrence

• Costs/risks to data purchaser– Paying more for less information– Wrong decisions– Loss of credibility (cliff loss function)

• Risks may change with greater expertise on either side

Assumptions about Fabricators

• Fabricators are not very sophisticated– No fancy synthesis models

• Fabricators are not trying to work hard– Falsifying must be easier than data collection– Will not know how to “beat” moderately sophisticated

detection techniques• If fabricators try harder …– Good standard synthesis methods could be hard to

detect– Learning on both sides

Fabrication on the Continuum of Survey Management

• Related to other survey errors at scale– Inadequately designed survey questions and tools• Not adapted to conditions under which survey fielded

– Interviewer errors• Misinterpretation of questions, procedures• Interpersonal interview technique• Training and motivation

• Monitoring of “honesty”, accuracy, technique

Detection techniques• Good survey management– Timely, at all levels– Recruitment, observation– Metadata and paradata

• Post-survey analysis– Replication of survey: interpenetrating samples – Subject-matter expertise– Statistical outliers (single and patterns)

• Earlier is better

Regina Faranda

• Extensive checking– Subject-matter and survey expertise– Checklist: QC

• Statistical assumptions?– Can be stated and tested

Rita Thissen

• Detailed specifics of monitoring and detection systems– Technology: CARI, CAPI, …

• (Anecdotes rarely heard)

Mike Robbins

• Duplicate detection is like record linkage– Likelihood ratio

• Duplicate detection also important in other settings– US Census (2000?): match 330M

Robbins – Duplicate detection

• Duplicate detection is like record linkage– Likelihood ratio

• Duplicate detection also important in other settings– US Census (2000?): match 330M × 330M possible

record pairs• Would models be different for fabricated data,

processing errors, repeated real interviews?

Example: Medicare CAHPS survey

• Pulled ~5000 responses (out of ~400K/year)• Examined 27 substantive items• Complex features– Substantial amount of screening/skipped items– Multiple choice items– Blocks of closely related items

Agreement – all pairs

Best agreement: duplicates?

Conclusions

• Know your data and survey methodology• Thanks to speakers for sharing their

experience and methods

Discussion Alan Zaslavsky Harvard Medical School.

Documents

Transcript of Discussion Alan Zaslavsky Harvard Medical School.