Survey Methodology Survey data entry/cleaning EPID 626 Lecture 10.

19
Survey Methodology Survey data entry/cleaning EPID 626 Lecture 10

Transcript of Survey Methodology Survey data entry/cleaning EPID 626 Lecture 10.

Page 1: Survey Methodology Survey data entry/cleaning EPID 626 Lecture 10.

Survey MethodologySurvey data entry/cleaning

EPID 626

Lecture 10

Page 2: Survey Methodology Survey data entry/cleaning EPID 626 Lecture 10.

To do or not to do:Contracting the work

• During study planning, you should decide whether to do the data entry, management, and analysis yourself, or whether to contract with someone else to do it

• What are the advantages and disadvantages?

• When might you want to? When might you not want to?

Page 3: Survey Methodology Survey data entry/cleaning EPID 626 Lecture 10.

Contracting

• Advantages– Specialized expertise– Potential ability to access national network

of personnel– Reduction of load on study personnel– Third party (without financial or

professional stake in results) increases legitimacy of the results

Page 4: Survey Methodology Survey data entry/cleaning EPID 626 Lecture 10.

Contracting

• Disadvantages– Generally more expensive

• Is this true? Discuss profits vs. expertise and efficiency

– Lose direct control over quality of data and study conduct

– May be more difficult to interpret data without having done the analysis

Page 5: Survey Methodology Survey data entry/cleaning EPID 626 Lecture 10.

DIY: Now what?

• Data analysis plan• Data entry• Data diagnostics• Data cleaning• Data setup

Page 6: Survey Methodology Survey data entry/cleaning EPID 626 Lecture 10.

Data analysis plan (DAP)

• Design from the protocol and the survey instrument – Note: they may be discrepant

• Aim:– Resolve discrepancies before you start

working with the data– Establish a clear plan for data

management and analysis

Page 7: Survey Methodology Survey data entry/cleaning EPID 626 Lecture 10.

DAP elements

• Summarize methods

• For each survey objective, identify and describe the relevant variables

• Identify the analysis methods– Software– Statistical methods, tests, significance

levels, definitions

Page 8: Survey Methodology Survey data entry/cleaning EPID 626 Lecture 10.

DAP elements (2)

• Describe plan for handling:– missing values– out-of-range values– zeros if doing log transformations– data collapsing

• Describe subgroup or by-group analyses

Page 9: Survey Methodology Survey data entry/cleaning EPID 626 Lecture 10.

DAP elements (3)

• Set up dummy tables and graphs

• Review this DAP carefully and pass it around

Page 10: Survey Methodology Survey data entry/cleaning EPID 626 Lecture 10.

Data entry

• Design a database that resembles the survey instrument in layout and format

• Pretest it extensively• Designer should be present at the

beginning of data entry to fix bugs• Double data entry?• Avoid necessity of interpretation by entry

personnel

Page 11: Survey Methodology Survey data entry/cleaning EPID 626 Lecture 10.

You and Your Data

Your first eight hours together

Page 12: Survey Methodology Survey data entry/cleaning EPID 626 Lecture 10.

First things first

• Virus-check the files

• Write protect original data

• Back up files and CRFs– On-site: hard drives, diskettes, safes– Off-site: safe deposit box

Page 13: Survey Methodology Survey data entry/cleaning EPID 626 Lecture 10.

First things first (2)

• Import data – Error prone; be very careful here

• Validate and verify the data

Page 14: Survey Methodology Survey data entry/cleaning EPID 626 Lecture 10.

Validating and verifying data

• Run frequencies for categorical variables• Run univariate statistics for continuous

variables• Examine key variables (those used in

the evaluation of primary objectives)• Look at variables by group (sex, age,

etc)

Page 15: Survey Methodology Survey data entry/cleaning EPID 626 Lecture 10.

Validating and verifying data (2)• Recode missing values• Calculate checks for error prone

variables– Ex. Check dates against time-to variables– Check anything that the interviewer had to

calculate, such as a total score

• Derive any key variables that need to be calculated from other variables, and verify them too

Page 16: Survey Methodology Survey data entry/cleaning EPID 626 Lecture 10.

Validating and verifying data (3)

• Rearrange, combine, or separate datasets as needed for analysis– Ex. Split demographic data, primary outcome,

secondary outcome data

• Annotate a survey instrument with variable names

• Create a data dictionary– Include variable name, type, length, and

description or label

Page 17: Survey Methodology Survey data entry/cleaning EPID 626 Lecture 10.

Validating and verifying data (4)

• Look for obvious errors– Ex. Spelling of medication or medical

condition– Be very careful about correcting them– Document any changes– Think about a query system– May need interviewer to resolve errors

Page 18: Survey Methodology Survey data entry/cleaning EPID 626 Lecture 10.

Validating and verifying data (5)• Run rough crosstabs for reference

– Ex. Number by sex, group, and age– Use to track observations

• Create data listings– Very useful for reference and to identify

problems in the data

• Check data coming from different sources– Be very careful with merging

Page 19: Survey Methodology Survey data entry/cleaning EPID 626 Lecture 10.

Validating and verifying data (6)

• Aside: Variable naming– Should be meaningful and descriptive– But be careful about overly descriptive

names• Long variable names are difficult to manipulate

• If meaning appears obvious, people won’t look it up

• Back all of this up in the same way you backed up the original data