Improving efficiency and accuracy in data management for ... › PDFs › NDRS-presentations ›...

11
Improving efficiency and accuracy in data management for naturalistic driving studies Rusan Chen Georgetown University

Transcript of Improving efficiency and accuracy in data management for ... › PDFs › NDRS-presentations ›...

Page 1: Improving efficiency and accuracy in data management for ... › PDFs › NDRS-presentations › Chen.pdfAutomating • Data management involves doing the same task multiple times.

Improving efficiency and accuracy indata management for naturalistic driving studies

Rusan Chen

Georgetown University

Page 2: Improving efficiency and accuracy in data management for ... › PDFs › NDRS-presentations › Chen.pdfAutomating • Data management involves doing the same task multiple times.

• Naturalistic driving studies involve complicated, dynamic datasets1

• Efficient data management is essential for the analysis results being replicable 2

• Based on my experience working on the 40-car Naturalistic Driving Study3

Overview

Page 3: Improving efficiency and accuracy in data management for ... › PDFs › NDRS-presentations › Chen.pdfAutomating • Data management involves doing the same task multiple times.

Sound familiar?

You have multiple versions for the same file and don’t know which is which.

You cannot find an important file and think you may have deleted it.

There are two versions of the ‘latest’ draft for a paper, with the same name ‘final.doc’

Page 4: Improving efficiency and accuracy in data management for ... › PDFs › NDRS-presentations › Chen.pdfAutomating • Data management involves doing the same task multiple times.

Efficient workflow requires proper

• Organizing

• Documenting

• Automating

• Archiving

Page 5: Improving efficiency and accuracy in data management for ... › PDFs › NDRS-presentations › Chen.pdfAutomating • Data management involves doing the same task multiple times.

Organizing

• \Work and \Post directories are critical

• Once a file is posted, it is never changed!

Example:

C:\40Car

\ADS

\Work

\Post

40carAnalysis.doc

Page 6: Improving efficiency and accuracy in data management for ... › PDFs › NDRS-presentations › Chen.pdfAutomating • Data management involves doing the same task multiple times.

Organizing folders

• \Post \2009

\012710 survey questionnaire analysis

\013110 personality related to risky driving

\031110 predicting C/NC from g-force

\032710 SAS Glimmix

\033010 risky friends interaction

\052110 speeding analysis

\052410 perception of risk as mediators

\060610 high vs low risky drivers

Page 7: Improving efficiency and accuracy in data management for ... › PDFs › NDRS-presentations › Chen.pdfAutomating • Data management involves doing the same task multiple times.

Documenting

It is always better to document today than tomorrow

What to document?

• Date

• Purpose

• Data sources

• How to form new composite scores

• Steps of analysis

• Where to save the results

• To whom you sent the results

Page 8: Improving efficiency and accuracy in data management for ... › PDFs › NDRS-presentations › Chen.pdfAutomating • Data management involves doing the same task multiple times.

Automating

• Data management involves doing the same task multiple times.

• Automating these tasks can save time and prevent errors

What to automate? (using macros and loops)

• To update, merge, and subset datasets

• to create and label new variables

• To check outliers

• To define and report missing values

• To fit a sequence of similar models

• To save analysis results

Page 9: Improving efficiency and accuracy in data management for ... › PDFs › NDRS-presentations › Chen.pdfAutomating • Data management involves doing the same task multiple times.

short-term mid-term long-term

mirror backup archive

Archiving: to protect your files

Page 10: Improving efficiency and accuracy in data management for ... › PDFs › NDRS-presentations › Chen.pdfAutomating • Data management involves doing the same task multiple times.

Thank you!

Page 11: Improving efficiency and accuracy in data management for ... › PDFs › NDRS-presentations › Chen.pdfAutomating • Data management involves doing the same task multiple times.

Reference

• Long, JS (2009) The workflow of data analysis using Stata. Stata Press, TX: College Station