BY : LISSY VERMA SHRADDHA GUPTA. Data Collection ODK : Open Data Kit Demo Usher : Improving Data...

24
DATA COLLECTION AND IMPROVING DATA QUALITY BY : LISSY VERMA SHRADDHA GUPTA

Transcript of BY : LISSY VERMA SHRADDHA GUPTA. Data Collection ODK : Open Data Kit Demo Usher : Improving Data...

Page 1: BY : LISSY VERMA SHRADDHA GUPTA.  Data Collection  ODK : Open Data Kit  Demo  Usher : Improving Data Quality  Purpose  Implementation  Results.

DATA COLLECTION AND

IMPROVING DATA QUALITY

BY : LISSY VERMASHRADDHA GUPTA

Page 2: BY : LISSY VERMA SHRADDHA GUPTA.  Data Collection  ODK : Open Data Kit  Demo  Usher : Improving Data Quality  Purpose  Implementation  Results.

OUTLINE

Data Collection ODK : Open Data Kit Demo

Usher : Improving Data Quality Purpose Implementation Results

Page 3: BY : LISSY VERMA SHRADDHA GUPTA.  Data Collection  ODK : Open Data Kit  Demo  Usher : Improving Data Quality  Purpose  Implementation  Results.

DATA COLLECTION

Data collection in developing areas is difficult.

None of existing tools suffice.

Based on need, new features are needed.

Page 4: BY : LISSY VERMA SHRADDHA GUPTA.  Data Collection  ODK : Open Data Kit  Demo  Usher : Improving Data Quality  Purpose  Implementation  Results.

OPEN DATA KIT

ODK is a tool suite for collection and management of data on mobile phones.

The main objective is to provide open source tools.

Page 5: BY : LISSY VERMA SHRADDHA GUPTA.  Data Collection  ODK : Open Data Kit  Demo  Usher : Improving Data Quality  Purpose  Implementation  Results.

OPEN DATA KIT

ODK COLLECT Collects Data

ODK AGGREGATE Store Data, view and export.

ODK MANAGE Remote Device Management

Page 6: BY : LISSY VERMA SHRADDHA GUPTA.  Data Collection  ODK : Open Data Kit  Demo  Usher : Improving Data Quality  Purpose  Implementation  Results.

A QUICK DEMO

Page 7: BY : LISSY VERMA SHRADDHA GUPTA.  Data Collection  ODK : Open Data Kit  Demo  Usher : Improving Data Quality  Purpose  Implementation  Results.

AMPATH

AMPATH deployed the ODK for data collection for medical purpose.

Deployment was found to be successful minimizing delays and improving lives of healthcare workers and other people.

Page 8: BY : LISSY VERMA SHRADDHA GUPTA.  Data Collection  ODK : Open Data Kit  Demo  Usher : Improving Data Quality  Purpose  Implementation  Results.

Data Collection is Challenging

Expertise in form design

Double Entry : Costly

Data Cleaning

Page 9: BY : LISSY VERMA SHRADDHA GUPTA.  Data Collection  ODK : Open Data Kit  Demo  Usher : Improving Data Quality  Purpose  Implementation  Results.

Past Work

Constraints

Combo-boxes.

Reduce Time

Automatically filled Leave-forms.

Page 10: BY : LISSY VERMA SHRADDHA GUPTA.  Data Collection  ODK : Open Data Kit  Demo  Usher : Improving Data Quality  Purpose  Implementation  Results.

USHER: Improving Data Quality

ESCORTER : Guide towards correct

entries.

Question Ordering in form.

Greedy Information Gain

Dynamically Reorder Questions

Predict Errors to Re-ask.

Contextualized Error Likelihood Principle.

Page 11: BY : LISSY VERMA SHRADDHA GUPTA.  Data Collection  ODK : Open Data Kit  Demo  Usher : Improving Data Quality  Purpose  Implementation  Results.

CURBSTONING

Concept : An unscrupulous door-to-door

surveyor Shirks Work, ask only important

questions.

Greedy Information Gain

Uniform Prior : Equal likely inputs

Training Set

Context – specific Model Required

Bayesian Learning

Page 12: BY : LISSY VERMA SHRADDHA GUPTA.  Data Collection  ODK : Open Data Kit  Demo  Usher : Improving Data Quality  Purpose  Implementation  Results.

DATASETS

The patient dataset collected at a rural HIV/AIDS clinic at Tanzania.

Survey dataset, responses from 1986 poll about race and politics

Page 13: BY : LISSY VERMA SHRADDHA GUPTA.  Data Collection  ODK : Open Data Kit  Demo  Usher : Improving Data Quality  Purpose  Implementation  Results.

Probabilistic Relation : Form Questions

Bayesian Network for the patient dataset

Page 14: BY : LISSY VERMA SHRADDHA GUPTA.  Data Collection  ODK : Open Data Kit  Demo  Usher : Improving Data Quality  Purpose  Implementation  Results.

Question layout generated by the algorithm

Page 15: BY : LISSY VERMA SHRADDHA GUPTA.  Data Collection  ODK : Open Data Kit  Demo  Usher : Improving Data Quality  Purpose  Implementation  Results.

Re-ask Questions

Approximates Double Entry

Uncertainty : High Entropy

Outliers

Page 16: BY : LISSY VERMA SHRADDHA GUPTA.  Data Collection  ODK : Open Data Kit  Demo  Usher : Improving Data Quality  Purpose  Implementation  Results.

Data-entry Feedback

Page 17: BY : LISSY VERMA SHRADDHA GUPTA.  Data Collection  ODK : Open Data Kit  Demo  Usher : Improving Data Quality  Purpose  Implementation  Results.
Page 18: BY : LISSY VERMA SHRADDHA GUPTA.  Data Collection  ODK : Open Data Kit  Demo  Usher : Improving Data Quality  Purpose  Implementation  Results.

Usher Components And Data-flow

Page 19: BY : LISSY VERMA SHRADDHA GUPTA.  Data Collection  ODK : Open Data Kit  Demo  Usher : Improving Data Quality  Purpose  Implementation  Results.

Error Modeling

Page 20: BY : LISSY VERMA SHRADDHA GUPTA.  Data Collection  ODK : Open Data Kit  Demo  Usher : Improving Data Quality  Purpose  Implementation  Results.

Accurate Prediction Results

Page 21: BY : LISSY VERMA SHRADDHA GUPTA.  Data Collection  ODK : Open Data Kit  Demo  Usher : Improving Data Quality  Purpose  Implementation  Results.

THANK YOU

Page 22: BY : LISSY VERMA SHRADDHA GUPTA.  Data Collection  ODK : Open Data Kit  Demo  Usher : Improving Data Quality  Purpose  Implementation  Results.

SUPPLEMENTARY SLIDES

Page 23: BY : LISSY VERMA SHRADDHA GUPTA.  Data Collection  ODK : Open Data Kit  Demo  Usher : Improving Data Quality  Purpose  Implementation  Results.

DATA COLLECTION : PROBLEMS

Due to digital divide between the developing and developed areas, it is very difficult to collect and use data in the developing regions.

The main problems being : Lack of reliable infrastructure,Proper connectivity, and,Inadequate expertise.

Currently available tools for data collection like Pedragon Forms, Nokia Data Gathering, Java-Rosa, RapidSMS etc. are difficult to deploy, hard to use, complicated to scale and rarely customizable.

Page 24: BY : LISSY VERMA SHRADDHA GUPTA.  Data Collection  ODK : Open Data Kit  Demo  Usher : Improving Data Quality  Purpose  Implementation  Results.

OPEN DATA KIT

The Open Data Kit or simply ODK is a suite of tools for data collection that uses Google’s Android platform.

The main objectives of the technology are : Modularising and customising toolsUse of open interfaces and standardsLong time survival of tools.

The three components of ODK are:1. ODK Collect : collects data using Forms.2. ODK Aggregate : ready to deploy online repository to store, view and export collected data.3. ODK Build : enables users to generate forms.4. ODK Voice : maps Forms to sound snippets.5. ODK Clinic : mobile medical record system.6. ODK Manage : maintains database of all phones for remote device management7. ODK Validate : validates Form.Other tools being ODK Dropbox, ODK Rangefinder, ODK Tasks, ODK Listen and ODK Visualise.