BY : LISSY VERMA SHRADDHA GUPTA. Data Collection ODK : Open Data Kit Demo Usher : Improving Data...

Post on 20-Jan-2016

220 views 3 download

Transcript of BY : LISSY VERMA SHRADDHA GUPTA. Data Collection ODK : Open Data Kit Demo Usher : Improving Data...

DATA COLLECTION AND

IMPROVING DATA QUALITY

BY : LISSY VERMASHRADDHA GUPTA

OUTLINE

Data Collection ODK : Open Data Kit Demo

Usher : Improving Data Quality Purpose Implementation Results

DATA COLLECTION

Data collection in developing areas is difficult.

None of existing tools suffice.

Based on need, new features are needed.

OPEN DATA KIT

ODK is a tool suite for collection and management of data on mobile phones.

The main objective is to provide open source tools.

OPEN DATA KIT

ODK COLLECT Collects Data

ODK AGGREGATE Store Data, view and export.

ODK MANAGE Remote Device Management

A QUICK DEMO

AMPATH

AMPATH deployed the ODK for data collection for medical purpose.

Deployment was found to be successful minimizing delays and improving lives of healthcare workers and other people.

Data Collection is Challenging

Expertise in form design

Double Entry : Costly

Data Cleaning

Past Work

Constraints

Combo-boxes.

Reduce Time

Automatically filled Leave-forms.

USHER: Improving Data Quality

ESCORTER : Guide towards correct

entries.

Question Ordering in form.

Greedy Information Gain

Dynamically Reorder Questions

Predict Errors to Re-ask.

Contextualized Error Likelihood Principle.

CURBSTONING

Concept : An unscrupulous door-to-door

surveyor Shirks Work, ask only important

questions.

Greedy Information Gain

Uniform Prior : Equal likely inputs

Training Set

Context – specific Model Required

Bayesian Learning

DATASETS

The patient dataset collected at a rural HIV/AIDS clinic at Tanzania.

Survey dataset, responses from 1986 poll about race and politics

Probabilistic Relation : Form Questions

Bayesian Network for the patient dataset

Question layout generated by the algorithm

Re-ask Questions

Approximates Double Entry

Uncertainty : High Entropy

Outliers

Data-entry Feedback

Usher Components And Data-flow

Error Modeling

Accurate Prediction Results

THANK YOU

SUPPLEMENTARY SLIDES

DATA COLLECTION : PROBLEMS

Due to digital divide between the developing and developed areas, it is very difficult to collect and use data in the developing regions.

The main problems being : Lack of reliable infrastructure,Proper connectivity, and,Inadequate expertise.

Currently available tools for data collection like Pedragon Forms, Nokia Data Gathering, Java-Rosa, RapidSMS etc. are difficult to deploy, hard to use, complicated to scale and rarely customizable.

OPEN DATA KIT

The Open Data Kit or simply ODK is a suite of tools for data collection that uses Google’s Android platform.

The main objectives of the technology are : Modularising and customising toolsUse of open interfaces and standardsLong time survival of tools.

The three components of ODK are:1. ODK Collect : collects data using Forms.2. ODK Aggregate : ready to deploy online repository to store, view and export collected data.3. ODK Build : enables users to generate forms.4. ODK Voice : maps Forms to sound snippets.5. ODK Clinic : mobile medical record system.6. ODK Manage : maintains database of all phones for remote device management7. ODK Validate : validates Form.Other tools being ODK Dropbox, ODK Rangefinder, ODK Tasks, ODK Listen and ODK Visualise.