Exploratory Data Analysis of Bahmni with R

Post on 15-Apr-2017

112 views 6 download

Transcript of Exploratory Data Analysis of Bahmni with R

Purpose

Explore EMR data collected over a period of time to:

1. Derive insights2. Observe Trends 3. Establish probable correlations.4. Help community to get started to explore their EMR data.

Objectives/Agenda

1. Look at patient trend across various regions2. Top 10 diagnosis reported3. Pick up top diagnosis to further analyze

a. Male/Female ratiob. Top regions/villagesc. Age distributiond. Year wise trende. Explore observations/results and chief complaints reported for these patients.

4. Insights from data and challenges5. Quick peek into other insights which can be derived from this EMR data.

Pre-requisites

1. Basic knowledge of a. Bahmni/OpenMRS data model and concept dictionaryb. SQLc. R (RStudio IDE)

2. PC/MAC/Linux machine set up witha. MySQL Client to connect to the MYSQL server on which Bahmni anonymous DB is set up,

it could be either local or remote serverb. R and RStudio installed

Why R?

1. Open source with great community support.2. Lot of inbuilt packages for descriptive and predictive analytics which can

be used out of box.a. Very good mix of packages for querying and plotting the data

3. Easy to learn and use

Let's get going

All hands on exercises are performed on anonymous data!!!

Part 1

Fundamentals

1. Exploring tables and columns of our interests2. Using R/RStudio

a. Connect to MYSQL DBb. Load required R packages

Patients across Regions

1. Number of patients reported across various cities/villages.2. Percentage of Male/Female Ratio3. Percentage of patients from each region in top 10 cities/villages

Patient Across Regions

Part 2

Top 10 diagnosis

1. Explore distribution of various diagnoses reported across Male/Females2. Pick up top 10 diagnosis and look at the male/female ratio

Top Diagnosis - Gastritis

Look at

1. Top 5 regionsa. With Male/Female distribution

2. Age distribution for Male/Female in the top 5 regions.a. Boxplotb. Histogram

3. Year wise trend

Top 10 Diagnosis

Gastritis - Deep Dive

Part 3

Explore results for top diagnosis - Gastritis

1. Gather all results for patients with gastritis.2. Look at important results for female to identify any trends

Top Chronic Diagnosis - Diabetes

1. Gather all the lab results2. Explore HBA1C results.

a. Lack of consistent data

3. Analyze Hemoglobin levelsa. Outliersb. Flooring and Cappingc. Check for gender bias in 12 to 18 age group

Exploring Results

Part 4

What’s next?

1. Better understanding of data2. Data cleaning and preparation

a. City/Village misspelledb. Outlier detection and replacement strategyc. Descriptive statistics, measures of central tendency, skewness, hypothesis testing.

3. Feature transformationa. Extract new features

i. Like Average sugar levels from fasting and postprandial blood sugar levelsii. Binning of variables such as age to infant, youth, adult, etc..

b. Natural Language processing (NLP)i. Chief complaints

4. Clustering of patients

References and Links

1. R & RStudio: https://www.rstudio.com/products/rstudio/2. MySQL:

https://dev.mysql.com/doc/refman/5.6/en/osx-installation-pkg.html3. RBlogs: https://www.r-bloggers.com/4. Source Code: https://github.com/karrtikiyer-tw/bahmni-eda5. YouTube playlist

Thank you!

Please leave your feedback and suggestions via comments.