Introduction to the Doctor Social Graph projectBrandon Weinberg : November 29, 2012
This presentation is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
Before I Start...● As the Doctor Social Graph project rapidly
progresses, obsolence will kick in rendering this content stale and "old news"
● This presentation was published on Slideshare 11/29/2012 when the Doctor Social Graph Project was quite new
● Details as of 11/29 are gradually emerging; Most content in these slides is paraphrased from official project announcements thus far
● Let's get started!
Organizer● Fred Trotter● Celebrated Health IT Expert in USA● One of the Designees of the Direct Project
(Mandated HIE Protocol in USA)● Co-Authored First Health IT Book for O'Reilly
and Most Popular Book on Meaningful Use Standards: Meaningful Use And Beyond
● Values Open Source
Announcement● Strata RX 2012- O'Relly Strata Conference● October 16, 2012● San Francisco● Fred's Keynote Titled "The Ethos of
Healthcare Data Science"● This Was When Data Was Initially Released
(Open Source Licensed), For Healthcare Data Scientists in Audience
Social Dataset● Collaborative Relationship Data● How Doctors, Hospitals, Labs and Other
Healthcare Providers Collaborate To Treat Medicare Patients
● Data Includes: Referrals to Specialists● Data Includes: Lab Providers and Hospitals
A Doctor Often Works With● Data Includes: Real Names and Addresses● Representative of How USA Healthcare
System is Delivering Care
Doctor Social Graphs● Graphical Representations of Group
Interactions During Medicare Treatment● Diagrams Based on Math Models● Use Nodes and Connections● Nodes: Providers, e.g. Doctors, Hospitals,
Labs, Etc● Connections: Degree to Which Providers
Work Together Treating Specific Patients● Will Be Largest Real-Name Social Graph
That is Publically Available, Of Any Kind!
Doctor Social Graphs● Visualization of Social Graph begins at 1:10● http://youtu.be/L4C3cloZEQk
Other Social Graphs● Facebook, Twitter, LinkedIn Exemplify
Private Big Data Social Graphs● Most Portions of Data Remain In-House● Do You Know Any Data Scientists Good at
Graphing and Graph Theory? They May Appreciate Doctor Social Graph
Preparing Data● Initial Dataset Was Obtained by Fred Trotter● He Filed A Freedom of Information Act
Request Against Medicare Claims Database● For Phase 1 Improvement, He Purchased
Board Credentialing Data in All 50 States● Was $50-$1,000 Per State to Download● Board Credentialing Data is Analogous to
"Credit History" for Doctors. e.g. Medical Schools, Board Certifications and Board-Imposed Punishments
Preparing Data● After Merging Initial (Referral and Teaming)
Dataset with State Credentialing Data, the Data Was Formatted For Usability, e.g. Disparate Data Sources Will Be Formatted in CSV, JSON, XML
● Merged Dataset To Be Released in Late November or Early December to MedStartr Backers (Explained Later)
Doctor Performance● Fairly Evaluate Doctor Quality in USA● "My Most Important Project For This Data Is
Simple: I Want To Create Algorithms To Rate Doctors That Patients Find Useful And That Doctors Find Fair." Fred Trotter (paragraph 10)
● "The Development of Objective, Fair and Useful Doctor Rating Systems" Fred Trotter
Doctor Performance● Referrals From Doctors, For Example, May
Be Used As Doctor "Votes" For Each Other● Scroll Down to Third Paragraph Why This
Matters To Patients For Challenges and Biases in Current Doctor Rating Systems
● Examples Abound How Patients, Doctors, Insurance Companies, Hospitals, Labs, Academics, Scientists, Health Policy Makers and Others May Leverage Data For Their Particular Research Interests
Hospital Performance● Hospital Performance Data Sources Will Be
Merged and Improve Dataset● e.g. Phase 3● Example Question: Which Cardiologists
Refer to Hospitals With Poor Central Line Infection Rates?
● "We Want to Turn This Into the Ultimate Source For Open Doctor and Hospital Data." Fred Trotter
Overview of Data● 2011 Dataset is 1.3 GB file● 3.7 Million Entries● Contains Nearly One Million Nodes● Node = Person or Organization That
Provided Health Care Service to a Medicare Patient
● Graph Data is Keyed Using National Provider Identifiers (NPIs)
NPI● NPI = Unique Provider Number● Individual and Organization Providers● NPI is Mandated by HIPPA (as a
Replacement to UPIN)● Doctors and Hospitals Must Use Their NPI
for Medical Billing, e.g. Medicare Billing or Prescribing Medication
Sample Data● A few lines from a random search (grep) on
a specific NPI...grep 1548387418 refer.2011.csv > Methodist_Hospital_Referrals.csvNPI_Seen_First,NPI_Seen_Second,Seen_Count1184710477,1548387418,551548387418,1326047754,621548387418,1598971913,24
● Pretty Cool, Huh? Full Sample is on Pastebin
Tip For Providers● Are You A Health Care Provider?● Good Time To Update Your NPI Record● e.g. No Need to List Your Home Address● Public Database● Updated Weekly● Fred Built a Very Clean NPI Search Tool● Or Use Government NPI Search Tool
Referral and Teaming● Graph Has 49,685,586 Referring Party Pairs
(Collaborative Relationships)● When Providers Work On The Same Group
of Patients Within The Same Time Frame = Teaming Relationship
● Interactions Traditionally Considered Referral Relationships = Majority of Data
● If Provider A Sees the Same Patient As Provider B Within 30 Days, It Counts As +1
Referral and Teaming● What's Counted is How Often Two Providers
Bill Medicare For The Same Patients in 30 Days
● How Can Patient Identification Be Avoided, You May Ask
● For Each Entry in Dataset, At Least 11 Patients Were Involved in Transaction
● 11 = CMS Standard● 11 Solves "Elvis Problem"
Elvis Problem● Everyone Knows Elvis' Doctor● Everyone Knows Elvis Doctor Has One
Patient● If Elvis' Doctor Refers to a Cardiologist, Then
Everyone Knows Elvis Has Heart Problems● At Least 11 Patients Take Part In Each
Given "Referral Count"● Enforcing a Minimum of 11 Patients in the
Transaction Addresses Said Problem
Privacy● Aside From Knowing a Score Reflects 11 or
More Patients, Little Else Can Be Derived From Relationship Scores About Patients
● e.g. Referral Relationship Score = 1,100● You Know it Reflects 11 or More Patients● Was It 11 Patients With 100 Referrals?● Was It 100 Patients With 11 Referrals?● Bottom Line. Data Reflects the Relationship
Score Between Two Nodes, While Omitting Patient-Specific Data
Privacy● No Patient-Specific Data is Released in
Dataset; Patient-Specific Data is Entirely Omitted (Not Deidentified)
● Doctors Who Bill Medicare Are Government Contractors; Some Will Be Surpised As Public Data Becomes Increasingly Accessible
● Freedom of Information Act Makes Government Contractor Data Available to Public for Accountability
Privacy● It is Fair to Presume Organizations Are
Already Using Such Healthcare Data● e.g. Insurance Companies, Pharmacy
Chains, Government, Etc● Ironically, Patients and Doctors Have Had
Least Access To Study Such Data
Data Overlay● Information Will Be Discoverable By
Overlaying Private or Public Data On Top of the Dataset
● Dataset With Medicare Referral and Teaming Patterns Was a Starting Point to Merge Data
● Dataset Will Be Steadily Improved● In Phase 2, For Example, Publically
Available Nursing Home Data To Be Merged
Geo-Encoded● Each Provider Identifier Contains Practice
Location Address and Mailing Address● Data Can Be Overlayed Geographically and
Merged With Geo Databases● Graph Gets Input From a Geo-Encoded Key● 80%: Specific Latitude or Longitude● 20%: Zip Code for General Location● Localized Healthcare Data
Sample Data, Re-Examined● 1112223334,5556667778,1111● 1112223334 = NPI of Node That Saw
Medicare Patient First● 5556667778 = NPI of Node That Saw
Medicare Patient Second● 1111 = Number of Times This Happened in
a 30-Day Period During A Year (Connection)● 1111 = Relationship Score Between Real-
Named Nodes and Connections● Often (Not Always) the PCP = First Variable
Most Popular Referrals● Fred Uploaded the Top 100 Organizations
by Number of Nodes in Dataset to Pastebin● One of Most Frequent "Referrals" is to Get
Lab Work Done at LabCorp, Quest or Other Local Lab Providers
● Also Very Common "Referrals" are to Hospital Emergency Departments and Treatment Facilities Like DaVita
Taxonomy● Public NPI File Has Provider-Type Ontology
Classifying Doctor and Organization Types● Hospitals, Primary Care Doctors, Specialist
Types and Labs are Coded in NPI File in This Provider-Type Ontology; Which is Maintained by AMA's National Uniform Claim Committee
● Not Perfect, But Usually Accurate
Funding Overview● Funding is Occasionally Needed to Improve
Dataset and Fred Uses Crowdfunding Model● Project is Currently Hosted on MedStartr
(Healthcare Version of KickStarter)● Backers Can Receive Early Access (6
Months) to a Rich Healthcare Dataset● Entire Dataset Will Become Open Sourced
in Mid-2013 and Free to the Public● License To Be Creative Commons
Attribution-ShareAlike 3.0 Unported License
Funding Overview● MedStartr Backers Have Bought 1 of 2 Data
Licenses● Open Source Data License● $100-$120: Access to Entire Database and
Sharing of Any Integrated Data Required● Proprietary-Friendly Data License● $1,200-$5,000: Access to Entire Database
and Sharing of Integrated Data Not Required
Funding Details● For Phase 1 Improvements to the Initial
Dataset $23,720 Collected From 88 MedStartr Backers; 51 Receive Data
● 39 Get Open Source Data License and 12 Get Proprietary-Friendly Data License
● Data Price Rises Per Phase Between Now and Mid-2013 (Until Data Becomes Free)
● Dual-License = No Data Hoarding; Lets Organizations Pay Steep Price to Innovate in Private, Without Blocking Open Research
Crowdfunding● Fred Effectively Said, "If A Few Hundred
People Want To Pool Small Amounts of Money Together For This Project, I'll Buy and Prepare Public-Yet-Inaccessible Healthcare Data So Scientists Can Use It To Improve Healthcare, and It Will Never Be Hoarded."
● Clinical Trial Fundraiser Diabetes App● Extend Features Patient Relationship App● Not-Just-For-Profits: Transparent Funding
Call To Innovators● "All of The Cool Discoveries in This Dataset
Should Happen in the First Six Months." Fred Trotter
● "All of the Really Amazing Discoveries in This Dataset Will Be Made in the Next Few Months, By Those Who Either Attended Strata RX, or Who Participate in This Project." Fred Trotter
● Phase 2 Underway on MedStartr
Conclusion● This presentation was made for people
learning about the Doctor Social Graph project
● I hope it provides them a few things which make understanding the project and data easier and faster
● Have fun using the Doctor Social Graph● Questions/Comments: Brandon Weinberg● Email: [email protected]
Top Related