Predictive Analytics for OpenFDA & Other Sources
-
Upload
nathan-ayers -
Category
Documents
-
view
76 -
download
0
description
Transcript of Predictive Analytics for OpenFDA & Other Sources
![Page 1: Predictive Analytics for OpenFDA & Other Sources](https://reader035.fdocuments.in/reader035/viewer/2022062517/5681382c550346895d9fdaf3/html5/thumbnails/1.jpg)
Predictive Analytics for OpenFDA & Other Sources
October 6, 2014
![Page 2: Predictive Analytics for OpenFDA & Other Sources](https://reader035.fdocuments.in/reader035/viewer/2022062517/5681382c550346895d9fdaf3/html5/thumbnails/2.jpg)
Data Fusion to Know a Individuals
![Page 3: Predictive Analytics for OpenFDA & Other Sources](https://reader035.fdocuments.in/reader035/viewer/2022062517/5681382c550346895d9fdaf3/html5/thumbnails/3.jpg)
OpenFDA Queries
https://api.fda.gov/drug/event.json?
search=patient.drug.openfda.pharm_class_epc:"nonsteroidal+anti-inflammatory+drug”
&count=patient.reaction.reactionmeddrapt.exact
End Point
search for records where
openfda.pharm_class_epc (pharmacologic
class) contains nonsteroidal anti-
inflammatory drug.
count the field patient.reaction.rea
ctionmeddrapt (patient reactions).
![Page 4: Predictive Analytics for OpenFDA & Other Sources](https://reader035.fdocuments.in/reader035/viewer/2022062517/5681382c550346895d9fdaf3/html5/thumbnails/4.jpg)
https://api.fda.gov/drug/event.json?search=patient.drug.openfda.pharm_class_epc:%22nonsteroidal+anti-inflammatory+drug%22&count=patient.reaction.reactionmeddrapt.exact
![Page 5: Predictive Analytics for OpenFDA & Other Sources](https://reader035.fdocuments.in/reader035/viewer/2022062517/5681382c550346895d9fdaf3/html5/thumbnails/5.jpg)
Important OpenFDA data types
What the drug is supposed to fix: Pharmacologic Class (EPC) - pharm_class_epc
How the drug works: Mechanism of Action (MOA) - pharm_class_moa
What the drug affects: Physiologic Effect (PE) - pharm_class_pe
What is in the drug: Chemical Structure (CS) - pharm_class_cs
![Page 6: Predictive Analytics for OpenFDA & Other Sources](https://reader035.fdocuments.in/reader035/viewer/2022062517/5681382c550346895d9fdaf3/html5/thumbnails/6.jpg)
https://api.fda.gov/drug/event.json?search=patient.drug.openfda.pharm_class_epc:%22Serotonin+and+Norepinephrine+Reuptake+Inhibitor%22
Safety Report ID
Biographical DataAdverse Reactions
Drug Information
![Page 7: Predictive Analytics for OpenFDA & Other Sources](https://reader035.fdocuments.in/reader035/viewer/2022062517/5681382c550346895d9fdaf3/html5/thumbnails/7.jpg)
More OpenFDA data types
How serious is the reaction: serious (1 for Yes, 2 for No)• "serious": "1",• "seriousnesscongenitalanomali": "1", • "seriousnessdeath": "1", • "seriousnessdisabling": "1" • "seriousnesshospitalization": "1", • "seriousnesslifethreatening": "1", • "seriousnessother": "1”
What is the drug indicated for: drugindication
Circumstances for taking drug: patient.drug.drugadditional
![Page 8: Predictive Analytics for OpenFDA & Other Sources](https://reader035.fdocuments.in/reader035/viewer/2022062517/5681382c550346895d9fdaf3/html5/thumbnails/8.jpg)
Predictions on OpenFDA Data
Hierarchical Clustering (“unsupervised learning”) on Manufacturers by Drug Class and Adverse Events
Generates Insights and Further Questions to Explore, Like; Do some adverse events dominate all others? What is the role of retail distributors rather than
manufacturers – an artifact of the data or something else they do between between themselves and patient?
![Page 9: Predictive Analytics for OpenFDA & Other Sources](https://reader035.fdocuments.in/reader035/viewer/2022062517/5681382c550346895d9fdaf3/html5/thumbnails/9.jpg)
Manufacturers by All Drug Classes
Group distinguished by abnormally large adverse events for the products they make – includes companies Mylan and Teva
Group troubling in the large number of adverse events for the products they make – includes companies Abbvie and Pfizer
Group above average for the number of product adverse events. includes private labeling companies CVS, Kroger, Wal-Mart, Publix
Other manufacturers not troubling in the number of adverse events
![Page 10: Predictive Analytics for OpenFDA & Other Sources](https://reader035.fdocuments.in/reader035/viewer/2022062517/5681382c550346895d9fdaf3/html5/thumbnails/10.jpg)
Manufacturers by All Adverse Events
Other manufacturers not troubling in the number of adverse events
Group of 1 highly (Mylan) distinguished by abnormally large adverse events for the products they make
Group troubling in the large number of adverse events for the products they make – includes companies Teva and Grocery Store Kroger
Group above average for the number of product adverse events. includes big pharma maker Merck.
![Page 11: Predictive Analytics for OpenFDA & Other Sources](https://reader035.fdocuments.in/reader035/viewer/2022062517/5681382c550346895d9fdaf3/html5/thumbnails/11.jpg)
Conditional Probability Models (Bayes) Very Helpful for Predictions
Model Type % Correct on Age
% Correct on Gender
Random Forest 48% 55%
Support Vector Machine
48% 55%
Decision Trees 14% 9%
Naïve Bayes 64% 78%
![Page 12: Predictive Analytics for OpenFDA & Other Sources](https://reader035.fdocuments.in/reader035/viewer/2022062517/5681382c550346895d9fdaf3/html5/thumbnails/12.jpg)
Why is Bayes So Much Better?
Works on Conditional Probability
Utilizes Much More of What We Already Know
Probability of Age 18to34 | Rating % Age 18to34drug
drug
![Page 13: Predictive Analytics for OpenFDA & Other Sources](https://reader035.fdocuments.in/reader035/viewer/2022062517/5681382c550346895d9fdaf3/html5/thumbnails/13.jpg)
Bayes is Conditional Probability
Intuition is “What the chances of X given I know Y”
This will always be better than flipping a coin – as in the case of gender prediction
The probability of Female (F) for a any given Drug (T) is the same as the probability of the Drug given Female times the probability of being female divided by the probability of the Drug.
![Page 14: Predictive Analytics for OpenFDA & Other Sources](https://reader035.fdocuments.in/reader035/viewer/2022062517/5681382c550346895d9fdaf3/html5/thumbnails/14.jpg)
Bayes Results for Single Person Households
**** ACCURACY **** WEIGHTED ACCURACY
Genre Gender Age Size Weight Gender AgeADVENTURE 75.4% 62.0% 16,565 1.001 75.5% 62.1%
AUDIENCE PARTICIPATION 84.1% 78.8% 46,283 1.003 84.4% 79.0%AWARD CEREMONIES 60.4% 42.6% 655 1.000 60.4% 42.6%
CHILD - LIVE 78.6% 67.7% 4,868 1.000 78.6% 67.7%CHILD DAY - ANIMATION 74.7% 59.3% 3,487 1.000 74.7% 59.4%
CHILD MULTI-WEEKLY 81.6% 73.2% 1,916,697 1.144 93.3% 83.8%CHILDREN'S NEWS 76.0% 33.3% 300 1.000 76.0% 33.3%COMEDY VARIETY 76.7% 68.9% 326,770 1.025 78.6% 70.6%CONCERT MUSIC 67.8% 54.6% 2,822 1.000 67.9% 54.6%
CONVERSATIONS, COLLOQUIES 76.8% 63.3% 113,290 1.009 77.5% 63.9%
DAYTIME DRAMA 81.1% 62.5% 20,478 1.002 81.2% 62.6%DEVOTIONAL 64.0% 47.8% 1,344 1.000 64.0% 47.8%
EVENING ANIMATION 80.7% 76.7% 481,722 1.036 83.6% 79.5%FEATURE FILM 74.5% 62.7% 449,549 1.034 77.0% 64.8%
FORMAT VARIES 76.6% 56.0% 1,127 1.000 76.6% 56.0%GENERAL DOCUMENTARY 74.6% 63.9% 2,004,256 1.150 85.8% 73.5%
GENERAL DRAMA 75.0% 63.6% 1,949,243 1.146 86.0% 72.9%GENERAL VARIETY 73.4% 62.1% 377,859 1.028 75.5% 63.8%
INSTRUCTION, ADVICE 79.1% 67.2% 1,000,586 1.075 85.0% 72.2%NEWS 77.8% 65.4% 971,951 1.073 83.5% 70.1%
NEWS DOCUMENTARY 77.5% 63.2% 100,634 1.008 78.1% 63.7%OFFICIAL POLICE 46.6% 29.2% 1,009 1.000 46.6% 29.2%
PARTICIPATION VARIETY 75.3% 62.3% 174,900 1.013 76.3% 63.1%POPULAR MUSIC 77.0% 67.5% 458,606 1.034 79.6% 69.8%POPULAR MUSIC
STANDARD 69.0% 50.5% 2,335 1.000 69.0% 50.5%PRIVATE DETECTIVE 71.5% 71.5% 20,522 1.002 71.6% 71.7%
QUIZ GIVE AWAY 79.1% 68.7% 76,822 1.006 79.5% 69.1%QUIZ PANEL 79.8% 63.4% 1,700 1.000 79.8% 63.4%
SCIENCE FICTION 76.1% 65.3% 24,219 1.002 76.2% 65.4%SITUATION COMEDY 75.4% 61.3% 1,124,687 1.084 81.8% 66.5%
SPORTS ANTHOLOGY 83.8% 64.8% 52,166 1.004 84.1% 65.0%SPORTS COMMENTARY 79.0% 68.7% 993,734 1.075 84.9% 73.9%
SPORTS EVENT 75.0% 62.2% 204,127 1.015 76.2% 63.1%SPORTS NEWS 81.1% 68.3% 15,275 1.001 81.2% 68.4%
SUSPENSE/MYSTERY 81.3% 70.9% 342,405 1.026 83.4% 72.7%UNCLASSIFIED 77.8% 62.8% 38,060 1.003 78.0% 63.0%
WESTERN DRAMA 75.6% 63.8% 4,300 1.000 75.7% 63.9%
AVERAGE 75.4% 62.1%13,325,35
3 77.5% 63.9%
![Page 15: Predictive Analytics for OpenFDA & Other Sources](https://reader035.fdocuments.in/reader035/viewer/2022062517/5681382c550346895d9fdaf3/html5/thumbnails/15.jpg)
Simplifying the Problem Set
Single Households
Multi-Person Households
Same Gender & Same Age Class
Same Gender & Diff. Age Class
Diff. Gender & Same Age Class
Diff. Gender & Diff. Age Class
123K
21K
44K
303K
133K
500K
nothing to predict
predict age
predict gender
predict both
Age / Gender models by Drug
![Page 16: Predictive Analytics for OpenFDA & Other Sources](https://reader035.fdocuments.in/reader035/viewer/2022062517/5681382c550346895d9fdaf3/html5/thumbnails/16.jpg)
2 Stage Models
Same Gender & Diff. Age Class
Diff. Gender & Same Age Class
Diff. Gender & Diff. Age Class
predict age
predict gender
predict both
Age / Gender Models by Drug
Age / Gender Conditional Probability
1
2
Single Households
![Page 17: Predictive Analytics for OpenFDA & Other Sources](https://reader035.fdocuments.in/reader035/viewer/2022062517/5681382c550346895d9fdaf3/html5/thumbnails/17.jpg)
Age Conditional Probabilities
![Page 18: Predictive Analytics for OpenFDA & Other Sources](https://reader035.fdocuments.in/reader035/viewer/2022062517/5681382c550346895d9fdaf3/html5/thumbnails/18.jpg)
Full Bayes Model
Using all the independent variables –
Where MAX is the prediction of Age or Gender classification given all the conditional probabilities known.
NOTE: The MAX prediction for Age is constrained by ID – each ID has only 2 possible Age classes since these are known, so if model predicts an Age class outside boundaries of a ID pick next highest MAX probability for Age.