The Experts' Perspective of 'Ask-an-Expert': An Interview ...
MODELING THE EXPERT · Experts are Human 15.071x –Modeling the Expert: An Introduction to...
Transcript of MODELING THE EXPERT · Experts are Human 15.071x –Modeling the Expert: An Introduction to...
![Page 1: MODELING THE EXPERT · Experts are Human 15.071x –Modeling the Expert: An Introduction to Logistic Regression 2 • Experts are limited by memory and time • Healthcare Quality](https://reader036.fdocuments.in/reader036/viewer/2022071020/5fd4d7f06e9a1d37e6146ba4/html5/thumbnails/1.jpg)
MODELING THE EXPERT An Introduction to Logistic Regression
15.071 – The Analytics Edge
![Page 2: MODELING THE EXPERT · Experts are Human 15.071x –Modeling the Expert: An Introduction to Logistic Regression 2 • Experts are limited by memory and time • Healthcare Quality](https://reader036.fdocuments.in/reader036/viewer/2022071020/5fd4d7f06e9a1d37e6146ba4/html5/thumbnails/2.jpg)
Ask the Experts!
15.071x –Modeling the Expert: An Introduction to Logistic Regression 1
• Critical decisions are often made by people with expert knowledge
• Healthcare Quality Assessment • Good quality care educates patients and controls costs • Need to assess quality for proper medical interventions
• No single set of guidelines for defining quality of healthcare
• Health professionals are experts in quality of care assessment
![Page 3: MODELING THE EXPERT · Experts are Human 15.071x –Modeling the Expert: An Introduction to Logistic Regression 2 • Experts are limited by memory and time • Healthcare Quality](https://reader036.fdocuments.in/reader036/viewer/2022071020/5fd4d7f06e9a1d37e6146ba4/html5/thumbnails/3.jpg)
Experts are Human
15.071x –Modeling the Expert: An Introduction to Logistic Regression 2
• Experts are limited by memory and time
• Healthcare Quality Assessment • Expert physicians can evaluate quality by examining a
patient’s records
• This process is time consuming and inefficient • Physicians cannot assess quality for millions of patients
![Page 4: MODELING THE EXPERT · Experts are Human 15.071x –Modeling the Expert: An Introduction to Logistic Regression 2 • Experts are limited by memory and time • Healthcare Quality](https://reader036.fdocuments.in/reader036/viewer/2022071020/5fd4d7f06e9a1d37e6146ba4/html5/thumbnails/4.jpg)
Replicating Expert Assessment
15.071x –Modeling the Expert: An Introduction to Logistic Regression 3
• Can we develop analytical tools that replicate expert assessment on a large scale?
• Learn from expert human judgment • Develop a model, interpret results, and adjust the model
• Make predictions/evaluations on a large scale
• Healthcare Quality Assessment • Let’s identify poor healthcare quality using analytics
![Page 5: MODELING THE EXPERT · Experts are Human 15.071x –Modeling the Expert: An Introduction to Logistic Regression 2 • Experts are limited by memory and time • Healthcare Quality](https://reader036.fdocuments.in/reader036/viewer/2022071020/5fd4d7f06e9a1d37e6146ba4/html5/thumbnails/5.jpg)
Claims Data
• Electronically available
• Standardized
• Not 100% accurate
• Under-reporting is common
• Claims for hospital visits can be vague
15.071x –Modeling the Expert: An Introduction to Logistic Regression 1
Medical Claims Diagnosis, Procedures,
Doctor/Hospital, Cost
Pharmacy Claims Drug, Quantity, Doctor,
Medication Cost
![Page 6: MODELING THE EXPERT · Experts are Human 15.071x –Modeling the Expert: An Introduction to Logistic Regression 2 • Experts are limited by memory and time • Healthcare Quality](https://reader036.fdocuments.in/reader036/viewer/2022071020/5fd4d7f06e9a1d37e6146ba4/html5/thumbnails/6.jpg)
Creating the Dataset – Claims Samples
• Large health insurance claims database
• Randomly selected 131 diabetes patients
• Ages range from 35 to 55 • Costs $10,000 – $20,000 • September 1, 2003 – August
31, 2005
15.071x –Modeling the Expert: An Introduction to Logistic Regression 2
Claims Sample
![Page 7: MODELING THE EXPERT · Experts are Human 15.071x –Modeling the Expert: An Introduction to Logistic Regression 2 • Experts are limited by memory and time • Healthcare Quality](https://reader036.fdocuments.in/reader036/viewer/2022071020/5fd4d7f06e9a1d37e6146ba4/html5/thumbnails/7.jpg)
Creating the Dataset – Expert Review
• Expert physician reviewed claims and wrote descriptive notes:
15.071x –Modeling the Expert: An Introduction to Logistic Regression 3
Claims Sample
Expert Review “Ongoing use of narcotics”
“Only on Avandia, not a good first choice drug”
“Had regular visits, mammogram, and immunizations”
“Was given home testing supplies”
![Page 8: MODELING THE EXPERT · Experts are Human 15.071x –Modeling the Expert: An Introduction to Logistic Regression 2 • Experts are limited by memory and time • Healthcare Quality](https://reader036.fdocuments.in/reader036/viewer/2022071020/5fd4d7f06e9a1d37e6146ba4/html5/thumbnails/8.jpg)
Creating the Dataset – Expert Assessment
• Rated quality on a two-point scale (poor/good)
15.071x –Modeling the Expert: An Introduction to Logistic Regression 4
Claims Sample
Expert Review
Expert Assessment
“I’d say care was poor – poorly treated diabetes”
“No eye care, but overall I’d say high quality”
![Page 9: MODELING THE EXPERT · Experts are Human 15.071x –Modeling the Expert: An Introduction to Logistic Regression 2 • Experts are limited by memory and time • Healthcare Quality](https://reader036.fdocuments.in/reader036/viewer/2022071020/5fd4d7f06e9a1d37e6146ba4/html5/thumbnails/9.jpg)
Creating the Dataset – Variable Extraction
• Dependent Variable • Quality of care
• Independent Variables • ongoing use of narcotics • only on Avandia, not a good first
choice drug
• Had regular visits, mammogram, and immunizations
• Was given home testing supplies
15.071x –Modeling the Expert: An Introduction to Logistic Regression 5
Claims Sample
Expert Review
Expert Assessment
Variable Extraction
![Page 10: MODELING THE EXPERT · Experts are Human 15.071x –Modeling the Expert: An Introduction to Logistic Regression 2 • Experts are limited by memory and time • Healthcare Quality](https://reader036.fdocuments.in/reader036/viewer/2022071020/5fd4d7f06e9a1d37e6146ba4/html5/thumbnails/10.jpg)
Creating the Dataset – Variable Extraction
• Dependent Variable • Quality of care
• Independent Variables • Diabetes treatment • Patient demographics
• Healthcare utilization
• Providers • Claims
• Prescriptions
15.071x –Modeling the Expert: An Introduction to Logistic Regression 6
Claims Sample
Expert Review
Expert Assessment
Variable Extraction
![Page 11: MODELING THE EXPERT · Experts are Human 15.071x –Modeling the Expert: An Introduction to Logistic Regression 2 • Experts are limited by memory and time • Healthcare Quality](https://reader036.fdocuments.in/reader036/viewer/2022071020/5fd4d7f06e9a1d37e6146ba4/html5/thumbnails/11.jpg)
Predicting Quality of Care
15.071x –Modeling the Expert: An Introduction to Logistic Regression 7
• The dependent variable is modeled as a binary variable • 1 if low-quality care, 0 if high-quality care
• This is a categorical variable • A small number of possible outcomes
• Linear regression would predict a continuous outcome
• How can we extend the idea of linear regression to situations where the outcome variable is categorical? • Only want to predict 1 or 0 • Could round outcome to 0 or 1 • But we can do better with logistic regression
![Page 12: MODELING THE EXPERT · Experts are Human 15.071x –Modeling the Expert: An Introduction to Logistic Regression 2 • Experts are limited by memory and time • Healthcare Quality](https://reader036.fdocuments.in/reader036/viewer/2022071020/5fd4d7f06e9a1d37e6146ba4/html5/thumbnails/12.jpg)
Logistic Regression
15.071x –Modeling the Expert: An Introduction to Logistic Regression 1
• Predicts the probability of poor care • Denote dependent variable “PoorCare” by • h
• Then ) • Independent variables • Uses the Logistic Response Function
• Nonlinear transformation of linear regression equation to produce number between 0 and 1
P (y = 1) =1
1 + e�(�0+�1x1+�2x2+...+�kxk)
Formulas - Logistic Regression
p1, p2, . . . , pn
g1, g2, . . . , gm
⇧
⇧ (pi)
pi
⇧ (gj)
gj
⇧ (p1) , . . . ,⇧ (pn)
⇧ (g1) , . . . ,⇧ (gm)
L (⇧) = ⇧ (p1) . . .⇧ (pn) [1� ⇧ (g1)] . . . [1� ⇧ (gm)]
⇧ (p1) = 0.9,⇧ (p2) = 0.8,⇧ (q1) = 0.5
⇧ (p1) = ⇧ (p2) = ⇧ (q1) = 0.5
x1, x2, . . . , xk
⇧ (x1, x2, . . . , xk) = logistic (�0 + �1x1 + . . .+ �kxk)
logistic (r) =
1
1 + exp (�r)
�0, �1, . . . , �k
L (⇧)
⇧ (Visits,Narcotics) = logistic (�2.4 + 0.06⇥ Visits + 0.08⇥ Narcotics)
1
P (y = 0) = 1� P (y = 1)
P (y = 1)
y
![Page 13: MODELING THE EXPERT · Experts are Human 15.071x –Modeling the Expert: An Introduction to Logistic Regression 2 • Experts are limited by memory and time • Healthcare Quality](https://reader036.fdocuments.in/reader036/viewer/2022071020/5fd4d7f06e9a1d37e6146ba4/html5/thumbnails/13.jpg)
Understanding the Logistic Function
• Positive values are predictive of class 1
• Negative values are predictive of class 0
-4 -2 0 2 4
0.0
0.2
0.4
0.6
0.8
1.0
P(y
= 1
)
15.071x –Modeling the Expert: An Introduction to Logistic Regression 2
P (y = 1) =1
1 + e�(�0+�1x1+�2x2+...+�kxk)
![Page 14: MODELING THE EXPERT · Experts are Human 15.071x –Modeling the Expert: An Introduction to Logistic Regression 2 • Experts are limited by memory and time • Healthcare Quality](https://reader036.fdocuments.in/reader036/viewer/2022071020/5fd4d7f06e9a1d37e6146ba4/html5/thumbnails/14.jpg)
Understanding the Logistic Function
15.071x –Modeling the Expert: An Introduction to Logistic Regression 3
• The coefficients are selected to • Predict a high probability for the poor care cases • Predict a low probability for the good care cases
P (y = 1) =1
1 + e�(�0+�1x1+�2x2+...+�kxk)
![Page 15: MODELING THE EXPERT · Experts are Human 15.071x –Modeling the Expert: An Introduction to Logistic Regression 2 • Experts are limited by memory and time • Healthcare Quality](https://reader036.fdocuments.in/reader036/viewer/2022071020/5fd4d7f06e9a1d37e6146ba4/html5/thumbnails/15.jpg)
Understanding the Logistic Function
15.071x –Modeling the Expert: An Introduction to Logistic Regression 4
• We can instead talk about Odds (like in gambling)
• Odds > 1 if y = 1 is more likely • Odds < 1 if y = 0 is more likely
P (y = 1) =1
1 + e�(�0+�1x1+�2x2+...+�kxk)
Odds =P (y = 1)
P (y = 0)
![Page 16: MODELING THE EXPERT · Experts are Human 15.071x –Modeling the Expert: An Introduction to Logistic Regression 2 • Experts are limited by memory and time • Healthcare Quality](https://reader036.fdocuments.in/reader036/viewer/2022071020/5fd4d7f06e9a1d37e6146ba4/html5/thumbnails/16.jpg)
The Logit
15.071x –Modeling the Expert: An Introduction to Logistic Regression 5
• It turns out that
• This is called the “Logit” and looks like linear regression
• The bigger the Logit is, the bigger
log(Odds) = �0 + �1x1 + �2x2 + . . .+ �kxk
P (y = 1)
![Page 17: MODELING THE EXPERT · Experts are Human 15.071x –Modeling the Expert: An Introduction to Logistic Regression 2 • Experts are limited by memory and time • Healthcare Quality](https://reader036.fdocuments.in/reader036/viewer/2022071020/5fd4d7f06e9a1d37e6146ba4/html5/thumbnails/17.jpg)
Model for Healthcare Quality
15.071x –Modeling the Expert: An Introduction to Logistic Regression 1
• Plot of the independent variables • Number of Office
Visits • Number of Narcotics
Prescribed
• Red are poor care • Green are good care
![Page 18: MODELING THE EXPERT · Experts are Human 15.071x –Modeling the Expert: An Introduction to Logistic Regression 2 • Experts are limited by memory and time • Healthcare Quality](https://reader036.fdocuments.in/reader036/viewer/2022071020/5fd4d7f06e9a1d37e6146ba4/html5/thumbnails/18.jpg)
Threshold Value
15.071x –Modeling the Expert: An Introduction to Logistic Regression 1
• The outcome of a logistic regression model is a probability
• Often, we want to make a binary prediction • Did this patient receive poor care or good care?
• We can do this using a threshold value t
• If P(PoorCare = 1) ≥ t, predict poor quality • If P(PoorCare = 1) < t, predict good quality
• What value should we pick for t?
![Page 19: MODELING THE EXPERT · Experts are Human 15.071x –Modeling the Expert: An Introduction to Logistic Regression 2 • Experts are limited by memory and time • Healthcare Quality](https://reader036.fdocuments.in/reader036/viewer/2022071020/5fd4d7f06e9a1d37e6146ba4/html5/thumbnails/19.jpg)
Threshold Value
15.071x –Modeling the Expert: An Introduction to Logistic Regression 2
• Often selected based on which errors are “better”
• If t is large, predict poor care rarely (when P(y=1) is large)
• More errors where we say good care, but it is actually poor care • Detects patients who are receiving the worst care
• If t is small, predict good care rarely (when P(y=1) is small)
• More errors where we say poor care, but it is actually good care • Detects all patients who might be receiving poor care
• With no preference between the errors, select t = 0.5 • Predicts the more likely outcome
![Page 20: MODELING THE EXPERT · Experts are Human 15.071x –Modeling the Expert: An Introduction to Logistic Regression 2 • Experts are limited by memory and time • Healthcare Quality](https://reader036.fdocuments.in/reader036/viewer/2022071020/5fd4d7f06e9a1d37e6146ba4/html5/thumbnails/20.jpg)
Selecting a Threshold Value
15.071x –Modeling the Expert: An Introduction to Logistic Regression 3
Compare actual outcomes to predicted outcomes using a confusion matrix (classification matrix)
Predicted = 0 Predicted = 1
Actual = 0 True Negatives (TN) False Positives (FP)
Actual = 1 False Negatives (FN) True Positives (TP)
![Page 21: MODELING THE EXPERT · Experts are Human 15.071x –Modeling the Expert: An Introduction to Logistic Regression 2 • Experts are limited by memory and time • Healthcare Quality](https://reader036.fdocuments.in/reader036/viewer/2022071020/5fd4d7f06e9a1d37e6146ba4/html5/thumbnails/21.jpg)
Receiver Operator Characteristic (ROC) Curve
• True positive rate (sensitivity) on y-axis • Proportion of poor
care caught
• False positive rate (1-specificity) on x-axis
• Proportion of good care labeled as poor care
15.071x –Modeling the Expert: An Introduction to Logistic Regression 1
Receiver Operator Characteristic Curve
False positive rate
True
pos
itive
rate
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
![Page 22: MODELING THE EXPERT · Experts are Human 15.071x –Modeling the Expert: An Introduction to Logistic Regression 2 • Experts are limited by memory and time • Healthcare Quality](https://reader036.fdocuments.in/reader036/viewer/2022071020/5fd4d7f06e9a1d37e6146ba4/html5/thumbnails/22.jpg)
Selecting a Threshold using ROC
• Captures all thresholds simultaneously
• High threshold • High specificity • Low sensitivity
• Low Threshold • Low specificity • High sensitivity
15.071x –Modeling the Expert: An Introduction to Logistic Regression 2
Receiver Operator Characteristic Curve
False positive rate
True
pos
itive
rate
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
![Page 23: MODELING THE EXPERT · Experts are Human 15.071x –Modeling the Expert: An Introduction to Logistic Regression 2 • Experts are limited by memory and time • Healthcare Quality](https://reader036.fdocuments.in/reader036/viewer/2022071020/5fd4d7f06e9a1d37e6146ba4/html5/thumbnails/23.jpg)
Selecting a Threshold using ROC
• Choose best threshold for best trade off • cost of failing to
detect positives • costs of raising false
alarms
15.071x –Modeling the Expert: An Introduction to Logistic Regression 3
Receiver Operator Characteristic Curve
False positive rate
True
pos
itive
rate
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
![Page 24: MODELING THE EXPERT · Experts are Human 15.071x –Modeling the Expert: An Introduction to Logistic Regression 2 • Experts are limited by memory and time • Healthcare Quality](https://reader036.fdocuments.in/reader036/viewer/2022071020/5fd4d7f06e9a1d37e6146ba4/html5/thumbnails/24.jpg)
Selecting a Threshold using ROC
• Choose best threshold for best trade off • cost of failing to
detect positives • costs of raising false
alarms
15.071x –Modeling the Expert: An Introduction to Logistic Regression 4
Receiver Operator Characteristic Curve
False positive rate
True
pos
itive
rate
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
0.07
0.25
0.44
0.63
0.81
1
![Page 25: MODELING THE EXPERT · Experts are Human 15.071x –Modeling the Expert: An Introduction to Logistic Regression 2 • Experts are limited by memory and time • Healthcare Quality](https://reader036.fdocuments.in/reader036/viewer/2022071020/5fd4d7f06e9a1d37e6146ba4/html5/thumbnails/25.jpg)
Selecting a Threshold using ROC
• Choose best threshold for best trade off • cost of failing to
detect positives • costs of raising false
alarms
15.071x –Modeling the Expert: An Introduction to Logistic Regression 5
Receiver Operator Characteristic Curve
False positive rate
True
pos
itive
rate
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
0.07
0.25
0.44
0.63
0.81
1
00.1
0.2
0.30.40.50.6
0.7
0.80.9
1
![Page 26: MODELING THE EXPERT · Experts are Human 15.071x –Modeling the Expert: An Introduction to Logistic Regression 2 • Experts are limited by memory and time • Healthcare Quality](https://reader036.fdocuments.in/reader036/viewer/2022071020/5fd4d7f06e9a1d37e6146ba4/html5/thumbnails/26.jpg)
Interpreting the Model
15.071x –Modeling the Expert: An Introduction to Logistic Regression 1
• Multicollinearity could be a problem • Do the coefficients make sense? • Check correlations
• Measures of accuracy
![Page 27: MODELING THE EXPERT · Experts are Human 15.071x –Modeling the Expert: An Introduction to Logistic Regression 2 • Experts are limited by memory and time • Healthcare Quality](https://reader036.fdocuments.in/reader036/viewer/2022071020/5fd4d7f06e9a1d37e6146ba4/html5/thumbnails/27.jpg)
Compute Outcome Measures
15.071x –Modeling the Expert: An Introduction to Logistic Regression 2
Confusion Matrix:
N = number of observations Overall accuracy = ( TN + TP )/N Overall error rate = ( FP + FN)/N Sensitivity = TP/( TP + FN) False Negative Error Rate = FN/( TP + FN) Specificity = TN/( TN + FP) False Positive Error Rate = FP/( TN + FP)
Predicted Class = 0 Predicted Class = 1
Actual Class = 0 True Negatives (TN) False Positives (FP)
Actual Class = 1 False Negatives (FN) True Positives (TP)
![Page 28: MODELING THE EXPERT · Experts are Human 15.071x –Modeling the Expert: An Introduction to Logistic Regression 2 • Experts are limited by memory and time • Healthcare Quality](https://reader036.fdocuments.in/reader036/viewer/2022071020/5fd4d7f06e9a1d37e6146ba4/html5/thumbnails/28.jpg)
Making Predictions
15.071x –Modeling the Expert: An Introduction to Logistic Regression 3
• Just like in linear regression, we want to make predictions on a test set to compute out-of-sample metrics
> predictTest = predict(QualityLog, type=“response”, newdata=qualityTest)
• This makes predictions for probabilities
• If we use a threshold value of 0.3, we get the following confusion matrix
Predicted Good Care Predicted Poor Care
Actually Good Care 19 5
Actually Poor Care 2 6
![Page 29: MODELING THE EXPERT · Experts are Human 15.071x –Modeling the Expert: An Introduction to Logistic Regression 2 • Experts are limited by memory and time • Healthcare Quality](https://reader036.fdocuments.in/reader036/viewer/2022071020/5fd4d7f06e9a1d37e6146ba4/html5/thumbnails/29.jpg)
Area Under the ROC Curve (AUC)
15.071x –Modeling the Expert: An Introduction to Logistic Regression 4
• Just take the area under the curve
• Interpretation • Given a random
positive and negative, proportion of the time you guess which is which correctly
• Less affected by sample balance than accuracy
AUC = 0.775
Receiver Operator Characteristic Curve
False positive rate
True
pos
itive
rate
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
![Page 30: MODELING THE EXPERT · Experts are Human 15.071x –Modeling the Expert: An Introduction to Logistic Regression 2 • Experts are limited by memory and time • Healthcare Quality](https://reader036.fdocuments.in/reader036/viewer/2022071020/5fd4d7f06e9a1d37e6146ba4/html5/thumbnails/30.jpg)
Area Under the ROC Curve (AUC)
15.071x –Modeling the Expert: An Introduction to Logistic Regression 5
• What is a good AUC? • Maximum of 1 (perfect prediction)
Receiver Operator Characteristic Curve
False positive rate
True
pos
itive
rate
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
![Page 31: MODELING THE EXPERT · Experts are Human 15.071x –Modeling the Expert: An Introduction to Logistic Regression 2 • Experts are limited by memory and time • Healthcare Quality](https://reader036.fdocuments.in/reader036/viewer/2022071020/5fd4d7f06e9a1d37e6146ba4/html5/thumbnails/31.jpg)
Area Under the ROC Curve (AUC)
15.071x –Modeling the Expert: An Introduction to Logistic Regression 6
• What is a good AUC? • Maximum of 1 (perfect prediction)
• Minimum of 0.5 (just guessing)
Receiver Operator Characteristic Curve
False positive rate
True
pos
itive
rate
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
![Page 32: MODELING THE EXPERT · Experts are Human 15.071x –Modeling the Expert: An Introduction to Logistic Regression 2 • Experts are limited by memory and time • Healthcare Quality](https://reader036.fdocuments.in/reader036/viewer/2022071020/5fd4d7f06e9a1d37e6146ba4/html5/thumbnails/32.jpg)
Conclusions
15.071x –Modeling the Expert: An Introduction to Logistic Regression 1
• An expert-trained model can accurately identify diabetics receiving low-quality care • Out-of-sample accuracy of 78% • Identifies most patients receiving poor care
• In practice, the probabilities returned by the logistic regression model can be used to prioritize patients for intervention
• Electronic medical records could be used in the future
![Page 33: MODELING THE EXPERT · Experts are Human 15.071x –Modeling the Expert: An Introduction to Logistic Regression 2 • Experts are limited by memory and time • Healthcare Quality](https://reader036.fdocuments.in/reader036/viewer/2022071020/5fd4d7f06e9a1d37e6146ba4/html5/thumbnails/33.jpg)
The Competitive Edge of Models
15.071x –Modeling the Expert: An Introduction to Logistic Regression 2
• While humans can accurately analyze small amounts of information, models allow larger scalability
• Models do not replace expert judgment • Experts can improve and refine the model
• Models can integrate assessments of many experts into one final unbiased and unemotional prediction