From Population to From Population to Individual Drug Dosing in Individual Drug Dosing in Chronic IllnessChronic IllnessIntelligent Control for Management of Renal Intelligent Control for Management of Renal AnemiaAnemia
Challenges in Dynamic Treatment Regimes and Multistage Decision-Making
Adam E GawedaUniversity of LouisvilleDepartment of Medicine
KidneyDiseaseProgram
June 21, 2007June 21, 2007 Challenges in Dynamic Treatment Challenges in Dynamic Treatment Regimes Regimes
and Multistage Decision-Makingand Multistage Decision-Making
OverviewOverview
Anemia managementAnemia management
Dose-response modelingDose-response modeling
Model-based control in drug dosingModel-based control in drug dosing
Model-free control in drug dosingModel-free control in drug dosing
June 21, 2007June 21, 2007 Challenges in Dynamic Treatment Challenges in Dynamic Treatment Regimes Regimes
and Multistage Decision-Makingand Multistage Decision-Making
Anemia ManagementAnemia ManagementBiological vs. clinicalBiological vs. clinical
rHuEPO
June 21, 2007June 21, 2007 Challenges in Dynamic Treatment Challenges in Dynamic Treatment Regimes Regimes
and Multistage Decision-Makingand Multistage Decision-Making
Anemia ManagementAnemia ManagementClinical guidelinesClinical guidelines
Dosing guidelines (NKF – KDOQI)Dosing guidelines (NKF – KDOQI)– Maintain Haemoglobin (Hb) between 11 and 12 Maintain Haemoglobin (Hb) between 11 and 12
g/dL g/dL ( Hematocrit (Hct) between 33 – 36 % ).( Hematocrit (Hct) between 33 – 36 % ).
– Titration of EPO: Titration of EPO: ““If the increase in Hb after EPO initiation or after a dose increase If the increase in Hb after EPO initiation or after a dose increase has been less than 1 g/dL over a 2- to 4-week period, the dose of has been less than 1 g/dL over a 2- to 4-week period, the dose of EPO should be increased by 50%.EPO should be increased by 50%.If the absolute rate of increase of Hb after EPO initiation or after If the absolute rate of increase of Hb after EPO initiation or after a dose increase exceeds 3 g/dL per month (eg, an increase from a dose increase exceeds 3 g/dL per month (eg, an increase from a Hgb 7 to 10 g/dL), or if the Hgb exceeds the target, reduce the a Hgb 7 to 10 g/dL), or if the Hgb exceeds the target, reduce the weekly dose of EPO by 25%. weekly dose of EPO by 25%. When the weekly EPO dose is being increased or decreased, a When the weekly EPO dose is being increased or decreased, a change may be made in the amount administered in a given change may be made in the amount administered in a given dose and/or the frequency of dosing.”dose and/or the frequency of dosing.”
June 21, 2007June 21, 2007 Challenges in Dynamic Treatment Challenges in Dynamic Treatment Regimes Regimes
and Multistage Decision-Makingand Multistage Decision-Making
Anemia ManagementAnemia ManagementCurrent state-of-the-artCurrent state-of-the-art
Anemia Management Protocols (AMP)Anemia Management Protocols (AMP)– Frequency of Hb observation:Frequency of Hb observation:
Every 4 weeks if Hb within the targetEvery 4 weeks if Hb within the target Every 2 weeks if Hb outside of the targetEvery 2 weeks if Hb outside of the target
– EPO dose adjustment:EPO dose adjustment: Minimum adjustment amount 10% (of current dose)Minimum adjustment amount 10% (of current dose) Maximum decrease 50% (if Hb > 15 g/dL)Maximum decrease 50% (if Hb > 15 g/dL) Maximum increase 70% (if Hb < 9 g/dL)Maximum increase 70% (if Hb < 9 g/dL)
– Problem with AMPProblem with AMP Based on average response.Based on average response. Only 1/3 of the patient population achieve the target.Only 1/3 of the patient population achieve the target.
Can we improve the outcome of anemia management Can we improve the outcome of anemia management by making it patient-specific using control theory and by making it patient-specific using control theory and machine learning techniques ?machine learning techniques ?
June 21, 2007June 21, 2007 Challenges in Dynamic Treatment Challenges in Dynamic Treatment Regimes Regimes
and Multistage Decision-Makingand Multistage Decision-Making
Dose-response modelingDose-response modelingOverviewOverview
In control system design and simulation, a good In control system design and simulation, a good process model is priceless.process model is priceless.
Models of erythropoiesis:Models of erythropoiesis:– Physiological model Physiological model
(Uehlinger et al. 1992)(Uehlinger et al. 1992)
– PK / PD modelPK / PD model(Brockm(Brockmööller et al. 1992)ller et al. 1992)
– Bayesian network model Bayesian network model (Bellazzi et al. 1993)(Bellazzi et al. 1993)
– Artificial Neural Network (ANN) models Artificial Neural Network (ANN) models (Martin Guerrero et al. 2003, Gaweda et al. 2003, Gabutti et al. 2006)(Martin Guerrero et al. 2003, Gaweda et al. 2003, Gabutti et al. 2006)
June 21, 2007June 21, 2007 Challenges in Dynamic Treatment Challenges in Dynamic Treatment Regimes Regimes
and Multistage Decision-Makingand Multistage Decision-Making
Dose-response modelingDose-response modelingPopulation vs. subpopulation modelingPopulation vs. subpopulation modeling
Model 1
Model 2
selection
Subpopulation 1e.g. responders(EPO/Hb < )
Subpopulation 2e.g. non-responders
(EPO/Hb ≥ )
dose responsedata subsets (batch) Model 1Whole population
dose
response
data set (batch)
June 21, 2007June 21, 2007 Challenges in Dynamic Treatment Challenges in Dynamic Treatment Regimes Regimes
and Multistage Decision-Makingand Multistage Decision-Making
Dose-response modelingDose-response modelingExample of response predictionExample of response prediction
June 21, 2007June 21, 2007 Challenges in Dynamic Treatment Challenges in Dynamic Treatment Regimes Regimes
and Multistage Decision-Makingand Multistage Decision-Making
Dose-response modelingDose-response modelingOpen problemsOpen problems
Prediction seems to “lag” behind the actual valuePrediction seems to “lag” behind the actual value– Do our data allow us to build a model that shows Do our data allow us to build a model that shows
the true effect of EPO on Hb ( Hct ) ?the true effect of EPO on Hb ( Hct ) ? Let’s estimate a dynamic linear model Hb(k+1) = f( Hb(k), Let’s estimate a dynamic linear model Hb(k+1) = f( Hb(k),
EPO(k) )EPO(k) )
HbHbmm(k+1) = 0.82 Hb(k) + 0.011 EPO(k) + 1.91(k+1) = 0.82 Hb(k) + 0.011 EPO(k) + 1.91 Let’s now estimate a model of Let’s now estimate a model of ΔΔHb(k+1) = f( EPO(k) )Hb(k+1) = f( EPO(k) )
ΔΔHbHbmm(k+1) = 0.015 EPO(k) - 0.23(k+1) = 0.015 EPO(k) - 0.23
Both models achieve comparable accuracy.Both models achieve comparable accuracy.
The second model “explains” the dose effect better.The second model “explains” the dose effect better.
June 21, 2007June 21, 2007 Challenges in Dynamic Treatment Challenges in Dynamic Treatment Regimes Regimes
and Multistage Decision-Makingand Multistage Decision-Making
Dose-response modelingDose-response modelingOpen problemsOpen problems
Our data come from clinical treatment (“closed-loop Our data come from clinical treatment (“closed-loop system”)system”)– How does that affect the model ?How does that affect the model ?
-5 0 50
100
200
300
400
500
600
Hb (g/dL)
count
output distribution-4 -2 0 2 40
0.5
1
1.5
2
2.5
3
3.5
Hb (g/dL)
error
absolute prediction error vs. output
Martin Guerrero et al. report the same phenomenon.
June 21, 2007June 21, 2007 Challenges in Dynamic Treatment Challenges in Dynamic Treatment Regimes Regimes
and Multistage Decision-Makingand Multistage Decision-Making
Model-based controlModel-based controlModel Predictive Control (MPC)Model Predictive Control (MPC)
Rationale for using Model Predictive ControlRationale for using Model Predictive Control– There is a delay between EPO administration and Hb There is a delay between EPO administration and Hb
responseresponse(about 17 days – from EPO manufacturer information).(about 17 days – from EPO manufacturer information).
– The relationship between EPO dose and Hb increase is The relationship between EPO dose and Hb increase is nonlinear nonlinear (monotonically increasing with saturation – Uehlinger et al. (monotonically increasing with saturation – Uehlinger et al. 1992).1992).
– The effect of EPO continues throughout the lifetime of The effect of EPO continues throughout the lifetime of red blood cells red blood cells (up to 120 days).(up to 120 days).
– We plan to include constraints on EPO dose (in the We plan to include constraints on EPO dose (in the future)future)(such as minimization of the total dose or minimization of dose changes).(such as minimization of the total dose or minimization of dose changes).
June 21, 2007June 21, 2007 Challenges in Dynamic Treatment Challenges in Dynamic Treatment Regimes Regimes
and Multistage Decision-Makingand Multistage Decision-Making
Model-based controlModel-based controlMPC - Schematic diagramMPC - Schematic diagram
MODEL(population)
Hb(k+1) = Hb(k) + FNN(EPO(k),EPO(k-1),EPO(k-2))
PATIENT
CONTROLLER
HbHbmm
HbHb
EPO*EPO*
EPOEPO
EPO(3)EPO(2)EPO(1)
iHb11.5minargEPO*3
1i
2m
EPO
June 21, 2007June 21, 2007 Challenges in Dynamic Treatment Challenges in Dynamic Treatment Regimes Regimes
and Multistage Decision-Makingand Multistage Decision-Making
Model-based controlModel-based controlMPC Clinical trial - setupMPC Clinical trial - setup
Trial population:Trial population:– 60 patients:60 patients:
30 controls (dosed by physicians) / 30 treatment (dosed by 30 controls (dosed by physicians) / 30 treatment (dosed by MPC)MPC)
45 African-American / 15 Caucasian45 African-American / 15 Caucasian 35 males / 25 females35 males / 25 females Average age 58, min 21, max 84Average age 58, min 21, max 84
Trial length:Trial length:– 8 months8 months
2 months “wash-out” period / 6 months for outcome analysis 2 months “wash-out” period / 6 months for outcome analysis
Treatment goal:Treatment goal:– maintain Hb at 11.5 g/dLmaintain Hb at 11.5 g/dL– performance measure: mean absolute deviation from 11.5performance measure: mean absolute deviation from 11.5
June 21, 2007June 21, 2007 Challenges in Dynamic Treatment Challenges in Dynamic Treatment Regimes Regimes
and Multistage Decision-Makingand Multistage Decision-Making
Model-based controlModel-based controlMPC - Clinical trial results (thus far)MPC - Clinical trial results (thus far)
1.08 .87 1.16 .95
1.12 1.07 1.34 1.10
.95 .79 1.07 .87
.75 .81 .74 .73
.96 .77 .96 1.26
.98 .71 .83 .60
1.17 1.18 .87 .78
1.00
2.00
3.00
4.00
5.00
6.00
7.00
Mean Std Deviation
Control
Mean Std Deviation
TreatmentMean |11.5-Hb|
Month
June 21, 2007June 21, 2007 Challenges in Dynamic Treatment Challenges in Dynamic Treatment Regimes Regimes
and Multistage Decision-Makingand Multistage Decision-Making
Model-based controlModel-based controlOpen problemsOpen problems
Simulating MPCSimulating MPC– How do we accurately represent the mismatch How do we accurately represent the mismatch
between the model and the patient ?between the model and the patient ?– How do we effectively simulate adverse events ? How do we effectively simulate adverse events ?
Measuring successMeasuring success– We try to individualize the treatment yet we use a mean We try to individualize the treatment yet we use a mean
performance measure – what are the alternatives ? performance measure – what are the alternatives ? Individual performance measures Individual performance measures
(e.g. within-subject StDev(e.g. within-subject StDev of Hb ) ????of Hb ) ????
– How do we eliminate influence of Hb changes due to How do we eliminate influence of Hb changes due to adverse events on the performance measure ?adverse events on the performance measure ?
June 21, 2007June 21, 2007 Challenges in Dynamic Treatment Challenges in Dynamic Treatment Regimes Regimes
and Multistage Decision-Makingand Multistage Decision-Making
Model-free controlModel-free controlReinforcement LearningReinforcement Learning
Drug administration in chronic conditions is a Drug administration in chronic conditions is a trial-trial-and-errorand-error control process that resembles control process that resembles reinforcement learning reinforcement learning
disease symptoms – initial state (sdisease symptoms – initial state (s00) )
(standard) initial dose – action (a(standard) initial dose – action (a00))
k = 1k = 1
Repeat (infinitely)Repeat (infinitely)
evaluate patient (remission/progression/side effects) – new state (sevaluate patient (remission/progression/side effects) – new state (s kk), reward (r), reward (rkk))
adjust dosing strategy – update state-action table/function (Qadjust dosing strategy – update state-action table/function (Qkk), extract policy (), extract policy (kk))
administer new dose – action (aadminister new dose – action (akk))
k = k + 1k = k + 1
EndEnd
June 21, 2007June 21, 2007 Challenges in Dynamic Treatment Challenges in Dynamic Treatment Regimes Regimes
and Multistage Decision-Makingand Multistage Decision-Making
Model-free controlModel-free controlQ-Learning simulation - Schematic Q-Learning simulation - Schematic diagramdiagram
a,sQmaxarg 1k1ka
1k π
Q-LEARNING AGENT
PATIENT SIMULATOR(subpopulation model)
Hb(k+1) = F( Hb(k), EPO(k), IRON(k) )
POLICY ()Ri: IF Hb = Hbi THEN EPO = EPOi
EPOEPO(a)(a)
IRONIRON(disturbance)(disturbance)
HbHb(s)(s)
kkk1kka
1kkkkkk1k a,sQa,sQmaxra,sQa,sQ
June 21, 2007June 21, 2007 Challenges in Dynamic Treatment Challenges in Dynamic Treatment Regimes Regimes
and Multistage Decision-Makingand Multistage Decision-Making
Model-free controlModel-free controlReward functionReward function
otherwise,0
11.51kHb,1
11.51kHbkHb
11.51kHbkHb
,0.5
11.5kHb1kHb
1kHbkHb11.5
,1
r 1k
11.5
11.5
11.5
11.5
11.5
June 21, 2007June 21, 2007 Challenges in Dynamic Treatment Challenges in Dynamic Treatment Regimes Regimes
and Multistage Decision-Makingand Multistage Decision-Making
Model-free controlModel-free controlQ-table updateQ-table update
Dose-response relationship (EPO to Dose-response relationship (EPO to ΔΔ Hb) is Hb) is monotonically increasing with saturation (monotonically increasing with saturation (Uehlinger et al. Uehlinger et al.
19921992).). Let’s update multiple entries in the Q-table at a time :Let’s update multiple entries in the Q-table at a time :
– IfIf Hb(k) < 11.5 Hb(k) < 11.5 andand Hb(k+1) Hb(k+1) Hb(k) Hb(k) oror Hb(k) = 11.5 Hb(k) = 11.5 andand Hb(k+1) < Hb(k) Hb(k+1) < Hb(k)thenthen updateupdate Q( s, a ) Q( s, a ) for allfor all s s Hb(k) Hb(k) and alland all a a EPO(k) EPO(k)
– IfIf Hb(k) > 11.5 Hb(k) > 11.5 andand Hb(k+1) Hb(k+1) ≥≥ Hb(k) Hb(k) oror Hb(k) = 11.5 Hb(k) = 11.5 andand Hb(k+1) > Hb(k) Hb(k+1) > Hb(k)thenthen updateupdate Q( s, a ) Q( s, a ) for allfor all s s ≥≥ Hb(k) Hb(k) and alland all a a ≥≥ EPO(k) EPO(k)
June 21, 2007June 21, 2007 Challenges in Dynamic Treatment Challenges in Dynamic Treatment Regimes Regimes
and Multistage Decision-Makingand Multistage Decision-Making
Model-free controlModel-free controlQ-Learning - Simulated clinical trialQ-Learning - Simulated clinical trial
Trial population:Trial population:– 200 individuals with various degrees of response to EPO200 individuals with various degrees of response to EPO– 100 distinct responders / 100 distinct non-responders100 distinct responders / 100 distinct non-responders– In the first run, all individuals dosed by AMPIn the first run, all individuals dosed by AMP– In the second run, all individuals dosed by policy updatedIn the second run, all individuals dosed by policy updated
on-line by Q-learningon-line by Q-learning Trial length:Trial length:
– 24 months24 months Treatment goal:Treatment goal:
– drive Hb to, and maintain at 11.5 g/dLdrive Hb to, and maintain at 11.5 g/dL– performance measure: mean absolute deviation from performance measure: mean absolute deviation from
11.511.5
June 21, 2007June 21, 2007 Challenges in Dynamic Treatment Challenges in Dynamic Treatment Regimes Regimes
and Multistage Decision-Makingand Multistage Decision-Making
Model-free controlModel-free controlQ-Learning - Simulation resultsQ-Learning - Simulation results
.89 .69 1.04 .85
.89 .52 .70 .33
1.29 .35 .58 .29
1.08 .33 .51 .30
.59 .26 .40 .26
.17 .14 .35 .25
.36 .22 .32 .23
.54 .25 .29 .20
.59 .30 .25 .19
.55 .35 .21 .17
.43 .32 .19 .15
.30 .26 .17 .14
.22 .18 .16 .13
.23 .18 .14 .11
.26 .19 .13 .12
.27 .19 .13 .11
.28 .20 .13 .12
.26 .19 .13 .10
.25 .19 .11 .10
.24 .17 .12 .10
.23 .17 .13 .10
.22 .17 .11 .10
.20 .17 .12 .10
7.00
8.00
9.00
10.00
11.00
12.00
13.00
14.00
15.00
16.00
17.00
18.00
19.00
20.00
21.00
22.00
23.00
24.00
25.00
26.00
27.00
28.00
29.00
Mean Std Deviation
amp
Mean Std Deviation
q learning
Mean |11.5-Hb|
Month
June 21, 2007June 21, 2007 Challenges in Dynamic Treatment Challenges in Dynamic Treatment Regimes Regimes
and Multistage Decision-Makingand Multistage Decision-Making
ConclusionsConclusionsand open problemsand open problems
We believe that we are on a good path to successfully We believe that we are on a good path to successfully individualize anemia management using presented individualize anemia management using presented techniques.techniques.
However, we need to address the following:However, we need to address the following:
How do we produce reliable dose-response models that How do we produce reliable dose-response models that perform well on under-represented data instances ?perform well on under-represented data instances ?
What performance measure do we need to use in order What performance measure do we need to use in order to adequately evaluate the success of an individualized to adequately evaluate the success of an individualized treatment ?treatment ?
June 21, 2007June 21, 2007 Challenges in Dynamic Treatment Challenges in Dynamic Treatment Regimes Regimes
and Multistage Decision-Makingand Multistage Decision-Making
AcknowledgmentsAcknowledgments
UofL Division of UofL Division of NephrologyNephrology– George R AronoffGeorge R Aronoff– Michael E BrierMichael E Brier– Alfred A JacobsAlfred A Jacobs
UofL Dept Electrical and UofL Dept Electrical and Computer EngineeringComputer Engineering– Mehmet K MuezzinogluMehmet K Muezzinoglu– Jacek M ZuradaJacek M Zurada
Michael E Brier has been sponsored by Department of Veterans Affairs Merit Review Grant.Adam E Gaweda is sponsored by NIDDK (1K25DK072085-01A2).
Top Related