Download - From Population to Individual Drug Dosing in Chronic Illness Intelligent Control for Management of Renal Anemia Challenges in Dynamic Treatment Regimes.

From Population to From Population to Individual Drug Dosing in Individual Drug Dosing in Chronic IllnessChronic IllnessIntelligent Control for Management of Renal Intelligent Control for Management of Renal AnemiaAnemia

Challenges in Dynamic Treatment Regimes and Multistage Decision-Making

Adam E GawedaUniversity of LouisvilleDepartment of Medicine

KidneyDiseaseProgram

June 21, 2007June 21, 2007 Challenges in Dynamic Treatment Challenges in Dynamic Treatment Regimes Regimes

and Multistage Decision-Makingand Multistage Decision-Making

OverviewOverview

Anemia managementAnemia management

Dose-response modelingDose-response modeling

Model-based control in drug dosingModel-based control in drug dosing

Model-free control in drug dosingModel-free control in drug dosing



Anemia ManagementAnemia ManagementBiological vs. clinicalBiological vs. clinical

rHuEPO



Anemia ManagementAnemia ManagementClinical guidelinesClinical guidelines

Dosing guidelines (NKF – KDOQI)Dosing guidelines (NKF – KDOQI)– Maintain Haemoglobin (Hb) between 11 and 12 Maintain Haemoglobin (Hb) between 11 and 12

g/dL g/dL ( Hematocrit (Hct) between 33 – 36 % ).( Hematocrit (Hct) between 33 – 36 % ).

– Titration of EPO: Titration of EPO: ““If the increase in Hb after EPO initiation or after a dose increase If the increase in Hb after EPO initiation or after a dose increase has been less than 1 g/dL over a 2- to 4-week period, the dose of has been less than 1 g/dL over a 2- to 4-week period, the dose of EPO should be increased by 50%.EPO should be increased by 50%.If the absolute rate of increase of Hb after EPO initiation or after If the absolute rate of increase of Hb after EPO initiation or after a dose increase exceeds 3 g/dL per month (eg, an increase from a dose increase exceeds 3 g/dL per month (eg, an increase from a Hgb 7 to 10 g/dL), or if the Hgb exceeds the target, reduce the a Hgb 7 to 10 g/dL), or if the Hgb exceeds the target, reduce the weekly dose of EPO by 25%. weekly dose of EPO by 25%. When the weekly EPO dose is being increased or decreased, a When the weekly EPO dose is being increased or decreased, a change may be made in the amount administered in a given change may be made in the amount administered in a given dose and/or the frequency of dosing.”dose and/or the frequency of dosing.”



Anemia ManagementAnemia ManagementCurrent state-of-the-artCurrent state-of-the-art

Anemia Management Protocols (AMP)Anemia Management Protocols (AMP)– Frequency of Hb observation:Frequency of Hb observation:

Every 4 weeks if Hb within the targetEvery 4 weeks if Hb within the target Every 2 weeks if Hb outside of the targetEvery 2 weeks if Hb outside of the target

– EPO dose adjustment:EPO dose adjustment: Minimum adjustment amount 10% (of current dose)Minimum adjustment amount 10% (of current dose) Maximum decrease 50% (if Hb > 15 g/dL)Maximum decrease 50% (if Hb > 15 g/dL) Maximum increase 70% (if Hb < 9 g/dL)Maximum increase 70% (if Hb < 9 g/dL)

– Problem with AMPProblem with AMP Based on average response.Based on average response. Only 1/3 of the patient population achieve the target.Only 1/3 of the patient population achieve the target.

Can we improve the outcome of anemia management Can we improve the outcome of anemia management by making it patient-specific using control theory and by making it patient-specific using control theory and machine learning techniques ?machine learning techniques ?



Dose-response modelingDose-response modelingOverviewOverview

In control system design and simulation, a good In control system design and simulation, a good process model is priceless.process model is priceless.

Models of erythropoiesis:Models of erythropoiesis:– Physiological model Physiological model

(Uehlinger et al. 1992)(Uehlinger et al. 1992)

– PK / PD modelPK / PD model(Brockm(Brockmööller et al. 1992)ller et al. 1992)

– Bayesian network model Bayesian network model (Bellazzi et al. 1993)(Bellazzi et al. 1993)

– Artificial Neural Network (ANN) models Artificial Neural Network (ANN) models (Martin Guerrero et al. 2003, Gaweda et al. 2003, Gabutti et al. 2006)(Martin Guerrero et al. 2003, Gaweda et al. 2003, Gabutti et al. 2006)



Dose-response modelingDose-response modelingPopulation vs. subpopulation modelingPopulation vs. subpopulation modeling

Model 1

Model 2

selection

Subpopulation 1e.g. responders(EPO/Hb < )

Subpopulation 2e.g. non-responders

(EPO/Hb ≥ )

dose responsedata subsets (batch) Model 1Whole population

dose

response

data set (batch)



Dose-response modelingDose-response modelingExample of response predictionExample of response prediction



Dose-response modelingDose-response modelingOpen problemsOpen problems

Prediction seems to “lag” behind the actual valuePrediction seems to “lag” behind the actual value– Do our data allow us to build a model that shows Do our data allow us to build a model that shows

the true effect of EPO on Hb ( Hct ) ?the true effect of EPO on Hb ( Hct ) ? Let’s estimate a dynamic linear model Hb(k+1) = f( Hb(k), Let’s estimate a dynamic linear model Hb(k+1) = f( Hb(k),

EPO(k) )EPO(k) )

HbHbmm(k+1) = 0.82 Hb(k) + 0.011 EPO(k) + 1.91(k+1) = 0.82 Hb(k) + 0.011 EPO(k) + 1.91 Let’s now estimate a model of Let’s now estimate a model of ΔΔHb(k+1) = f( EPO(k) )Hb(k+1) = f( EPO(k) )

ΔΔHbHbmm(k+1) = 0.015 EPO(k) - 0.23(k+1) = 0.015 EPO(k) - 0.23

Both models achieve comparable accuracy.Both models achieve comparable accuracy.

The second model “explains” the dose effect better.The second model “explains” the dose effect better.



Dose-response modelingDose-response modelingOpen problemsOpen problems

Our data come from clinical treatment (“closed-loop Our data come from clinical treatment (“closed-loop system”)system”)– How does that affect the model ?How does that affect the model ?

-5 0 50

100

200

300

400

500

600

Hb (g/dL)

count

output distribution-4 -2 0 2 40

0.5

1

1.5

2

2.5

3

3.5

Hb (g/dL)

error

absolute prediction error vs. output

Martin Guerrero et al. report the same phenomenon.



Model-based controlModel-based controlModel Predictive Control (MPC)Model Predictive Control (MPC)

Rationale for using Model Predictive ControlRationale for using Model Predictive Control– There is a delay between EPO administration and Hb There is a delay between EPO administration and Hb

responseresponse(about 17 days – from EPO manufacturer information).(about 17 days – from EPO manufacturer information).

– The relationship between EPO dose and Hb increase is The relationship between EPO dose and Hb increase is nonlinear nonlinear (monotonically increasing with saturation – Uehlinger et al. (monotonically increasing with saturation – Uehlinger et al. 1992).1992).

– The effect of EPO continues throughout the lifetime of The effect of EPO continues throughout the lifetime of red blood cells red blood cells (up to 120 days).(up to 120 days).

– We plan to include constraints on EPO dose (in the We plan to include constraints on EPO dose (in the future)future)(such as minimization of the total dose or minimization of dose changes).(such as minimization of the total dose or minimization of dose changes).



Model-based controlModel-based controlMPC - Schematic diagramMPC - Schematic diagram

MODEL(population)

Hb(k+1) = Hb(k) + FNN(EPO(k),EPO(k-1),EPO(k-2))

PATIENT

CONTROLLER

HbHbmm

HbHb

EPO*EPO*

EPOEPO

EPO(3)EPO(2)EPO(1)

iHb11.5minargEPO*3

1i

2m

EPO



Model-based controlModel-based controlMPC Clinical trial - setupMPC Clinical trial - setup

Trial population:Trial population:– 60 patients:60 patients:

30 controls (dosed by physicians) / 30 treatment (dosed by 30 controls (dosed by physicians) / 30 treatment (dosed by MPC)MPC)

45 African-American / 15 Caucasian45 African-American / 15 Caucasian 35 males / 25 females35 males / 25 females Average age 58, min 21, max 84Average age 58, min 21, max 84

Trial length:Trial length:– 8 months8 months

2 months “wash-out” period / 6 months for outcome analysis 2 months “wash-out” period / 6 months for outcome analysis

Treatment goal:Treatment goal:– maintain Hb at 11.5 g/dLmaintain Hb at 11.5 g/dL– performance measure: mean absolute deviation from 11.5performance measure: mean absolute deviation from 11.5



Model-based controlModel-based controlMPC - Clinical trial results (thus far)MPC - Clinical trial results (thus far)

1.08 .87 1.16 .95

1.12 1.07 1.34 1.10

.95 .79 1.07 .87

.75 .81 .74 .73

.96 .77 .96 1.26

.98 .71 .83 .60

1.17 1.18 .87 .78

1.00

2.00

3.00

4.00

5.00

6.00

7.00

Mean Std Deviation

Control

Mean Std Deviation

TreatmentMean |11.5-Hb|

Month



Model-based controlModel-based controlOpen problemsOpen problems

Simulating MPCSimulating MPC– How do we accurately represent the mismatch How do we accurately represent the mismatch

between the model and the patient ?between the model and the patient ?– How do we effectively simulate adverse events ? How do we effectively simulate adverse events ?

Measuring successMeasuring success– We try to individualize the treatment yet we use a mean We try to individualize the treatment yet we use a mean

performance measure – what are the alternatives ? performance measure – what are the alternatives ? Individual performance measures Individual performance measures

(e.g. within-subject StDev(e.g. within-subject StDev of Hb ) ????of Hb ) ????

– How do we eliminate influence of Hb changes due to How do we eliminate influence of Hb changes due to adverse events on the performance measure ?adverse events on the performance measure ?



Model-free controlModel-free controlReinforcement LearningReinforcement Learning

Drug administration in chronic conditions is a Drug administration in chronic conditions is a trial-trial-and-errorand-error control process that resembles control process that resembles reinforcement learning reinforcement learning

disease symptoms – initial state (sdisease symptoms – initial state (s00) )

(standard) initial dose – action (a(standard) initial dose – action (a00))

k = 1k = 1

Repeat (infinitely)Repeat (infinitely)

evaluate patient (remission/progression/side effects) – new state (sevaluate patient (remission/progression/side effects) – new state (s kk), reward (r), reward (rkk))

adjust dosing strategy – update state-action table/function (Qadjust dosing strategy – update state-action table/function (Qkk), extract policy (), extract policy (kk))

administer new dose – action (aadminister new dose – action (akk))

k = k + 1k = k + 1

EndEnd



Model-free controlModel-free controlQ-Learning simulation - Schematic Q-Learning simulation - Schematic diagramdiagram

a,sQmaxarg 1k1ka

1k π

Q-LEARNING AGENT

PATIENT SIMULATOR(subpopulation model)

Hb(k+1) = F( Hb(k), EPO(k), IRON(k) )

POLICY ()Ri: IF Hb = Hbi THEN EPO = EPOi

EPOEPO(a)(a)

IRONIRON(disturbance)(disturbance)

HbHb(s)(s)

kkk1kka

1kkkkkk1k a,sQa,sQmaxra,sQa,sQ



Model-free controlModel-free controlReward functionReward function

otherwise,0

11.51kHb,1

11.51kHbkHb

11.51kHbkHb

,0.5

11.5kHb1kHb

1kHbkHb11.5

,1

r 1k

11.5

11.5

11.5

11.5

11.5



Model-free controlModel-free controlQ-table updateQ-table update

Dose-response relationship (EPO to Dose-response relationship (EPO to ΔΔ Hb) is Hb) is monotonically increasing with saturation (monotonically increasing with saturation (Uehlinger et al. Uehlinger et al.

19921992).). Let’s update multiple entries in the Q-table at a time :Let’s update multiple entries in the Q-table at a time :

– IfIf Hb(k) < 11.5 Hb(k) < 11.5 andand Hb(k+1) Hb(k+1) Hb(k) Hb(k) oror Hb(k) = 11.5 Hb(k) = 11.5 andand Hb(k+1) < Hb(k) Hb(k+1) < Hb(k)thenthen updateupdate Q( s, a ) Q( s, a ) for allfor all s s Hb(k) Hb(k) and alland all a a EPO(k) EPO(k)

– IfIf Hb(k) > 11.5 Hb(k) > 11.5 andand Hb(k+1) Hb(k+1) ≥≥ Hb(k) Hb(k) oror Hb(k) = 11.5 Hb(k) = 11.5 andand Hb(k+1) > Hb(k) Hb(k+1) > Hb(k)thenthen updateupdate Q( s, a ) Q( s, a ) for allfor all s s ≥≥ Hb(k) Hb(k) and alland all a a ≥≥ EPO(k) EPO(k)



Model-free controlModel-free controlQ-Learning - Simulated clinical trialQ-Learning - Simulated clinical trial

Trial population:Trial population:– 200 individuals with various degrees of response to EPO200 individuals with various degrees of response to EPO– 100 distinct responders / 100 distinct non-responders100 distinct responders / 100 distinct non-responders– In the first run, all individuals dosed by AMPIn the first run, all individuals dosed by AMP– In the second run, all individuals dosed by policy updatedIn the second run, all individuals dosed by policy updated

on-line by Q-learningon-line by Q-learning Trial length:Trial length:

– 24 months24 months Treatment goal:Treatment goal:

– drive Hb to, and maintain at 11.5 g/dLdrive Hb to, and maintain at 11.5 g/dL– performance measure: mean absolute deviation from performance measure: mean absolute deviation from

11.511.5



Model-free controlModel-free controlQ-Learning - Simulation resultsQ-Learning - Simulation results

.89 .69 1.04 .85

.89 .52 .70 .33

1.29 .35 .58 .29

1.08 .33 .51 .30

.59 .26 .40 .26

.17 .14 .35 .25

.36 .22 .32 .23

.54 .25 .29 .20

.59 .30 .25 .19

.55 .35 .21 .17

.43 .32 .19 .15

.30 .26 .17 .14

.22 .18 .16 .13

.23 .18 .14 .11

.26 .19 .13 .12

.27 .19 .13 .11

.28 .20 .13 .12

.26 .19 .13 .10

.25 .19 .11 .10

.24 .17 .12 .10

.23 .17 .13 .10

.22 .17 .11 .10

.20 .17 .12 .10

7.00

8.00

9.00

10.00

11.00

12.00

13.00

14.00

15.00

16.00

17.00

18.00

19.00

20.00

21.00

22.00

23.00

24.00

25.00

26.00

27.00

28.00

29.00

Mean Std Deviation

amp

Mean Std Deviation

q learning

Mean |11.5-Hb|

Month



ConclusionsConclusionsand open problemsand open problems

We believe that we are on a good path to successfully We believe that we are on a good path to successfully individualize anemia management using presented individualize anemia management using presented techniques.techniques.

However, we need to address the following:However, we need to address the following:

How do we produce reliable dose-response models that How do we produce reliable dose-response models that perform well on under-represented data instances ?perform well on under-represented data instances ?

What performance measure do we need to use in order What performance measure do we need to use in order to adequately evaluate the success of an individualized to adequately evaluate the success of an individualized treatment ?treatment ?



AcknowledgmentsAcknowledgments

UofL Division of UofL Division of NephrologyNephrology– George R AronoffGeorge R Aronoff– Michael E BrierMichael E Brier– Alfred A JacobsAlfred A Jacobs

UofL Dept Electrical and UofL Dept Electrical and Computer EngineeringComputer Engineering– Mehmet K MuezzinogluMehmet K Muezzinoglu– Jacek M ZuradaJacek M Zurada

Michael E Brier has been sponsored by Department of Veterans Affairs Merit Review Grant.Adam E Gaweda is sponsored by NIDDK (1K25DK072085-01A2).