Bernoulli 0704

7/29/2019 Bernoulli 0704

1/31

Q-Learning and DynamicTreatment Regimes

S.A. Murphy

Univ. of Michigan

IMS/Bernoulli: July, 2004

7/29/2019 Bernoulli 0704

2/31

Outline

Dynamic Treatment Regimes

Optimal Q-functions and Q-learning

The Problem & Goal

Finite Sample Bounds

Outline of Proof

Shortcomings and Open Problems

7/29/2019 Bernoulli 0704

3/31

---- Multi-stage decision problems: repeated decisions

are made over time on each patient.

---- Used in the management of Addictions, MentalIllnesses, HIV infection and Cancer

Dynamic Treatment Regimes

7/29/2019 Bernoulli 0704

4/31

k Decisions

Observations made prior to tth decision

Action at tth decision

Primary Outcome:

7/29/2019 Bernoulli 0704

5/31

A dynamic treatment regime is a vector of decision

rules, one per decision

If the regime is implemented then

7/29/2019 Bernoulli 0704

6/31

Goal: Estimate the decision rules that maximize mean

Data: Data set of n finite horizon trajectories, each withrandomized actions.

are randomization probabilities.

7/29/2019 Bernoulli 0704

7/31

Optimal Q-functions and Q-learning:

Definition:

denotes expectation when the actions are chosen

according to the regime

7/29/2019 Bernoulli 0704

8/31

Q-functions:

The Q-functions foroptimalregime, are givenrecursively by

For t=k,k-1,.

7/29/2019 Bernoulli 0704

9/31

Q-functions:

The optimal regime is given by

7/29/2019 Bernoulli 0704

10/31

Q-learning:

Given a model for the Q-functions, minimize

over

Set

7/29/2019 Bernoulli 0704

11/31

Q-learning:

For each t=k-1,,1 minimize

over

And set

and so on.

7/29/2019 Bernoulli 0704

12/31

Q-Learning:

The estimated regime is given by

7/29/2019 Bernoulli 0704

13/31

The Problem & Goal:

Most learning (e.g. estimation) methods utilize a model

for all or parts of the multivariate distribution of

implicitly constrains the class of possible decision rules in

the dynamic treatment regime: call this constrained class,

is a vector with many components (high dimensional) thus

the model is likely incorrect; view and as approximation

classes.

7/29/2019 Bernoulli 0704

14/31

Goal:Given a learning method and approximation classes

assess the ability of learning method to produce the best decision

rules in the class.

Ideally construct an upper bound for

where is the estimator of the regime

denotes expectation when the actions are chosen according

to the rule

7/29/2019 Bernoulli 0704

15/31

Goal:Given a learning method, model and approximationclass construct a finite sample upper bound for

This upper bound should be composed of quantities that are

minimized in the learning method.

Learning Method is Q-learning.

7/29/2019 Bernoulli 0704

16/31

Finite Sample Bounds:

Primary Assumptions:

(1)

for L>1.

(2) Number of possible actions is finite.

7/29/2019 Bernoulli 0704

17/31

Definition:

where E, without a subscript, denotes expectation when

the actions are randomized.

7/29/2019 Bernoulli 0704

18/31

Results:

Approximation Error:

The minimum is over with

7/29/2019 Bernoulli 0704

19/31

Define

The estimation error involves the complexity of this

space.

7/29/2019 Bernoulli 0704

20/31

Estimation Error:

For with probability at least 1-

for n satisfying

7/29/2019 Bernoulli 0704

21/31

If is finite then n needs only to satisfy

that is,

7/29/2019 Bernoulli 0704

22/31

Outline of Proof:

The Q-functions for regime are given by

7/29/2019 Bernoulli 0704

23/31

Proof Outline

(1)

7/29/2019 Bernoulli 0704

24/31

Proof Outline

(2)

It turns out that also

7/29/2019 Bernoulli 0704

25/31

Proof Outline

(3)

7/29/2019 Bernoulli 0704

26/31

Shortcomings and Open Problems

7/29/2019 Bernoulli 0704

27/31

Recall Estimation Error:

For with probability at least 1-

for n satisfying

7/29/2019 Bernoulli 0704

28/31

Open Problems

Is there a learning method that can learn the best

decision rule in an approximation class given a dataset of n finite horizon trajectories?

Sieve Estimators or Regularized Estimators?

Dealing with high dimensional X-- feature extraction-

--feature selection.

7/29/2019 Bernoulli 0704

29/31

This seminar can be found at:

http://www.stat.lsa.umich.edu/~samurphy/seminars/ims_bernoulli_0704.ppt

The paper can be found at :

http://www.stat.lsa.umich.edu/~samurphy/papers/

Qlearning.pdf

[email protected]

7/29/2019 Bernoulli 0704

30/31

Recall Proof Outline

(2)

It turns out that also

7/29/2019 Bernoulli 0704

31/31

Recall Proof Outline

(1)

Bernoulli 0704

Documents

Transcript of Bernoulli 0704