Bernoulli 0704
-
Upload
warezisgr8 -
Category
Documents
-
view
224 -
download
0
Transcript of Bernoulli 0704
-
7/29/2019 Bernoulli 0704
1/31
Q-Learning and DynamicTreatment Regimes
S.A. Murphy
Univ. of Michigan
IMS/Bernoulli: July, 2004
-
7/29/2019 Bernoulli 0704
2/31
Outline
Dynamic Treatment Regimes
Optimal Q-functions and Q-learning
The Problem & Goal
Finite Sample Bounds
Outline of Proof
Shortcomings and Open Problems
-
7/29/2019 Bernoulli 0704
3/31
---- Multi-stage decision problems: repeated decisions
are made over time on each patient.
---- Used in the management of Addictions, MentalIllnesses, HIV infection and Cancer
Dynamic Treatment Regimes
-
7/29/2019 Bernoulli 0704
4/31
k Decisions
Observations made prior to tth decision
Action at tth decision
Primary Outcome:
-
7/29/2019 Bernoulli 0704
5/31
A dynamic treatment regime is a vector of decision
rules, one per decision
If the regime is implemented then
-
7/29/2019 Bernoulli 0704
6/31
Goal: Estimate the decision rules that maximize mean
Data: Data set of n finite horizon trajectories, each withrandomized actions.
are randomization probabilities.
-
7/29/2019 Bernoulli 0704
7/31
Optimal Q-functions and Q-learning:
Definition:
denotes expectation when the actions are chosen
according to the regime
-
7/29/2019 Bernoulli 0704
8/31
Q-functions:
The Q-functions foroptimalregime, are givenrecursively by
For t=k,k-1,.
-
7/29/2019 Bernoulli 0704
9/31
Q-functions:
The optimal regime is given by
-
7/29/2019 Bernoulli 0704
10/31
Q-learning:
Given a model for the Q-functions, minimize
over
Set
-
7/29/2019 Bernoulli 0704
11/31
Q-learning:
For each t=k-1,,1 minimize
over
And set
and so on.
-
7/29/2019 Bernoulli 0704
12/31
Q-Learning:
The estimated regime is given by
-
7/29/2019 Bernoulli 0704
13/31
The Problem & Goal:
Most learning (e.g. estimation) methods utilize a model
for all or parts of the multivariate distribution of
implicitly constrains the class of possible decision rules in
the dynamic treatment regime: call this constrained class,
is a vector with many components (high dimensional) thus
the model is likely incorrect; view and as approximation
classes.
-
7/29/2019 Bernoulli 0704
14/31
Goal:Given a learning method and approximation classes
assess the ability of learning method to produce the best decision
rules in the class.
Ideally construct an upper bound for
where is the estimator of the regime
denotes expectation when the actions are chosen according
to the rule
-
7/29/2019 Bernoulli 0704
15/31
Goal:Given a learning method, model and approximationclass construct a finite sample upper bound for
This upper bound should be composed of quantities that are
minimized in the learning method.
Learning Method is Q-learning.
-
7/29/2019 Bernoulli 0704
16/31
Finite Sample Bounds:
Primary Assumptions:
(1)
for L>1.
(2) Number of possible actions is finite.
-
7/29/2019 Bernoulli 0704
17/31
Definition:
where E, without a subscript, denotes expectation when
the actions are randomized.
-
7/29/2019 Bernoulli 0704
18/31
Results:
Approximation Error:
The minimum is over with
-
7/29/2019 Bernoulli 0704
19/31
Define
The estimation error involves the complexity of this
space.
-
7/29/2019 Bernoulli 0704
20/31
Estimation Error:
For with probability at least 1-
for n satisfying
-
7/29/2019 Bernoulli 0704
21/31
If is finite then n needs only to satisfy
that is,
-
7/29/2019 Bernoulli 0704
22/31
Outline of Proof:
The Q-functions for regime are given by
-
7/29/2019 Bernoulli 0704
23/31
Proof Outline
(1)
-
7/29/2019 Bernoulli 0704
24/31
Proof Outline
(2)
It turns out that also
-
7/29/2019 Bernoulli 0704
25/31
Proof Outline
(3)
-
7/29/2019 Bernoulli 0704
26/31
Shortcomings and Open Problems
-
7/29/2019 Bernoulli 0704
27/31
Recall Estimation Error:
For with probability at least 1-
for n satisfying
-
7/29/2019 Bernoulli 0704
28/31
Open Problems
Is there a learning method that can learn the best
decision rule in an approximation class given a dataset of n finite horizon trajectories?
Sieve Estimators or Regularized Estimators?
Dealing with high dimensional X-- feature extraction-
--feature selection.
-
7/29/2019 Bernoulli 0704
29/31
This seminar can be found at:
http://www.stat.lsa.umich.edu/~samurphy/seminars/ims_bernoulli_0704.ppt
The paper can be found at :
http://www.stat.lsa.umich.edu/~samurphy/papers/
Qlearning.pdf
-
7/29/2019 Bernoulli 0704
30/31
Recall Proof Outline
(2)
It turns out that also
-
7/29/2019 Bernoulli 0704
31/31
Recall Proof Outline
(1)