A Decision-Theoretic Model of Assistance - Evaluation, Extension and Open Problems
-
Upload
hashim-torres -
Category
Documents
-
view
22 -
download
0
description
Transcript of A Decision-Theoretic Model of Assistance - Evaluation, Extension and Open Problems
A Decision-Theoretic Model of A Decision-Theoretic Model of Assistance - Evaluation, Assistance - Evaluation,
Extension and Open ProblemsExtension and Open Problems
Sriraam Natarajan, Kshitij Judah, Prasad Sriraam Natarajan, Kshitij Judah, Prasad Tadepalli and Alan FernTadepalli and Alan Fern
School of EECS, Oregon State University
OutlineOutline
IntroductionIntroduction Decision-Theoretic ModelDecision-Theoretic Model Experiment with folder predictorExperiment with folder predictor Incorporating Relational Incorporating Relational
HierarchiesHierarchies Open ProblemsOpen Problems ConclusionConclusion
MotivationMotivation
Several assistant systems proposed to Several assistant systems proposed to Assist users in daily tasksAssist users in daily tasks Reduce their cognitive loadReduce their cognitive load
Examples: CALO (CALO 2003), COACH Examples: CALO (CALO 2003), COACH (Boger (Boger et al. et al. 2005) etc2005) etc
Problems with previous workProblems with previous work Fine-tuned to particular application domainsFine-tuned to particular application domains Utilize specialized technologiesUtilize specialized technologies Lack an overarching frameworkLack an overarching framework
Interaction ModelInteraction Model
User Assistant
Action set UAction set AGo
al
W2
User Action
W1
Initial State
Interaction ModelInteraction Model
Assistant
W2
User Action
W4 W5W3
Assistant Actions
W1
Initial State
User Assistant
Goal : Minimize
user’s actions
Interaction ModelInteraction Model
User Assistant
Goal
W6W2
User Action
W4 W5W3
Assistant Actions
W1
Initial State
Interaction ModelInteraction Model
User Assistant
Action set A
W6 W7 W8W2
User Action
W4 W5W3
Assistant Actions
W1
Initial State
Goal : Minimize
user’s actions
Interaction ModelInteraction Model
User Assistant
Thank you
W6 W7 W8 W9
Goal Achieved
W2
User Action
W4 W5W3
Assistant Actions
W1
Initial State
IntroductionIntroduction Decision-Theoretic ModelDecision-Theoretic Model Experiment with folder predictorExperiment with folder predictor Incorporating Relational Incorporating Relational
HierarchiesHierarchies Open ProblemsOpen Problems ConclusionConclusion
Markov Decision ProcessMarkov Decision Process
MDP – (S,A,T,R,I)MDP – (S,A,T,R,I)
Policy (Policy () – Mapping from S to A) – Mapping from S to A
V(V() = E() = E(ΣΣTTt=1 t=1 rrtt), T = length of episode), T = length of episode
Optimal Policy (Optimal Policy () = argmax (V() = argmax (V()))) A Partially Observable Markov Decision A Partially Observable Markov Decision
Process (POMDP):Process (POMDP): O is the set of observationsO is the set of observations µ(o|s) is a distribution over observations o µ(o|s) is a distribution over observations o єє O O
given current state sgiven current state s
Decision-Theoretic Model (Fern et al. Decision-Theoretic Model (Fern et al. 07)07)
Assistant: History-dependent stochastic policy Assistant: History-dependent stochastic policy ‘(a|w, ‘(a|w, OO))
Observables: World states, Agent’s actionsObservables: World states, Agent’s actions
Hidden: Agent’s goalsHidden: Agent’s goals
Episode begins at state w with goal gEpisode begins at state w with goal g
C(w, g, C(w, g, , , ’): Cost of episode’): Cost of episode
Objective: compute Objective: compute ’ that minimizes E[C(I, G’ that minimizes E[C(I, G00, , , , ’)]’)]
Assistant POMDPAssistant POMDP
Given MDP <W,A,A’,T,C,I>, GGiven MDP <W,A,A’,T,C,I>, G0 0 and and , the , the assistant POMDP is defined as:assistant POMDP is defined as: State space is W State space is W x Gx G Action set is A’Action set is A’ Transition function T’ isTransition function T’ is T’((w,g),a’,(w’,g’)) = 0 if g != g’T’((w,g),a’,(w’,g’)) = 0 if g != g’ = T(w,a’,w’) if a’ != noop= T(w,a’,w’) if a’ != noop = P(T(w, = P(T(w, (w,g)) = w’)(w,g)) = w’) if a’ == noopif a’ == noop Cost model C’ isCost model C’ is C’((w, g), a’) = C(w, a’) if a’ != noopC’((w, g), a’) = C(w, a’) if a’ != noop = E[C(w, a)] where a is distributed = E[C(w, a)] where a is distributed
according to according to
Assistant POMDPAssistant POMDP
AAtt
WWtt
GG
SStt
WWt+1t+1
A’A’ttAAt+1t+1
SSt+1t+1
A’A’t+1t+1
Approximate Solution Approximate Solution ApproachApproach
Goal Recognizer Action Selection
Environment
UserUt
AtOt
P(G)
Assistant
Wt
Online actions selection cycleOnline actions selection cycle1) Estimate posterior goal distribution given 1) Estimate posterior goal distribution given
observationobservation
2) Action selection via myopic heuristics2) Action selection via myopic heuristics
Goal EstimationGoal Estimation
Wt
Current State
P(G | Ot)
Goal posterior given observations up to time t
Wt+1
Ut
P(G | Ot+1)
Updated goal posterior
new observation
GivenGiven P(G | OP(G | Ott) : Goal posterior at time ) : Goal posterior at time tt P(UP(Ut t | G, W| G, Wtt) : User policy) : User policy OOt+1 t+1 : New observation of user action and world state: New observation of user action and world state
must learn user policy
Action Selection: Assistant Action Selection: Assistant POMDPPOMDP
At’
Wt Wt+1 Wt+2
U
G
At’
Wt Wt+2
Assistant MDP
Assume we know the user goal G and policyAssume we know the user goal G and policy Can create a corresponding Can create a corresponding assistant MDPassistant MDP over assistant over assistant
actionsactions Can compute Can compute Q(A, W, G) giving value of taking assistive action A Q(A, W, G) giving value of taking assistive action A
when users goal is Gwhen users goal is G
Select action that maximizes expected (myopic) value: Select action that maximizes expected (myopic) value:
Q(A,W) =P
G P (G j Ot)Q(A;W;G)
IntroductionIntroduction Decision-Theoretic ModelDecision-Theoretic Model Experiment with folder predictorExperiment with folder predictor Incorporating Relational Incorporating Relational
HierarchiesHierarchies Open ProblemsOpen Problems ConclusionConclusion
Folder PredictorFolder Predictor
Previous work (Bao et al. 2006):Previous work (Bao et al. 2006): No repredictionsNo repredictions Does not consider new foldersDoes not consider new folders
Decision-Theoretic Model Decision-Theoretic Model Naturally handles repredictionsNaturally handles repredictions Considers mixture density to obtain the Considers mixture density to obtain the
distributiondistribution
Data set – set of requests of Data set – set of requests of OpenOpen and and saveAssaveAs
Folder hierarchy – 226 foldersFolder hierarchy – 226 folders Prior distribution initialized according to Prior distribution initialized according to
the model of Bao et al.the model of Bao et al.
P(f) = ¹ 0P0(f ) + (1¡ ¹ 0)Pl(f )
restricted folder set
all foldersconsidered
No Reprediction With Repredictions
1.3724
1.319
1.34
1.2344
Avg. no. of clicks per open/saveAs
Current Tasktracer
Full Assistant Framework
IntroductionIntroduction Decision-Theoretic ModelDecision-Theoretic Model Experiment with folder predictorExperiment with folder predictor Incorporating Relational Incorporating Relational
HierarchiesHierarchies Open ProblemsOpen Problems ConclusionConclusion
Incorporating Relational Incorporating Relational HierarchiesHierarchies
Tasks are hierarchical Tasks are hierarchical Writing a paper Writing a paper
Tasks have a natural class – subclass hierarchyTasks have a natural class – subclass hierarchy Papers to ICML or IJCAI involve similar subtasksPapers to ICML or IJCAI involve similar subtasks
Tasks are chosen based on some attribute of the Tasks are chosen based on some attribute of the worldworld Grad students work on a paper closer to the deadlineGrad students work on a paper closer to the deadline
Goal: Combine these ideas to Goal: Combine these ideas to Specify prior knowledge easilySpecify prior knowledge easily Accelerate learning of the parametersAccelerate learning of the parameters
Doorman DomainDoorman Domain
L = R.Loc
Gather(R) Attack(E)
Collect(R) Deposit(R,S) DestroyCamp(E)KillDragon(D)
Goto(L)Pickup(R)
Move(X) Open(D)
DropOff(R,S)
R.Type = S.Type
L = S.Loc L = D.Loc
Kill(D)
Destroy(E)
L = E.Loc
E.Type = D.Type
Performance of different Performance of different modelsmodels
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Number of episodes x 10
Sav
ings
Relational Hierarchies
Hierarchical Model
Flat Model
RelationalModel
IntroductionIntroduction Decision-Theoretic ModelDecision-Theoretic Model Experiment with folder predictorExperiment with folder predictor Incorporating Relational Incorporating Relational
HierarchiesHierarchies Open ProblemsOpen Problems ConclusionConclusion
Open ProblemsOpen Problems Partial Observability of the userPartial Observability of the user
Currently user completely observes the environmentCurrently user completely observes the environment Not the case in real-world – User need not know what Not the case in real-world – User need not know what
is in the refrigeratoris in the refrigerator Assistant can completely observe the world Assistant can completely observe the world Current system does not consider user’s exploratory Current system does not consider user’s exploratory
actionsactions Setting is similar to interactive POMDPs (Doshi et al.)Setting is similar to interactive POMDPs (Doshi et al.) Environment – POMDPEnvironment – POMDP Belief states of the POMDP are belief states of the userBelief states of the POMDP are belief states of the user State space needs to be extended to capture user’s State space needs to be extended to capture user’s
beliefsbeliefs
Open ProblemsOpen Problems Large State spaceLarge State space
Solving POMDP is impracticalSolving POMDP is impractical Kitchen Domain (Fern et al.) – 140000 states Kitchen Domain (Fern et al.) – 140000 states Prune certain regions of the search space Prune certain regions of the search space
((Electric ElvesElectric Elves)) Can use user trajectories as training examplesCan use user trajectories as training examples
Parallel subgoals/actionsParallel subgoals/actions Assistant and user execute actions in parallelAssistant and user execute actions in parallel Useful to execute parallel subgoals - User writes Useful to execute parallel subgoals - User writes
paper, assistant runs experimentspaper, assistant runs experiments Identification of the possible parallel actionsIdentification of the possible parallel actions The assistant can change the goal stack of the The assistant can change the goal stack of the
useruser Goal estimation has to include the user’s responseGoal estimation has to include the user’s response
Open ProblemsOpen Problems Changing goalsChanging goals
User can change goal midway - Work on a different User can change goal midway - Work on a different projectproject
Currently, the system would converge to the goal slowlyCurrently, the system would converge to the goal slowly Explicitly model this possibilityExplicitly model this possibility Borrow ideas from user modeling to predict changing Borrow ideas from user modeling to predict changing
goalsgoals Expanding set of goals Expanding set of goals
A large number of dishes can be cookedA large number of dishes can be cooked Forgetting subgoalsForgetting subgoals
Forgetting to attach a document to the emailForgetting to attach a document to the email Explicitly model this possibility – borrow ideas from Explicitly model this possibility – borrow ideas from
cognitive science literaturecognitive science literature
IntroductionIntroduction Decision-Theoretic ModelDecision-Theoretic Model Experiment with folder predictorExperiment with folder predictor Incorporating Relational Incorporating Relational
HierarchiesHierarchies Open ProblemsOpen Problems ConclusionConclusion
ConclusionConclusion Propose a general framework based on decision-Propose a general framework based on decision-
theorytheory
Experiments in a real-world domainExperiments in a real-world domain
Repredictions are usefulRepredictions are useful
Currently working on a relational hierarchical Currently working on a relational hierarchical modelmodel
Outlined several open problems Outlined several open problems
Motivated the necessity of using sophisticated Motivated the necessity of using sophisticated user modelsuser models
Thank you!!!Thank you!!!