A Decision-Theoretic Model of Assistance - Evaluation, Extension and Open Problems
-
Upload
hashim-torres -
Category
Documents
-
view
22 -
download
0
description
Transcript of A Decision-Theoretic Model of Assistance - Evaluation, Extension and Open Problems
![Page 1: A Decision-Theoretic Model of Assistance - Evaluation, Extension and Open Problems](https://reader033.fdocuments.in/reader033/viewer/2022051619/56812a46550346895d8d8151/html5/thumbnails/1.jpg)
A Decision-Theoretic Model of A Decision-Theoretic Model of Assistance - Evaluation, Assistance - Evaluation,
Extension and Open ProblemsExtension and Open Problems
Sriraam Natarajan, Kshitij Judah, Prasad Sriraam Natarajan, Kshitij Judah, Prasad Tadepalli and Alan FernTadepalli and Alan Fern
School of EECS, Oregon State University
![Page 2: A Decision-Theoretic Model of Assistance - Evaluation, Extension and Open Problems](https://reader033.fdocuments.in/reader033/viewer/2022051619/56812a46550346895d8d8151/html5/thumbnails/2.jpg)
OutlineOutline
IntroductionIntroduction Decision-Theoretic ModelDecision-Theoretic Model Experiment with folder predictorExperiment with folder predictor Incorporating Relational Incorporating Relational
HierarchiesHierarchies Open ProblemsOpen Problems ConclusionConclusion
![Page 3: A Decision-Theoretic Model of Assistance - Evaluation, Extension and Open Problems](https://reader033.fdocuments.in/reader033/viewer/2022051619/56812a46550346895d8d8151/html5/thumbnails/3.jpg)
MotivationMotivation
Several assistant systems proposed to Several assistant systems proposed to Assist users in daily tasksAssist users in daily tasks Reduce their cognitive loadReduce their cognitive load
Examples: CALO (CALO 2003), COACH Examples: CALO (CALO 2003), COACH (Boger (Boger et al. et al. 2005) etc2005) etc
Problems with previous workProblems with previous work Fine-tuned to particular application domainsFine-tuned to particular application domains Utilize specialized technologiesUtilize specialized technologies Lack an overarching frameworkLack an overarching framework
![Page 4: A Decision-Theoretic Model of Assistance - Evaluation, Extension and Open Problems](https://reader033.fdocuments.in/reader033/viewer/2022051619/56812a46550346895d8d8151/html5/thumbnails/4.jpg)
Interaction ModelInteraction Model
User Assistant
Action set UAction set AGo
al
W2
User Action
W1
Initial State
![Page 5: A Decision-Theoretic Model of Assistance - Evaluation, Extension and Open Problems](https://reader033.fdocuments.in/reader033/viewer/2022051619/56812a46550346895d8d8151/html5/thumbnails/5.jpg)
Interaction ModelInteraction Model
Assistant
W2
User Action
W4 W5W3
Assistant Actions
W1
Initial State
User Assistant
Goal : Minimize
user’s actions
![Page 6: A Decision-Theoretic Model of Assistance - Evaluation, Extension and Open Problems](https://reader033.fdocuments.in/reader033/viewer/2022051619/56812a46550346895d8d8151/html5/thumbnails/6.jpg)
Interaction ModelInteraction Model
User Assistant
Goal
W6W2
User Action
W4 W5W3
Assistant Actions
W1
Initial State
![Page 7: A Decision-Theoretic Model of Assistance - Evaluation, Extension and Open Problems](https://reader033.fdocuments.in/reader033/viewer/2022051619/56812a46550346895d8d8151/html5/thumbnails/7.jpg)
Interaction ModelInteraction Model
User Assistant
Action set A
W6 W7 W8W2
User Action
W4 W5W3
Assistant Actions
W1
Initial State
Goal : Minimize
user’s actions
![Page 8: A Decision-Theoretic Model of Assistance - Evaluation, Extension and Open Problems](https://reader033.fdocuments.in/reader033/viewer/2022051619/56812a46550346895d8d8151/html5/thumbnails/8.jpg)
Interaction ModelInteraction Model
User Assistant
Thank you
W6 W7 W8 W9
Goal Achieved
W2
User Action
W4 W5W3
Assistant Actions
W1
Initial State
![Page 9: A Decision-Theoretic Model of Assistance - Evaluation, Extension and Open Problems](https://reader033.fdocuments.in/reader033/viewer/2022051619/56812a46550346895d8d8151/html5/thumbnails/9.jpg)
IntroductionIntroduction Decision-Theoretic ModelDecision-Theoretic Model Experiment with folder predictorExperiment with folder predictor Incorporating Relational Incorporating Relational
HierarchiesHierarchies Open ProblemsOpen Problems ConclusionConclusion
![Page 10: A Decision-Theoretic Model of Assistance - Evaluation, Extension and Open Problems](https://reader033.fdocuments.in/reader033/viewer/2022051619/56812a46550346895d8d8151/html5/thumbnails/10.jpg)
Markov Decision ProcessMarkov Decision Process
MDP – (S,A,T,R,I)MDP – (S,A,T,R,I)
Policy (Policy () – Mapping from S to A) – Mapping from S to A
V(V() = E() = E(ΣΣTTt=1 t=1 rrtt), T = length of episode), T = length of episode
Optimal Policy (Optimal Policy () = argmax (V() = argmax (V()))) A Partially Observable Markov Decision A Partially Observable Markov Decision
Process (POMDP):Process (POMDP): O is the set of observationsO is the set of observations µ(o|s) is a distribution over observations o µ(o|s) is a distribution over observations o єє O O
given current state sgiven current state s
![Page 11: A Decision-Theoretic Model of Assistance - Evaluation, Extension and Open Problems](https://reader033.fdocuments.in/reader033/viewer/2022051619/56812a46550346895d8d8151/html5/thumbnails/11.jpg)
Decision-Theoretic Model (Fern et al. Decision-Theoretic Model (Fern et al. 07)07)
Assistant: History-dependent stochastic policy Assistant: History-dependent stochastic policy ‘(a|w, ‘(a|w, OO))
Observables: World states, Agent’s actionsObservables: World states, Agent’s actions
Hidden: Agent’s goalsHidden: Agent’s goals
Episode begins at state w with goal gEpisode begins at state w with goal g
C(w, g, C(w, g, , , ’): Cost of episode’): Cost of episode
Objective: compute Objective: compute ’ that minimizes E[C(I, G’ that minimizes E[C(I, G00, , , , ’)]’)]
![Page 12: A Decision-Theoretic Model of Assistance - Evaluation, Extension and Open Problems](https://reader033.fdocuments.in/reader033/viewer/2022051619/56812a46550346895d8d8151/html5/thumbnails/12.jpg)
Assistant POMDPAssistant POMDP
Given MDP <W,A,A’,T,C,I>, GGiven MDP <W,A,A’,T,C,I>, G0 0 and and , the , the assistant POMDP is defined as:assistant POMDP is defined as: State space is W State space is W x Gx G Action set is A’Action set is A’ Transition function T’ isTransition function T’ is T’((w,g),a’,(w’,g’)) = 0 if g != g’T’((w,g),a’,(w’,g’)) = 0 if g != g’ = T(w,a’,w’) if a’ != noop= T(w,a’,w’) if a’ != noop = P(T(w, = P(T(w, (w,g)) = w’)(w,g)) = w’) if a’ == noopif a’ == noop Cost model C’ isCost model C’ is C’((w, g), a’) = C(w, a’) if a’ != noopC’((w, g), a’) = C(w, a’) if a’ != noop = E[C(w, a)] where a is distributed = E[C(w, a)] where a is distributed
according to according to
![Page 13: A Decision-Theoretic Model of Assistance - Evaluation, Extension and Open Problems](https://reader033.fdocuments.in/reader033/viewer/2022051619/56812a46550346895d8d8151/html5/thumbnails/13.jpg)
Assistant POMDPAssistant POMDP
AAtt
WWtt
GG
SStt
WWt+1t+1
A’A’ttAAt+1t+1
SSt+1t+1
A’A’t+1t+1
![Page 14: A Decision-Theoretic Model of Assistance - Evaluation, Extension and Open Problems](https://reader033.fdocuments.in/reader033/viewer/2022051619/56812a46550346895d8d8151/html5/thumbnails/14.jpg)
Approximate Solution Approximate Solution ApproachApproach
Goal Recognizer Action Selection
Environment
UserUt
AtOt
P(G)
Assistant
Wt
Online actions selection cycleOnline actions selection cycle1) Estimate posterior goal distribution given 1) Estimate posterior goal distribution given
observationobservation
2) Action selection via myopic heuristics2) Action selection via myopic heuristics
![Page 15: A Decision-Theoretic Model of Assistance - Evaluation, Extension and Open Problems](https://reader033.fdocuments.in/reader033/viewer/2022051619/56812a46550346895d8d8151/html5/thumbnails/15.jpg)
Goal EstimationGoal Estimation
Wt
Current State
P(G | Ot)
Goal posterior given observations up to time t
Wt+1
Ut
P(G | Ot+1)
Updated goal posterior
new observation
GivenGiven P(G | OP(G | Ott) : Goal posterior at time ) : Goal posterior at time tt P(UP(Ut t | G, W| G, Wtt) : User policy) : User policy OOt+1 t+1 : New observation of user action and world state: New observation of user action and world state
must learn user policy
![Page 16: A Decision-Theoretic Model of Assistance - Evaluation, Extension and Open Problems](https://reader033.fdocuments.in/reader033/viewer/2022051619/56812a46550346895d8d8151/html5/thumbnails/16.jpg)
Action Selection: Assistant Action Selection: Assistant POMDPPOMDP
At’
Wt Wt+1 Wt+2
U
G
At’
Wt Wt+2
Assistant MDP
Assume we know the user goal G and policyAssume we know the user goal G and policy Can create a corresponding Can create a corresponding assistant MDPassistant MDP over assistant over assistant
actionsactions Can compute Can compute Q(A, W, G) giving value of taking assistive action A Q(A, W, G) giving value of taking assistive action A
when users goal is Gwhen users goal is G
Select action that maximizes expected (myopic) value: Select action that maximizes expected (myopic) value:
Q(A,W) =P
G P (G j Ot)Q(A;W;G)
![Page 17: A Decision-Theoretic Model of Assistance - Evaluation, Extension and Open Problems](https://reader033.fdocuments.in/reader033/viewer/2022051619/56812a46550346895d8d8151/html5/thumbnails/17.jpg)
IntroductionIntroduction Decision-Theoretic ModelDecision-Theoretic Model Experiment with folder predictorExperiment with folder predictor Incorporating Relational Incorporating Relational
HierarchiesHierarchies Open ProblemsOpen Problems ConclusionConclusion
![Page 18: A Decision-Theoretic Model of Assistance - Evaluation, Extension and Open Problems](https://reader033.fdocuments.in/reader033/viewer/2022051619/56812a46550346895d8d8151/html5/thumbnails/18.jpg)
![Page 19: A Decision-Theoretic Model of Assistance - Evaluation, Extension and Open Problems](https://reader033.fdocuments.in/reader033/viewer/2022051619/56812a46550346895d8d8151/html5/thumbnails/19.jpg)
Folder PredictorFolder Predictor
Previous work (Bao et al. 2006):Previous work (Bao et al. 2006): No repredictionsNo repredictions Does not consider new foldersDoes not consider new folders
Decision-Theoretic Model Decision-Theoretic Model Naturally handles repredictionsNaturally handles repredictions Considers mixture density to obtain the Considers mixture density to obtain the
distributiondistribution
Data set – set of requests of Data set – set of requests of OpenOpen and and saveAssaveAs
Folder hierarchy – 226 foldersFolder hierarchy – 226 folders Prior distribution initialized according to Prior distribution initialized according to
the model of Bao et al.the model of Bao et al.
P(f) = ¹ 0P0(f ) + (1¡ ¹ 0)Pl(f )
![Page 20: A Decision-Theoretic Model of Assistance - Evaluation, Extension and Open Problems](https://reader033.fdocuments.in/reader033/viewer/2022051619/56812a46550346895d8d8151/html5/thumbnails/20.jpg)
restricted folder set
all foldersconsidered
No Reprediction With Repredictions
1.3724
1.319
1.34
1.2344
Avg. no. of clicks per open/saveAs
Current Tasktracer
Full Assistant Framework
![Page 21: A Decision-Theoretic Model of Assistance - Evaluation, Extension and Open Problems](https://reader033.fdocuments.in/reader033/viewer/2022051619/56812a46550346895d8d8151/html5/thumbnails/21.jpg)
IntroductionIntroduction Decision-Theoretic ModelDecision-Theoretic Model Experiment with folder predictorExperiment with folder predictor Incorporating Relational Incorporating Relational
HierarchiesHierarchies Open ProblemsOpen Problems ConclusionConclusion
![Page 22: A Decision-Theoretic Model of Assistance - Evaluation, Extension and Open Problems](https://reader033.fdocuments.in/reader033/viewer/2022051619/56812a46550346895d8d8151/html5/thumbnails/22.jpg)
Incorporating Relational Incorporating Relational HierarchiesHierarchies
Tasks are hierarchical Tasks are hierarchical Writing a paper Writing a paper
Tasks have a natural class – subclass hierarchyTasks have a natural class – subclass hierarchy Papers to ICML or IJCAI involve similar subtasksPapers to ICML or IJCAI involve similar subtasks
Tasks are chosen based on some attribute of the Tasks are chosen based on some attribute of the worldworld Grad students work on a paper closer to the deadlineGrad students work on a paper closer to the deadline
Goal: Combine these ideas to Goal: Combine these ideas to Specify prior knowledge easilySpecify prior knowledge easily Accelerate learning of the parametersAccelerate learning of the parameters
![Page 23: A Decision-Theoretic Model of Assistance - Evaluation, Extension and Open Problems](https://reader033.fdocuments.in/reader033/viewer/2022051619/56812a46550346895d8d8151/html5/thumbnails/23.jpg)
Doorman DomainDoorman Domain
![Page 24: A Decision-Theoretic Model of Assistance - Evaluation, Extension and Open Problems](https://reader033.fdocuments.in/reader033/viewer/2022051619/56812a46550346895d8d8151/html5/thumbnails/24.jpg)
L = R.Loc
Gather(R) Attack(E)
Collect(R) Deposit(R,S) DestroyCamp(E)KillDragon(D)
Goto(L)Pickup(R)
Move(X) Open(D)
DropOff(R,S)
R.Type = S.Type
L = S.Loc L = D.Loc
Kill(D)
Destroy(E)
L = E.Loc
E.Type = D.Type
![Page 25: A Decision-Theoretic Model of Assistance - Evaluation, Extension and Open Problems](https://reader033.fdocuments.in/reader033/viewer/2022051619/56812a46550346895d8d8151/html5/thumbnails/25.jpg)
Performance of different Performance of different modelsmodels
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
Number of episodes x 10
Sav
ings
Relational Hierarchies
Hierarchical Model
Flat Model
RelationalModel
![Page 26: A Decision-Theoretic Model of Assistance - Evaluation, Extension and Open Problems](https://reader033.fdocuments.in/reader033/viewer/2022051619/56812a46550346895d8d8151/html5/thumbnails/26.jpg)
IntroductionIntroduction Decision-Theoretic ModelDecision-Theoretic Model Experiment with folder predictorExperiment with folder predictor Incorporating Relational Incorporating Relational
HierarchiesHierarchies Open ProblemsOpen Problems ConclusionConclusion
![Page 27: A Decision-Theoretic Model of Assistance - Evaluation, Extension and Open Problems](https://reader033.fdocuments.in/reader033/viewer/2022051619/56812a46550346895d8d8151/html5/thumbnails/27.jpg)
Open ProblemsOpen Problems Partial Observability of the userPartial Observability of the user
Currently user completely observes the environmentCurrently user completely observes the environment Not the case in real-world – User need not know what Not the case in real-world – User need not know what
is in the refrigeratoris in the refrigerator Assistant can completely observe the world Assistant can completely observe the world Current system does not consider user’s exploratory Current system does not consider user’s exploratory
actionsactions Setting is similar to interactive POMDPs (Doshi et al.)Setting is similar to interactive POMDPs (Doshi et al.) Environment – POMDPEnvironment – POMDP Belief states of the POMDP are belief states of the userBelief states of the POMDP are belief states of the user State space needs to be extended to capture user’s State space needs to be extended to capture user’s
beliefsbeliefs
![Page 28: A Decision-Theoretic Model of Assistance - Evaluation, Extension and Open Problems](https://reader033.fdocuments.in/reader033/viewer/2022051619/56812a46550346895d8d8151/html5/thumbnails/28.jpg)
Open ProblemsOpen Problems Large State spaceLarge State space
Solving POMDP is impracticalSolving POMDP is impractical Kitchen Domain (Fern et al.) – 140000 states Kitchen Domain (Fern et al.) – 140000 states Prune certain regions of the search space Prune certain regions of the search space
((Electric ElvesElectric Elves)) Can use user trajectories as training examplesCan use user trajectories as training examples
Parallel subgoals/actionsParallel subgoals/actions Assistant and user execute actions in parallelAssistant and user execute actions in parallel Useful to execute parallel subgoals - User writes Useful to execute parallel subgoals - User writes
paper, assistant runs experimentspaper, assistant runs experiments Identification of the possible parallel actionsIdentification of the possible parallel actions The assistant can change the goal stack of the The assistant can change the goal stack of the
useruser Goal estimation has to include the user’s responseGoal estimation has to include the user’s response
![Page 29: A Decision-Theoretic Model of Assistance - Evaluation, Extension and Open Problems](https://reader033.fdocuments.in/reader033/viewer/2022051619/56812a46550346895d8d8151/html5/thumbnails/29.jpg)
Open ProblemsOpen Problems Changing goalsChanging goals
User can change goal midway - Work on a different User can change goal midway - Work on a different projectproject
Currently, the system would converge to the goal slowlyCurrently, the system would converge to the goal slowly Explicitly model this possibilityExplicitly model this possibility Borrow ideas from user modeling to predict changing Borrow ideas from user modeling to predict changing
goalsgoals Expanding set of goals Expanding set of goals
A large number of dishes can be cookedA large number of dishes can be cooked Forgetting subgoalsForgetting subgoals
Forgetting to attach a document to the emailForgetting to attach a document to the email Explicitly model this possibility – borrow ideas from Explicitly model this possibility – borrow ideas from
cognitive science literaturecognitive science literature
![Page 30: A Decision-Theoretic Model of Assistance - Evaluation, Extension and Open Problems](https://reader033.fdocuments.in/reader033/viewer/2022051619/56812a46550346895d8d8151/html5/thumbnails/30.jpg)
IntroductionIntroduction Decision-Theoretic ModelDecision-Theoretic Model Experiment with folder predictorExperiment with folder predictor Incorporating Relational Incorporating Relational
HierarchiesHierarchies Open ProblemsOpen Problems ConclusionConclusion
![Page 31: A Decision-Theoretic Model of Assistance - Evaluation, Extension and Open Problems](https://reader033.fdocuments.in/reader033/viewer/2022051619/56812a46550346895d8d8151/html5/thumbnails/31.jpg)
ConclusionConclusion Propose a general framework based on decision-Propose a general framework based on decision-
theorytheory
Experiments in a real-world domainExperiments in a real-world domain
Repredictions are usefulRepredictions are useful
Currently working on a relational hierarchical Currently working on a relational hierarchical modelmodel
Outlined several open problems Outlined several open problems
Motivated the necessity of using sophisticated Motivated the necessity of using sophisticated user modelsuser models
![Page 32: A Decision-Theoretic Model of Assistance - Evaluation, Extension and Open Problems](https://reader033.fdocuments.in/reader033/viewer/2022051619/56812a46550346895d8d8151/html5/thumbnails/32.jpg)
Thank you!!!Thank you!!!