Learning Motion Prediction Models for Opponent Interception

30
Learning Motion Prediction Models for Opponent Interception Bulent Tastan David Chang Gita Sukthankar

description

Learning Motion Prediction Models for Opponent Interception. Bulent Tastan David Chang Gita Sukthankar. Intercepting Opponents. The ability to intercept opponents is an important aspect of many adversarial games. Human players exhibit user-specific movement preferences - PowerPoint PPT Presentation

Transcript of Learning Motion Prediction Models for Opponent Interception

Page 1: Learning Motion Prediction Models for Opponent Interception

Learning Motion Prediction Models for Opponent

Interception

Bulent TastanDavid Chang

Gita Sukthankar

Page 2: Learning Motion Prediction Models for Opponent Interception

Intercepting Opponents• The ability to intercept opponents is an

important aspect of many adversarial games.

• Human players– exhibit user-specific movement

preferences– don’t necessarily prefer the shortest

routes– are capable of intercepting opponents

in a partially occluded map• This paper presents a method for learning

user path preferences from data and planning interception routes.

University of Central Florida [email protected] 2

Page 3: Learning Motion Prediction Models for Opponent Interception

Framework

University of Central Florida [email protected] 3

(1) Learning motion models

(2) Opponent trackingusing Particle Filters

(3) Planning to intercept

Page 4: Learning Motion Prediction Models for Opponent Interception

Example Scenario• Bot plays a series of repeated

games against a human to learn the human’s evasion strategies.

• Human needs to safely cross the area and is initially occluded from the bot.

• The bot’s goal is to intercept the human player before they leave the map.

• Training map is a subsection of larger Unreal Tournament maze.

University of Central Florida [email protected] 4

Page 5: Learning Motion Prediction Models for Opponent Interception

Related Work• Particle filters for opponent modeling

– A sequential Monte Carlo state estimation technique in which the current probability distribution is represented as a set of particles and importance weights which are resampled and reweighted based on observed data

– Multiple aspects of the framework can be configured• (Bererton 2004): modify number of particles to adapt the difficulty

level of the AI• (Weber et al. 2011): Starcraft unit movement• (Hladky and Bulitko 2008): learning motion models from game logs

in CounterstrikeOur method can learn motion models from a small number of logs and

generalize them using inverse reinforcement learning .

Page 6: Learning Motion Prediction Models for Opponent Interception

Learning to intercept

University of Central Florida [email protected] 6

Data collection

Max-Entropy IRL

Page 7: Learning Motion Prediction Models for Opponent Interception

Data Collection• Gather a small set of traces from a specific human player• Player is instructed to use a small subset of the

entrances/exits on the map• Log traces are converted into a feature-based

representation• Features include: 1) distance to corners 2) distance to

map center 3) distance to nearest exit 4) quadrant information (binary descriptor).

• Data is used to learn a user-specific model of the player’s evasion preferences in the form of expected feature counts.

Page 8: Learning Motion Prediction Models for Opponent Interception

Learning to intercept

University of Central Florida [email protected] 8

Max-Entropy IRL (Ziebert et al, 2008)

Page 9: Learning Motion Prediction Models for Opponent Interception

Inverse Reinforcement Learning

Reward Vector Demonstrations (Policy)

IRL

• Inverse RL is a mechanism for learning the implicit reward structure of an MDP from demonstrations of a policy.

• Under constrained problem—many reward vectors map to the same policy.

• Assumption: human player is acting to optimize a hidden reward metric.

• Iterative process of selecting a reward, creating a policy that optimizes the rewards, modifying the reward to make the policy “more similar” to the demonstrations

• Many different ways of defining similarity function.

Page 10: Learning Motion Prediction Models for Opponent Interception

Max Entropy IRL (Ziebart et al, 2008)

πω = logP(a | s,ω)m=1

M

∑€

E[ f0] = f trajmm=1

M

∑Input: frequency that features were viewed inplayer tracesReward Model: linear weighted combinationof featuresOutput: weights

1) Calculate expectation of player’s trajectoryviewing features2) Policy is expressed as the probability of player taking an action conditioned on stateand parameter.3) Find weights that maximize the loglikelihood of seeing the trajectories4) Use gradient descent to improve theweights5) Forward-backward procedure is used tocalculate the feature expectation for weights

argmaxωL(ω) = logP(trajm |ω

m=1

M

∑ )

dL(ω) = E[ fo] − E[ fπ ω ]

Page 11: Learning Motion Prediction Models for Opponent Interception

Framework

University of Central Florida [email protected] 11

Page 12: Learning Motion Prediction Models for Opponent Interception

Framework

University of Central Florida [email protected] 12

(2) Opponent tracking using Particle Filters

Page 13: Learning Motion Prediction Models for Opponent Interception

Particle Filter Opponent tracking• Generate candidate paths using IRL• Candidate paths are used as the motion model for the

particle filter• Particle filter can be run forward in time to predict where

the opponent will be at longer time horizons• Evaluate 2 motion models:

– IRL motion model uses the paths– Brownian motion assumes there’s no path

information

Page 14: Learning Motion Prediction Models for Opponent Interception

Particle Filter Tracker• Generate a set of particle

that match the prior probability distribution

• Use the motion model to make the prediction for the next time step

• Reweight the particles based on the observation (if any)

• Use importance sampling to resample the particles based on the new weights

Page 15: Learning Motion Prediction Models for Opponent Interception

PF without Path Info

Page 16: Learning Motion Prediction Models for Opponent Interception

PF with Max-Entropy IRL

Page 17: Learning Motion Prediction Models for Opponent Interception

Tracking Error Results

• Verification that the first part of the prediction part of the system works• Measure tracking error for different specific entrance and goal pairs• The error is reduced substantially using the IRL motion model.• However tracking is not the whole story---the planning model matters

as well.

Page 18: Learning Motion Prediction Models for Opponent Interception

Framework

University of Central Florida [email protected] 18

(3) Planning to intercept

Page 19: Learning Motion Prediction Models for Opponent Interception

Planning to intercept• Centroid: the center of the entire particle set

• Uncertainty Elimination: Maximum number of particles within the bot's sensor radius

• Cluster: Particles are clustered, and the best cluster centroid is selected

University of Central Florida [email protected] 19

Page 20: Learning Motion Prediction Models for Opponent Interception

Centroid Planner

Page 21: Learning Motion Prediction Models for Opponent Interception

Uncertainty Elimination Planner

Page 22: Learning Motion Prediction Models for Opponent Interception

Cluster Planner

Page 23: Learning Motion Prediction Models for Opponent Interception

Planners

University of Central Florida [email protected] 23

Page 24: Learning Motion Prediction Models for Opponent Interception

Evaluations• Models: IRL, Brownian motion

• Planners: centroid, uncertainty elimination, and cluster

• Delays: running the particle filter at different time horizons

University of Central Florida [email protected] 24

Page 25: Learning Motion Prediction Models for Opponent Interception

Results

University of Central Florida [email protected] 25

Page 26: Learning Motion Prediction Models for Opponent Interception

Results

University of Central Florida [email protected] 26

Page 27: Learning Motion Prediction Models for Opponent Interception

Results

University of Central Florida [email protected] 27

Page 28: Learning Motion Prediction Models for Opponent Interception

Conclusion and Future Work• We introduce a general method for learning and

incorporating user-specific evasion models into adversarial planning.

• Motion model has the most impact on the results.• But the choice of planner and time horizon matters as

wellwithout good planning the modeling benefits are lost.

• Future work is combining the prediction model with a hierarchical POMDP planner

University of Central Florida [email protected] 28

Page 29: Learning Motion Prediction Models for Opponent Interception

Thank YouQuestions?

University of Central Florida [email protected] 29

Page 30: Learning Motion Prediction Models for Opponent Interception

The 8th Annual Conference on Artificial Intelligence and Interactive Digital Entertainment

General Chair: Mark RiedlProgram Chair: Gita Sukthankar

October 8-9, Palo Alto, California

Please come join us next month at AIIDE in Palo, Alto, CA!