CS344M Autonomous Multiagent Systems - Department of ...todd/cs344m/slides/week5b-pp4.pdf · CS344M...

CS344MAutonomous Multiagent Systems

Todd Hester

Department or Computer ScienceThe University of Texas at Austin

Good Afternoon, ColleaguesAre there any questions?

Todd Hester


• Changes from 2011 to now

• Do different formations in different situations?

• How does UT’s walk engine work?

• Has the formation code been released? copied?

• Why does world model give 0s for some players? Unseen?

Todd Hester


• Changes from 2011 to now

• Do different formations in different situations?

• How does UT’s walk engine work?

• Has the formation code been released? copied?

• Why does world model give 0s for some players? Unseen?

• Todd: Why not run CMA-ES to optimize role positions too?

Todd Hester

Logistics

• Assignment 4 due today

Todd Hester

Logistics


• Next week’s readings posted

Todd Hester

Logistics


• Next week’s readings posted

• Final project proposal assigned

Todd Hester

Final Projects

• Proposal (10/11): 3+ pages

• What you’re going to do; graded on writing

Todd Hester

Final Projects



• Progress Report (11/8): 5+ pages + binaries + logs

• What you’ve been doing; graded on writing

Todd Hester

Final Projects



• Progress Report (11/8): 5+ pages + binaries + logs

• What you’ve been doing; graded on writing

• Peer Review (11/15): review 2 progress reports

• Clear? suggestions?; graded on writing and feedbackquality

Todd Hester

Final Projects

• Team (12/4): source + binaries

• The tournament entry; make sure it runs!

Todd Hester

Final Projects



• Final Report (12/6): 8+ pages

• A term paper; the main component of your grade

Todd Hester

Final Projects





• Tournament (12/17): nothing due

• Oral presentation

Todd Hester

Final Projects





• Tournament (12/17): nothing due

• Oral presentation

Due at beginning of classes

Todd Hester

Final Project info

• All writing is individual!

Todd Hester

Final Project info


• Two hard copies and one electronic copy

Todd Hester

Final Project info



• Due at beginning of class

Todd Hester

Final Project info




• One idea: Re-implement an idea from one of thereadings

Todd Hester

Final Project info





• Be careful with machine learning

Todd Hester

Final Project info





• Be careful with machine learning

• Example final report on website

Todd Hester

Overview of the Readings• Darwin: genetic programming approach

Todd Hester


• Stone and McAllester: Architecture for action selection

Todd Hester



• Riley et al: Coach competition, extracting models

Todd Hester




• Kuhlmann et al: Learning for coaching

Todd Hester





• Withopf and Riedmiller: Reinforcement learning

Todd Hester






• MacAlpine et al: UT Austin Villa 2011

Todd Hester






• MacAlpine et al: UT Austin Villa 2011

• Barrett et al: SPL Kicking strategy

Todd Hester

Evolutionary Computation• Motivated by biological evolution: GA, GP

Todd Hester


• Search through a space

Todd Hester



− Need a representation, fitness function− Probabilistically apply search operators to set of points

in search space

Todd Hester




in search space

• Randomized, parallel hill-climbing through space

Todd Hester




in search space


• Learning is an optimization problem (fitness)

Todd Hester




in search space


• Learning is an optimization problem (fitness)

Some slides from Machine Learning [Mitchell, 1997]

Todd Hester

Darwin United

• More ambitious follow-up to Luke, 97 (made 2nd round)

Todd Hester

Darwin United


• Motivated in part by Peter’s detailed team construction

Todd Hester

Darwin United



• Evolves whole teams — lexicographic fitness function

Todd Hester

Darwin United




• Evolved on huge (at the time) hypercube

Todd Hester

Darwin United





• Lots of spinning, but figured out dribbling, offsides

Todd Hester

Darwin United






• 1-1-1 record. Tied a good team, but didn’t advance

Todd Hester

Darwin United






• 1-1-1 record. Tied a good team, but didn’t advance

• Success of the method, but not pursued

Todd Hester

Architecture for Action Selection

• (other slides, video)

Todd Hester



• downsides

Todd Hester



• downsides

• Keepaway

Todd Hester

Coaching• Learn best strategy to play a fixed team

Todd Hester


• Give high level advice to players at low frequency

Todd Hester



• Focus on learning formations

Todd Hester




• Learn when successful teams passed/kicked

Todd Hester





• Learn when opponent will pass and try to block

Todd Hester






• What if players switch roles?

Todd Hester







• Why just imitate another team?

Todd Hester







• Why just imitate another team?

• Other slides

Todd Hester

Reinforcement Learning

• RL Slides

Todd Hester


• RL Slides

• Extend to grid soccer

Todd Hester


• RL Slides


• Large state space, joint actions

Todd Hester


• RL Slides



• Address this with state aliasing, options

Todd Hester


• RL Slides




• Successfully learn the task, use for some of team behavior

Todd Hester


• RL Slides




• Successfully learn the task, use for some of team behavior

• However, takes 12 million actions to learn

Todd Hester

UT Austin Villa 2011

• Other slides

Todd Hester


• Other slides

• Why not use CMA-ES on role positions as well?

Todd Hester


• Other slides

• Why not use CMA-ES on role positions as well?

• Changes for 2012?

Todd Hester

Kicking Under Uncertainty• Previous SPL approach: always rotate to kick at goal

Todd Hester


• Kick engine to kick at various distances/headings

Todd Hester



• Adjust to seen ball location

Todd Hester




• Select first kick that moves ball up field

Todd Hester





• Figure

Todd Hester





• Figure

• Emphasis on quickness

Todd Hester





• Figure

• Emphasis on quickness

• Now: Better model of opponents -> Know if we have moretime

Todd Hester

Learning KeepawayKEEPAWAY SLIDES

Todd Hester

Learning Commentary

• David Chen and Ray Mooney

Todd Hester

Coordination Graphs• n agents, each choose an action Ai

Todd Hester


• A = A1 × . . .×An

Todd Hester


• A = A1 × . . .×An

• Ri(A) 7→ IR

Todd Hester


• A = A1 × . . .×An

• Ri(A) 7→ IR

• Coordination problem: R1 = . . . = Rn = R

Todd Hester


• A = A1 × . . .×An

• Ri(A) 7→ IR


• Nash equilibrium: no agent could do better given whatothers are doing.

Todd Hester


• A = A1 × . . .×An

• Ri(A) 7→ IR


• Nash equilibrium: no agent could do better given whatothers are doing.

• May be more than one (chicken)

Todd Hester

Example from the paper• Understand the rule syntax

Todd Hester


• Form the coordination graph

Todd Hester



• First eliminate rules based on context

Todd Hester




• What does it mean for G3 to collect all relevant rules?

Todd Hester





• What does it mean for G3 to maximize over all actions ofa1 and a2?

Todd Hester






• How are the results propagated back?

Todd Hester






• How are the results propagated back?

• Let’s try again with G1 eliminated first

Todd Hester

Application to soccer• Make the world discrete by assigning roles, using high-

level predicates

Todd Hester


level predicates

• Assume global state information

Todd Hester


level predicates


• Finds pass sequences and starts players moving ahead oftime.

Todd Hester


level predicates


• Finds pass sequences and starts players moving ahead oftime.

• Note the results: with and without coordination.

Todd Hester

Reactive Deliberation

• A hybrid approach

• Executor: carry out reactive behaviors

• Deliberator: evaluate possible high-level schema withparameters; generate bids

• Deliberator takes time, but something keeps happeningalways.

• In effect: deliberator commits to schema for some time

Todd Hester

CS344M Autonomous Multiagent Systems - Department of ...todd/cs344m/slides/week5b-pp4.pdf · CS344M...

Documents

Transcript of CS344M Autonomous Multiagent Systems - Department of ...todd/cs344m/slides/week5b-pp4.pdf · CS344M...