Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing •...
Transcript of Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing •...
![Page 1: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/1.jpg)
Marvin Zhang 08/10/2016
Lecture 29: Artificial Intelligence
Some slides are adapted from CS 188 (Artificial Intelligence)
![Page 3: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/3.jpg)
Roadmap
Introduction
Functions
Data
Mutability
Objects
Interpretation
Paradigms
Applications
![Page 4: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/4.jpg)
Roadmap
• This week (Applications), the goals are:
Introduction
Functions
Data
Mutability
Objects
Interpretation
Paradigms
Applications
![Page 5: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/5.jpg)
Roadmap
• This week (Applications), the goals are:• To go beyond CS 61A and see
examples of what comes next
Introduction
Functions
Data
Mutability
Objects
Interpretation
Paradigms
Applications
![Page 6: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/6.jpg)
Roadmap
• This week (Applications), the goals are:• To go beyond CS 61A and see
examples of what comes next• To wrap up CS 61A!
Introduction
Functions
Data
Mutability
Objects
Interpretation
Paradigms
Applications
![Page 7: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/7.jpg)
Artificial Intelligence (AI)
![Page 8: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/8.jpg)
Artificial Intelligence (AI)
• The subfield of computer science that studies how to create programs that:
![Page 9: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/9.jpg)
Artificial Intelligence (AI)
• The subfield of computer science that studies how to create programs that:• Think like humans?
![Page 10: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/10.jpg)
Artificial Intelligence (AI)
• The subfield of computer science that studies how to create programs that:• Think like humans?
• Well, we don’t really know how humans think
![Page 11: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/11.jpg)
Artificial Intelligence (AI)
• The subfield of computer science that studies how to create programs that:• Think like humans?
• Well, we don’t really know how humans think• Act like humans?
![Page 12: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/12.jpg)
Artificial Intelligence (AI)
• The subfield of computer science that studies how to create programs that:• Think like humans?
• Well, we don’t really know how humans think• Act like humans?
• Quick, what’s 17548 * 44?
![Page 13: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/13.jpg)
Artificial Intelligence (AI)
• The subfield of computer science that studies how to create programs that:• Think like humans?
• Well, we don’t really know how humans think• Act like humans?
• Quick, what’s 17548 * 44?• Humans can often behave irrationally
![Page 14: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/14.jpg)
Artificial Intelligence (AI)
• The subfield of computer science that studies how to create programs that:• Think like humans?
• Well, we don’t really know how humans think• Act like humans?
• Quick, what’s 17548 * 44?• Humans can often behave irrationally
• Think rationally?
![Page 15: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/15.jpg)
Artificial Intelligence (AI)
• The subfield of computer science that studies how to create programs that:• Think like humans?
• Well, we don’t really know how humans think• Act like humans?
• Quick, what’s 17548 * 44?• Humans can often behave irrationally
• Think rationally?• What we really care about, though, is behavior
![Page 16: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/16.jpg)
Artificial Intelligence (AI)
• The subfield of computer science that studies how to create programs that:• Think like humans?
• Well, we don’t really know how humans think• Act like humans?
• Quick, what’s 17548 * 44?• Humans can often behave irrationally
• Think rationally?• What we really care about, though, is behavior
• Act rationally
![Page 17: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/17.jpg)
Artificial Intelligence (AI)
• The subfield of computer science that studies how to create programs that:• Think like humans?
• Well, we don’t really know how humans think• Act like humans?
• Quick, what’s 17548 * 44?• Humans can often behave irrationally
• Think rationally?• What we really care about, though, is behavior
• Act rationally
• A better name for artificial intelligence would be computational rationality
![Page 18: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/18.jpg)
Applications
![Page 19: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/19.jpg)
Applications
• Artificial intelligence has a wide range of applications, including examples such as:
![Page 20: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/20.jpg)
Applications
• Artificial intelligence has a wide range of applications, including examples such as:• Natural language processing
![Page 21: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/21.jpg)
Applications
• Artificial intelligence has a wide range of applications, including examples such as:• Natural language processing• Computer vision
![Page 22: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/22.jpg)
Applications
• Artificial intelligence has a wide range of applications, including examples such as:• Natural language processing• Computer vision• Robotics
![Page 23: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/23.jpg)
Applications
• Artificial intelligence has a wide range of applications, including examples such as:• Natural language processing• Computer vision• Robotics• Game playing
![Page 24: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/24.jpg)
Applications
• Artificial intelligence has a wide range of applications, including examples such as:• Natural language processing• Computer vision• Robotics• Game playing
![Page 25: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/25.jpg)
Game Playing
• Games have historically been a popular area of study in artificial intelligence, in part because they drive the study and implementation of efficient AI algorithms • If you’re interested, two recent-ish results include
playing Atari games at human expert levels andplaying Go beyond top human levels
• Many breakthroughs in AI research have come from building systems that play games, including advances in: • Reinforcement learning (Checkers, Atari) • Rational meta-reasoning (Reversi/Othello) • Game tree search algorithms (Go)
• We will build AI systems today that play Hog and Ants!
![Page 26: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/26.jpg)
Using Markov Decision Processes
Playing Hog
![Page 27: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/27.jpg)
Hog
![Page 28: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/28.jpg)
Hog
• Two player dice game
![Page 29: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/29.jpg)
Hog
• Two player dice game• Take turns rolling 0 to 10 dice and accumulating the sum
into your overall score, until someone reaches 100
![Page 30: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/30.jpg)
Hog
• Two player dice game• Take turns rolling 0 to 10 dice and accumulating the sum
into your overall score, until someone reaches 100• Several special rules to keep track of:
![Page 31: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/31.jpg)
Hog
• Two player dice game• Take turns rolling 0 to 10 dice and accumulating the sum
into your overall score, until someone reaches 100• Several special rules to keep track of:
• Pig Out, Free Bacon, Hog Tied, Hog Wild, Hogtimus Prime
![Page 32: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/32.jpg)
Hog
• Two player dice game• Take turns rolling 0 to 10 dice and accumulating the sum
into your overall score, until someone reaches 100• Several special rules to keep track of:
• Pig Out, Free Bacon, Hog Tied, Hog Wild, Hogtimus Prime • And the notorious Swine Swap
![Page 33: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/33.jpg)
Hog
• Two player dice game• Take turns rolling 0 to 10 dice and accumulating the sum
into your overall score, until someone reaches 100• Several special rules to keep track of:
• Pig Out, Free Bacon, Hog Tied, Hog Wild, Hogtimus Prime • And the notorious Swine Swap
• In the last question of this project, you had to implement a final strategy that beats always_roll(6) at least 70% of the time
![Page 34: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/34.jpg)
Hog
• Two player dice game• Take turns rolling 0 to 10 dice and accumulating the sum
into your overall score, until someone reaches 100• Several special rules to keep track of:
• Pig Out, Free Bacon, Hog Tied, Hog Wild, Hogtimus Prime • And the notorious Swine Swap
• In the last question of this project, you had to implement a final strategy that beats always_roll(6) at least 70% of the time• This is AI-like, except you (probably) hand-designed
the “intelligence” into your strategy
![Page 35: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/35.jpg)
Hog
• Two player dice game• Take turns rolling 0 to 10 dice and accumulating the sum
into your overall score, until someone reaches 100• Several special rules to keep track of:
• Pig Out, Free Bacon, Hog Tied, Hog Wild, Hogtimus Prime • And the notorious Swine Swap
• In the last question of this project, you had to implement a final strategy that beats always_roll(6) at least 70% of the time• This is AI-like, except you (probably) hand-designed
the “intelligence” into your strategy
• We can get up to ~85% win rate against always_roll(6)! I’ll show you how, using AI techniques and algorithms
![Page 36: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/36.jpg)
Agents and Environments
![Page 37: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/37.jpg)
Agents and Environments
• Many, if not most, problems in AI are formalized using the concepts of an agent and an environment
![Page 38: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/38.jpg)
Agents and Environments
• Many, if not most, problems in AI are formalized using the concepts of an agent and an environment
• The agent perceives information about the environment and performs actions that may change the environment
![Page 39: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/39.jpg)
Agents and Environments
• Many, if not most, problems in AI are formalized using the concepts of an agent and an environment
• The agent perceives information about the environment and performs actions that may change the environment
• This is a natural way to describe many games, robotic systems, humans, and much more
![Page 40: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/40.jpg)
Agents and Environments
• Many, if not most, problems in AI are formalized using the concepts of an agent and an environment
• The agent perceives information about the environment and performs actions that may change the environment
• This is a natural way to describe many games, robotic systems, humans, and much more
Agent Environment
![Page 41: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/41.jpg)
Agents and Environments
• Many, if not most, problems in AI are formalized using the concepts of an agent and an environment
• The agent perceives information about the environment and performs actions that may change the environment
• This is a natural way to describe many games, robotic systems, humans, and much more
Agent Environment
percepts
![Page 42: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/42.jpg)
Agents and Environments
• Many, if not most, problems in AI are formalized using the concepts of an agent and an environment
• The agent perceives information about the environment and performs actions that may change the environment
• This is a natural way to describe many games, robotic systems, humans, and much more
Agent Environment
percepts
actions
![Page 43: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/43.jpg)
Hog Agents and Environments
Agent Environment
percepts
actions
![Page 44: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/44.jpg)
Hog Agents and Environments
• In the game of Hog, who is the agent?
Agent Environment
percepts
actions
![Page 45: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/45.jpg)
Hog Agents and Environments
• In the game of Hog, who is the agent?• You, or the computer
Agent Environment
percepts
actions
![Page 46: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/46.jpg)
Hog Agents and Environments
• In the game of Hog, who is the agent?• You, or the computer
• What is the environment?Agent Environment
percepts
actions
![Page 47: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/47.jpg)
Hog Agents and Environments
• In the game of Hog, who is the agent?• You, or the computer
• What is the environment?• It’s the whole game!
Agent Environment
percepts
actions
![Page 48: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/48.jpg)
Hog Agents and Environments
• In the game of Hog, who is the agent?• You, or the computer
• What is the environment?• It’s the whole game!• Your opponent
Agent Environment
percepts
actions
![Page 49: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/49.jpg)
Hog Agents and Environments
• In the game of Hog, who is the agent?• You, or the computer
• What is the environment?• It’s the whole game!• Your opponent• (We are considering the opposing agent to be part of
the environment, because it’s simpler this way)
Agent Environment
percepts
actions
![Page 50: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/50.jpg)
Hog Agents and Environments
• In the game of Hog, who is the agent?• You, or the computer
• What is the environment?• It’s the whole game!• Your opponent• (We are considering the opposing agent to be part of
the environment, because it’s simpler this way)• You and your opponent’s score
Agent Environment
percepts
actions
![Page 51: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/51.jpg)
Hog Agents and Environments
• In the game of Hog, who is the agent?• You, or the computer
• What is the environment?• It’s the whole game!• Your opponent• (We are considering the opposing agent to be part of
the environment, because it’s simpler this way)• You and your opponent’s score• The rules of the game
Agent Environment
percepts
actions
![Page 52: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/52.jpg)
Hog Agents and Environments
• In the game of Hog, who is the agent?• You, or the computer
• What is the environment?• It’s the whole game!• Your opponent• (We are considering the opposing agent to be part of
the environment, because it’s simpler this way)• You and your opponent’s score• The rules of the game
• In AI, the problem we care about is figuring out how the agent should choose its actions, given what it perceives, so as to positively shape its environment
Agent Environment
percepts
actions
![Page 53: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/53.jpg)
Markov Decision Processes
![Page 54: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/54.jpg)
Markov Decision Processes
• To do this for Hog, we will formalize our environment as aMarkov Decision Process (MDP)
![Page 55: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/55.jpg)
Markov Decision Processes
• To do this for Hog, we will formalize our environment as aMarkov Decision Process (MDP)
• This means is that we have to specify:
![Page 56: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/56.jpg)
Markov Decision Processes
• To do this for Hog, we will formalize our environment as aMarkov Decision Process (MDP)
• This means is that we have to specify:
• A set of states S, which are the states of the environment
![Page 57: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/57.jpg)
Markov Decision Processes
• To do this for Hog, we will formalize our environment as aMarkov Decision Process (MDP)
• This means is that we have to specify:
• A set of states S, which are the states of the environment
• For Hog, we just need the two scores to represent states
![Page 58: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/58.jpg)
Markov Decision Processes
• To do this for Hog, we will formalize our environment as aMarkov Decision Process (MDP)
• This means is that we have to specify:
• A set of states S, which are the states of the environment
• For Hog, we just need the two scores to represent states
• A set of actions A, which are the actions the agent can take
![Page 59: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/59.jpg)
Markov Decision Processes
• To do this for Hog, we will formalize our environment as aMarkov Decision Process (MDP)
• This means is that we have to specify:
• A set of states S, which are the states of the environment
• For Hog, we just need the two scores to represent states
• A set of actions A, which are the actions the agent can take
• This is how many dice the agent chooses to roll
![Page 60: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/60.jpg)
Markov Decision Processes
• To do this for Hog, we will formalize our environment as aMarkov Decision Process (MDP)
• This means is that we have to specify:
• A set of states S, which are the states of the environment
• For Hog, we just need the two scores to represent states
• A set of actions A, which are the actions the agent can take
• This is how many dice the agent chooses to roll
• A reward function R(s), which is the reward for each state s
![Page 61: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/61.jpg)
Markov Decision Processes
• To do this for Hog, we will formalize our environment as aMarkov Decision Process (MDP)
• This means is that we have to specify:
• A set of states S, which are the states of the environment
• For Hog, we just need the two scores to represent states
• A set of actions A, which are the actions the agent can take
• This is how many dice the agent chooses to roll
• A reward function R(s), which is the reward for each state s• We get a positive/negative reward only when we win/lose
![Page 62: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/62.jpg)
Markov Decision Processes
• To do this for Hog, we will formalize our environment as aMarkov Decision Process (MDP)
• This means is that we have to specify:
• A set of states S, which are the states of the environment
• For Hog, we just need the two scores to represent states
• A set of actions A, which are the actions the agent can take
• This is how many dice the agent chooses to roll
• A reward function R(s), which is the reward for each state s• We get a positive/negative reward only when we win/lose
• A transition function T(s, a, s’), which tells us the probability of going to state s’ starting from state s and choosing action a
![Page 63: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/63.jpg)
Markov Decision Processes
• To do this for Hog, we will formalize our environment as aMarkov Decision Process (MDP)
• This means is that we have to specify:
• A set of states S, which are the states of the environment
• For Hog, we just need the two scores to represent states
• A set of actions A, which are the actions the agent can take
• This is how many dice the agent chooses to roll
• A reward function R(s), which is the reward for each state s• We get a positive/negative reward only when we win/lose
• A transition function T(s, a, s’), which tells us the probability of going to state s’ starting from state s and choosing action a
• We get this from dice probabilities and rules of the game
![Page 64: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/64.jpg)
Policies
![Page 65: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/65.jpg)
Policies
• Now, with our MDP, we can formalize our problem
![Page 66: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/66.jpg)
Policies
• Now, with our MDP, we can formalize our problem• Our agent has a policy 𝜋, which is a function that takes in
a state and outputs the action to take for that state
![Page 67: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/67.jpg)
Policies
• Now, with our MDP, we can formalize our problem• Our agent has a policy 𝜋, which is a function that takes in
a state and outputs the action to take for that state• The policies that the computer uses were called strategies
in the project
![Page 68: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/68.jpg)
Policies
• Now, with our MDP, we can formalize our problem• Our agent has a policy 𝜋, which is a function that takes in
a state and outputs the action to take for that state• The policies that the computer uses were called strategies
in the project• Our goal is to find the optimal policy 𝜋* that maximizes
the expected amount of reward the agent receives
![Page 69: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/69.jpg)
Policies
• Now, with our MDP, we can formalize our problem• Our agent has a policy 𝜋, which is a function that takes in
a state and outputs the action to take for that state• The policies that the computer uses were called strategies
in the project• Our goal is to find the optimal policy 𝜋* that maximizes
the expected amount of reward the agent receives• In our case, this means maximizing the win rate against
some fixed opponent, such as always_roll(6)
![Page 70: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/70.jpg)
Policies
• Now, with our MDP, we can formalize our problem• Our agent has a policy 𝜋, which is a function that takes in
a state and outputs the action to take for that state• The policies that the computer uses were called strategies
in the project• Our goal is to find the optimal policy 𝜋* that maximizes
the expected amount of reward the agent receives• In our case, this means maximizing the win rate against
some fixed opponent, such as always_roll(6)• How do we find this optimal policy? The reward function
gives us very little information because it is 0 except for winning and losing states
![Page 71: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/71.jpg)
Policies
• Now, with our MDP, we can formalize our problem• Our agent has a policy 𝜋, which is a function that takes in
a state and outputs the action to take for that state• The policies that the computer uses were called strategies
in the project• Our goal is to find the optimal policy 𝜋* that maximizes
the expected amount of reward the agent receives• In our case, this means maximizing the win rate against
some fixed opponent, such as always_roll(6)• How do we find this optimal policy? The reward function
gives us very little information because it is 0 except for winning and losing states
• We need something that will tell us about which states are more or less likely to win from
![Page 72: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/72.jpg)
Value Functions
![Page 73: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/73.jpg)
Value Functions
• Reward function: R(s) = reward of being in state s
![Page 74: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/74.jpg)
Value Functions
• Reward function: R(s) = reward of being in state s
• Value function: V(s) = value of being in state s
![Page 75: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/75.jpg)
Value Functions
• Reward function: R(s) = reward of being in state s
• Value function: V(s) = value of being in state s• The value is the long-term expected reward
![Page 76: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/76.jpg)
Value Functions
• Reward function: R(s) = reward of being in state s
• Value function: V(s) = value of being in state s• The value is the long-term expected reward• How do we determine the value of a state? With recursion!
![Page 77: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/77.jpg)
Value Functions
• Reward function: R(s) = reward of being in state s
• Value function: V(s) = value of being in state s• The value is the long-term expected reward• How do we determine the value of a state? With recursion!
• The value of a state is the reward of the state plus the value of the state we end up in next.
![Page 78: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/78.jpg)
Value Functions
• Reward function: R(s) = reward of being in state s
• Value function: V(s) = value of being in state s• The value is the long-term expected reward• How do we determine the value of a state? With recursion!
• The value of a state is the reward of the state plus the value of the state we end up in next.
![Page 79: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/79.jpg)
Value Functions
• Reward function: R(s) = reward of being in state s
• Value function: V(s) = value of being in state s• The value is the long-term expected reward• How do we determine the value of a state? With recursion!
• The value of a state is the reward of the state plus the value of the state we end up in next.
![Page 80: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/80.jpg)
Value Functions
• Reward function: R(s) = reward of being in state s
• Value function: V(s) = value of being in state s• The value is the long-term expected reward• How do we determine the value of a state? With recursion!
• The value of a state is the reward of the state plus the value of the state we end up in next.
![Page 81: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/81.jpg)
Value Functions
• Reward function: R(s) = reward of being in state s
• Value function: V(s) = value of being in state s• The value is the long-term expected reward• How do we determine the value of a state? With recursion!
• The value of a state is the reward of the state plus the value of the state we end up in next.
![Page 82: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/82.jpg)
Value Functions
• Reward function: R(s) = reward of being in state s
• Value function: V(s) = value of being in state s• The value is the long-term expected reward• How do we determine the value of a state? With recursion!
• The value of a state is the reward of the state plus the value of the state we end up in next.
• We take a maximum over all possible actions because we want to find the value for the optimal policy
![Page 83: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/83.jpg)
Value Functions
• Reward function: R(s) = reward of being in state s
• Value function: V(s) = value of being in state s• The value is the long-term expected reward• How do we determine the value of a state? With recursion!
• The value of a state is the reward of the state plus the value of the state we end up in next.
• We take a maximum over all possible actions because we want to find the value for the optimal policy
• We use a summation and T(s, a, s’) because there may be several different states we could end up in
![Page 84: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/84.jpg)
Value Iteration
![Page 85: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/85.jpg)
Value Iteration
• We may have to compute V(s) multiple times in order to get it right, because the value of later states s’ can change and this can affect the value of s
![Page 86: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/86.jpg)
Value Iteration
• We may have to compute V(s) multiple times in order to get it right, because the value of later states s’ can change and this can affect the value of s
• This leads us to an algorithm known as value iteration:
![Page 87: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/87.jpg)
Value Iteration
• We may have to compute V(s) multiple times in order to get it right, because the value of later states s’ can change and this can affect the value of s
• This leads us to an algorithm known as value iteration:• Repeat:
![Page 88: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/88.jpg)
Value Iteration
• We may have to compute V(s) multiple times in order to get it right, because the value of later states s’ can change and this can affect the value of s
• This leads us to an algorithm known as value iteration:• Repeat:
• For all states s, determine V(s)
![Page 89: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/89.jpg)
Value Iteration
• We may have to compute V(s) multiple times in order to get it right, because the value of later states s’ can change and this can affect the value of s
• This leads us to an algorithm known as value iteration:• Repeat:
• For all states s, determine V(s)
![Page 90: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/90.jpg)
Value Iteration
• We may have to compute V(s) multiple times in order to get it right, because the value of later states s’ can change and this can affect the value of s
• This leads us to an algorithm known as value iteration:• Repeat:
• For all states s, determine V(s)
• If V doesn’t change, return the policy 𝜋 that, given a state s, chooses the action a that maximizes the expected value of the next state s’
![Page 91: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/91.jpg)
Value Iteration
• We may have to compute V(s) multiple times in order to get it right, because the value of later states s’ can change and this can affect the value of s
• This leads us to an algorithm known as value iteration:• Repeat:
• For all states s, determine V(s)
• If V doesn’t change, return the policy 𝜋 that, given a state s, chooses the action a that maximizes the expected value of the next state s’
![Page 92: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/92.jpg)
Value Iteration
• We may have to compute V(s) multiple times in order to get it right, because the value of later states s’ can change and this can affect the value of s
• This leads us to an algorithm known as value iteration:• Repeat:
• For all states s, determine V(s)
• If V doesn’t change, return the policy 𝜋 that, given a state s, chooses the action a that maximizes the expected value of the next state s’
• We can show that this policy is optimal, under the correct assumptions! But let’s not do the math
![Page 93: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/93.jpg)
Algorithms for MDPs
![Page 94: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/94.jpg)
Algorithms for MDPs
• We now have an algorithm that will find us the optimal policy for playing against always_roll(6)!
![Page 95: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/95.jpg)
Algorithms for MDPs
• We now have an algorithm that will find us the optimal policy for playing against always_roll(6)!• It also does quite well against other opponents
![Page 96: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/96.jpg)
Algorithms for MDPs
• We now have an algorithm that will find us the optimal policy for playing against always_roll(6)!• It also does quite well against other opponents
• This algorithm, value iteration, is just a special case of a family of algorithms for solving MDPs by alternating between two steps:
![Page 97: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/97.jpg)
Algorithms for MDPs
• We now have an algorithm that will find us the optimal policy for playing against always_roll(6)!• It also does quite well against other opponents
• This algorithm, value iteration, is just a special case of a family of algorithms for solving MDPs by alternating between two steps:• Policy evaluation: Determine the value of each state s,
but using the current policy rather than the optimal
![Page 98: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/98.jpg)
Algorithms for MDPs
• We now have an algorithm that will find us the optimal policy for playing against always_roll(6)!• It also does quite well against other opponents
• This algorithm, value iteration, is just a special case of a family of algorithms for solving MDPs by alternating between two steps:• Policy evaluation: Determine the value of each state s,
but using the current policy rather than the optimal• Policy iteration: Improve the current policy to a new
policy using the value function found in the first step
![Page 99: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/99.jpg)
Algorithms for MDPs
• We now have an algorithm that will find us the optimal policy for playing against always_roll(6)!• It also does quite well against other opponents
• This algorithm, value iteration, is just a special case of a family of algorithms for solving MDPs by alternating between two steps:• Policy evaluation: Determine the value of each state s,
but using the current policy rather than the optimal• Policy iteration: Improve the current policy to a new
policy using the value function found in the first step• Value iteration combines these two steps into one!
![Page 100: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/100.jpg)
Algorithms for MDPs
• We now have an algorithm that will find us the optimal policy for playing against always_roll(6)!• It also does quite well against other opponents
• This algorithm, value iteration, is just a special case of a family of algorithms for solving MDPs by alternating between two steps:• Policy evaluation: Determine the value of each state s,
but using the current policy rather than the optimal• Policy iteration: Improve the current policy to a new
policy using the value function found in the first step• Value iteration combines these two steps into one!
• Let’s see the optimal policy in action
![Page 101: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/101.jpg)
Algorithms for MDPs
• We now have an algorithm that will find us the optimal policy for playing against always_roll(6)!• It also does quite well against other opponents
• This algorithm, value iteration, is just a special case of a family of algorithms for solving MDPs by alternating between two steps:• Policy evaluation: Determine the value of each state s,
but using the current policy rather than the optimal• Policy iteration: Improve the current policy to a new
policy using the value function found in the first step• Value iteration combines these two steps into one!
• Let’s see the optimal policy in action
(demo)
![Page 102: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/102.jpg)
Using rollout-based methods
Playing Ants
![Page 103: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/103.jpg)
Reinforcement Learning (RL)
![Page 104: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/104.jpg)
Reinforcement Learning (RL)
• In the reinforcement learning setting, we still model our environment as an MDP, except now we don’t know our reward function R(s) or transition function T(s, a, s’)
![Page 105: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/105.jpg)
Reinforcement Learning (RL)
• In the reinforcement learning setting, we still model our environment as an MDP, except now we don’t know our reward function R(s) or transition function T(s, a, s’)
• This is very much like the real world, and here’s an analogy: suppose you go on a date with someone
![Page 106: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/106.jpg)
Reinforcement Learning (RL)
• In the reinforcement learning setting, we still model our environment as an MDP, except now we don’t know our reward function R(s) or transition function T(s, a, s’)
• This is very much like the real world, and here’s an analogy: suppose you go on a date with someone
SomeoneYou
![Page 107: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/107.jpg)
Reinforcement Learning (RL)
• In the reinforcement learning setting, we still model our environment as an MDP, except now we don’t know our reward function R(s) or transition function T(s, a, s’)
• This is very much like the real world, and here’s an analogy: suppose you go on a date with someone
• You are the agent, the other person and the setting are the environment, and you don’t know the environment that well
SomeoneYou
![Page 108: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/108.jpg)
Reinforcement Learning (RL)
• In the reinforcement learning setting, we still model our environment as an MDP, except now we don’t know our reward function R(s) or transition function T(s, a, s’)
• This is very much like the real world, and here’s an analogy: suppose you go on a date with someone
• You are the agent, the other person and the setting are the environment, and you don’t know the environment that well
• At the beginning of the date, youmight not know how to act, so youtry different things to see how theother person responds
SomeoneYou
![Page 109: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/109.jpg)
Reinforcement Learning (RL)
• In the reinforcement learning setting, we still model our environment as an MDP, except now we don’t know our reward function R(s) or transition function T(s, a, s’)
• This is very much like the real world, and here’s an analogy: suppose you go on a date with someone
• You are the agent, the other person and the setting are the environment, and you don’t know the environment that well
• At the beginning of the date, youmight not know how to act, so youtry different things to see how theother person responds
SomeoneYou
Do youlike cats?
![Page 110: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/110.jpg)
Reinforcement Learning (RL)
• In the reinforcement learning setting, we still model our environment as an MDP, except now we don’t know our reward function R(s) or transition function T(s, a, s’)
• This is very much like the real world, and here’s an analogy: suppose you go on a date with someone
• You are the agent, the other person and the setting are the environment, and you don’t know the environment that well
• At the beginning of the date, youmight not know how to act, so youtry different things to see how theother person responds
SomeoneYou
Ew, no.
![Page 111: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/111.jpg)
Reinforcement Learning (RL)
• In the reinforcement learning setting, we still model our environment as an MDP, except now we don’t know our reward function R(s) or transition function T(s, a, s’)
• This is very much like the real world, and here’s an analogy: suppose you go on a date with someone
• You are the agent, the other person and the setting are the environment, and you don’t know the environment that well
• At the beginning of the date, youmight not know how to act, so youtry different things to see how theother person responds
SomeoneYou
Oh… yeah,me neither.
![Page 112: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/112.jpg)
Reinforcement Learning (RL)
• In the reinforcement learning setting, we still model our environment as an MDP, except now we don’t know our reward function R(s) or transition function T(s, a, s’)
• This is very much like the real world, and here’s an analogy: suppose you go on a date with someone
• You are the agent, the other person and the setting are the environment, and you don’t know the environment that well
• At the beginning of the date, youmight not know how to act, so youtry different things to see how theother person responds
• As the date goes on, you slowlyfigure out how you should actbased on what you’ve tried so far,and how it went Some
oneYou
Oh… yeah,me neither.
![Page 113: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/113.jpg)
Reinforcement Learning (RL)
• In the reinforcement learning setting, we still model our environment as an MDP, except now we don’t know our reward function R(s) or transition function T(s, a, s’)
• This is very much like the real world, and here’s an analogy: suppose you go on a date with someone
• You are the agent, the other person and the setting are the environment, and you don’t know the environment that well
• At the beginning of the date, youmight not know how to act, so youtry different things to see how theother person responds
• As the date goes on, you slowlyfigure out how you should actbased on what you’ve tried so far,and how it went Some
oneYou
So…do you
like dogs?
![Page 114: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/114.jpg)
Reinforcement Learning (RL)
• In the reinforcement learning setting, we still model our environment as an MDP, except now we don’t know our reward function R(s) or transition function T(s, a, s’)
• This is very much like the real world, and here’s an analogy: suppose you go on a date with someone
• You are the agent, the other person and the setting are the environment, and you don’t know the environment that well
• At the beginning of the date, youmight not know how to act, so youtry different things to see how theother person responds
• As the date goes on, you slowlyfigure out how you should actbased on what you’ve tried so far,and how it went Some
oneYou
I love dogs!
![Page 115: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/115.jpg)
Reinforcement Learning (RL)
• In the reinforcement learning setting, we still model our environment as an MDP, except now we don’t know our reward function R(s) or transition function T(s, a, s’)
• This is very much like the real world, and here’s an analogy: suppose you go on a date with someone
• You are the agent, the other person and the setting are the environment, and you don’t know the environment that well
• At the beginning of the date, youmight not know how to act, so youtry different things to see how theother person responds
• As the date goes on, you slowlyfigure out how you should actbased on what you’ve tried so far,and how it went Some
oneYou
Omg me too!!
![Page 116: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/116.jpg)
Reinforcement Learning (RL)
• In the reinforcement learning setting, we still model our environment as an MDP, except now we don’t know our reward function R(s) or transition function T(s, a, s’)
• This is very much like the real world, and here’s an analogy: suppose you go on a date with someone
• You are the agent, the other person and the setting are the environment, and you don’t know the environment that well
• At the beginning of the date, youmight not know how to act, so youtry different things to see how theother person responds
• As the date goes on, you slowlyfigure out how you should actbased on what you’ve tried so far,and how it went
• With some luck, and the right algorithm,you may learn how to act optimally!
SomeoneYou
Omg me too!!
![Page 117: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/117.jpg)
Reinforcement Learning (RL)
• In the reinforcement learning setting, we still model our environment as an MDP, except now we don’t know our reward function R(s) or transition function T(s, a, s’)
• This is very much like the real world, and here’s an analogy: suppose you go on a date with someone
• You are the agent, the other person and the setting are the environment, and you don’t know the environment that well
• At the beginning of the date, youmight not know how to act, so youtry different things to see how theother person responds
• As the date goes on, you slowlyfigure out how you should actbased on what you’ve tried so far,and how it went
• With some luck, and the right algorithm,you may learn how to act optimally!
SomeoneYou
DATE: SUCCESS
![Page 118: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/118.jpg)
RL Algorithms
![Page 119: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/119.jpg)
RL Algorithms
• Algorithms for reinforcement learning must solve a more general problem than algorithms like value iteration, because we don’t know how our environment works
![Page 120: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/120.jpg)
RL Algorithms
• Algorithms for reinforcement learning must solve a more general problem than algorithms like value iteration, because we don’t know how our environment works
• We have to make sure to try different actions to determine which ones work well in our environment
![Page 121: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/121.jpg)
RL Algorithms
• Algorithms for reinforcement learning must solve a more general problem than algorithms like value iteration, because we don’t know how our environment works
• We have to make sure to try different actions to determine which ones work well in our environment• This is called exploration
![Page 122: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/122.jpg)
RL Algorithms
• Algorithms for reinforcement learning must solve a more general problem than algorithms like value iteration, because we don’t know how our environment works
• We have to make sure to try different actions to determine which ones work well in our environment• This is called exploration
• However, we also want to make sure to use actions that we have already found to be good
![Page 123: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/123.jpg)
RL Algorithms
• Algorithms for reinforcement learning must solve a more general problem than algorithms like value iteration, because we don’t know how our environment works
• We have to make sure to try different actions to determine which ones work well in our environment• This is called exploration
• However, we also want to make sure to use actions that we have already found to be good• This is called exploitation
![Page 124: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/124.jpg)
RL Algorithms
• Algorithms for reinforcement learning must solve a more general problem than algorithms like value iteration, because we don’t know how our environment works
• We have to make sure to try different actions to determine which ones work well in our environment• This is called exploration
• However, we also want to make sure to use actions that we have already found to be good• This is called exploitation
• Balancing exploration and exploitation is a key problem that RL algorithms must address, and there are many different ways to handle this
![Page 125: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/125.jpg)
RL for Ants
![Page 126: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/126.jpg)
RL for Ants
• It’s a little weird to use MDPs and RL for Ants. Why?
![Page 127: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/127.jpg)
RL for Ants
• It’s a little weird to use MDPs and RL for Ants. Why?• Everything is deterministic
![Page 128: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/128.jpg)
RL for Ants
• It’s a little weird to use MDPs and RL for Ants. Why?• Everything is deterministic• This means that we don’t need a transition function,
and we actually do know how our environment works
![Page 129: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/129.jpg)
RL for Ants
• It’s a little weird to use MDPs and RL for Ants. Why?• Everything is deterministic• This means that we don’t need a transition function,
and we actually do know how our environment works• However, the state space for Ants is very, very large
![Page 130: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/130.jpg)
RL for Ants
• It’s a little weird to use MDPs and RL for Ants. Why?• Everything is deterministic• This means that we don’t need a transition function,
and we actually do know how our environment works• However, the state space for Ants is very, very large
• So even though we could specify how our environment works, it is very difficult to code it and for our program to utilize all of this information
![Page 131: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/131.jpg)
RL for Ants
• It’s a little weird to use MDPs and RL for Ants. Why?• Everything is deterministic• This means that we don’t need a transition function,
and we actually do know how our environment works• However, the state space for Ants is very, very large
• So even though we could specify how our environment works, it is very difficult to code it and for our program to utilize all of this information
• A more reasonable approach is thus to only look at a subset of states and actions, e.g., the more likely ones, and find an approximation that hopefully works for all states
![Page 132: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/132.jpg)
RL for Ants
• It’s a little weird to use MDPs and RL for Ants. Why?• Everything is deterministic• This means that we don’t need a transition function,
and we actually do know how our environment works• However, the state space for Ants is very, very large
• So even though we could specify how our environment works, it is very difficult to code it and for our program to utilize all of this information
• A more reasonable approach is thus to only look at a subset of states and actions, e.g., the more likely ones, and find an approximation that hopefully works for all states
• Now, it makes sense to use MDPs and RL for Ants
![Page 133: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/133.jpg)
Rollout-based Policy Iteration
![Page 134: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/134.jpg)
Rollout-based Policy Iteration
• In reinforcement learning and some other settings, a rollout is essentially a simulation, where the agent takes a certain number of actions in the environment
![Page 135: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/135.jpg)
Rollout-based Policy Iteration
• In reinforcement learning and some other settings, a rollout is essentially a simulation, where the agent takes a certain number of actions in the environment
• Algorithms that use rollouts to find a policy are sometimes called rollout-based algorithms
![Page 136: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/136.jpg)
Rollout-based Policy Iteration
• In reinforcement learning and some other settings, a rollout is essentially a simulation, where the agent takes a certain number of actions in the environment
• Algorithms that use rollouts to find a policy are sometimes called rollout-based algorithms
• One such algorithm is rollout-based policy iteration, which approximates the value function V(s) using rollouts
![Page 137: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/137.jpg)
Rollout-based Policy Iteration
• In reinforcement learning and some other settings, a rollout is essentially a simulation, where the agent takes a certain number of actions in the environment
• Algorithms that use rollouts to find a policy are sometimes called rollout-based algorithms
• One such algorithm is rollout-based policy iteration, which approximates the value function V(s) using rollouts• For every state seen during the rollouts, the value of
that state is the average of the rewards after that state for every rollout that included that state
![Page 138: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/138.jpg)
Rollout-based Policy Iteration
• In reinforcement learning and some other settings, a rollout is essentially a simulation, where the agent takes a certain number of actions in the environment
• Algorithms that use rollouts to find a policy are sometimes called rollout-based algorithms
• One such algorithm is rollout-based policy iteration, which approximates the value function V(s) using rollouts• For every state seen during the rollouts, the value of
that state is the average of the rewards after that state for every rollout that included that state
• For the unseen states, we assign them values by looking at the seen states that seem the most similar
![Page 139: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/139.jpg)
Rollout-based Policy Iteration
• In reinforcement learning and some other settings, a rollout is essentially a simulation, where the agent takes a certain number of actions in the environment
• Algorithms that use rollouts to find a policy are sometimes called rollout-based algorithms
• One such algorithm is rollout-based policy iteration, which approximates the value function V(s) using rollouts• For every state seen during the rollouts, the value of
that state is the average of the rewards after that state for every rollout that included that state
• For the unseen states, we assign them values by looking at the seen states that seem the most similar
• We balance exploration and exploitation by sometimes selecting a random action, rather than using our policy
![Page 140: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/140.jpg)
Rollout-based Policy Iteration
• In reinforcement learning and some other settings, a rollout is essentially a simulation, where the agent takes a certain number of actions in the environment
• Algorithms that use rollouts to find a policy are sometimes called rollout-based algorithms
• One such algorithm is rollout-based policy iteration, which approximates the value function V(s) using rollouts• For every state seen during the rollouts, the value of
that state is the average of the rewards after that state for every rollout that included that state
• For the unseen states, we assign them values by looking at the seen states that seem the most similar
• We balance exploration and exploitation by sometimes selecting a random action, rather than using our policy
• Let’s see a policy trained using this algorithm in action
![Page 141: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/141.jpg)
Rollout-based Policy Iteration
• In reinforcement learning and some other settings, a rollout is essentially a simulation, where the agent takes a certain number of actions in the environment
• Algorithms that use rollouts to find a policy are sometimes called rollout-based algorithms
• One such algorithm is rollout-based policy iteration, which approximates the value function V(s) using rollouts• For every state seen during the rollouts, the value of
that state is the average of the rewards after that state for every rollout that included that state
• For the unseen states, we assign them values by looking at the seen states that seem the most similar
• We balance exploration and exploitation by sometimes selecting a random action, rather than using our policy
• Let’s see a policy trained using this algorithm in action
(demo)
![Page 142: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/142.jpg)
Summary
![Page 143: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/143.jpg)
Summary
• Artificial intelligence is all about building programs that act rationally, i.e., computational rationality
![Page 144: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/144.jpg)
Summary
• Artificial intelligence is all about building programs that act rationally, i.e., computational rationality
• Game playing is an important and natural domain for much of artificial intelligence research and development
![Page 145: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/145.jpg)
Summary
• Artificial intelligence is all about building programs that act rationally, i.e., computational rationality
• Game playing is an important and natural domain for much of artificial intelligence research and development• We built an agent that plays Hog optimally against
always_roll(6), using MDPs and value iteration
![Page 146: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/146.jpg)
Summary
• Artificial intelligence is all about building programs that act rationally, i.e., computational rationality
• Game playing is an important and natural domain for much of artificial intelligence research and development• We built an agent that plays Hog optimally against
always_roll(6), using MDPs and value iteration• We built an agent that plays Ants pretty well, using
reinforcement learning and rollout-based methods
![Page 147: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/147.jpg)
Summary
• Artificial intelligence is all about building programs that act rationally, i.e., computational rationality
• Game playing is an important and natural domain for much of artificial intelligence research and development• We built an agent that plays Hog optimally against
always_roll(6), using MDPs and value iteration• We built an agent that plays Ants pretty well, using
reinforcement learning and rollout-based methods• However, applications of AI go far beyond games and
stretch into almost every area of everyday life
![Page 148: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/148.jpg)
Summary
• Artificial intelligence is all about building programs that act rationally, i.e., computational rationality
• Game playing is an important and natural domain for much of artificial intelligence research and development• We built an agent that plays Hog optimally against
always_roll(6), using MDPs and value iteration• We built an agent that plays Ants pretty well, using
reinforcement learning and rollout-based methods• However, applications of AI go far beyond games and
stretch into almost every area of everyday life• If you’re interested, take:
![Page 149: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/149.jpg)
Summary
• Artificial intelligence is all about building programs that act rationally, i.e., computational rationality
• Game playing is an important and natural domain for much of artificial intelligence research and development• We built an agent that plays Hog optimally against
always_roll(6), using MDPs and value iteration• We built an agent that plays Ants pretty well, using
reinforcement learning and rollout-based methods• However, applications of AI go far beyond games and
stretch into almost every area of everyday life• If you’re interested, take:
• CS 188 (Introduction to Artificial Intelligence)
![Page 150: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/150.jpg)
Summary
• Artificial intelligence is all about building programs that act rationally, i.e., computational rationality
• Game playing is an important and natural domain for much of artificial intelligence research and development• We built an agent that plays Hog optimally against
always_roll(6), using MDPs and value iteration• We built an agent that plays Ants pretty well, using
reinforcement learning and rollout-based methods• However, applications of AI go far beyond games and
stretch into almost every area of everyday life• If you’re interested, take:
• CS 188 (Introduction to Artificial Intelligence)• CS 189 (Introduction to Machine Learning)
![Page 151: Lecture 29: Artificial Intelligence - University of …cs61a/su16/assets/slides/...Game Playing • Games have historically been a popular area of study in artificial intelligence,](https://reader033.fdocuments.in/reader033/viewer/2022051800/5ac143a57f8b9ad73f8cb953/html5/thumbnails/151.jpg)
Thank you