Where will AGI come from?€¦ · Example: AlphaGo (see my Medium post “AlphaGo, in context”)...

82
Where will AGI come from? Y Conf, June 10, 2017 andrej @karpathy

Transcript of Where will AGI come from?€¦ · Example: AlphaGo (see my Medium post “AlphaGo, in context”)...

Page 1: Where will AGI come from?€¦ · Example: AlphaGo (see my Medium post “AlphaGo, in context”) Convenient properties of Go: 1. Deterministic. No noise in the game. 2. Fully observed.

Where will AGI come from?

Y Conf, June 10, 2017andrej @karpathy

Page 2: Where will AGI come from?€¦ · Example: AlphaGo (see my Medium post “AlphaGo, in context”) Convenient properties of Go: 1. Deterministic. No noise in the game. 2. Fully observed.

“Deep Learning” search popularity

20122012+ image recognition, 2010+ speech recognition, 2014+ machine translation,etc.

Page 3: Where will AGI come from?€¦ · Example: AlphaGo (see my Medium post “AlphaGo, in context”) Convenient properties of Go: 1. Deterministic. No noise in the game. 2. Fully observed.
Page 4: Where will AGI come from?€¦ · Example: AlphaGo (see my Medium post “AlphaGo, in context”) Convenient properties of Go: 1. Deterministic. No noise in the game. 2. Fully observed.

(from @ML_Hipster)

Page 5: Where will AGI come from?€¦ · Example: AlphaGo (see my Medium post “AlphaGo, in context”) Convenient properties of Go: 1. Deterministic. No noise in the game. 2. Fully observed.
Page 6: Where will AGI come from?€¦ · Example: AlphaGo (see my Medium post “AlphaGo, in context”) Convenient properties of Go: 1. Deterministic. No noise in the game. 2. Fully observed.

CS231n: Convolutional Neural Networks for Visual Recognition (Stanford Class)

2015: 150 students2016: 330 students2017: 750 students2018: ??? (max students per class is capped at 999)

Page 7: Where will AGI come from?€¦ · Example: AlphaGo (see my Medium post “AlphaGo, in context”) Convenient properties of Go: 1. Deterministic. No noise in the game. 2. Fully observed.

The Current State of Machine Intelligence 3.0 [Shivon Zilis]

Page 8: Where will AGI come from?€¦ · Example: AlphaGo (see my Medium post “AlphaGo, in context”) Convenient properties of Go: 1. Deterministic. No noise in the game. 2. Fully observed.

In popular media...

Page 9: Where will AGI come from?€¦ · Example: AlphaGo (see my Medium post “AlphaGo, in context”) Convenient properties of Go: 1. Deterministic. No noise in the game. 2. Fully observed.

1. AI today is still very narrow*.

*2. but thanks to Deep Learning, we can repurpose solution components faster.

Two comments:

Page 10: Where will AGI come from?€¦ · Example: AlphaGo (see my Medium post “AlphaGo, in context”) Convenient properties of Go: 1. Deterministic. No noise in the game. 2. Fully observed.

Example: AlphaGo(see my Medium post “AlphaGo, in context”)

Page 11: Where will AGI come from?€¦ · Example: AlphaGo (see my Medium post “AlphaGo, in context”) Convenient properties of Go: 1. Deterministic. No noise in the game. 2. Fully observed.

Convenient properties of Go:1. Deterministic. No noise in the game.2. Fully observed. Each player has complete information.3. Discrete action space. Finite number of actions possible.4. Perfect simulator. The effect of any action is know exactly.5. Short episodes. ~200 actions per game.6. Clear + fast evaluation. According to Go rules.7. Huge dataset available. Human vs human games.

Page 12: Where will AGI come from?€¦ · Example: AlphaGo (see my Medium post “AlphaGo, in context”) Convenient properties of Go: 1. Deterministic. No noise in the game. 2. Fully observed.

Q: “Can we run AlphaGo on a robot for the Amazon Picking Challenge”?

Page 13: Where will AGI come from?€¦ · Example: AlphaGo (see my Medium post “AlphaGo, in context”) Convenient properties of Go: 1. Deterministic. No noise in the game. 2. Fully observed.

Q: “Can we run AlphaGo on a robot for the Amazon Picking Challenge”?A:

Page 14: Where will AGI come from?€¦ · Example: AlphaGo (see my Medium post “AlphaGo, in context”) Convenient properties of Go: 1. Deterministic. No noise in the game. 2. Fully observed.

1. Deterministic. No noise in the game.2. Fully observed. Each player has complete information.3. Discrete action space. Finite number of actions possible.4. Perfect simulator. The effect of any action is know exactly.5. Short episodes. ~200 actions per game.6. Clear + fast evaluation. According to Go rules.7. Huge dataset available. Human vs human games.

Page 15: Where will AGI come from?€¦ · Example: AlphaGo (see my Medium post “AlphaGo, in context”) Convenient properties of Go: 1. Deterministic. No noise in the game. 2. Fully observed.

1. Deterministic. No noise in the game.2. Fully observed. Each player has complete information.3. Discrete action space. Finite number of actions possible.4. Perfect simulator. The effect of any action is know exactly.5. Short episodes. ~200 actions per game.6. Clear + fast evaluation. According to Go rules.7. Huge dataset available. Human vs human games.

OKOKish

OKTROUBLE.challenge

challengenot good

Page 16: Where will AGI come from?€¦ · Example: AlphaGo (see my Medium post “AlphaGo, in context”) Convenient properties of Go: 1. Deterministic. No noise in the game. 2. Fully observed.

Summary so far:1. in interest in AI

2. AI is still

3. AI tech works in some cases and can be repurposed much

(narrow)

Page 17: Where will AGI come from?€¦ · Example: AlphaGo (see my Medium post “AlphaGo, in context”) Convenient properties of Go: 1. Deterministic. No noise in the game. 2. Fully observed.

“What if we succeed in making it not narrow?”

Nick BostromStephen HawkingBill GatesElon MuskSam AltmanStuart RussellEliezer Yudkowsky...

~2014+

Page 18: Where will AGI come from?€¦ · Example: AlphaGo (see my Medium post “AlphaGo, in context”) Convenient properties of Go: 1. Deterministic. No noise in the game. 2. Fully observed.

Normal hype cycle

Page 19: Where will AGI come from?€¦ · Example: AlphaGo (see my Medium post “AlphaGo, in context”) Convenient properties of Go: 1. Deterministic. No noise in the game. 2. Fully observed.

AI is different.

Page 20: Where will AGI come from?€¦ · Example: AlphaGo (see my Medium post “AlphaGo, in context”) Convenient properties of Go: 1. Deterministic. No noise in the game. 2. Fully observed.

“AGI imminent.”

“Oh no, AI winter imminent. My funding is about to dry up again.”

Meanwhile, in Academia...

Page 21: Where will AGI come from?€¦ · Example: AlphaGo (see my Medium post “AlphaGo, in context”) Convenient properties of Go: 1. Deterministic. No noise in the game. 2. Fully observed.

Talk Outline:- Supervised learning - “it works, just scale up!”- Unsupervised learning - “it will work, if we only scale up!”- AIXI - “guys, I can write down optimal AI.”- Brain simulation - “this will work one day, right?”- Artificial Life - “just do what nature did.”- Something not on our radar

Where could AGI come from?

Page 22: Where will AGI come from?€¦ · Example: AlphaGo (see my Medium post “AlphaGo, in context”) Convenient properties of Go: 1. Deterministic. No noise in the game. 2. Fully observed.

Talk Outline:- Supervised learning - “it works, just scale up!”- Unsupervised learning - “it will work, if we only scale up!”- AIXI - “guys, I can write down optimal AI.”- Brain simulation - “this will work one day, right?”- Artificial Life - “just do what nature did.”- Something not on our radar

Where could AGI come from?

Page 23: Where will AGI come from?€¦ · Example: AlphaGo (see my Medium post “AlphaGo, in context”) Convenient properties of Go: 1. Deterministic. No noise in the game. 2. Fully observed.

Supervised Learning: Collect lots of labeled data, train a neural network on it.

Page 24: Where will AGI come from?€¦ · Example: AlphaGo (see my Medium post “AlphaGo, in context”) Convenient properties of Go: 1. Deterministic. No noise in the game. 2. Fully observed.

How do we get labels of intelligent behavior?

Page 25: Where will AGI come from?€¦ · Example: AlphaGo (see my Medium post “AlphaGo, in context”) Convenient properties of Go: 1. Deterministic. No noise in the game. 2. Fully observed.

Short Story on AI: A Cognitive Discontinuity.Nov 14, 2015

see:

link

Page 26: Where will AGI come from?€¦ · Example: AlphaGo (see my Medium post “AlphaGo, in context”) Convenient properties of Go: 1. Deterministic. No noise in the game. 2. Fully observed.

Amazon Mechanical TurkCORE IDEA: collect data from humans, then train a big Neural Net to mimic what humans do.

Page 27: Where will AGI come from?€¦ · Example: AlphaGo (see my Medium post “AlphaGo, in context”) Convenient properties of Go: 1. Deterministic. No noise in the game. 2. Fully observed.

Amazon Mechanical Turk ++

SSH

lots oftraindata

Page 28: Where will AGI come from?€¦ · Example: AlphaGo (see my Medium post “AlphaGo, in context”) Convenient properties of Go: 1. Deterministic. No noise in the game. 2. Fully observed.

Big Neural

Network

STATE:visionaudiojoint positions/velocities

TASKdescription

ACTION:joint torques, etc.

LABEL:ACTION taken by the human

OBJECTIVE:Make these equal

Page 29: Where will AGI come from?€¦ · Example: AlphaGo (see my Medium post “AlphaGo, in context”) Convenient properties of Go: 1. Deterministic. No noise in the game. 2. Fully observed.

Amazon Mechanical Turk ++Step 2:autonomy

Big Neural

Network

STATE:visionaudiojoint positions/velocities

TASKdescription

ACTION:joint torques, etc.

Page 30: Where will AGI come from?€¦ · Example: AlphaGo (see my Medium post “AlphaGo, in context”) Convenient properties of Go: 1. Deterministic. No noise in the game. 2. Fully observed.

What would this AI look like?

Page 31: Where will AGI come from?€¦ · Example: AlphaGo (see my Medium post “AlphaGo, in context”) Convenient properties of Go: 1. Deterministic. No noise in the game. 2. Fully observed.

Possible hint: char-rnn

The cat sat on a ma_?

Page 32: Where will AGI come from?€¦ · Example: AlphaGo (see my Medium post “AlphaGo, in context”) Convenient properties of Go: 1. Deterministic. No noise in the game. 2. Fully observed.

Big Neural

Network

STATE:previouscharacters

TASKnone

ACTION:next character

LABEL:next character by human

OBJECTIVE:Make these equal

Possible hint: char-rnn

Page 33: Where will AGI come from?€¦ · Example: AlphaGo (see my Medium post “AlphaGo, in context”) Convenient properties of Go: 1. Deterministic. No noise in the game. 2. Fully observed.
Page 34: Where will AGI come from?€¦ · Example: AlphaGo (see my Medium post “AlphaGo, in context”) Convenient properties of Go: 1. Deterministic. No noise in the game. 2. Fully observed.

at first:

Generate text from the model

Page 35: Where will AGI come from?€¦ · Example: AlphaGo (see my Medium post “AlphaGo, in context”) Convenient properties of Go: 1. Deterministic. No noise in the game. 2. Fully observed.

train for a bit

at first:

Page 36: Where will AGI come from?€¦ · Example: AlphaGo (see my Medium post “AlphaGo, in context”) Convenient properties of Go: 1. Deterministic. No noise in the game. 2. Fully observed.

train more

train more

at first:

train for a bit

Page 37: Where will AGI come from?€¦ · Example: AlphaGo (see my Medium post “AlphaGo, in context”) Convenient properties of Go: 1. Deterministic. No noise in the game. 2. Fully observed.
Page 38: Where will AGI come from?€¦ · Example: AlphaGo (see my Medium post “AlphaGo, in context”) Convenient properties of Go: 1. Deterministic. No noise in the game. 2. Fully observed.

open source textbook on algebraic geometry

Latex source

Page 39: Where will AGI come from?€¦ · Example: AlphaGo (see my Medium post “AlphaGo, in context”) Convenient properties of Go: 1. Deterministic. No noise in the game. 2. Fully observed.
Page 40: Where will AGI come from?€¦ · Example: AlphaGo (see my Medium post “AlphaGo, in context”) Convenient properties of Go: 1. Deterministic. No noise in the game. 2. Fully observed.

The low-level gestalt is right, but the high-level, long-term structure is missing. This is mitigated with more data / larger models.

Page 41: Where will AGI come from?€¦ · Example: AlphaGo (see my Medium post “AlphaGo, in context”) Convenient properties of Go: 1. Deterministic. No noise in the game. 2. Fully observed.

AIs in this approach…- Imitate/generate human-like actions- Can these AIs be creative?- Can they assemble a room of chairs/tables?- Can they make human domination schemes?

Page 42: Where will AGI come from?€¦ · Example: AlphaGo (see my Medium post “AlphaGo, in context”) Convenient properties of Go: 1. Deterministic. No noise in the game. 2. Fully observed.

AIs in this approach…- Imitate/generate human-like actions- Can these AIs be creative?- Can they assemble a room of chairs/tables?- Can they make human domination schemes?

(Kind of)(Yes)(No.)

Page 43: Where will AGI come from?€¦ · Example: AlphaGo (see my Medium post “AlphaGo, in context”) Convenient properties of Go: 1. Deterministic. No noise in the game. 2. Fully observed.

Talk Outline:- Supervised learning - “it works, just scale up!”- Unsupervised learning - “it will work, if we only scale up!”- AIXI - “guys, I can write down optimal AI.”- Brain simulation - “this will work one day, right?”- Artificial Life - “just do what nature did.”- Something not on our radar

Where could AGI come from?

Page 44: Where will AGI come from?€¦ · Example: AlphaGo (see my Medium post “AlphaGo, in context”) Convenient properties of Go: 1. Deterministic. No noise in the game. 2. Fully observed.

Unsupervised Learning: Big generative models.

1. Initialize a Big Neural Network2. Train it to compress a huge amount of

data on the internet3. ???4. Profit

Page 45: Where will AGI come from?€¦ · Example: AlphaGo (see my Medium post “AlphaGo, in context”) Convenient properties of Go: 1. Deterministic. No noise in the game. 2. Fully observed.

Example2: (variational) autoencoders

Also see:Autoregressive models,

Generative Adversarial Networks,etcetc.

identity function

Information bottleneck: 30 numbers.(must compress the data to 30 numbers to reconstruct later)

Page 46: Where will AGI come from?€¦ · Example: AlphaGo (see my Medium post “AlphaGo, in context”) Convenient properties of Go: 1. Deterministic. No noise in the game. 2. Fully observed.

Example2: (variational) autoencoders

Meddle with the code, then “decode” to the image

Page 47: Where will AGI come from?€¦ · Example: AlphaGo (see my Medium post “AlphaGo, in context”) Convenient properties of Go: 1. Deterministic. No noise in the game. 2. Fully observed.

Work at OpenAI: “Unsupervised Sentiment Neuron”(Alec Radford et al.)

Another example:

1. Train a large char-rnn on a large corpus of unlabeled reviews from Amazon2. One of the neurons automagically “discovers” a small sentiment classifier (this

high-level feature must help predict the next character)

(char-rnn also optimizes compression of data; prediction and compression are closely linked.)

Page 48: Where will AGI come from?€¦ · Example: AlphaGo (see my Medium post “AlphaGo, in context”) Convenient properties of Go: 1. Deterministic. No noise in the game. 2. Fully observed.

Basic idea:

all of internet

Big Neural Network +compression objective

Page 49: Where will AGI come from?€¦ · Example: AlphaGo (see my Medium post “AlphaGo, in context”) Convenient properties of Go: 1. Deterministic. No noise in the game. 2. Fully observed.

What would this AI look like?- The neural network has a powerful “brain state”:

- Given any input data, could get e.g. 10,000 numbers of the networks “thoughts” about the data.

- Given any vector of 10,000 numbers, we could maybe ask the network to generate samples of data that correspond.

- Does it want to take over the world? (no; has no agency, no planning, etc.)

Page 50: Where will AGI come from?€¦ · Example: AlphaGo (see my Medium post “AlphaGo, in context”) Convenient properties of Go: 1. Deterministic. No noise in the game. 2. Fully observed.

Talk Outline:- Supervised learning - “it works, just scale up!”- Unsupervised learning - “it will work, if we only scale up!”- AIXI - “guys, I can write down optimal AI.”- Brain simulation - “this will work one day, right?”- Artificial Life - “just do what nature did.”- Something not on our radar

Where could AGI come from?

Page 51: Where will AGI come from?€¦ · Example: AlphaGo (see my Medium post “AlphaGo, in context”) Convenient properties of Go: 1. Deterministic. No noise in the game. 2. Fully observed.

AIXI- Algorithmic information theory applied to general artificial

intelligence. (Marcus Hutter)

- Allows for a formal definition of “Universal Intelligence” (Shane Legg)

- Bayesian Reinforcement Learning agent over the hypothesis space of all Turing machines.

Page 52: Where will AGI come from?€¦ · Example: AlphaGo (see my Medium post “AlphaGo, in context”) Convenient properties of Go: 1. Deterministic. No noise in the game. 2. Fully observed.

Turing machines

Prior probability:“Simpler worlds” are more likely

P

Turing machines

Likelihood probability:Which TMs are consistent with my experience so far?

P

System identification: which Turing machine am I in? If I knew, I could plan perfectly.

Multiply vertically to get a posterior

Page 53: Where will AGI come from?€¦ · Example: AlphaGo (see my Medium post “AlphaGo, in context”) Convenient properties of Go: 1. Deterministic. No noise in the game. 2. Fully observed.

We can write down the optimal agent’s action at time t:

(from http://www.vetta.org/documents/Machine_Super_Intelligence.pdf)

where

Page 54: Where will AGI come from?€¦ · Example: AlphaGo (see my Medium post “AlphaGo, in context”) Convenient properties of Go: 1. Deterministic. No noise in the game. 2. Fully observed.

Complete history of interactions up to this point

time t

time m

all possible future action-state sequences

Weighted average of the total discounted reward, across all possible Turing Machines.

The weights are [prior] x [likelihood] for each Turing machine.

(description length of the TM, number of bits)

Page 55: Where will AGI come from?€¦ · Example: AlphaGo (see my Medium post “AlphaGo, in context”) Convenient properties of Go: 1. Deterministic. No noise in the game. 2. Fully observed.
Page 56: Where will AGI come from?€¦ · Example: AlphaGo (see my Medium post “AlphaGo, in context”) Convenient properties of Go: 1. Deterministic. No noise in the game. 2. Fully observed.

There’s just a few problems...

!!! !!!

!!!!!!!!!!!!11

Page 57: Where will AGI come from?€¦ · Example: AlphaGo (see my Medium post “AlphaGo, in context”) Convenient properties of Go: 1. Deterministic. No noise in the game. 2. Fully observed.

Attempts have been made...I like “A Monte-Carlo AIXI Approximation” from Veness et al. 2011, https://www.aaai.org/Papers/JAIR/Vol40/JAIR-4004.pdf

Page 58: Where will AGI come from?€¦ · Example: AlphaGo (see my Medium post “AlphaGo, in context”) Convenient properties of Go: 1. Deterministic. No noise in the game. 2. Fully observed.

What would this agent look like?

- We need to feed it a reward signal. Might be very hard to write down. Might lead to “perverse instantiations” (e.g. paper clip maximizers etc.)

- Or maybe humans have a dial that gives the reward. But its actions might not be fully observable to humans.

- Very computationally intractable. Also, people are really not good at writing complex code. (e.g. for “AIXI approximation”).

- This agent could be quite scary. Definitely has agency.

Page 59: Where will AGI come from?€¦ · Example: AlphaGo (see my Medium post “AlphaGo, in context”) Convenient properties of Go: 1. Deterministic. No noise in the game. 2. Fully observed.

Talk Outline:- Supervised learning - “it works, just scale up!”- Unsupervised learning - “it will work, if we only scale up!”- AIXI - “guys, I can write down optimal AI.”- Brain simulation - “this will work one day, right?”- Artificial Life - “just do what nature did.”- Something not on our radar

Where could AGI come from?

Page 60: Where will AGI come from?€¦ · Example: AlphaGo (see my Medium post “AlphaGo, in context”) Convenient properties of Go: 1. Deterministic. No noise in the game. 2. Fully observed.

Brain simulationBRAIN initiative, Human Brain Project, optogenetics, multi-electrode arrays, connectomics, NeuraLink, ...

Page 61: Where will AGI come from?€¦ · Example: AlphaGo (see my Medium post “AlphaGo, in context”) Convenient properties of Go: 1. Deterministic. No noise in the game. 2. Fully observed.

Brain simulation- How to measure a complete brain state?- At what level of abstraction?- How to model the dynamics?- How do you simulate the “environment” to

feed into senses?- Various ethical dilemmas- Timescale-bearish neuroscientists.

Page 62: Where will AGI come from?€¦ · Example: AlphaGo (see my Medium post “AlphaGo, in context”) Convenient properties of Go: 1. Deterministic. No noise in the game. 2. Fully observed.

Talk Outline:- Supervised learning - “it works, just scale up!”- Unsupervised learning - “it will work, if we only scale up!”- AIXI - “guys, I can write down optimal AI.”- Brain simulation - “this will work one day, right?”- Artificial Life - “just do what nature did.”- Something not on our radar

Where could AGI come from?

Page 63: Where will AGI come from?€¦ · Example: AlphaGo (see my Medium post “AlphaGo, in context”) Convenient properties of Go: 1. Deterministic. No noise in the game. 2. Fully observed.

How did intelligence arise in nature?

Page 64: Where will AGI come from?€¦ · Example: AlphaGo (see my Medium post “AlphaGo, in context”) Convenient properties of Go: 1. Deterministic. No noise in the game. 2. Fully observed.

We don’t have to redo 4B years of evolution.- Work at a higher level of abstraction. We don’t have to

simulate chemistry etc. to get intelligent networks.

- Intelligent design. We can meddle with the system and initialize with RL agents, etc.

Page 65: Where will AGI come from?€¦ · Example: AlphaGo (see my Medium post “AlphaGo, in context”) Convenient properties of Go: 1. Deterministic. No noise in the game. 2. Fully observed.

Intelligence is the ability to win, in the face of world dynamics and a changing population of other intelligent agents with similar goals.

Page 66: Where will AGI come from?€¦ · Example: AlphaGo (see my Medium post “AlphaGo, in context”) Convenient properties of Go: 1. Deterministic. No noise in the game. 2. Fully observed.

● attention. The at-will ability to selectively "filter out" parts of the input that is judged not to be relevant for a current top-down goal. e.g. the "cocktail party effect".

● working memory: some structures/processes that temporarily store and manipulate information (7 +/- 2). Related to this, phonological loop: a special part of working memory dedicated to storing a few seconds of sound (e.g. when you repeat a 7-digit phone number in your mind to keep it in memory). also: the visuospatial sketchpad and an episodic buffer.

● long-term memory of quite a few suspected different types: procedural memory (e.g. driving a car), semantic memory (e.g. the name of the current President), episodic memory (for autobiographical sequences of events, e.g. where one was during 9/11)

● knowledge representation; the ability to rapidly learn and incorporate facts into some "world model" that can be inferred over in what looks to be approximately bayesian ways. the ability to detect and resolve contradictions, or propose experiments that disambiguate cases. the ability to keep track of what source provided a piece of information and later down-weigh its confidence if the source is suddenly judged not trust-worthy.

● spatial reasoning, some crude "game engine" model of a scene and its objects and attributes. All the complex biases we have built in that only get properly revealed with optical illusions. Spatial memory: cells in the brain that keep track of the connectivity of the world and do something like an automatic "SLAM", putting together a lot of information from different senses to position the brain in the world.

● reasoning by analogy, eg applying a proverb such as "that’s locking the barn door after the horse has gone" to a current situation.● emotions; heuristics that make our genes more likely to spread - e.g. frustration.● a forward simulator, which lets us roll forward and consider abstract events and situations.● various skill acquisition heuristics; practicing something repeatedly, including the abstract idea of "resetting" an experiment, or

deciding when an experiment is finished, or what its outcomes were. The heuristic inclination for "fun", experimentation, and curiosity. The heuristic of empowerment, or the idea that it is better to take actions that leave more options available in the future.

● consciousness / theory of mind: the understanding that other agents are like me but also slightly different in unknown ways. Empathy (e.g. the cringy feeling when seeing someone else get hurt). Imitation learning, or the heuristic of paying attention to and then later repeating what the other agents are doing.

Intelligence “Cognitive toolkit” includes but is not limited to:

Page 67: Where will AGI come from?€¦ · Example: AlphaGo (see my Medium post “AlphaGo, in context”) Convenient properties of Go: 1. Deterministic. No noise in the game. 2. Fully observed.

Conclusion: we need to create environments that incentivize the emergence of a cognitive toolkit.

Page 68: Where will AGI come from?€¦ · Example: AlphaGo (see my Medium post “AlphaGo, in context”) Convenient properties of Go: 1. Deterministic. No noise in the game. 2. Fully observed.

Conclusion: we need to create environments that incentivize the emergence of cognitive toolkit.

Incentives a lookup table of correct moves.

Doing it wrong:

Page 69: Where will AGI come from?€¦ · Example: AlphaGo (see my Medium post “AlphaGo, in context”) Convenient properties of Go: 1. Deterministic. No noise in the game. 2. Fully observed.

Conclusion: we need to create environments that incentivize the emergence of cognitive toolkit.

Doing it right:

Incentives a lookup table of correct moves.

Doing it wrong:

Incentivises cognitive tools.

Page 70: Where will AGI come from?€¦ · Example: AlphaGo (see my Medium post “AlphaGo, in context”) Convenient properties of Go: 1. Deterministic. No noise in the game. 2. Fully observed.

Benefits of multi-agent environments:

- variety - the environment is parameterized by its agent population, so an optimal strategy must be dynamically derived, and cannot be statically “baked” as behaviors / reflexes into a network.

- natural curriculum - the difficulty of the environment is determined by the skill of the other agents.

Page 71: Where will AGI come from?€¦ · Example: AlphaGo (see my Medium post “AlphaGo, in context”) Convenient properties of Go: 1. Deterministic. No noise in the game. 2. Fully observed.

Why? Trends.

Q: What about the optimization?A: Optimize over the whole thing: the architecture, the initialization, the learning rule.Write very little (or none) explicit code.

(example small tensorflow graph)

Page 72: Where will AGI come from?€¦ · Example: AlphaGo (see my Medium post “AlphaGo, in context”) Convenient properties of Go: 1. Deterministic. No noise in the game. 2. Fully observed.

datasets

models

ImageNet(~10^6 images)

Caltech 101(~10^4 images)

(how large they are)

Google/FBImages on the web

(~10^9+ images)

(how well they work)Image Features

(SIFT etc., learning linear classifiers on top)

ConvNets(learn the features,

Structure hard-coded)

2013

201790s - 2012

CodeGen(learn the weightsand the structure)

proje

ction

Hard Coded (edge detection etc.

no learning)

Lena(10^0; single image)

70s - 90s

poss

ibility

fron

tier

Zone of “not going to happen.”

Pascal VOC(~10^5 images)

In Computer Vision...

Page 73: Where will AGI come from?€¦ · Example: AlphaGo (see my Medium post “AlphaGo, in context”) Convenient properties of Go: 1. Deterministic. No noise in the game. 2. Fully observed.

environments

agents

MuJoCo/ATARI/Universe

(~few dozen envs)

Cartpole etc.(and bandits, gridworld,

...few toy tasks)

(how much they measure / incentivise general intelligence)more multi-agent / non-stationary / real-world-like.

(how impressive they are)more learning. more compute.

Value Iteration etc. (~discrete MDPs, linear function approximators)

DQN, PG(deep nets, hard-coded

various tricks)

2013

2017

RL^2(Learn the RL

algorithm. structure fixed.)

90s - 2012

CodeGen(learn structure and learning algorithm)

proje

ction

(simple multi-agent envs)

Digital worlds(complex multi-agent envs)

Reality

Hard Coded (LISP programs, no learning)

BlocksWorld(SHRDLU etc)

70s - 90s

poss

ibility

fron

tier

Zone of “not going to happen.”

In Reinforcement Learning

Page 74: Where will AGI come from?€¦ · Example: AlphaGo (see my Medium post “AlphaGo, in context”) Convenient properties of Go: 1. Deterministic. No noise in the game. 2. Fully observed.

With increasing computational resources, the trend is towards more learning/optimization, and less explicit design.

1970: One of Many explicit (LISP) programs that made up SHRDLU.

50 years“NEURAL ARCHITECTURE SEARCH WITH REINFORCEMENT LEARNING”, Zoph & Le

Large-Scale Evolution of Image Classifiers

Page 75: Where will AGI come from?€¦ · Example: AlphaGo (see my Medium post “AlphaGo, in context”) Convenient properties of Go: 1. Deterministic. No noise in the game. 2. Fully observed.

“Learning to Cooperate, Compete, and Communicate” OpenAI blog post, 2017

- 4 red agents cooperate to chase 2 green agents

- 2 green agents want to reach blue “water”

Page 76: Where will AGI come from?€¦ · Example: AlphaGo (see my Medium post “AlphaGo, in context”) Convenient properties of Go: 1. Deterministic. No noise in the game. 2. Fully observed.

What would this look like?- Achieve completely uninterpretable “proto-AIs” first, similar

to simple animals, but with fairly complete cognitive toolkits.- Evolved AIs are a synthetic species that lives among us.- We will shape them to love humans, similar to how we

shaped dogs.- “AI safety” will become a primarily empirical discipline, not a

mathematical one as it is today.- Some might try to evolve bad AIs, equiv. to. combat dogs.- We might have to make it illegal to evolve AI strains, or

upper bound the amount of computation per person and closely track all computational resources on Earth.

Page 77: Where will AGI come from?€¦ · Example: AlphaGo (see my Medium post “AlphaGo, in context”) Convenient properties of Go: 1. Deterministic. No noise in the game. 2. Fully observed.

Talk Outline:- Supervised learning - “it works, just scale up!”- Unsupervised learning - “it will work, if we only scale up!”- AIXI - “guys, I can write down optimal AI.”- Brain simulation - “this will work one day, right?”- Artificial Life - “just do what nature did.”- Something not on our radar

Where could AGI come from?

Page 78: Where will AGI come from?€¦ · Example: AlphaGo (see my Medium post “AlphaGo, in context”) Convenient properties of Go: 1. Deterministic. No noise in the game. 2. Fully observed.

+

Data from very large VR MMORPG worlds?

Page 79: Where will AGI come from?€¦ · Example: AlphaGo (see my Medium post “AlphaGo, in context”) Convenient properties of Go: 1. Deterministic. No noise in the game. 2. Fully observed.

Combination of some of the above?

- E.g. take the artificial life approach, but allow agents to access the high-level representations of a big, pre-trained generative model.

Page 80: Where will AGI come from?€¦ · Example: AlphaGo (see my Medium post “AlphaGo, in context”) Convenient properties of Go: 1. Deterministic. No noise in the game. 2. Fully observed.

In order of promisingness:

- Artificial Life - “just do what nature did.”- Something not on our radar- Supervised learning - “it works, just scale up!”- Unsupervised learning - “it will work, if we only scale up!”- AIXI - “guys, I can write down optimal AI.”- Brain simulation - “this will work one day, right?”

Conclusion

Page 81: Where will AGI come from?€¦ · Example: AlphaGo (see my Medium post “AlphaGo, in context”) Convenient properties of Go: 1. Deterministic. No noise in the game. 2. Fully observed.

What do you think?

(Thank you!)

SL UL AIXI

BrainSim ALife Other

http://bit.ly/2r54rfe

Page 82: Where will AGI come from?€¦ · Example: AlphaGo (see my Medium post “AlphaGo, in context”) Convenient properties of Go: 1. Deterministic. No noise in the game. 2. Fully observed.

Cool Related Pointers

Sebastian’s post, which inspired the title of this talkhttp://www.nowozin.net/sebastian/blog/where-will-artificial-intelligence-come-from.html

Rodney Brooks paperhttps://www.researchgate.net/publication/222486990_Intelligence_Without_Representation