Designing a Safe Motivational System for Intelligent Machines
description
Transcript of Designing a Safe Motivational System for Intelligent Machines
Designing aSafe Motivational System
for Intelligent Machines
Mark R. Waser
Inflammatory Statements
>Human intelligence REQUIRES ethics All humans want the same things Ethics are universal Ethics are SIMPLE in concept Difference in power is irrelevant (to ethics) Evolution has “designed” you todisagree with the above five points
Definitions
• Human – goal-directed entity
• Goals – a destination OR a direction
• Restrictions – conditional overriding goals
• Motivation – incentive to move
• Actions – determined by goals + motivations
• Path (or direction)
• Preferences, Rules-of-Thumb and Defaults
• Ethics (the *goal* includes the path)
• Safety
(disguised assumptions)
http://www.markzug.com/
1. A robot may not injure a human
being or, through inaction, allow a
human being to come to harm.
2. A robot must obey orders given
to it by human beings except where
such orders would conflict with the
First Law.
3. A robot must protect its own
existence as long as such protection
does not conflict with the First or
Second Law.
Asimov's 3 Laws:
Four Possible Scenarios
• Asimov’s early robots (little foresight, helpful but easily confused or conflicted)
• Immediate shutdown/suicide
• VIKI from the movie “I, Robot” (generalize to “bubble-wrapping” humanity)
• Asimov’s late robots (further generalize to self-exile with invisible continuing assistance)
SIAI’s Definitions
• Friendly AI - an AI that takes actions that are, on the whole, beneficial to humans and humanity; benevolent rather than malevolent; nice rather than hostile
• Coherent Extrapolated Volition of Humanity (CEV) - “In poetic terms, our coherent extrapolated volition is our wish if we knew more, thought faster, were more the people we wished we were, had grown up farther together.”
------------------
goals & motivations
SIAI’s First Law
An AI must be
beneficial to humans and humanity
(benevolent rather than malevolent)
But . . .
What is beneficial?
What are humans and humanity?
Values (good/bad) are *entirely* derivative/relative with respect to some goal (CEV)
Value = f(x, y)where
x is a set of circumstances (world state), y is a set of (proposed) actions, and
f is an evaluation of how well your goal is advanced
Value = f(x, y, t, e)
t is the time point at which goal progress is judgede is the set of entities which the goal covers
Value Formula
Questions
• Is this moral relativism?
• Are values complex?
• Must our goal (CEV) be complex?
Copernicus!
Assume that beneficial was a relatively simple formula (like z2+c)
Mandelbrot set
Assume further that we are trying to determine that formula (beneficial) by looking at the results
(color) one example (pixel) at a time
Color Illusions
Current Situation of Ethics
• Two formulas (beneficial to humans and humanity & beneficial to me)
• As long as you aren’t caught, all the incentive is to shade towards the second
• Evolution has “designed” humans to be able to shade to the second (Trivers, Hauser)
• Further, for very intelligent people, it is far more advantageous for ethics to be complex
Definition
Ethics *IS*
What is beneficial for the community
OR
What maximizes cooperation
Goal(s)/Omohundro Drives
1. AIs will want to self-improve
2. AIs will want to be rational
3. AIs will try to preserve their utility
4. AIs will try to prevent counterfeit utility
5. AIs will be self-protective
6. AIs will want to acquire resources and use them efficiently
“Without explicit goals to the contrary, AIs are likely to behave like human sociopaths
in their pursuit of resources.”
7. GDEs will want cooperation and to be part of a community
8. GDEs will want FREEDOM!
GDEs-----
Humans . . . • Are classified as obligatorily gregarious because we come
from a long lineage for which life in groups is not an option but a survival strategy (Frans de Waal, 2006)
• Evolved to be extremely social because mass cooperation, in the form of community, is the best way to survive and thrive
• Have empathy not only because it helps to understand and predict the actions of others but, more importantly, prevents us from doing anti-social things that will inevitably hurt us in the long run (although we generally won’t believe this)
• Have not yet evolved a far-sighted rationality where the “rational” conscious mind is capable of competently making the correct social/community choices when deprived of our subconscious “sense of morality”
Circles of Morality
Relationships and Loyalty
/Moral Sombrero
• Friendly Entity (“Friendly”) - an entity with goals and motivations that are, on the whole, beneficial to humans and humanity; benevolent rather than malevolent
• Friendly Entity (“Friendly”) - an entity with goals and motivations that are, on the whole, beneficial to the community of Friendlies (i.e. the set of all Friendlies, known or unknown); benevolent rather than malevolent
Redefining Friendly Entity
Friendliness’s First Law
An entity must be
beneficial to the community of Friendlies
(benevolent rather than malevolent)
But . . .
What is beneficial?
What are humans and humanity?--------------------------------
What is beneficial?• Cooperation (minimize conflicts & frictions)
• Omohundro drives • Increasing the size of the community (both
growing and preventing defection)
• To meet the needs/goals of each member of the community better than any alternative
(as judged by them -- without interference or gaming)
What is harmful?
• Blocking/Perverting Omohundro Drives• Lying• Single-goaled entities• Over-optimization (achievable top level goals)• The fact that we do not maintain our top-level
goal and have not yet evolved a far-sighted rationality where the “rational” conscious mind is capable of competently making the correct social/community choices when deprived of our “moral sense”
OPTIMAL
This makes ethics much more complex because it includes the cultural history
The anti-gaming drive to maintain utility adds friction/resistance to the discussion of ethics
community’s sense of what
is correct (ethical)
<
ONE non-organ donor
SIX dying patients>
+avoiding a
defensive arms raceCredit to: Eric Baum What Is Thought?
Triangle
stimuli implement moral rules of thumb
LOGICAL VIEW GOAL(S)
ACTIONS
CEV
Sloman’s architecture
for ahuman-like agent
(Sloman 1999)
Inflammatory Statements >Human intelligence REQUIRES ethics All humans want the same things Ethics are universal Ethics are SIMPLE in concept Difference in power is irrelevant (to ethics) Evolution has “designed” you todisagree with the above five points
Next . . . .
Copies of this powerpoint available from [email protected]
CEV Candidate #1:
We wish thatall entities
were Friendlies
Necessary? Sufficient/Complete? Possible?