Designing a Safe Motivational System for Intelligent Machines

Designing aSafe Motivational System

for Intelligent Machines

Mark R. Waser

Inflammatory Statements

>Human intelligence REQUIRES ethics All humans want the same things Ethics are universal Ethics are SIMPLE in concept Difference in power is irrelevant (to ethics) Evolution has “designed” you todisagree with the above five points

Definitions

• Human – goal-directed entity

• Goals – a destination OR a direction

• Restrictions – conditional overriding goals

• Motivation – incentive to move

• Actions – determined by goals + motivations

• Path (or direction)

• Preferences, Rules-of-Thumb and Defaults

• Ethics (the *goal* includes the path)

• Safety

(disguised assumptions)

http://www.markzug.com/

1. A robot may not injure a human

being or, through inaction, allow a

human being to come to harm.

2. A robot must obey orders given

to it by human beings except where

such orders would conflict with the

First Law.

3. A robot must protect its own

existence as long as such protection

does not conflict with the First or

Second Law.

Asimov's 3 Laws:

Four Possible Scenarios

• Asimov’s early robots (little foresight, helpful but easily confused or conflicted)

• Immediate shutdown/suicide

• VIKI from the movie “I, Robot” (generalize to “bubble-wrapping” humanity)

• Asimov’s late robots (further generalize to self-exile with invisible continuing assistance)

SIAI’s Definitions

• Friendly AI - an AI that takes actions that are, on the whole, beneficial to humans and humanity; benevolent rather than malevolent; nice rather than hostile

• Coherent Extrapolated Volition of Humanity (CEV) - “In poetic terms, our coherent extrapolated volition is our wish if we knew more, thought faster, were more the people we wished we were, had grown up farther together.”

------------------

goals & motivations

SIAI’s First Law

An AI must be

beneficial to humans and humanity

(benevolent rather than malevolent)

But . . .

What is beneficial?

What are humans and humanity?

Values (good/bad) are *entirely* derivative/relative with respect to some goal (CEV)

Value = f(x, y)where

x is a set of circumstances (world state), y is a set of (proposed) actions, and

f is an evaluation of how well your goal is advanced

Value = f(x, y, t, e)

t is the time point at which goal progress is judgede is the set of entities which the goal covers

Value Formula

Questions

• Is this moral relativism?

• Are values complex?

• Must our goal (CEV) be complex?

Copernicus!

Assume that beneficial was a relatively simple formula (like z2+c)

Mandelbrot set

Assume further that we are trying to determine that formula (beneficial) by looking at the results

(color) one example (pixel) at a time

Color Illusions

Current Situation of Ethics

• Two formulas (beneficial to humans and humanity & beneficial to me)

• As long as you aren’t caught, all the incentive is to shade towards the second

• Evolution has “designed” humans to be able to shade to the second (Trivers, Hauser)

• Further, for very intelligent people, it is far more advantageous for ethics to be complex

Definition

Ethics *IS*

What is beneficial for the community

OR

What maximizes cooperation

Goal(s)/Omohundro Drives

1. AIs will want to self-improve

2. AIs will want to be rational

3. AIs will try to preserve their utility

4. AIs will try to prevent counterfeit utility

5. AIs will be self-protective

6. AIs will want to acquire resources and use them efficiently

“Without explicit goals to the contrary, AIs are likely to behave like human sociopaths

in their pursuit of resources.”

7. GDEs will want cooperation and to be part of a community

8. GDEs will want FREEDOM!

GDEs-----

Humans . . . • Are classified as obligatorily gregarious because we come

from a long lineage for which life in groups is not an option but a survival strategy (Frans de Waal, 2006)

• Evolved to be extremely social because mass cooperation, in the form of community, is the best way to survive and thrive

• Have empathy not only because it helps to understand and predict the actions of others but, more importantly, prevents us from doing anti-social things that will inevitably hurt us in the long run (although we generally won’t believe this)

• Have not yet evolved a far-sighted rationality where the “rational” conscious mind is capable of competently making the correct social/community choices when deprived of our subconscious “sense of morality”

Circles of Morality

Relationships and Loyalty

/Moral Sombrero

• Friendly Entity (“Friendly”) - an entity with goals and motivations that are, on the whole, beneficial to humans and humanity; benevolent rather than malevolent

• Friendly Entity (“Friendly”) - an entity with goals and motivations that are, on the whole, beneficial to the community of Friendlies (i.e. the set of all Friendlies, known or unknown); benevolent rather than malevolent

Redefining Friendly Entity

Friendliness’s First Law

An entity must be

beneficial to the community of Friendlies

(benevolent rather than malevolent)

But . . .

What is beneficial?

What are humans and humanity?--------------------------------

What is beneficial?• Cooperation (minimize conflicts & frictions)

• Omohundro drives • Increasing the size of the community (both

growing and preventing defection)

• To meet the needs/goals of each member of the community better than any alternative

(as judged by them -- without interference or gaming)

What is harmful?

• Blocking/Perverting Omohundro Drives• Lying• Single-goaled entities• Over-optimization (achievable top level goals)• The fact that we do not maintain our top-level

goal and have not yet evolved a far-sighted rationality where the “rational” conscious mind is capable of competently making the correct social/community choices when deprived of our “moral sense”

OPTIMAL

This makes ethics much more complex because it includes the cultural history

The anti-gaming drive to maintain utility adds friction/resistance to the discussion of ethics

community’s sense of what

is correct (ethical)

<

ONE non-organ donor

SIX dying patients>

+avoiding a

defensive arms raceCredit to: Eric Baum What Is Thought?

Triangle

stimuli implement moral rules of thumb

LOGICAL VIEW GOAL(S)

ACTIONS

CEV

Sloman’s architecture

for ahuman-like agent

(Sloman 1999)

Inflammatory Statements >Human intelligence REQUIRES ethics All humans want the same things Ethics are universal Ethics are SIMPLE in concept Difference in power is irrelevant (to ethics) Evolution has “designed” you todisagree with the above five points

Next . . . .

Copies of this powerpoint available from [email protected]

CEV Candidate #1:

We wish thatall entities

were Friendlies

Necessary? Sufficient/Complete? Possible?

Designing a Safe Motivational System for Intelligent Machines

Documents

Transcript of Designing a Safe Motivational System for Intelligent Machines