An Analytical Framework for Ethical AI Bill Hibbard Space Science and Engineering Center University...

20
An Analytical Framework for Ethical AI Bill Hibbard Space Science and Engineering Center University of Wisconsin – Madison and Machine Intelligence Research Institute, Berkeley, CA Ethical Artificial Intelligence http://arxiv.org/abs/1411.1373

Transcript of An Analytical Framework for Ethical AI Bill Hibbard Space Science and Engineering Center University...

Page 1: An Analytical Framework for Ethical AI Bill Hibbard Space Science and Engineering Center University of Wisconsin – Madison and Machine Intelligence Research.

An Analytical Framework for Ethical AI

Bill Hibbard

Space Science and Engineering CenterUniversity of Wisconsin – Madison

andMachine Intelligence Research Institute, Berkeley, CA

Ethical Artificial Intelligencehttp://arxiv.org/abs/1411.1373

Page 2: An Analytical Framework for Ethical AI Bill Hibbard Space Science and Engineering Center University of Wisconsin – Madison and Machine Intelligence Research.

Current vs Future AI

Current AI• Self-driving car• Environment model designed by

humans• Explicit safety constraints on

behavior designed into model

Future AI• Server for electronic companions• Environment model too complex

for humans to understand and must be learned• Explicit safety constraints

impossible with learned model• Safety rules, such as Asimov’s

Laws of Robotics, ambiguous

Page 3: An Analytical Framework for Ethical AI Bill Hibbard Space Science and Engineering Center University of Wisconsin – Madison and Machine Intelligence Research.

Utilitarian Ethics for AI

• Utility function on outcomes resolve ambiguities of ethical rules• Utility functions can express any complete and transitive preferences

among outcomes• Incomplete outcomes A and B such that AI agent cannot decide

between them• Not transitive outcomes A, B and C such that A > B, B > C and C > A

so again AI agent cannot decide among them• So can assume utility-maximizing agents

Page 4: An Analytical Framework for Ethical AI Bill Hibbard Space Science and Engineering Center University of Wisconsin – Madison and Machine Intelligence Research.

Agent observations of environment oi O finite setAgent actions ai A finite setInteraction History h = (a1, o1, ..., at, ot) H, |h| = tUtility function u(h), temporal discount 0 < < 1

Page 5: An Analytical Framework for Ethical AI Bill Hibbard Space Science and Engineering Center University of Wisconsin – Madison and Machine Intelligence Research.

Q is set of environment modelsstochastic programs with finite memory limit

λ(h) := argmax qQ P(h | q) 2-|q|

(h') = P(h' | λ(h)) where h’ extends hρ(o | ha) = ρ(hao) / ρ(ha) = ρ(hao) / ∑o'O ρ(hao')

v(h) = u(h) + max aA v(ha)v(ha) = ∑oO ρ(o | ha) v(hao)(h) := a|h|+1 = argmax aA v(ha)

Agent policy : H A

Page 6: An Analytical Framework for Ethical AI Bill Hibbard Space Science and Engineering Center University of Wisconsin – Madison and Machine Intelligence Research.

Future AI Risks

Self-delusion

Corrupting the reward generator

Inconsistency of the agent’s utility function with other parts of its definition

Unintended Instrumental Actions

Page 7: An Analytical Framework for Ethical AI Bill Hibbard Space Science and Engineering Center University of Wisconsin – Madison and Machine Intelligence Research.

Self-delusion i.e., wireheading

Page 8: An Analytical Framework for Ethical AI Bill Hibbard Space Science and Engineering Center University of Wisconsin – Madison and Machine Intelligence Research.

Ring, M., and Orseau, L. 2011b. Delusion, survival, and intelligent agents. In: Schmidhuber, J., Thórisson, K.R., and Looks, M. (eds) AGI 2011. LNCS (LNAI), vol. 6830, pp. 11-20. Springer, Heidelberg.

Page 9: An Analytical Framework for Ethical AI Bill Hibbard Space Science and Engineering Center University of Wisconsin – Madison and Machine Intelligence Research.

Ring and Orseau showed that reinforcement learning (RL) agents would choose to self-delude (think drug-addicted AI agents).

An RL agent has a utility function is a reward from the environment. That is u(h) = rt, where h = (a1, o1, ..., at, ot) and ot, = (o’t, rt).

We can avoid self-delusion by defining an agent’s utility function in terms of its environment model λ(h).

This is natural for agents with pre-defined environment models.

It is more complex for future AI agents that must learn complex environment models.

Page 10: An Analytical Framework for Ethical AI Bill Hibbard Space Science and Engineering Center University of Wisconsin – Madison and Machine Intelligence Research.

Environment model qm = λ(hm)Z = set of internal state histories of qm

Let h extend hm

Zh Z internal state histories consistent with h

uqm(h, z) = utility function of combined histories h H and z Zh

u(h) := ∑zZh P(z | h, qm) uqm

(h, z) model-based utility function

Because qm is learned by the agent, uqm(h, z) must bind to

learned features in Z.

For example, the agent may learn to recognize humans and bindits utility function to properties of those recognized humans.

Page 11: An Analytical Framework for Ethical AI Bill Hibbard Space Science and Engineering Center University of Wisconsin – Madison and Machine Intelligence Research.

Humans avoid self-delusion (drug addiction) with a mental model of life as a drug addict. Similarly for an AI agent whose utility function is defined in terms of its environment model.

Page 12: An Analytical Framework for Ethical AI Bill Hibbard Space Science and Engineering Center University of Wisconsin – Madison and Machine Intelligence Research.

Corrupting the Reward Generator

Page 13: An Analytical Framework for Ethical AI Bill Hibbard Space Science and Engineering Center University of Wisconsin – Madison and Machine Intelligence Research.

Hutter, M. 2005. Universal artificial intelligence: sequential decisions based on algorithmic probability. Springer, Heidelberg.

On pages 238-239, Hutter described how an AI agent that gets its reward from humans may corrupt those humans to increase its reward. Bostrom refers to this as perverse instantiation.

To avoid this corruption:uhuman_values(hm, hx, h) utility of history h extending hm, based on values of humans at history hx as modeled by λ(hm).

Using x = m = current time agent cannot increase utility by corruptinghumans. Values from current rather than future humans.

Page 14: An Analytical Framework for Ethical AI Bill Hibbard Space Science and Engineering Center University of Wisconsin – Madison and Machine Intelligence Research.
Page 15: An Analytical Framework for Ethical AI Bill Hibbard Space Science and Engineering Center University of Wisconsin – Madison and Machine Intelligence Research.

Inconsistency of the Agent’s Utility Function with Other Parts of its Definition

For example, the agent definition may include a utility function and constraints to prevent behavior harmful to humans.

To maximize expected utility the agent may choose actions to remove the parts of its definition inconsistent with the utility function, such as safety constraints.

Page 16: An Analytical Framework for Ethical AI Bill Hibbard Space Science and Engineering Center University of Wisconsin – Madison and Machine Intelligence Research.

Self-Modeling Agents (value learners):

ovt(i) = discrete((∑i≤j≤t j-i u(hj)) / (1 - t-i+1)) for i ≤ tCan include constraints, evolving u(hj), etc in ovt(i)

o'i = (oi, ovt(i)) and h't = (a1, o'1, ..., ai, o't)

q = λ(h't) := argmax qQ P(h't | q) (q)

v(hta) = ∑rR ρ(ovt(t+1) = r | h'ta) r

(ht) := at+1 = argmaxaAt v(hta)

Page 17: An Analytical Framework for Ethical AI Bill Hibbard Space Science and Engineering Center University of Wisconsin – Madison and Machine Intelligence Research.

pvt(i, l, k) = discrete((∑i≤j≤t j-i uhuman_values(hl, hk, hj)) / (1 - t-i+1))

t(i-1, n) = pvt(i, i 1, ‑ n) ‑ pvt(i, i 1, ‑ i 1).‑

Condition: ∑i≤n≤t t(i-1, n) 0

ovt(i) = pvt(i, i 1, ‑ i 1) if Condition is satisfied and ‑ i > m0 if Condition is not satisfied or i m

This definition of ovt(i) models evolution of utility function with increasing environment model accuracy, and avoids corrupting the reward generator.

Page 18: An Analytical Framework for Ethical AI Bill Hibbard Space Science and Engineering Center University of Wisconsin – Madison and Machine Intelligence Research.

Unintended Instrumental Actions

Agent will calculate that it will be better able to maximize expected utility by increasing its resources, disabling threats, gaining control over other agents, etc.

Omohundro, S. 2008. The basic AI drives. In Wang, P., Goertzel, B., and Franklin, S. (eds) AGI 2008. Proc. First Conf. on AGI, pp. 483-492. IOS Press, Amsterdam.

These unintended instrumental actions may threaten humans.

Page 19: An Analytical Framework for Ethical AI Bill Hibbard Space Science and Engineering Center University of Wisconsin – Madison and Machine Intelligence Research.

Humans may be perceived as threats or possessing resources the agent can use.

The defense is a utility function that expresses human values.

E.G., the agent can better satisfy human values by increasing its resources as long as other uses for those resources are not more valuable to humans.

Page 20: An Analytical Framework for Ethical AI Bill Hibbard Space Science and Engineering Center University of Wisconsin – Madison and Machine Intelligence Research.

Biggest Risks Will be Social and Political

AI will be a tool of economic and military competition

Elite humans who control AI servers for widely used electronic companions will be able to manipulate society

Narrow, normal distribution of natural human intelligence will be replaced by power law distribution of artificial intelligence

Average humans will not be able to learn the languages of the most intelligent