CSCI 1300 Artificial Intelligence Lecture Mike Mozer December 4, … · 2003-12-04 · Artificial...

CSCI 1300

Artificial Intelligence Lecture

Mike Mozer

December 4, 2003

Computer Science

Operating Systems

Programming Languages

Networking

Security

Theory

Artificial Intelligence

Natural Language Understanding

Speech Recognition

Computer Vision

Robotics

Reasoning

Planning

Machine Learning

Supervised Learningspam filters (hotmail.com)

ALVINN (autonomous vehicle navigation)

Unsupervised Learningcollaborative filtering (amazon.com)

fault monitoring

Reinforcement Learningtd-gammon (champion backgammon playing program)

elevator controller

adaptive home lighting/heating control

Reinforcement Learning: A Simple Example

Suppose you are in one of two stateshungry

sleepy

Suppose you can take one of two actionsgo to Turley’s

lie on bed

Reward contingencieshungry -> go to Turley’s reward

hungry -> lie on bed no reward

sleepy -> go to Turley’s no reward

sleepy -> lie on bed reward

Reward depends on what action you take in a given state.

Reinforcement Learning: A Simple Example

How do you learn to take the correct action?

Trial and error!

Through experience, system can learn to predict the reward that will be obtained for some action given the current state:

reward(action | state)

This is also notated as “Q(state, action)”

Given the expected reward, agent can choose best action:if Q(hungry, Turley’s) > Q(hungry, lie on bed) then go to Turley’selse lie on bed

Reinforcement Learning in the Real World

IssuesDelayed reinforcement (e.g., car accident due to worn tires)

Occasional reinforcement (e.g., chess playing)

Short term versus long term rewards (e.g., skipping class)

Exploration versus exploitation (e.g., trying new restaurants)

Partially observable state (e.g., viral infection)

Multiple agents (e.g., multiple elevators)

s1 s2 s3 s4 s5 s6 s7

time interval

action

instantaneous

1 2 3 4 5 6 7

a1 a2 a3 a4 a5 a6 a7

r1 r2 r3 r4 r5 r6 r7reinforcement

Elevator Control

Q learning(Watkins, 1989; Watkins & Dayan, 1992)

Q(x,u): If action u is taken in state x, what is the minimum cost we can expect to obtain?

Policy based on Q values:

Incremental update rule for Q values:

Given fully observable state, infinite exploration, etc., guaranteed to converge on optimal policy.

π xt( ) argminuQ xt ut,( ) with probability 1 θ–( )

random with probability θ

exploration rate

Q xt ut,( ) 1 α–( )Q xt ut,( ) α maxu ct λQ xt 1+ u,( )+[ ]+←

discount factorlearning rate

The Adaptive HouseMichael Mozer+*Robert Dodier#Debra Miller*

Marc Anderson*Josh Anderson✩ Diane Lukianow✩

Dan Bertini# Tom Moyer�

Matt Bronder* Charles Myers✩

Michael Colagrosso* Tom Pennell*Robert Cruickshank# James Ries✩

Brian Daugherty* Erik Skorpen✩

Mark Fontenot� Joel Sloss✩

Okechukwu Ikeako✩ Lucky Vidmar*Paul Kooros✩ Matthew Weeks✩

University of Colorado*Department of Computer Science+Institute of Cognitive Science

#Department of Civil, Environmental, and Architectural Engineering✩Department of Electrical and Computer Engineering

�Department of Mechanical Engineering�Department of Aerospace Engineering

http://www.cs.colorado.edu/~mozer/adaptive-house

The adaptive house

Not a programmable house, but a house that programs itself.

House adapts to the lifestyle of the inhabitants.House monitors environmental state and senses actions of inhabitant.

House learns inhabitants’ schedules, preferences, and occupancy patterns.

House uses this information to achieve two objectives:(1) anticipate inhabitant needs(2) conserve energy

Domain: home comfort systems• air heating

• lighting

• water heating

• ventilation

The adaptive house

Residence in Marshall, Colorado, outside of Boulder

Some of the gang

Great room

Bedrooms and bathrooms

Sensors

Water heater

Furnace

Controls

Computers

Training signals

Actions performed by inhabitant specify setpoints➜ anticipation of inhabitant desires

Gas and electricity costs➜ energy conservation

An reinforcement learning framework

Each constraint has an associated cost:discomfort cost if inhabitant preferences are neglected

energy cost depends on device and intensity setting

The optimal control policy minimizes

where t = index over nonoverlapping time intervalst0 = current time intervalut = control decision for interval txt = environmental state during interval t

J t0( ) E= 1κ--- d xt( ) e ut( )+

t t0 1+=

t0 κ+

∑κ ∞→lim

ACHE(Adaptive Control of Home Environments)

Separate control system for each task

air temperature regulation

furnacespace heatersfansdampersblinds

lighting regulation

wall sconcesoverhead lights

water temperature regulation

hot water heater

device

inhabitant actions

environmentalstate

setpoints

and energy costs

General architecture of ACHE

instantaneousenvironmental state

occupancymodel

statetransformation

predictors

setpointgenerator

deviceregulator

decision

staterepresentation

occupiedzones

setpointprofile

future stateinformation

Lighting control

What makes lighting control a challenge?Twenty-two banks of lights, each with 16 intensity levels; seven banks of lights in great room alone

Motion-triggered lighting does not work

Lighting moods

Two constraints must be satisfied simultaneously• maintaining lighting according to inhabitant preferences• conserving energy

Range of time scales involved

Sluggishness of system

Resolving the sluggishness dilemma

Anticipator: Neural network that predicts which zone(s) will become occupied in the next two seconds

Input1, 3, and 6 second average of motion signals (36)instantaneous and 2 second average of door status (20)instantaneous, 1 second, and 3 second average of sound level (33)current zone occupancy status and durations (16)time of day (2)

Outputp(zone i becomes occupied in next 2 seconds | currently unoccupied) (8)

Runs every 250 ms

Training anticipatorOccupancy model provides training signalTwo types of errors

false alarm

Training procedureGiven partially trained net, collect misses and false alarms.Retrain net when 200 additional examples collected.TD algorithm for misses

state(t – 2000 ms)state(t – 1750 ms)...state(t – 250 ms)

zone i becomes occupied

state(t) zone i vacant

0 20000 40000 60000Number of training examples

Examples of anticipator performance

Lighting controller costs

Energy cost7.2 cents per kW-hr

Discomfort cost1 cent per device whose level is manually adjusted

Anticipator miss cost.1 cent per device that was off and should have been on

Anticipator false alarm cost.1 cent per device that was turned on

Results

• about three months of data collection• events logged only from 19:00 – 06:59

2000 4000 6000 80000

# events

nts) discomfort

energy

Air temperature control

0 5 10 15 20off

eSunday March 6, 2000

0 5 10 15 20away

0 5 10 15 200

Time of day

Comparison of control policiesusing artificial occupancy data

10.750.50.2507

Variability Index

Productivity Loss = 1.0 hr.

10.750.50.2507

Variability Index

Productivity Loss = 3.0 hr.

constant temperature

NeurothermostatNeurothermostat

setbackthermostat setback

thermostat

occupancytriggered

Comparison of control policiesusing real occupancy data

Mean Daily Costproductivity lossρ = 1 ρ = 3

Neurothermostat $6.77 $7.05constant temperature $7.85 $7.85occupancy triggered $7.49 $8.66setback thermostat $8.12 $9.74

CSCI 1300 Artificial Intelligence Lecture Mike Mozer December 4, … · 2003-12-04 · Artificial...

Documents

Transcript of CSCI 1300 Artificial Intelligence Lecture Mike Mozer December 4, … · 2003-12-04 · Artificial...

CS 484 – Artificial Intelligence1 Announcements Research Paper due today, November 20 Homework 8 due Thursday, November 29 Current Events Mike - now Presentations.

Message from the Principal Principal: Jarod Mozer€¦ · Jarod Mozer mozerj@live.sioux cityschools.com have been many changes and numerous positive actions taking place in and Assistant

Mike Daily, Ron Azuma, Youngkwan Cho, Troy Rockwood HRL …azuma/TAMS_IME_paper.pdf · 2006-08-03 · Published in 2006 International Conference on Artificial Intelligence (ICAI ’06)

Artificial life and artificial intelligence

Final report of my studies and stay at the U of A in Edmonton report Mozer Edmonton.pdf · Final report of my studies and stay at the U of A in Edmonton My name is Christoph Mozer

THE 2018 MITX AWARDS - files.constantcontact.comfiles.constantcontact.com/8010e112101/29ae8ab7-f4c... · BEST ARTIFICIAL, VIRTUAL OR MIXED ... Alyssa Stern, Jack Morton ... Mike Bottomley,

Application of Artificial Intelligence (Artificial Neural Network…829365/FULLTEXT… · · 2015-06-30Application of Artificial Intelligence (Artificial ... Application of Artificial

Finding Structure in Time - UFRGSengel/data/media/file/cmp121/elman-fsit.pdf · Finding Structure in Time JEFFREY L. ELMAN ... Mike Jordan, Mary Hare, Dave Rumelhart, Mike Mozer,

Mozer Methods Of Valuation

Artificial Intelligence Techniques Artificial Stupidity?

(Keynote) Mike Thelwall - “Sentiment Strength Detection for Social Media Text: Artificial Agents, Answer Ranking and Art Installations”

Artificial morality and artificial law

Soft Artificial Life, Artificial Agents and Artificial ... Life-springer... · Soft Artificial Life, Artificial Agents and Artificial ... Introduction Artificial ... Stillings [22]

12 Lessons from an Adaptive Home - Computer …mozer/Research/Selected...12 Lessons from an Adaptive Home MICHAEL C. MOZER University of Colorado For the past four decades, the impending

Artificial Intelligence = Artificial Creativity?

Minutes - Fédération Aéronautique Internationale...3. IGC President’s report (Eric Mozer) Mr. Mozer welcomed the new participants to the IGC Plenary meeting. He then referred

Jeffrey L. Elman: Finding Structure in Time ... - crl.ucsd.eduelman/Papers/fsit.pdf · Finding Structure in Time JEFFREYL.ELMAN ... Dave Rumelhart. Mike Mozer, Steve Poteet, ... FINDING

Mini mike mike

SAN DIEGO LA JOLLA IN CENTER NORD PERCEPTION(U) FOR …mozer/Research/Selected... · 2018. 7. 5. · ADA 98164 ONR REPORT 8301 APRIL 1983 I Michael C. Mozer LETTER MIGRATION IN WORD

Mike Arnoult 9/30/2010 The role of Artificial Neural Networks in Phage Research.