Class notes

1. Homework 5 due Tuesday, November 13th 11:59pm

Class notes

Real-World Robot Learning:Safety and Flexibility

CS294-112: Deep Reinforcement Learning

Gregory Kahn

Safety Flexibility

Why should you care?

Topics• Safety

• Flexibility

Outline

Algorithms• Imitation learning

• Model-free

• Model-based

Safety Flexibility Imitation learning Model-free Model-based

2 * 3 = 6 papers we’ll cover

By no means the best / only papers on these topics

Learn control policy that maps observations to controls

Observation

Control

Policy

Human expert

● Able to generate good trajectories using an expert policy

- cost function

- optimization

- full state information

only during training

Trajectory optimization

Assumption

● Problem: training and test distributions differ

Gather expert trajectories

Supervised learning

Training trajectory

Policy reaches states not in training set! [Ross et al 2010]

Learned policy trajectory

Trajectoryoptimization

Supervised Learning

[Ross et al 2011]

● Problem: training and test distributions differ● Solution: execute policy during training

Gather expert trajectories

Supervised learning

Dataset Aggregation (DAgger)

● DAgger mixes the actions

Safety during training

● DAgger mixes the actions ● PLATO mixes the objectives

cost J → avoids high cost

Policy Learning using Adaptive Trajectory Optimization (PLATO)

approach sampling policy

safe similar training and

test distributions

supervised learning

DAgger

Algorithm comparisons

Canyon Forest

Experiments: final neural network policies

Canyon Forest

Experiments: metrics

CanyonForest CanyonForest

Experiments: metrics

NOT SAFE

Shielding

Like learning in a transformed MDP

Pre-emptive shielding

Shield can be used at test time

Post-posed shielding

How to shield: linear temporal logic

● Encode safety with temporal logic

● Assumption: Known approximate/conservative transition dynamics

Experiments

Safety criteria- Don’t crash

Experiments

Safety criteria- Don’t run out of oxygen- If enough oxygen,

don’t surface w/o divers

How to do reinforcement learningwithout destroying the robot during training using only onboard images

unknown environment

learn a collision prediction model

command velocities

raw image

neural network

Approach

Collision prediction model

Train uncertainty-aware collision prediction model

Gather trajectories using MPC controller

Deep neural network with uncertainty estimates from bootstrapping and dropout

Encourage safe, low-speed collisions by reasoning about

the model’s uncertainty

Robot increases speed as model becomes more

confident

May experience collisions

Form speed-dependent, uncertainty-awarecollision cost .

Model-based RL using collision prediction model

high speed predict collision large uncertainty

large cost

Collision cost

Bootstrapping

D1 D2 D3

Resample with replacement

TrainTrainTrain

M1 M3M2

Training time Test time

M1 M2 M3

Estimating neural network output uncertainty

Dropout

Model ModelModel Model ModelModel

Training time Test time

Estimating neural network output uncertainty

Not accounting for uncertainty(higher-speed collisions)

Preliminary real-world experiments

accounting for uncertainty(lower-speed collisions)

successful flight past obstacle

• Tradeoff between safety and exploration

• Safety guarantees require expert oversight or known environment + dynamics

• Uncertainty can play a key role

Safety takeaways

User-specified command

Approach

Option A: Input command Option B: Branch using command

+ empirically better- only works for discrete commands

Important details

• Data augmentation• Contrast

• Brightness

• Tone

• Gaussian blur

• Salt-and-pepper noise

• Region dropout

• Adding noise to expert

Approach

[slides adapted from Tuomas Haarnoja]

Avoidance skill

Reaching skill

Task 1: Reach

Task 2: Avoid

Reaching while avoiding skill

Space of trajectories

Task 1+2: Reach and avoid

Task 1: Reach Task 2: Avoid

Reusability!Related to divergence between and

Avoidance skill

Reaching skillReaching while avoiding skill

Space of trajectories

Policy Composition

Task 1 Task 2 Task 1 + 2

Avoidance policyStacking policy

Avoidance policyStacking policy Combined policy

Standard Reinforcement Learning

Policy

Data inefficientExpert in the loop

Inflexible

CAPs Approach

DataTrai

Event CuesDetector

Data efficientDetector in the loop

Flexible

Detect Predict Control

Event Cues

Detector

Drive in right lane

Drive in either lane

Drive at 7m/sAvoid collisions

CAPs DQL

Collision Avoidance

Heading

Avoid collisionsFollow goal headingMove towards doors

• Carefully construct how your policy / model deals with goals

• Model-free methods require extra care to reuse

• Model-based methods are flexible by construction

Flexibility takeaways

Class notes - BAIR

Documents

Transcript of Class notes - BAIR

Bair Hugger Service Manual

3 Bair Hugger...3 Bair Hugger Temperature Management Unit Model 505 Üzemeltetői Kézikönyv Magyar 209 Total Temperature Management System Check the 3M TM Bair Hugger therapy website

Class Notes Class Notes

Physics Notes Class 12 Chapter 14 Semiconductor ...ncerthelp.com/cbse notes/class 12/physics/Physics Notes Class 12... · Physics Notes Class 12 Chapter 14 Semiconductor Electronics,

LCDR Heather Bair Brake

Bair Moors & Christians Fiesta

Bair - 603 - Oakmont High School · Bair - 603. Bair Benjamin Baker Clayton ... Pollack Erika Pollock Molly Porter Kayleigh ... Terry Quaye Thomas Justin Thomas Vince Grantham ...

Bair FDIC Director Suits

614 Bair Road

1994 PIAA Class AA Wrestling Championshipslive.pa-wrestling.com/pdfs/1994_PIAA_State_AA_results.pdfCHAMPION 1Josh MillerTri Valley 2Mike WaldronLoyalsock 3Scott Bair Bald Eagle Nittany

3 Bair Hugger · lesões térmicas graves. Esta unidade aquecedora foi concebida para funcionar em segurança APENAS com as mantas Bair Hugger e as batas Bair Paws. 10. Não ligue

Multifamily Program (Manager: Mary Bair)€¦ · Mary Bair brought to the Board the 2017 Housing Credit Letter of Intent (Tape 1 – 33:55). Bruce Brensdal and Mary Bair reviewed

Bair Hugger 775

Check the 3M Bair Hugger 3 Bair Huggermultimedia.3m.com/mws/media/798475O/model-775-service-manual... · Deutsch 65 3 Bair Hugger Temperaturmanagementgerät Modell 775 Benutzerhandbuch

Brian Bair | The Bair Group

RSS Jonathan Adams Doug Bair Colleen Flayler Heather Haynes Jonathan Adams Doug Bair Colleen Flayler Heather Haynes.

CSE3601 CSE 360: Introduction to Computer Systems Course Notes Bettina Bair (bbair@cse.osu.edu) bbair.

3m Bair Hugger Research Compendium

3 Bair Hugger Service Manual › mws › media › 798399O › model-505... · 2014-03-24 · Bair Hugger Model 505 Service Manual 2 Check the 3MTM Bair Hugger TM therapy website