ORF 544 Stochastic Optimization and Learning › wp-content › uploads › ... · ORF 544...

103
. ORF 544 Stochastic Optimization and Learning Spring, 2019 Warren Powell Princeton University http://www.castlelab.princeton.edu © 2018 W.B. Powell

Transcript of ORF 544 Stochastic Optimization and Learning › wp-content › uploads › ... · ORF 544...

Page 1: ORF 544 Stochastic Optimization and Learning › wp-content › uploads › ... · ORF 544 Stochastic Optimization and Learning Spring, 2019 Warren Powell ... It is not enough to

.

ORF 544

Stochastic Optimization and LearningSpring, 2019

Warren PowellPrinceton University

http://www.castlelab.princeton.edu

© 2018 W.B. Powell

Page 2: ORF 544 Stochastic Optimization and Learning › wp-content › uploads › ... · ORF 544 Stochastic Optimization and Learning Spring, 2019 Warren Powell ... It is not enough to

Week 1 - Monday

Introduction

© 2019 Warren Powell Slide 2

Page 3: ORF 544 Stochastic Optimization and Learning › wp-content › uploads › ... · ORF 544 Stochastic Optimization and Learning Spring, 2019 Warren Powell ... It is not enough to

Sample applications

© 2019 Warren Powell Slide 3

Page 4: ORF 544 Stochastic Optimization and Learning › wp-content › uploads › ... · ORF 544 Stochastic Optimization and Learning Spring, 2019 Warren Powell ... It is not enough to

My experience

Economics» Analysis of SREC certificates

for valuing solar energy credits.

» System for valuing 10-year forward contracts for electricity.

» Optimizing cash for mutual funds.

© 2019 Warren Powell

Page 5: ORF 544 Stochastic Optimization and Learning › wp-content › uploads › ... · ORF 544 Stochastic Optimization and Learning Spring, 2019 Warren Powell ... It is not enough to

© 2019 Warren Powell Slide 5

Page 6: ORF 544 Stochastic Optimization and Learning › wp-content › uploads › ... · ORF 544 Stochastic Optimization and Learning Spring, 2019 Warren Powell ... It is not enough to

Learning the market in AfricaHow do African consumers respond to energy pricing strategies for recharging cell phones?

» Cell phone use is widespread in Africa, but the lack of a reliable power grid complicates recharging cell phone batteries.

» A low cost strategy is to encourage an entrepreneurial market to develop which sells energy from small, low cost solar panels.

» We do not know the demand curve, and we need to learn it as quickly as possible.

© 2019 Warren Powell

Page 7: ORF 544 Stochastic Optimization and Learning › wp-content › uploads › ... · ORF 544 Stochastic Optimization and Learning Spring, 2019 Warren Powell ... It is not enough to

Drug discovery

Designing molecules

» X and Y are sites where we can hang substituents to change the behavior of the molecule

© 2019 Warren Powell

Page 8: ORF 544 Stochastic Optimization and Learning › wp-content › uploads › ... · ORF 544 Stochastic Optimization and Learning Spring, 2019 Warren Powell ... It is not enough to

Drug discovery

We express our belief using a linear, additive QSAR model»»

0

ij ijsites i substituents j

Y X Indicator variable for molecule .m m

ij ijX X m

© 2019 Warren Powell

Page 9: ORF 544 Stochastic Optimization and Learning › wp-content › uploads › ... · ORF 544 Stochastic Optimization and Learning Spring, 2019 Warren Powell ... It is not enough to

Drug discovery

Knowledge gradient versus pure exploration for 99 compounds

Perf

orm

ance

und

er b

est p

ossi

ble

Number of molecules tested (out of 99)

Pure exploration

Optimal learning

© 2019 Warren Powell

Page 10: ORF 544 Stochastic Optimization and Learning › wp-content › uploads › ... · ORF 544 Stochastic Optimization and Learning Spring, 2019 Warren Powell ... It is not enough to

Health applications

Health sciences» Sequential design of

experiments for drug discovery

» Drug delivery – Optimizing the design of protective membranes to control drug release

» Medical decision making –Optimal learning for medical treatments.

© 2019 Warren Powell

Page 11: ORF 544 Stochastic Optimization and Learning › wp-content › uploads › ... · ORF 544 Stochastic Optimization and Learning Spring, 2019 Warren Powell ... It is not enough to

State-dependent applications

Materials science» Optimizing payloads: reactive

species, biomolecules, fluorescent markers, …

» Controllers for robotic scientist for materials science experiments

» Optimizing nanoparticles to maximize photoconductivity

© 2019 Warren Powell

Page 12: ORF 544 Stochastic Optimization and Learning › wp-content › uploads › ... · ORF 544 Stochastic Optimization and Learning Spring, 2019 Warren Powell ... It is not enough to

E-commerceRevenue management» Optimizing prices to maximize

total revenue for a particulate night in a hotel.

Ad-click optimization» How much to bid for ads on the

internet.

Personalized offer optimization» Designing offers for individual

customers.

© 2019 Warren Powell

Page 13: ORF 544 Stochastic Optimization and Learning › wp-content › uploads › ... · ORF 544 Stochastic Optimization and Learning Spring, 2019 Warren Powell ... It is not enough to

Policy function approximations

Battery arbitrage – When to charge, when to discharge, given volatile LMPs

© 2019 Warren Powell

Page 14: ORF 544 Stochastic Optimization and Learning › wp-content › uploads › ... · ORF 544 Stochastic Optimization and Learning Spring, 2019 Warren Powell ... It is not enough to

0.00

20.00

40.00

60.00

80.00

100.00

120.00

140.00

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71

Grid operators require that batteries bid charge and discharge prices, an hour in advance.

We have to search for the best values for the policy parameters

DischargeCharge

Charge Dischargeand .

Policy function approximations

© 2019 Warren Powell

Page 15: ORF 544 Stochastic Optimization and Learning › wp-content › uploads › ... · ORF 544 Stochastic Optimization and Learning Spring, 2019 Warren Powell ... It is not enough to

Policy function approximations

Our policy function might be the parametric model (this is nonlinear in the parameters):

charge

charge discharge

charge

1 if ( | ) 0 if

1 if

t

t t

t

pX S p

p

Energy in storage:

Price of electricity:

© 2019 Warren Powell

Page 16: ORF 544 Stochastic Optimization and Learning › wp-content › uploads › ... · ORF 544 Stochastic Optimization and Learning Spring, 2019 Warren Powell ... It is not enough to

Policy function approximations

Finding the best policy» We need to maximize

» We cannot compute the expectation, so we run simulations:

DischargeCharge

0

max ( ) , ( | )T

tt t t

tF C S X S

© 2019 Warren Powell Slide 16

Page 17: ORF 544 Stochastic Optimization and Learning › wp-content › uploads › ... · ORF 544 Stochastic Optimization and Learning Spring, 2019 Warren Powell ... It is not enough to

Energy storageHow much energy to store in a battery to handle the volatility of wind and spot prices to meet demands?

© 2019 Warren Powell

Page 18: ORF 544 Stochastic Optimization and Learning › wp-content › uploads › ... · ORF 544 Stochastic Optimization and Learning Spring, 2019 Warren Powell ... It is not enough to

Planning cash reservesHow much money should we hold in cash given variable market returns and interest rates to meet the needs of a business?

Bonds

Stock prices

© 2019 Warren Powell

Page 19: ORF 544 Stochastic Optimization and Learning › wp-content › uploads › ... · ORF 544 Stochastic Optimization and Learning Spring, 2019 Warren Powell ... It is not enough to

© 2019 Warren Powell Slide 19

Page 20: ORF 544 Stochastic Optimization and Learning › wp-content › uploads › ... · ORF 544 Stochastic Optimization and Learning Spring, 2019 Warren Powell ... It is not enough to

© 2019 Warren Powell Slide 20

Page 21: ORF 544 Stochastic Optimization and Learning › wp-content › uploads › ... · ORF 544 Stochastic Optimization and Learning Spring, 2019 Warren Powell ... It is not enough to

Electricity forward contracts

© 2019 Warren Powell

Page 22: ORF 544 Stochastic Optimization and Learning › wp-content › uploads › ... · ORF 544 Stochastic Optimization and Learning Spring, 2019 Warren Powell ... It is not enough to

Fleet management

Fleet management problem» Optimize the assignment of drivers to loads over time.» Tremendous uncertainty in loads being called in

© 2019 Warren Powell

Page 23: ORF 544 Stochastic Optimization and Learning › wp-content › uploads › ... · ORF 544 Stochastic Optimization and Learning Spring, 2019 Warren Powell ... It is not enough to

Pre-decision state: we see the demands

$300

$150

$350

$450

Nomadic trucker illustration

ˆ( , )t t

TXS D

t

© 2019 Warren Powell

Page 24: ORF 544 Stochastic Optimization and Learning › wp-content › uploads › ... · ORF 544 Stochastic Optimization and Learning Spring, 2019 Warren Powell ... It is not enough to

We use initial value function approximations…

0 ( ) 0V CO

0 ( ) 0V MN

$300

$150

$350

$4500 ( ) 0V CA

0 ( ) 0V NY

Nomadic trucker illustration

ˆ( , )t t

TXS D

t

© 2019 Warren Powell

Page 25: ORF 544 Stochastic Optimization and Learning › wp-content › uploads › ... · ORF 544 Stochastic Optimization and Learning Spring, 2019 Warren Powell ... It is not enough to

… and make our first choice:

$300

$150

$350

$450

0 ( ) 0V CO

0 ( ) 0V CA

0 ( ) 0V NY

Nomadic trucker illustration1x

( )1

xt

NYS

t

0 ( ) 0V MN

© 2019 Warren Powell

Page 26: ORF 544 Stochastic Optimization and Learning › wp-content › uploads › ... · ORF 544 Stochastic Optimization and Learning Spring, 2019 Warren Powell ... It is not enough to

Update the value of being in Texas.

1( ) 450V TX

$300

$150

$350

$450

0 ( ) 0V CO

0 ( ) 0V CA

0 ( ) 0V NY

Nomadic trucker illustration

( )1

xt

NYS

t

0 ( ) 0V MN

© 2019 Warren Powell

Page 27: ORF 544 Stochastic Optimization and Learning › wp-content › uploads › ... · ORF 544 Stochastic Optimization and Learning Spring, 2019 Warren Powell ... It is not enough to

Now move to the next state, sample new demands and make a new decision

$600

$400

$180

$125

0 ( ) 0V CO

0 ( ) 0V CA

0 ( ) 0V NY

1( ) 450V TX

Nomadic trucker illustration

1 1ˆ( , )

1t t

NYS D

t

0 ( ) 0V MN

© 2019 Warren Powell

Page 28: ORF 544 Stochastic Optimization and Learning › wp-content › uploads › ... · ORF 544 Stochastic Optimization and Learning Spring, 2019 Warren Powell ... It is not enough to

Update value of being in NY

0 ( ) 600V NY

$600

$400

$180

$125

0 ( ) 0V CO

0 ( ) 0V CA

1( ) 450V TX

Nomadic trucker illustration

1 ( )2

xt

CAS

t

0 ( ) 0V MN

© 2019 Warren Powell

Page 29: ORF 544 Stochastic Optimization and Learning › wp-content › uploads › ... · ORF 544 Stochastic Optimization and Learning Spring, 2019 Warren Powell ... It is not enough to

Move to California.

$150

$400

$200

$350

0 ( ) 0V CA

0 ( ) 0V CO

1( ) 450V TX

Nomadic trucker illustration

0 ( ) 600V NY

2 2ˆ( , )

2t t

CAS D

t

0 ( ) 0V MN

© 2019 Warren Powell

Page 30: ORF 544 Stochastic Optimization and Learning › wp-content › uploads › ... · ORF 544 Stochastic Optimization and Learning Spring, 2019 Warren Powell ... It is not enough to

Make decision to return to TX and update value of being in CA

$150

$400

$200

$350

0 ( ) 800V CA

0 ( ) 0V CO

1( ) 450V TX

0 ( ) 500V NY

Nomadic trucker illustration

2 2ˆ( , )

2t t

CAS D

t

0 ( ) 0V MN

© 2019 Warren Powell

Page 31: ORF 544 Stochastic Optimization and Learning › wp-content › uploads › ... · ORF 544 Stochastic Optimization and Learning Spring, 2019 Warren Powell ... It is not enough to

Updating the value function:

» We are updating the previous post-decision state (describe).

1

2

2 1 2

Old value: ( ) $450

New estimate:ˆ ( ) $800

How do we merge old with new?ˆ ( ) (1 ) ( ) ( ) ( )

(0.90)$450+(0.10)$800 $485

V TX

v TX

V TX V TX v TX

Nomadic trucker illustration

© 2019 Warren Powell

Page 32: ORF 544 Stochastic Optimization and Learning › wp-content › uploads › ... · ORF 544 Stochastic Optimization and Learning Spring, 2019 Warren Powell ... It is not enough to

An updated value of being in TX

1( ) 485V TX

0 ( ) 0V CO 0 ( ) 600V NY

$275

$800

$385

$125

Nomadic trucker illustration

0 ( ) 800V CA

3 3ˆ( , )

3t t

TXS D

t

0 ( ) 0V MN

© 2019 Warren Powell

Page 33: ORF 544 Stochastic Optimization and Learning › wp-content › uploads › ... · ORF 544 Stochastic Optimization and Learning Spring, 2019 Warren Powell ... It is not enough to

Real-time logisticsUber» Provides real-time, on-demand

transportation.» Drivers are encouraged to enter or leave

the system using pricing signals and informational guidance.

Decisions:» How to price to get the right balance of

drivers relative to customers.» Assigning and routing drivers to

manage Uber-created congestion.» Real-time management of drivers.» Pricing (trips, new services, …)» Policies (rules for managing drivers,

customers, …)

© 2019 Warren Powell

Page 34: ORF 544 Stochastic Optimization and Learning › wp-content › uploads › ... · ORF 544 Stochastic Optimization and Learning Spring, 2019 Warren Powell ... It is not enough to

Effect of Current Decision on the Future

Closest car…. but it moves car away from busy downtown area

and strands other car in low density area.

Assigning car in less dense area allows closer car to handle potential demands in more dense

areas.

© 2019 Warren Powell

Page 35: ORF 544 Stochastic Optimization and Learning › wp-content › uploads › ... · ORF 544 Stochastic Optimization and Learning Spring, 2019 Warren Powell ... It is not enough to

Cost function approximations

RidersCars

© 2019 Warren Powell

Page 36: ORF 544 Stochastic Optimization and Learning › wp-content › uploads › ... · ORF 544 Stochastic Optimization and Learning Spring, 2019 Warren Powell ... It is not enough to

Optimizing over time

t t+1 t+2

© 2019 Warren Powell

Page 37: ORF 544 Stochastic Optimization and Learning › wp-content › uploads › ... · ORF 544 Stochastic Optimization and Learning Spring, 2019 Warren Powell ... It is not enough to

Optimizing over time

t t+1 t+2

© 2019 Warren Powell

Page 38: ORF 544 Stochastic Optimization and Learning › wp-content › uploads › ... · ORF 544 Stochastic Optimization and Learning Spring, 2019 Warren Powell ... It is not enough to

Optimizing over time

t t+1 t+2

The assignment of cars to riders evolves over time, with new riders arriving, along with updates of cars available.

© 2019 Warren Powell

Page 39: ORF 544 Stochastic Optimization and Learning › wp-content › uploads › ... · ORF 544 Stochastic Optimization and Learning Spring, 2019 Warren Powell ... It is not enough to

t t+1 t+2

Autonomous EVs

© 2019 Warren Powell

Page 40: ORF 544 Stochastic Optimization and Learning › wp-content › uploads › ... · ORF 544 Stochastic Optimization and Learning Spring, 2019 Warren Powell ... It is not enough to

Matching buyers with sellersNow we have a logistic curve for each origin-destination pair (i,j)

Number of offers for each (i,j) pair is relatively small.Need to generalize the learning across hundreds to thousands of markets.

0

0( , | )1

aij ij ij

aij ij ij

p aY

p a

eP p ae

Buyer Seller

Offered price

© 2019 Warren Powell

Page 41: ORF 544 Stochastic Optimization and Learning › wp-content › uploads › ... · ORF 544 Stochastic Optimization and Learning Spring, 2019 Warren Powell ... It is not enough to

Industrial sponsorsAir Liquide» Largest industrial gases

company with 64,000 employees.

» Consumes 0.1 percent of global electricity.

Challenges» Faces a variety of

challenges to manage risk:» Spikes in natural gas prices,

electricity prices.» Pipeline outages due to

storms.© 2019 Warren Powell

Page 42: ORF 544 Stochastic Optimization and Learning › wp-content › uploads › ... · ORF 544 Stochastic Optimization and Learning Spring, 2019 Warren Powell ... It is not enough to

An energy generation portfolio

© 2019 Warren Powell Slide 42

Page 43: ORF 544 Stochastic Optimization and Learning › wp-content › uploads › ... · ORF 544 Stochastic Optimization and Learning Spring, 2019 Warren Powell ... It is not enough to

Wind in the U.S.

© 2019 Warren Powell

Page 44: ORF 544 Stochastic Optimization and Learning › wp-content › uploads › ... · ORF 544 Stochastic Optimization and Learning Spring, 2019 Warren Powell ... It is not enough to

Energy from wind

1 year

Wind power from all PJM wind farms

Jan Feb March April May June July Aug Sept Oct Nov Dec

© 2019 Warren Powell Slide 44

Page 45: ORF 544 Stochastic Optimization and Learning › wp-content › uploads › ... · ORF 544 Stochastic Optimization and Learning Spring, 2019 Warren Powell ... It is not enough to

Energy from wind

30 days

Wind from all PJM wind farms

© 2019 Warren Powell Slide 45

Page 46: ORF 544 Stochastic Optimization and Learning › wp-content › uploads › ... · ORF 544 Stochastic Optimization and Learning Spring, 2019 Warren Powell ... It is not enough to

© 2019 Warren Powell

Page 47: ORF 544 Stochastic Optimization and Learning › wp-content › uploads › ... · ORF 544 Stochastic Optimization and Learning Spring, 2019 Warren Powell ... It is not enough to

Solar energy

Princeton solar array

Solar energy in the PJM region

© 2019 Warren Powell

Page 48: ORF 544 Stochastic Optimization and Learning › wp-content › uploads › ... · ORF 544 Stochastic Optimization and Learning Spring, 2019 Warren Powell ... It is not enough to

Solar energy

PSE&G solar farms

Sept Oct Nov Dec Jan Feb March April May June July Aug © 2019 Warren Powell

Page 49: ORF 544 Stochastic Optimization and Learning › wp-content › uploads › ... · ORF 544 Stochastic Optimization and Learning Spring, 2019 Warren Powell ... It is not enough to

Solar energy

Solar from a single solar farm

© 2019 Warren Powell

Page 50: ORF 544 Stochastic Optimization and Learning › wp-content › uploads › ... · ORF 544 Stochastic Optimization and Learning Spring, 2019 Warren Powell ... It is not enough to

Solar energy

Within-day sample trajectories

© 2019 Warren Powell

Page 51: ORF 544 Stochastic Optimization and Learning › wp-content › uploads › ... · ORF 544 Stochastic Optimization and Learning Spring, 2019 Warren Powell ... It is not enough to

Locational marginal prices on the gridLMPs – Locational marginal prices

$977/MW !!!

© 2019 Warren Powell

Page 52: ORF 544 Stochastic Optimization and Learning › wp-content › uploads › ... · ORF 544 Stochastic Optimization and Learning Spring, 2019 Warren Powell ... It is not enough to

Locational marginal prices on the grid

© 2019 Warren Powell

Page 53: ORF 544 Stochastic Optimization and Learning › wp-content › uploads › ... · ORF 544 Stochastic Optimization and Learning Spring, 2019 Warren Powell ... It is not enough to

Planning under uncertainty

Battery arbitrage – When to charge, when to discharge, given volatile LMPs

© 2019 Warren Powell

Page 54: ORF 544 Stochastic Optimization and Learning › wp-content › uploads › ... · ORF 544 Stochastic Optimization and Learning Spring, 2019 Warren Powell ... It is not enough to

Planning under uncertainty

Snapshot of electricity prices for New Jersey:

© 2019 Warren Powell

Page 55: ORF 544 Stochastic Optimization and Learning › wp-content › uploads › ... · ORF 544 Stochastic Optimization and Learning Spring, 2019 Warren Powell ... It is not enough to

Planning under uncertainty

Stochastic prices

© 2019 Warren Powell

Page 56: ORF 544 Stochastic Optimization and Learning › wp-content › uploads › ... · ORF 544 Stochastic Optimization and Learning Spring, 2019 Warren Powell ... It is not enough to

The models in these papers allow decisions to see into the future:» Strategy posed by the battery manufacturer: “Buy low, sell high”

0.00

20.00

40.00

60.00

80.00

100.00

120.00

140.00

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71

Pric

e

Buy

Sell

Buy

Sell

Buy

Sell

Buy

Sell

Buy

Sell

Planning under uncertainty

Buy

Sell

Page 57: ORF 544 Stochastic Optimization and Learning › wp-content › uploads › ... · ORF 544 Stochastic Optimization and Learning Spring, 2019 Warren Powell ... It is not enough to

Don’t gamble; take all your savings and buy some good stock and hold it till it goes up, then sell it. If it don’t go up, don’t buy it.

Will Rogers

It is not enough to mix “optimization” (intelligent decision making) and uncertainty. You have to be sure that each decision has access to the information available at the time.

Planning under uncertainty

© 2019 Warren Powell

Page 58: ORF 544 Stochastic Optimization and Learning › wp-content › uploads › ... · ORF 544 Stochastic Optimization and Learning Spring, 2019 Warren Powell ... It is not enough to

From deterministic to stochasticImagine that you would like to solve the time-dependent linear program:

» subject to

We can convert this to a proper stochastic model by replacing with and taking an expectation:

The policy has to satisfy with transition function:

0 ,...,0

minT

T

x x t tt

c x

0 0 0

1 1 , 1.t t t t t

A x bA x B x b t

tx ( )t tX S

0min ( )

T

t t tt

c X S

( )t tX St t tA x R

1 1, ,Mt t t tS S S x W

© 2019 Warren Powell

Page 59: ORF 544 Stochastic Optimization and Learning › wp-content › uploads › ... · ORF 544 Stochastic Optimization and Learning Spring, 2019 Warren Powell ... It is not enough to

Known customers in outage

Unknown outages

Outage calls(known)

Network outages(unknown)

Storm

© 2019 Warren Powell

Page 60: ORF 544 Stochastic Optimization and Learning › wp-content › uploads › ... · ORF 544 Stochastic Optimization and Learning Spring, 2019 Warren Powell ... It is not enough to

Problem Description - Emergency Storm Response

XX

X

X

X

call

call

X

call

call

call

call

call

© 2019 Warren Powell

Page 61: ORF 544 Stochastic Optimization and Learning › wp-content › uploads › ... · ORF 544 Stochastic Optimization and Learning Spring, 2019 Warren Powell ... It is not enough to

XX

X

call

call

call

call

call

Escalation Algorithm

call

call

X

X

X

© 2019 Warren Powell

Page 62: ORF 544 Stochastic Optimization and Learning › wp-content › uploads › ... · ORF 544 Stochastic Optimization and Learning Spring, 2019 Warren Powell ... It is not enough to

XX

X

call

call

call

call

call

Escalation Algorithm

call

call

X

X

X

© 2019 Warren Powell

Page 63: ORF 544 Stochastic Optimization and Learning › wp-content › uploads › ... · ORF 544 Stochastic Optimization and Learning Spring, 2019 Warren Powell ... It is not enough to

XX

X

call

call

call

call

call

Escalation Algorithm

call

call

X

X

X

© 2019 Warren Powell

Page 64: ORF 544 Stochastic Optimization and Learning › wp-content › uploads › ... · ORF 544 Stochastic Optimization and Learning Spring, 2019 Warren Powell ... It is not enough to

XX

X

X

X

call

call

call

call

call

Escalation Algorithm

call

call

X

© 2019 Warren Powell

Page 65: ORF 544 Stochastic Optimization and Learning › wp-content › uploads › ... · ORF 544 Stochastic Optimization and Learning Spring, 2019 Warren Powell ... It is not enough to

X

call

call

call

call

Escalation Algorithm

call

call

call

X

X

X

X

X

© 2019 Warren Powell

Page 66: ORF 544 Stochastic Optimization and Learning › wp-content › uploads › ... · ORF 544 Stochastic Optimization and Learning Spring, 2019 Warren Powell ... It is not enough to

Lookahead policies

Decision trees:

© 2019 Warren Powell

Page 67: ORF 544 Stochastic Optimization and Learning › wp-content › uploads › ... · ORF 544 Stochastic Optimization and Learning Spring, 2019 Warren Powell ... It is not enough to

Decision Outcome Decision Outcome Decision

© 2019 Warren Powell

Page 68: ORF 544 Stochastic Optimization and Learning › wp-content › uploads › ... · ORF 544 Stochastic Optimization and Learning Spring, 2019 Warren Powell ... It is not enough to

Monte Carlo tree search

Steps of MCTS:

C. Browne, E. Powley, D. Whitehouse, S. Lucas, P. I. Cowling, P. Rohlfshagen, S. Tavener, D. Perez, S. Samothrakis and S. Colton, “A survey of Monte Carlo tree search methods,” IEEE Transactions on Computational Intelligence and AI in Games, vol. 4, no. 1, pp. 1–49, March 2012. © 2019 Warren Powell

Page 69: ORF 544 Stochastic Optimization and Learning › wp-content › uploads › ... · ORF 544 Stochastic Optimization and Learning Spring, 2019 Warren Powell ... It is not enough to

© 2019 Warren Powell

Page 70: ORF 544 Stochastic Optimization and Learning › wp-content › uploads › ... · ORF 544 Stochastic Optimization and Learning Spring, 2019 Warren Powell ... It is not enough to

Monte Carlo tree search

AlphaGo» Much more complex state

space.» Uses hybrid of policies:

• MCTS• PFA• VFA

© 2017 Warren B. Powell

Page 71: ORF 544 Stochastic Optimization and Learning › wp-content › uploads › ... · ORF 544 Stochastic Optimization and Learning Spring, 2019 Warren Powell ... It is not enough to

Canonical problems

© 2019 Warren Powell Slide 71

Page 72: ORF 544 Stochastic Optimization and Learning › wp-content › uploads › ... · ORF 544 Stochastic Optimization and Learning Spring, 2019 Warren Powell ... It is not enough to

Stochastic programming

Markov decision processes

Reinforcement learning

Optimal control

Model predictive

control

Robust optimization

Approximate dynamic

programming

Online computation

Simulation optimization

Stochastic search

Decision

analysis

Stochastic control

Simulation optimization

DynamicProgramming

andcontrol

Optimal learning

Banditproblems

© 2019 Warren Powell Slide 72

Page 73: ORF 544 Stochastic Optimization and Learning › wp-content › uploads › ... · ORF 544 Stochastic Optimization and Learning Spring, 2019 Warren Powell ... It is not enough to

Canonical problems

Decision trees

© 2019 Warren Powell

Page 74: ORF 544 Stochastic Optimization and Learning › wp-content › uploads › ... · ORF 544 Stochastic Optimization and Learning Spring, 2019 Warren Powell ... It is not enough to

Canonical problems

Stochastic search (derivative based)» Basic problem:

» Stochastic gradient

» Asymptotic convergence:

» Finite time performance

max ( , )x F x W

1 1( , )n n n nn xx x F x W

*lim ( , ) ( , )nn F x W F x W

Manufacturing network (x=design)Unit commitment problem (x=day ahead decisions)Inventory system (x=design, replenishment policy)Battery system (x=choice of material)Patient treatment cost (x=drug, treatments)Trucking company (x=fleet size and mix)

,max ( , ) where is an algorithm (or policy)nF x W

© 2019 Warren Powell

Page 75: ORF 544 Stochastic Optimization and Learning › wp-content › uploads › ... · ORF 544 Stochastic Optimization and Learning Spring, 2019 Warren Powell ... It is not enough to

Canonical problems

Ranking and selection (derivative free)» Basic problem:

» We need to design a policy that finds a design given by

» We refer to this objective as maximizing the final reward.

1 ,...,max ( , )Mx x x F x W

( )nX S

,max ,NF x W

,Nx

© 2019 Warren Powell

Page 76: ORF 544 Stochastic Optimization and Learning › wp-content › uploads › ... · ORF 544 Stochastic Optimization and Learning Spring, 2019 Warren Powell ... It is not enough to

Canonical problems

Multi-armed bandit problems» We learn the reward from playing each

“arm”

» We need to find a policy for playing machine x that maximizes:

where

We refer to this problem as maximizing cumulative reward.

( )nX S

11

0

max ( ( ), )N

n n

n

F X S W

1 "winnings"State of knowledge

( )

n

n

n n

WSx X S Choose next “arm” to play

New information

What we know about each slot machine

© 2019 Warren Powell

Page 77: ORF 544 Stochastic Optimization and Learning › wp-content › uploads › ... · ORF 544 Stochastic Optimization and Learning Spring, 2019 Warren Powell ... It is not enough to

Canonical problems

(Discrete) Markov decision processes» Bellman’s optimality equation

» This is also the same as solving

where the optimal policy has the form

1 1

1 1 1'

( ) min ( , ) ( ) |

min ( , ) ( ' | , ) ( )

t

t

t t a t t t t t

a t t t t t t ts

V S C S a V S S

C S a p S s S a V S

A

A

{ }( )1 1( ) arg min ( , ) ( ) | ,tt x t t t t t tX S C S x V S S xp

+ += +

00

min , ( ) |T

t t tt

E C S X S S

© 2019 Warren Powell

Page 78: ORF 544 Stochastic Optimization and Learning › wp-content › uploads › ... · ORF 544 Stochastic Optimization and Learning Spring, 2019 Warren Powell ... It is not enough to

Value function approximations

Reinforcement learning (computer science)» Q-learning (for discrete actions)

» The second edition of Reinforcement Learning (Sutton and Barto, forthcoming) includes other solution approaches:

• Policy search (Boltzmann policies)• Upper confidence bounding• Monte Carlo tree search

1'

11 1

ˆ ( , ) ( , ) max ( ', ')ˆ ( , ) (1 ) ( , ) ( , )

Policy: ( ) arg max ( , )

n n n n n na

n n n n n n n n nn n

na

q s a r s a Q s a

Q s a Q s a q s a

s Q s a

g

a a

p

-

-- -

= +

= - +

=

Page 79: ORF 544 Stochastic Optimization and Learning › wp-content › uploads › ... · ORF 544 Stochastic Optimization and Learning Spring, 2019 Warren Powell ... It is not enough to

Canonical problems

Optimal stopping I» Model:

• Exogenous process:

• Decision:

• Reward:

» Optimization problem:

where is a “stopping time” (or )

1 If we stop and sell at time ( )

0 Otherwise t

tX

1 2, ,..., Sequence of stock pricesTp p p

Price received if we stop at time tp t

max p X " measurable function"tF

© 2019 Warren Powell

Page 80: ORF 544 Stochastic Optimization and Learning › wp-content › uploads › ... · ORF 544 Stochastic Optimization and Learning Spring, 2019 Warren Powell ... It is not enough to

Canonical problems

Optimal stopping II» Model:

• Exogenous process:

• State:

• Policy:

» Optimization problem:

1 ( | )

0 Otherwise t t

t t

p pX S

1 2

1

, ,..., Sequence of stock prices(1 )

T

t t t

p p pp p p

0 0max ( | ) max ( | )

T T

t t t tt t

p X S p X S

1 if we are holding asset, 0 otherwise.( , , )

t

t t t t

RS R p p

=

=

© 2019 Warren Powell

Page 81: ORF 544 Stochastic Optimization and Learning › wp-content › uploads › ... · ORF 544 Stochastic Optimization and Learning Spring, 2019 Warren Powell ... It is not enough to

Canonical problems

Linear quadratic regulation (LQR)» A popular optimal control problem in engineering

involves solving:

» where:

» Possible to show that the optimal policy looks like:

where is a complicated function of Q and R.

0 ,...,

0min ( ) ( )

T

TT T

u u t t t tt

x Qx u Ru

1

State at time Control at time (must be measurable)

( , ) ( is random at time )

t

t t

t t t t t

x tu t F

x f x u w w t

*( )t t t tU x K x

tK

© 2019 Warren Powell

Page 82: ORF 544 Stochastic Optimization and Learning › wp-content › uploads › ... · ORF 544 Stochastic Optimization and Learning Spring, 2019 Warren Powell ... It is not enough to

Canonical problems

Stochastic programming» A (two-stage) stochastic programming problem

where

» This is the canonical form of stochastic programming, which might also be written over multiple periods:

0 0 0 0 0 1min ( , )x X c x Q x

1 10 1 ( ) ( ) 1 1( , ( )) min ( ) ( )x XQ x c x

0 01

min ( ) ( ) ( )T

t tt

c x p c x

© 2019 Warren Powell

Page 83: ORF 544 Stochastic Optimization and Learning › wp-content › uploads › ... · ORF 544 Stochastic Optimization and Learning Spring, 2019 Warren Powell ... It is not enough to

Canonical problems

Stochastic programming» A (two-stage) stochastic programming policy

where

» This is the canonical form of stochastic programming, which might also be written over multiple periods:

1min ( , )t tx X t t t tc x Q x

1 11 ( ) ( ) 1 1( , ( )) min ( ) ( )t tt t x X t tQ x c x

' '' 1

min ( ) ( ) ( )t t

t H

t t t tt ttt t

c x p c x

© 2019 Warren Powell

Page 84: ORF 544 Stochastic Optimization and Learning › wp-content › uploads › ... · ORF 544 Stochastic Optimization and Learning Spring, 2019 Warren Powell ... It is not enough to

Canonical problems

A robust optimization problem would be written

» This means finding the best design x for the worst outcome w in an “uncertainty set”

» This has been adapted to multiperiod problems

( )min max ( , )x X w F x w W

0 1 0,..., ( ,..., ) ( ) ' ' '' 0

min max ( )T

T

x x w w t t tt

c w x

( )

© 2019 Warren Powell

Page 85: ORF 544 Stochastic Optimization and Learning › wp-content › uploads › ... · ORF 544 Stochastic Optimization and Learning Spring, 2019 Warren Powell ... It is not enough to

Canonical problems

A robust optimization problem would be written

» This means finding the best design x for the worst outcome w in an “uncertainty set”

» … but it is often used as a policy

( )min max ( , )x X w F x w W

( )

,..., ( ,..., ) ( ) ' ' ''

min max ( )t t H t t H

t H

x x w w t t tt t

c w x

© 2019 Warren Powell

Page 86: ORF 544 Stochastic Optimization and Learning › wp-content › uploads › ... · ORF 544 Stochastic Optimization and Learning Spring, 2019 Warren Powell ... It is not enough to

Canonical problems

Stochastic control (from text by Rene Carmona)

» This is mathematically correct, but does not suggest a path to computation, or even how to model a real problem.

© 2019 Warren Powell

Page 87: ORF 544 Stochastic Optimization and Learning › wp-content › uploads › ... · ORF 544 Stochastic Optimization and Learning Spring, 2019 Warren Powell ... It is not enough to

Canonical problems

Why do we need a unified framework?

» The classical frameworks and algorithms are fragile.

» Small changes to problems invalidate optimality conditions, or make algorithmic approaches intractable.

» Practitioners need robust approaches that will provide high quality solutions for all problems.

© 2019 Warren Powell

Page 88: ORF 544 Stochastic Optimization and Learning › wp-content › uploads › ... · ORF 544 Stochastic Optimization and Learning Spring, 2019 Warren Powell ... It is not enough to

A modeling framework

© 2019 Warren Powell Slide 88

Page 89: ORF 544 Stochastic Optimization and Learning › wp-content › uploads › ... · ORF 544 Stochastic Optimization and Learning Spring, 2019 Warren Powell ... It is not enough to

Modeling

We propose to model problems along five fundamental dimensions:

» State variables» Decision variables» Exogenous information» Transition function» Objective function

» This framework draws heavily from Markov decision processes and the control theory communities, but it is not the standard form used anywhere.

© 2019 Warren Powell

Page 90: ORF 544 Stochastic Optimization and Learning › wp-content › uploads › ... · ORF 544 Stochastic Optimization and Learning Spring, 2019 Warren Powell ... It is not enough to

Modeling dynamic problems

The system state:

Controls community "Information state"Operations research/MDP/Computer science , , System state, where: Resource state (physical state) Loca

t

t t t t

t

x

S R I KR

tion/status of truck/train/plane Energy in storage Information state Prices Weather Knowledge state ("belief stat

t

t

I

K

e") Belief about traffic delays Belief about the status of equipment

Bizarrely, only the controls community has a tradition of actually defining state variables. We return to state variables later.© 2019 Warren Powell Slide 90

Page 91: ORF 544 Stochastic Optimization and Learning › wp-content › uploads › ... · ORF 544 Stochastic Optimization and Learning Spring, 2019 Warren Powell ... It is not enough to

Modeling dynamic problems

Decisions:Markov decision processes/Computer science Discrete actionControl theory Low-dimensional continuous vectorOperations research Usually a discrete or continuous but high-dimensional

t

t

t

a

u

x

vector of decisions.

At this point, we do not specify to make a decision.Instead, we define the function ( ) (or ( ) or ( )), where specifies the type of policy. " " carries informationabout the type of functi

howX s A s U s

.on , and any tunable parameters ff

© 2019 Warren Powell Slide 91

Page 92: ORF 544 Stochastic Optimization and Learning › wp-content › uploads › ... · ORF 544 Stochastic Optimization and Learning Spring, 2019 Warren Powell ... It is not enough to

Problem classes

Types of decisions» Binary

» Finite

» Continuous scalar

» Continuous vector

» Discrete vector

» Categorical

0,1x X

1,2,...,x X M

,x X a b

1( ,..., ), K kx x x x

1( ,..., ), K kx x x x

1( ,..., ), is a category (e.g. red/green/blue)I ix a a a

© 2019 Warren Powell

Page 93: ORF 544 Stochastic Optimization and Learning › wp-content › uploads › ... · ORF 544 Stochastic Optimization and Learning Spring, 2019 Warren Powell ... It is not enough to

Modeling dynamic problems

Exogenous information:

New information that first became known at time

ˆ ˆ ˆˆ = , , ,

ˆ Equipment failures, delays, new arrivals New drivers being hired to the network

ˆ New customer demands

t

t t t t

t

t

W t

R D p E

R

D

ˆ Changes in pricesˆ Information about the environment (temperature, ...) t

t

p

E

Note: Any variable indexed by t is known at time t. This convention, which is not standard in control theory, dramatically simplifies the modeling of information.

1 2Below, we let represent a sequence of actual observations , ,....

refers to a sample realization of the random variable .t t

W WW W

© 2019 Warren Powell Slide 93

Page 94: ORF 544 Stochastic Optimization and Learning › wp-content › uploads › ... · ORF 544 Stochastic Optimization and Learning Spring, 2019 Warren Powell ... It is not enough to

Modeling dynamic problems

The transition function

1 1

1 1

1 1

1 1

( , , )ˆ Inventoriesˆ Spot pricesˆ Market demands

Mt t t t

t t t t

t t t

t t t

S S S x W

R R x Rp p p

D D D

Also known as the:“System model”“State transition model”“Plant model”“Plant equation”“Transition law”

“Transfer function”“Transformation function”“Law of motion”“Model”

For many applications, these equations are unknown. This is known as “model-free” dynamic programming.

© 2019 Warren Powell Slide 94

Page 95: ORF 544 Stochastic Optimization and Learning › wp-content › uploads › ... · ORF 544 Stochastic Optimization and Learning Spring, 2019 Warren Powell ... It is not enough to

Modeling dynamic problems

The objective function

Dimensions of objective functions» Type of performance metric» Final cost vs. cumulative cost» Expectation or risk measures» Mathematical properties (convexity,

monotonicity, continuity, unimodularity, …)

» Time to compute (fractions of seconds to minutes, to hours, to days or months)

© 2019 Warren Powell Slide 95

Page 96: ORF 544 Stochastic Optimization and Learning › wp-content › uploads › ... · ORF 544 Stochastic Optimization and Learning Spring, 2019 Warren Powell ... It is not enough to

Objective functions» Cumulative reward (“online learning”)

• Policies have to work well over time.» Final reward (“offline learning”)

• We only care about how well the final decision 𝑥 , works.» Risk

1 00

max , ( ), |T

t t t t tt

C S X S W S

,0

ˆmax ( , ) |NF x W S

0 0 0 1 1 1 0max ( , ( )), ( , ( )),..., ( , ( )) |T T TC S X S C S X S C S X S S

Elements of a dynamic model

© 2019 Warren Powell

Page 97: ORF 544 Stochastic Optimization and Learning › wp-content › uploads › ... · ORF 544 Stochastic Optimization and Learning Spring, 2019 Warren Powell ... It is not enough to

The complete model:» Objective function

• Cumulative reward (“online learning”)

• Final reward (“offline learning”)

• Risk:

» Transition function:

» Exogenous information:

1 00

max , ( ), |T

t t t t tt

C S X S W S

Elements of a dynamic model

1 1, ,Mt t t tS S S x W

0 1 2, , ,..., TS W W W

,0

ˆmax ( , ) |NF x W S

0 0 0 1 1 1 0max ( , ( )), ( , ( )),..., ( , ( )) |T T TC S X S C S X S C S X S S

© 2019 Warren Powell

Page 98: ORF 544 Stochastic Optimization and Learning › wp-content › uploads › ... · ORF 544 Stochastic Optimization and Learning Spring, 2019 Warren Powell ... It is not enough to

Elements of a dynamic model

The modeling process» I conduct a conversation with a domain expert to fill in

the elements of a problem:

Statevariables

Decisionvariables

Newinformation

Transitionfunction

Objectivefunction

What we need to know(and only what we need)

What we control

What we didn’t knowwhen we made our decision

How the state variables evolve

Performance metrics

© 2019 Warren Powell

Page 99: ORF 544 Stochastic Optimization and Learning › wp-content › uploads › ... · ORF 544 Stochastic Optimization and Learning Spring, 2019 Warren Powell ... It is not enough to

Course overview

© 2019 Warren Powell Slide 99

Page 100: ORF 544 Stochastic Optimization and Learning › wp-content › uploads › ... · ORF 544 Stochastic Optimization and Learning Spring, 2019 Warren Powell ... It is not enough to

Course overview

Requirements

» Weekly problem sets. These will emphasize modeling and the testing of different algorithmic strategies

» Take-home midterm and final

» “Diary problem” – You will propose a problem of your choosing, and we will periodically return to this problem to develop a model and suggest solution approaches.

» Active participation in class – We have a diverse class with different backgrounds and contexts. I need active discussion to relate the notation to your problems.

© 2019 Warren Powell

Page 101: ORF 544 Stochastic Optimization and Learning › wp-content › uploads › ... · ORF 544 Stochastic Optimization and Learning Spring, 2019 Warren Powell ... It is not enough to

Course overview

Programming

» Most numerical work will require nothing more than Excel.

» We have two libraries available for testing more advanced algorithms:

• Python – This is a newer set of modules.• Matlab – We have an older package, MOLTE, that was

developed to illustrate a wide range of algorithms.

© 2019 Warren Powell

Page 102: ORF 544 Stochastic Optimization and Learning › wp-content › uploads › ... · ORF 544 Stochastic Optimization and Learning Spring, 2019 Warren Powell ... It is not enough to

Course overview

Prerequisites» ORF 544 primarily needs a basic course in statistics and

probability. From this we will need:• General understanding of probability distributions and random

variables.• Basic understanding of conditional probability and Bayes

theorem.• In this course, we will emphasize recursive statistics which is

covered in chapter 3.

» So why is this a grad course?• Translating real problems into notation (mathematical

modeling) requires some patience and an interest in solving problems using analytics.

• Stochastic modeling, which means modeling the flow of decisions and information, can be fairly subtle.

© 2019 Warren Powell

Page 103: ORF 544 Stochastic Optimization and Learning › wp-content › uploads › ... · ORF 544 Stochastic Optimization and Learning Spring, 2019 Warren Powell ... It is not enough to

Course overview

What you have to do.» Most important, ask questions!!! If you are not going to

ask questions, then just read the book!» The course will follow the structure of the book

Stochastic Optimization and Learning. I will present selected topics from most chapters, with the goal that you will feel comfortable reading the rest of the chapter on your own.

» You will best understand the material when you can relate the notation to problems you are familiar with.

» I will do my best to provide a range of applications, but it really helps when you contribute your own problems.

© 2019 Warren Powell