The Optimizing-Simulator: Merging Optimization and ......Special maintenance at airbase -1000...

The Optimizing-Simulator: MergingOptimization and Simulation Using

Approximate Dynamic Programming

Winter Simulation ConferenceDecember 5, 2005

Warren PowellCASTLE LaboratoryPrinceton University

http://www.castlelab.princeton.edu

Yellow Freight System

The fractional jet ownership industry

Schneider National

Air Mobility Command

AirMobility

Command

Cargo HandlingRamp Space

Maintenance

Cargo Holding

The challenges

Needs for simulation:» Are we using the right mix of people and equipment?» What is the effect of new policies regarding the

management of people and equipment?» What is the marginal contribution from serving

customers?» What is the effect of last-minute demands on the

system?

The challenges

We need simulation technology that accomplishes the following:» Decisions have to handle high dimensional states and

actions (assigning different types of resources to different types of tasks).

» The simulator has to capture behaviors that produce “good” behaviors not just at a point in time, but over time (decisions have to think about the future).

» Performance statistics must match historical performance.

Outline

Modeling and problem representation

Modeling

Resources can have a number of attributes:

LocationEquipment type⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦

LocationETA

Equipment typeTrain priority

PoolDue for maint

Home shop

⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦

LocationETA

A/C typeFuel level

Home shopCrewEqpt1

Eqpt100

⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦

LocationETA

Bus. segmentSingle/team

DomicileDrive hoursDuty hours

8 day historyDays from home

⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦

Modeling

The attribute vector

The resource state variable

( )Number of resources with attribute at time .

Resource state variableta

t ta a

R a tR R

⎡ ⎤⎢ ⎥⎢ ⎥=⎢ ⎥⎢ ⎥⎣ ⎦

Modeling

Decision set function:( ) Set of decision types we can use to act

on a resource with attribute .a

⎡ ⎤⎢ ⎥⎢ ⎥=⎢ ⎥⎢ ⎥⎣ ⎦

Modified resource label1ta +d

Modeling

The “modify” function

The information process

1 ( , , ) ( , )t t t tM a W d a c− =

Vector of information arriving during time interval .

Ex: new customer requests, equipment failures, weather delays.

Modeling

Decisions

The decision function

( )t t tx X Iπ=

Set of decision functions (policies)π ∈Π =

Information available for making a decision

Number of resources with attribute that we can act on with decision using the information available at time .

t tad a d

x ad t

x x∈ ∈

Approximate dynamic programming

Information and decision processes:

1x 2x 3x 4x 5x 6x0x

2W 3W 4W 5W 6W

Exogenous information process

Decisions determined by a policy

Modeling

System dynamics (classical view):

Given a decision function (policy) ( ) andexogenous information process , we can modelthe evolution of the state of our system using:

( , ( ), )

t t t t t

S f S X S W

π+ +=

Modeling

( )t tX Sπ

Modeling

User provides:Model of physical system

Data: Resource vector Information process Software: Decision set function Modify function ( , , )

D aM a d W +

Our research goal:The decision function

Decision functions ( )t tX Iπ

Outline

The optimizing simulator

Optimizing over time

Resources

t t+1 t+2

Optimizing at a point in time

t = t + 1

Make decision at time t

Update system stateat t+1

t < T ???

Classical simulation:» Simple» Extremely flexible

But . . .» Limited solution

quality» Often requires

extensive user defined tables to guide the simulation.

» Can respond to changes in inputs in an unpredictable way.

Optimization» Intelligent» Responds naturally to

new datasets.But . . .» Struggles to handle

complexity of real operations.

» Does not model evolution of information.

» Might be “too intelligent”?

t t t t tt

A x B x b

D x ux

− −− =

≤≥

Multicommodity flowTime

Simulation» Strengths

• Extremely flexible• High level of detail

» Weaknesses• Low level of “intelligence”• Lower solution quality• May have difficulty

“behaving” properly with new scenarios.

• Difficulty adapting to random outcomes.

Optimization» Strengths

• High level of intelligence• System behaves “optimally”

even with new datasets• Reduces data set preparation.

» Weaknesses• Strict rules on problem structure• Low level of detail• Inflexible!

To simulate or to optimize . . .

. . . Why are we asking this question?

Decision-making technologies

Cost-based» The standard assumption of

math programming.» Easily handles tradeoffs.» Easily handles high

dimensions.» Can be difficult to tune to

get the right behavior.

Rule-based» Typically associated with AI.» Very flexible.» Difficult coding tradeoffs.» Struggles with higher

dimensional states.

Expert knowledge ρ

The four information classes

Forecasts of impacts on others tV

tΩForecasts of exogenous events

Knowledge tK

The four information classes

Knowledge tK

Knowledge

Rule-based: one aircraft and one requirement

California

Germany

New Jersey

Colorado

Taiwan

England

New Jersey

Aircraft Requirements

Knowledge

Cost based: one requirement and multiple aircraft

California

Germany

New Jersey

Colorado

Taiwan

England

New Jersey

Knowledge

Costs allow you to make tradeoffs:

California

Germany

-8000Total “cost”-1000Special maintenance at airbase-3000Requires modifications+8000Utilization+5000Appropriate a/c type

-$17,000Repositioning cost“cost”/“bonus”Issue

Knowledge

Cost based: multiple requirements and aircraft

California

Germany

New Jersey

Colorado

Taiwan

England

New Jersey

The information classes

Knowledge tK

Forecasts of exogenous information

California

Germany

New Jersey

Colorado

Taiwan

England

New Jersey

( ) involves solving a linear program/network model.X Iπ

Resources that are known now…

California

Germany

New Jersey

Colorado

Taiwan

England

New Jersey

( ) involves solving a linear program/network model.X Iπ

CaliforniaGermany

New Jersey

Colorado

TaiwanEngland

New Jersey

Resources that are known now…

Aircraft Requirements California

Germany

New Jersey

Colorado

TaiwanEngland

New Jersey

⎧⎪⎪= ⎨⎪⎪⎩

CaliforniaGermany

New Jersey

Colorado

Taiwan

England

New Jersey

( )' 't t tR

⎧⎪⎪⎨⎪⎪⎩

… and are forecasted for the future.

The Information classes

Knowledge tK

Decisions now may need to know the impact on future decisions:» What is the cost of assigning this type of aircraft to

move a requirement?» What is the value of having a certain number of aircraft

in a region?» Should this requirement be satisfied now? Later?

Never?

For these questions, it is important that we optimize over time.

Time tV(a’)

V(a’’)

Time t '1( )V a

'2( )V a

The optimization challenge

State variables

Systems evolve through a cycle of exogenous and endogenous information

1x 2x 3x 4x 5x 6x0x

2R̂ 3R̂ 4R̂ 5R̂ 6R̂ω =

State variables

Systems evolve through a cycle of exogenous and endogenous information

1x 2x 3x 4x 5x 6x0x

2R̂ 3R̂ 4R̂ 5R̂ 6R̂

1R 2R 3R 4R 5R 6R0R

Using this state variable, we obtain the optimality equations:

Problem: Curse of dimensionality

{ }1 1( ) max ( , ) ( ) |t t t t t t t txV R C R x E V R R+ +∈

Three curses

State spaceOutcome spaceAction space (feasible region)

The computational challenge:

{ }1 1( ) max ( , ) ( ) |t t t t t t t txV R C R x E V R R+ +∈

How do we find ? 1 1( )t tV R+ +

How do we compute the expectation?

How do we find the optimal solution?

A possible approximation strategy:

( ){ }1 1

We start with:

( ) max ( , ) |t t t t t t t tt

V R C R x E V R Rx + += +

Can’t compute this!!!

( )1 1

We solve this for a sample realization:

( , ) max ( , ) ( )t t t t t t tt

V R C R x V Rxω ω+ += +

( )1 1

Now substitute in function approximations:

( , ) max ( , ) ( )t t t t t t tt

Don’t know what this is!

Need to approximate V

One big problem….

( )1 1( , ) max ( , ) ( )t t t t t t tt

1Seeing is cheating!tR +

Alternative: Change the definition of the state variable:

1x 2x 3x 4x 5x 6x0x

2R̂ 3R̂ 4R̂ 5R̂ 6R̂

1R 2R 3R 4R 5R 6R0R 1R 2R 3R 4R 5R 6R0R 1R 2R 3R 4R 5R 6R0R 1R 2R 3R 4R 5R 6R0R 3R1R 2R 4R 5R0R

Approximate dynamic programmingNow our optimality equation looks like:

We drop the expectation and solve the conditional problem:

Finally, we substitute in our approximation:

{ }1, 1 1( ) max ( , ) ( ( , )) |t

x x xt t t t t t t t t tx

V R E C R x V R x Rω− − −∈= +

( )( )1 1 ( ) )ˆ( , ( )) max ( ( ), ( )) ,x x

t t t t t t t t txV R R C R x V R x

ω ωω ω ω ω− − ∈

( )( )1 1 ( ) )ˆ( , ( )) max ( ( ), ( )) ,x x

t t t t t t t t txV R R C R x V R x

ω ωω ω ω ω− − ∈

Expectation outside of the “max” operator.

Post-decision state variable

“Convenient” value function approximation.

Approximating the value function:» We choose approximations of the form:

Linear (in the resource state):

Piecewise linear, separable:

( ) ( )

t t ta taa

V R v R

V R V R

Best when assets are complex,which means that is small(typically 0 or 1).

Best when assets are simple,which means that may belarger.

A myopic decision rule (policy):

A decision rule that looks into the future:

( )( )( ) )

arg max ( ( ), ( )) ,n xt t t t t t t

xx C R x V R x

ω ωω ω ω

∈= +

( ) )arg max ( ( ), ( ))n

t t t tx

x C R xω ω

ω ω∈

t t+1 t+2Simulating a myopic policy:

A myopic decision rule (policy):

A decision rule that looks into the future:

( )( )( ) )

arg max ( ( ), ( )) ,n xt t t t t t t

xx C R x V R x

ω ωω ω ω

∈= +

( ) )arg max ( ( ), ( ))n

t t t tx

x C R xω ω

ω ω∈

'1( )V a

'2( )V a

Option 1: Send directly to customersOption 2: Send to regional depotsOption 3: Send to classification yards

Classification yards

Approximate dynamic programmingTwo-stage resource allocation under uncertainty

Approximate dynamic programmingWe obtain piecewise linear recourse functions for each regions.

Approximate dynamic programmingThe function is piecewise linear on the integers.

We approximate the value of cars in the future using a separable approximation.

0 1 2 3 4 5Number of vehicles at a location

Approximate dynamic programmingTo capture nonlinear behavior:

Each link captures the marginalreward of an additional car.

1nR →

2nR →

3nR →

4nR →

5nR →

Approximate dynamic programmingWe estimate the functions by sampling from our distributions.

1nR →

2nR →

3nR →

4nR →

5nR →

1 ( )nD ω

2 ( )nD ω

3 ( )nD ω

( )nCD ω

1( )nv ω

2 ( )nv ω

3 ( )nv ω

4 ( )nv ω

5 ( )nv ω

Marginal value:

The time t subproblem:

t1 2 3( , , )n

ta t t tV R R R(i-1,t+3)

(i,t+1)

(i+1,t+5)

Gradients:ˆ ˆ( , )ˆ ˆ( , )ˆ ˆ( , )

n nt t

n nt tn nt t

Left and right gradients are found by solving flow augmenting path problems.

1 2 3( , , )nta t t tV R R R

(i-1,t+3)Gradients:

3ˆ( )ntv +

The right derivative (the value of one more unit of that resource) is a flow augmenting path from that node to the supersink.

Left and right derivatives are used to build up a nonlinear approximation of the subproblem.

1( )kit tV R

Left and right derivatives are used to build up a nonlinear approximation of the subproblem.

ktv−

Right derivativeLeft derivative

1( )kit tV R

Each iteration adds new segments, as well as refining old ones.

( 1)ktv+ +

( 1)ktv− +

R1tk+1

1( )kit tV R

0 1 2 3 4 5 6 7 8 9 10

Variable Value, s

Exact1 Iter2 Iter5 Iter10 Iter15 Iter20 Iter

Number of resources

Simulating a myopic policy

Using value functions to anticipate the future

“Here and now” Downstream impacts

1001 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73 76 79 82 85 88 91 94 97 100

Iteration No.

Agg_PWLinear_1

Agg_PWLinear_2

Agg_PWLinear_3

DisAgg_Linear

DisAgg_PWLinear

Decomp_Location

The mathematical optimum

Approximate DP vs. LP

Downloadable atwww.castlelab.princeton.edu

Expert knowledge ρ

Knowledge tK

Low dimensional patterns

Old modeling approach: Engineering costs

0, :Subject tominarg*

xbAxcxx

Objectives

“Physics”

“Behavior”

Flows from history

Flows from the model

Bottom up/top down modeling:

Specify the behaviorsyou want at a general

level.

Patterns

Specify costs,driver availability,work rules, routing

preferences, load avail.

Engineering

Pattern matching

* arg min ( , )x cx H xθ ρ= +

Cost function

“Behavior”

The “happiness” function –measures the degree to which model behavior agrees with a knowledgeable expert.

( , ) || ( ) || where ( ) is an aggregation functionH x G x G xρ ρ= −

Patterns and aggregation:» What we do:

• We define patterns based on an aggregation of the attributes of a single vehicle.

• Patterns indicate the desirability of a single decision.

» Patterns can be expressed at different levels of aggregation, simultaneously.

• Don’t send C-5’s into Saudi Arabia• Don’t send C-5’s needing maintenance into Saudi Arabia• Don’t send C-5’s needing maintenance loaded with freight to

southeast Asia into Saudi Arabia.

» Patterns are not hard rules – they express desirable or undesirable patterns of behavior.

Flows from history

Length of haul calibration-teams

1 2 3 4 5 6 7 8 9 10

Iteration

MinSolo w/ patternSolo w/o patternMax

Without pattern

With pattern

Patterns can come from history:

Low dimensional patterns… or an expert:

Expert knowledge ρ

Knowledge tK

The military airlift problem

(EK)Expert knowledge

(ADP)Approximate Dynamic Programming

(RH)Rolling horizon

(MP:RL-AL/KNAF)

Myopic cost-based, a list of requirements to a list of aircraft, known now and actionable in the future

(MP:RL-AL/KNAN)

Myopic cost-based, a list of requirements to a list of aircraft, known now and actionable now

(MP:R-AL/KNAF)

Myopic cost-based, one requirement to a list of aircraft, known now and actionable in the future

(MP:R-AL/KNAN)

Myopic cost-based, one requirement to a list of aircraft, known now and actionable now

(RB:R-A)Rule-based

Decision Functions

Information ClassesPolicy

ttt RI =

),( tttt cRI =

),)(( tttttt cRI ≥′′=

),( tttt cRI =

),)(( tttttt cRI ≥′′=

}|,){( ''''''ph

ttttttt tcRI T∈′= ′≥

}|,,){( phttttttttt tVcRI T∈′= ′′≥′′

}|,,,){( phttttttttt tVcRI T∈′= ′′≥′′ ρ

Optimizing simulator

Increasing information sets

Costs of different policies

(RB:R-A)(MP:RL-AL/KNAN)

Policies

Transportation cost

Late delivery cost

Repair cost

Total cost

RuleBased

Value functions

Actionablefuture

ActionableNow

Choice ofaircraft

Throughput curves of policies

0 30 60 90 120 150 180 210

Time periods

Cumulative expected thruput(RB:R-A)(MP:R-AL/KNAN)(MP:RL-AL/KNAN)(MP:RL-AL/KNAF)(ADP)

Throughput curves of policies

0 30 60 90 120 150 180 210

Time periods

Cumulative expected thruput(RB:R-A)(MP:R-AL/KNAN)(MP:RL-AL/KNAN)(MP:RL-AL/KNAF)(ADP)

Areas between the cumulative expected thruput curve and different policy thruput curves

(RB:R-A)(MP:R-AL/KNAN)

(MP:RL-AL/KNAN)

(MP:RL-AL/KNAF)

Policy

Outline

Recent experiments with modeling airlift operations

Random demands and equipment failures

Pilots

Aircraft

Customers

Case study

Questions:

» What is the effect of uncertain demands on a military airlift schedule?

» What is the effect of equipment failures?

» How does adaptive learning change the effect of randomness on the performance of the simulation?

» What is the effect of advance information?

250000

260000

270000

280000

290000

300000

310000

320000

330000

1 9 17 25 33 41 49 57 65 73 81 89 97

Determ demand|NoBreak|LearnDeterm demand|Break|Learn

Determ demand|Break|No learn

Random demand|No Break|NolearnRandom demand|Break|Nolearn

Iterative learning

250000

260000

270000

280000

290000

300000

310000

320000

330000

1 9 17 25 33 41 49 57 65 73 81 89 97

Determ demand|NoBreak|LearnDeterm demand|Break|Learn

Random demand|No Break|NolearnRandom demand|Break|Nolearn

Deterministic demands, no failures

With learning

Without learning

250000

260000

270000

280000

290000

300000

310000

320000

330000

1 9 17 25 33 41 49 57 65 73 81 89 97

Determ demand|No Break|Learn

Determ demand|Break|Learn

Random demand|Nobreak|Learn

Determ demand|No Break|Nolearn

Random demand|Break|Learn

Random demand|No Break|Nolearn

Random demand|Break|Nolearn

Deterministic demands, with failures

With learning

Without learning

250000

260000

270000

280000

290000

300000

310000

320000

330000

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99

Random demand|No break|Learn

Determ demand|No Break|No learn

Random demand|No Break|No learn

Random demand|Break|No learn

Random demands, no failures

With learning

Without learning

250000

260000

270000

280000

290000

300000

310000

320000

330000

1 9 17 25 33 41 49 57 65 73 81 89 97

Random demand|Nobreak|Learn

Determ demand|No Break|Nolearn

Random demand|No Break|Nolearn

Random demand|Break|Nolearn

Random demands, with failures

With learning

Without learning

Effect of advance notice

Prebook 0 hours Prebook 2 hours Prebook 6 hours

Effect of advance booking

Withoutlearning

Effect of advance booking

Effect of advance notice

Prebook 0 hours Prebook 2 hours Prebook 6 hours

Withoutlearning

Withlearning

Midair refueling: initial solution

Path followed by tanker (moves up and down Atlantic).

Second plane crashes

First plane refuels

Green: full of fuelYellow to red: nearing emptyBlack: empty (plane crashes)

Midair refueling: exploration

Learning over many iterations.

Planes learn to meet in the middle so both can refuel.

Midair refueling: final solution

Outline

Calibrating a model for a major truckload motor carrier

Schneider National

Truckload trucking

Questions for the model:» What types of drivers should they hire?

• Domicile?• Single drivers vs. teams?

» What is the value of knowing about customer requests farther in the future?

» What is the profitability of different customers?» What is the value of increasing terminal capacity?

US_SOLO US_IC US_TEAM

Capacity category

Historical maximumSimulationHistorical minimum

Truckload trucking

Revenue per WU

Utilization

Capacity category

Historical maximumSimulationHistorical minimum

Capacity category

ion Historical maximum

SimulationHistorical minimum

Truckload trucking

Challenge» We want to know the marginal value of each type of

driver.» A driver type is determined by:

» There are 30,000 driver “types”!!!» We need to take the “derivative” of our simulation for

each type.

Location 100Domicile 100

Driver type 3a

⎡ ⎤ ⎡ ⎤⎢ ⎥ ⎢ ⎥= =⎢ ⎥ ⎢ ⎥⎢ ⎥ ⎢ ⎥⎣ ⎦ ⎣ ⎦

Time2+t1+tt

Multistage problems

( )t tX Rπ

3ˆntv

2ˆntv

1ˆntv

2+t1+t

Multistage problems

1 1( )t tX Rπ

1,2ˆntv +

1,1ˆntv +

1,3ˆntv +

Multistage problems

2 2( )t tX Rπ

+ +2,1ˆn

2,2ˆntv +

2,3ˆntv +

Time2+t1+tt

Multistage problems

( )t tX Rπ

3ˆntv

2ˆntv

1ˆntv

2+t1+t

Multistage problems

1 1( )t tX Rπ

1,2ˆntv +

1,1ˆntv +

1,3ˆntv +

Multistage problems

2 2( )t tX Rπ

+ +2,1ˆn

2,2ˆntv +

2,3ˆntv +

( )t tX Rπ

1 1( )t tX Rπ

+ + 2 2( )t tX Rπ

Backward pass

2,1ˆntv +

Backward pass

2+t1+t

1,2ˆntv +

Backward pass

2+t1+tt

3ˆntv

Backward pass

2+t1+tt

3ˆntv

Backward pass

Driver fleet optimization

simulation objective function

1800000

1810000

1820000

1830000

1840000

1850000

1860000

1870000

1880000

1890000

1900000

580 590 600 610 620 630 640 650

# of drivers

Base case+5 resources

+20 resources+30 resources+40 resources

+50 resources+60 resources

+10 resources

1800000

1810000

1820000

1830000

1840000

1850000

1860000

1870000

1880000

1890000

1900000

580 590 600 610 620 630 640 650

# of drivers

1800000

1810000

1820000

1830000

1840000

1850000

1860000

1870000

1880000

1890000

1900000

580 590 600 610 620 630 640 650

# of drivers

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Driver types

Add drivers

Reduce drivers

Questions?

The Optimizing-Simulator: Merging Optimization and ......Special maintenance at airbase -1000...

Documents

Transcript of The Optimizing-Simulator: Merging Optimization and ......Special maintenance at airbase -1000...

Thule airbase arctic uk editorial dec 2012 k1

Brand Repositioning

Range Operations V22 - · PDF fileKOTAR Range Operations: ... POPUP event. The closest airbases in case of emergency are Yongju airstrips, yechon airbase (026X) and Kangnung airbase

Repositioning Hells Kitchen

Airtel RePositioning

2020 Public Housing Repositioning: Wednesday Webinar ... · Public Housing Repositioning Wednesday Webinar Series: Developing a Repositioning Strategy May 13, 2020. U.S. DEPARTMENT

Repositioning Strategy

Carmex Repositioning

AIRBASE ROAD | COLUMBUS, OHIO

L'Oreal Repositioning

The Baros Repositioning Sheet - Home - 1st Call … Baros Repositioning Sheet The Baros Repositioning Sheet aids the otherwise manual task of repositioning & turning a patient in bed.

47776544 Brand Repositioning

repositioning of

Merging & Blending - Topaz Labstopazlabs.com/remask_static/remask_mergingtutorial.pdf · 2018-12-14 · Merging & Blending 2 Merging & Blending Masking is the hard part, but merging

Mo'Men Repositioning Summary

Equipment Repositioning Optimization

Repositioning of Morlboro

Equity repositioning strategies

Oriflame Repositioning Project

Russian S-400 SAM System Range Radius... · 2015. 12. 21. · Erkilet International Airport, Turkey Konya Airbase, Turkey RAF Akrotiri Airbase, Cyprus King Hussein Airbase, Jordan