Data-Driven...

23
Data-Driven Optimization John Turner Assistant Professor The Paul Merage School of Business University of California, Irvine October 24, 2014: UCI Data Science Initiative

Transcript of Data-Driven...

Data-Driven Optimization John Turner

Assistant Professor The Paul Merage School of Business

University of California, Irvine

October 24, 2014: UCI Data Science Initiative

Analytics – Unlocking the Potential of Big Data

è  Descriptive – “What is happening?” Querying, Reporting, Data Capturing, Filtering & Analysis

è  Predictive – “What will likely happen?” Statistical Methods (Regression), Forecasting & Data Mining

è  Prescriptive – “What should we do?” Optimization, Simulation, Quantitative Models

John Turner - UCI Data Science Initiative 2

Optimization

█■min 𝑓(𝑥) @                                                          s.t.    𝑔↓𝑖 (𝑥)≤0,  𝑖=1..𝑚@                                                                       ℎ↓𝑖 (𝑥)=0,  𝑖=1..𝑝@                    𝑥∈𝑋 

John Turner - UCI Data Science Initiative 3

Applica'ons  for  Op'miza'on  

logistics chip design

timetabling production planning

air traffic routing

scheduling

supply chain management

portfolio optimization

marketing campaign design

network design

Problems,  Formula'ons,  and  Instances  

Define Problem

Construct Mathematical Model (Formulation)

Solve one or more Instances

Communicate Results / Automate the Process

Algebraic Manipulation

Solver (CPLEX, Gurobi, KNITRO, etc.)

John Turner - UCI Data Science Initiative 5

Computational Challenges

█■min 𝑓(𝑥) @                                                          s.t.    𝑔↓𝑖 (𝑥)≤0,  𝑖=1..𝑚@                                                                       ℎ↓𝑖 (𝑥)=0,  𝑖=1..𝑝@                    𝑥∈𝑋 

Nonconvex objective

Too many constraints

Nonlinear constraints

Too many variables

Nonconvex domain (e.g., discrete 𝑋) )

à Large-Scale Instances Require Specialized Decomposition or Approximation Schemes

John Turner - UCI Data Science Initiative 6

Two Examples 1.  Planning Online Advertising

Turner, J. 2012. The Planning of Guaranteed Targeted Display Advertising. Operations Research, 60(1) 18--33.

2.  Locating Trauma Centers at Nation-Scale Cho, S-H., H. Jang, T. Lee, J. Turner. 2014. Simultaneous Location of Trauma Centers and Helicopters for Emergency Medical Service Planning. Operations Research, 62(4) 751--771.

John Turner - UCI Data Science Initiative 7

John Turner - UCI Data Science Initiative 8

Different “viewer types” John Turner - UCI Data Science Initiative 9

Different “viewer types” John Turner - UCI Data Science Initiative 10

Different “targeting requirements” John Turner - UCI Data Science Initiative 11

Impression goals for each advertiser

1

5

2

1

Impression Goals

John Turner - UCI Data Science Initiative 12

1

3

2

2

1

1

5

2

1

Impression Goals

4

5

Supply

We use optimization to find an “optimal allocation”

John Turner - UCI Data Science Initiative 13

Targeting can specify: •  gender •  age •  geographic region •  income level •  day of week / time of day •  the sports you like, etc.

Combinatorial Explosion

Trillions of possible combinations!!

John Turner - UCI Data Science Initiative 14

Supply highly variable Expected supply close to zero!

Questions: 1.  What is “best” way to aggregate? 2.  How to interpret aggregate solution? 3.  How good is aggregate solution?

Solve

Disaggregate

Aggregate

John Turner - UCI Data Science Initiative 15

Implications •  Allow the optimization to segment viewers

into buckets •  This becomes an iterative process

•  Solve à Learn à Revise à Repeat •  Data requirements change at each iteration

John Turner - UCI Data Science Initiative 16

Data-­‐Driven  Op'miza'on  

Sequen&al  Process  1.  Construct  Es&mates  of  

Model  Parameters  (Forecas&ng  /  Data  Mining)  

2.  Solve  Op&miza&on  Model  

Itera&ve  Process  1.  Start  with  Coarse  

Approxima&ons  for  Model  Parameters  (Cheap  Forecasts)  

2.  Solve  Op&miza&on  Model  

3.  Learn  &  Revise  Model  4.  Re-­‐sample  Data  to  

Improve  Forecasts  5.  Repeat  Steps  2-­‐4  

Take-away: Can often get to within 1% of optimality with less than 100 judiciously-chosen audience segments

17  

Example  2:  Demand  for  Trauma  Care  in  Korea

l  Popula&on    ~  50  million    l  Area    100,210  km2  (~  Indiana) l  7  metro  ci&es  &  9  provinces  

l  Number  of  trauma  cases**    =  190,196*  l  ~  382.1  trauma  /  100k  

10 50 100 200 400 600 600~ 0 * Stats in year 2008 ** Patients with EMR-ISS score ≥ 15 18  

Candidate Trauma Center Sites (n=38)

Candidate Heliport Sites (n=54 total: 16 NEMA*

+ 38 @ TC’s)

Candidate  Sites  

*NEMA = Korean National Emergency Management Agency sites (heliports currently used for fire & other EMS missions) 19

Decoupled

N

o-Congestion

Integrated

T10H5 T10H15 T10H25

indicates the sites chosen for T10H5;

indicates the sites chosen for T10H15 but not for T10H5;

indicates the sites chosen for T10H25 but not for T10H15

For each approach (decoupled, no-congestion, and integrated),

Take-away: Solutions of varying quality can look visually similar.

20

10 trauma centers

Successful  Pa'ents  (%)  

12 trauma centers 14 trauma centers

50%

60%

70%

80%

90%

100%

T10H

5

T10H

10

T10H

15

T10H

20

T10H

25

Integrated NoCongestion Decoupled

T14H

5

T14H

10

T14H

15

T14H

20

T14H

25

T12H

5

T12H

10

T12H

15

T12H

20

T12H

25

# Helicopters # Helicopters # Helicopters

Take-away: An integrated model solved approximately is often better than solving two more-precise models in sequence.

21

Big  Data,  Big  Challenges  Conclusions:    l  Predic&on  and  Op&miza&on  work  well  together  l  Predict  AND  Op&mize,  instead  of                                    Predict  THEN  Op&mize.  

22

Data-Driven Optimization John Turner

Assistant Professor The Paul Merage School of Business

University of California, Irvine

October 24, 2014: UCI Data Science Initiative