Data-Driven...
Transcript of Data-Driven...
Data-Driven Optimization John Turner
Assistant Professor The Paul Merage School of Business
University of California, Irvine
October 24, 2014: UCI Data Science Initiative
Analytics – Unlocking the Potential of Big Data
è Descriptive – “What is happening?” Querying, Reporting, Data Capturing, Filtering & Analysis
è Predictive – “What will likely happen?” Statistical Methods (Regression), Forecasting & Data Mining
è Prescriptive – “What should we do?” Optimization, Simulation, Quantitative Models
John Turner - UCI Data Science Initiative 2
Optimization
█■min 𝑓(𝑥) @ s.t. 𝑔↓𝑖 (𝑥)≤0, 𝑖=1..𝑚@ ℎ↓𝑖 (𝑥)=0, 𝑖=1..𝑝@ 𝑥∈𝑋
John Turner - UCI Data Science Initiative 3
Applica'ons for Op'miza'on
logistics chip design
timetabling production planning
air traffic routing
scheduling
supply chain management
portfolio optimization
marketing campaign design
network design
Problems, Formula'ons, and Instances
Define Problem
Construct Mathematical Model (Formulation)
Solve one or more Instances
Communicate Results / Automate the Process
Algebraic Manipulation
Solver (CPLEX, Gurobi, KNITRO, etc.)
John Turner - UCI Data Science Initiative 5
Computational Challenges
█■min 𝑓(𝑥) @ s.t. 𝑔↓𝑖 (𝑥)≤0, 𝑖=1..𝑚@ ℎ↓𝑖 (𝑥)=0, 𝑖=1..𝑝@ 𝑥∈𝑋
Nonconvex objective
Too many constraints
Nonlinear constraints
Too many variables
Nonconvex domain (e.g., discrete 𝑋) )
à Large-Scale Instances Require Specialized Decomposition or Approximation Schemes
John Turner - UCI Data Science Initiative 6
Two Examples 1. Planning Online Advertising
Turner, J. 2012. The Planning of Guaranteed Targeted Display Advertising. Operations Research, 60(1) 18--33.
2. Locating Trauma Centers at Nation-Scale Cho, S-H., H. Jang, T. Lee, J. Turner. 2014. Simultaneous Location of Trauma Centers and Helicopters for Emergency Medical Service Planning. Operations Research, 62(4) 751--771.
John Turner - UCI Data Science Initiative 7
Impression goals for each advertiser
1
5
2
1
Impression Goals
John Turner - UCI Data Science Initiative 12
1
3
2
2
1
1
5
2
1
Impression Goals
4
5
Supply
We use optimization to find an “optimal allocation”
John Turner - UCI Data Science Initiative 13
Targeting can specify: • gender • age • geographic region • income level • day of week / time of day • the sports you like, etc.
Combinatorial Explosion
Trillions of possible combinations!!
John Turner - UCI Data Science Initiative 14
Supply highly variable Expected supply close to zero!
Questions: 1. What is “best” way to aggregate? 2. How to interpret aggregate solution? 3. How good is aggregate solution?
Solve
Disaggregate
Aggregate
John Turner - UCI Data Science Initiative 15
Implications • Allow the optimization to segment viewers
into buckets • This becomes an iterative process
• Solve à Learn à Revise à Repeat • Data requirements change at each iteration
John Turner - UCI Data Science Initiative 16
Data-‐Driven Op'miza'on
Sequen&al Process 1. Construct Es&mates of
Model Parameters (Forecas&ng / Data Mining)
2. Solve Op&miza&on Model
Itera&ve Process 1. Start with Coarse
Approxima&ons for Model Parameters (Cheap Forecasts)
2. Solve Op&miza&on Model
3. Learn & Revise Model 4. Re-‐sample Data to
Improve Forecasts 5. Repeat Steps 2-‐4
Take-away: Can often get to within 1% of optimality with less than 100 judiciously-chosen audience segments
17
Example 2: Demand for Trauma Care in Korea
l Popula&on ~ 50 million l Area 100,210 km2 (~ Indiana) l 7 metro ci&es & 9 provinces
l Number of trauma cases** = 190,196* l ~ 382.1 trauma / 100k
10 50 100 200 400 600 600~ 0 * Stats in year 2008 ** Patients with EMR-ISS score ≥ 15 18
Candidate Trauma Center Sites (n=38)
Candidate Heliport Sites (n=54 total: 16 NEMA*
+ 38 @ TC’s)
Candidate Sites
*NEMA = Korean National Emergency Management Agency sites (heliports currently used for fire & other EMS missions) 19
Decoupled
N
o-Congestion
Integrated
T10H5 T10H15 T10H25
indicates the sites chosen for T10H5;
indicates the sites chosen for T10H15 but not for T10H5;
indicates the sites chosen for T10H25 but not for T10H15
For each approach (decoupled, no-congestion, and integrated),
Take-away: Solutions of varying quality can look visually similar.
20
10 trauma centers
Successful Pa'ents (%)
12 trauma centers 14 trauma centers
50%
60%
70%
80%
90%
100%
T10H
5
T10H
10
T10H
15
T10H
20
T10H
25
Integrated NoCongestion Decoupled
T14H
5
T14H
10
T14H
15
T14H
20
T14H
25
T12H
5
T12H
10
T12H
15
T12H
20
T12H
25
# Helicopters # Helicopters # Helicopters
Take-away: An integrated model solved approximately is often better than solving two more-precise models in sequence.
21
Big Data, Big Challenges Conclusions: l Predic&on and Op&miza&on work well together l Predict AND Op&mize, instead of Predict THEN Op&mize.
22