Optimization Issues for Huge Datasets and Long Computation
description
Transcript of Optimization Issues for Huge Datasets and Long Computation
Optimization Issues for Huge Datasets and Long
Computation
Michael FerrisUniversity of Wisconsin, Computer Sciences
Qun Chen, Jin-Ho Lim, Jeff Linderoth, Miron Livny, Todd Munson, Mary Vernon, Meta
Voelker
Update on Gamma Knife
• In use at U. Maryland Hospitals• Covered by Business Week (Apr 2001)
• Better models, faster solution• Requires less user input
• Skeletonization is key improvement
Skeleton Starting Pointsa. Target area
10 20 30 40
10
20
30
40
b. A single line skeleton of an image
10 20 30 40
10
20
30
40
c. 8 initial shots are identified
1-4mm, 2-8mm, 5-14mm10 20 30 40
10
20
30
400.5
1
1.5
2
1-4mm, 2-8mm, 5-14mm
d. An optimal solution: 8 shots
10 20 30 40 50
10
20
30
40
50
Run Time Comparison
Average Run Time
Size of Tumor
Small Medium Large
Random(Std. Dev)
2 min 33 sec
(40 sec)
17 min 20 sec
(3 min 48 sec)
373 min 2 sec
(90 min 8 sec)
SLSD(Std. Dev)
1 min 2 sec(17 sec)
15 min 57 sec
(3 min 12 sec)
23 min 54 sec
(4 min 54 sec)
Data Mining & Optimization
C om p u ta tionP rocessor/M em ory
A lg orith m sO p tim iza tion
M od e lsS ta tis t ica l/A I
D a ta M in in g A p p lica tionD atab ases
Prediction, Categorization,
Separation
Equations, LP, QP,
MIP, NLP
GAMS, Matlab, so/dll
Serial, Parallel
, Condor
Optimization
• Global• Exact• Constrained• Stochastic• Large scale• Fast convergence• CPU + Memory +
Smarts
• Local• Approximate• Unconstrained• Deterministic• Small scale• Termination
MIP formulation
minimize cTxsubject to Ax b
l x uand some xj integer
Problems are specified by application convenient format - GAMS, AMPL, or MPS
Data delivery: pay-per-view
• Optimization model for regional caches:
minimize: Cremote + P Cregional
over all possible cached objects/segments
subject to:– Cregional Nchannels
regional storage Nsegments
regional server stores 0, k or K segments of each object
• MIP (large number of objects/segments)
The “Seymour Problem”
• Set covering problem used in proof of four color theorem
• CPLEX 6.0 and Condor (2 option files)• Running since June 23, 1999• Currently >590 days CPU time per
job• (13 million nodes; 2.4 million nodes)
FAT COP
• FAT - large # of processors – opportunistic environment (Condor)
• COP - Master Worker control– fault tolerant: task exit, host suspend– portable parallel programming
• Mixed Integer Program Solver– Branch and Bound: LP relaxations – MPS file, AMPL or GAMS input
GAMS AMPLMPS
FATCOPFATCOP
MW
Condor-PVM
CPLEXOSL
SOPLEXMINOS
...
Application Problem
PVM
Internet Protocol
LPSO
LVER
INTER
FAC
E
MIP Technology
• Each task is a subtree, time limit– Diving heuristic– Cutting planes (global)– Pseudocosts– Preprocessing
• Master checkpoint• Worker has state, how to share
info?
FATCOP Daily Log
Note machine reboot at approx 3:00 am (night)
Back to Seymour
• Schmieta, Pataki, Linderoth and MCF– explored to depth 8 in tree– applied cuts at each of these 256 nodes– solved in parallel, using whatever resources
available (CPLEX, FATCOP,...)
• Problem solved with over 1 year CPU– over 10 million nodes, 11,000 hours
Seymour Node 319
• FATCOP– 47.0 hrs with 2,887,808 nodes– average number of machine used is
108
• CPLEX– 12 days, 10 hrs with 356,600 nodes– single machine, clique cuts useful
Large datasets
• Enormous computational resources can sometimes facilitate solution
• X-validation, slice modeling
• What about the data? • In particular, what if the problem
does not fit in core?
NCP functions
þ(a;b) = 0( ) 0ô a ? bõ 0Definition:
þmin(a;b) := minf a;bgExample:
þFB(a;b) := a2 + b2p
à aà bExample:
Ð(x) = 0 ( ) 0ô x ? F(x) õ 0
Ði(x) := þ(xi;F i(x))Componentwise definition:
Semismooth results
How can you use these?
• Specialized codes– Asynchronous I/O
• Specialized platforms– Condor (executable per architecture)
• Specific input formats– GAMS, Matlab
• Handholding operation
Model centric toolbox
GAMSoptimization
model
SolversLP,QP,MIP,NLP,MINLP
Other modelformatsgms2xx
Matlabprogrammingenvironment
Modeldata
exchange
CondorResourceManager
Datawarehouse
Specializedinput