Solving Multistage Stochastic Linear Programs on the Computational Grid
Jerry Shen
June 8, 2004
Stochastic Programming (SP)
Random Uncertainty Problems with limited information Can it really help me to win money in stock
market? I remember I have taken IE 495 last year …
really, what is it? Who cares? I don’t!
My Answer
It is hard but important. Since it is going to be a topic in my thesis!
First Answer in Google …
SP is a framework for modeling optimization problems that involve uncertainty.
Two-Stage SP with Recourse
0,..)('min xbAxTSxQxcx
)(xQWhere : Expected recourse cost of choosing x in first stage
0,..'min)( iiiiiiy
def
i yxThWyTSyqxQi
N
iii xQp
1
)(
L-Shaped Method
Solve first-stage (root) node Get a proposal solution x, and pass it to
children
Solve second-stage (children) nodes Evaluate recourse cost Q(x) Generate (feasibility & optimality) cuts and add
to root node
Repeat
Where : Expected recourse cost of choosing x0 in first stage
Multi-Stage SP with Recourse
0,..)('min 0000
0
0 xbAxTSxQxcx
)( 00 xQ
0,..)('min)( )(1
iipiiiit
iix
def
iti xxThWxTSxQxqxQ
i
)(
)(1
)( )()(ipsamewithi
iptiiip
t xQpxQ
TttNi ...1),(
1...0 Tt
Nested Decomposition Method
A recursive version of the L-Shaped Method
Node n generates a proposal solution xn, and pass it to his children S(n) … and children’s to grandchildren
Cuts generated and passed back to parent and evaluate for Qt(xp(n))
Multi-Stage = Multi Two-Stage??
Not That Easy!! In Two-Stage, a node is either a parent or a
child. In Multi-Stage, a node is a parent, a child or
both. (RYG Method) Scenario tree blow-up problem
Imaging there are 10 independent variables with 1000 scenarios per variable … in 5 stages, the number of total scenarios at last stage will be 1016
Having a super efficient algorithm, being a super smart guy.
Having a very fast computer with huge memory.
Having lots of machines working together with by using smart methods.
How to deal with the computation?
Mom said: “You are not and will never be God”
I am not a millionaire
Grid Computing!!
The Computational Grid
It is like the national power grid Users can seamlessly draw computational power
whenever they need it Possible?
A lots of computational resources are wasted in the internet, they can be brought together to solve very large problems
Difficulties Security Interfaces Heterogeneity Dynamic Communication
Tools for Grid Computing
Condor http://www.cs.wisc.edu/condor Manages collections of “distributively owned”
workstations. It is a pool. It is good at
Solving lots of independent tasks like Monte Carlo Methods
It is not good at Parallel Applications with non-trivial control struct
ures like optimization algorithms
Tools for Grid Computing
MW http://www.cs.wisc.edu/condor/mw Master-Worker Paradigm.
Master assigns tasks to the workers Workers perform tasks and report results to mast
er Workers do not communicate with each other
MW - SMART A nested-decomposition code for multistage stoc
hastic linear programming
SMART
SmartMaster Sending out works (LPs), if no more works, job is done!
SmartTask Keep record, pack and unpack tasks (works or results)
SmartWorker Accept works, solving it and report results.
Controller Create scenario tree, record the node state, telling Sm
artMaster what tasks to send next. CutManager
Keep records of all cuts, store them by stages.
Tasks
Task type: Work – One or more nodes in same period with
proposals from parents. Result – Proposals for children or cuts for parents.
Direction: Forward – Given the proposals from parents, find out
the proposals for the children, or a feasibility cuts for parents. (not for last stage)
Backward – Given the proposals and model values from parents, find out the optimality cuts for parents, if there is any.
•Forward Result will be stored in the task nodes itself
•Backward Result will be stored in the parent node
Node State
Color & Direction: Red (R) – Node n is blocked. (Nothing useful can be
gained by evaluating this node) Yellow Forward (YF) – Node n is ready to be evaluated
in a forward work Yellow Backward (YB) – Node n is ready to be
evaluated in a backward work Green Forward (GF) – Node n is being evaluated in a
forward work and is waiting for the forward result Green Backward (GB) – Node n is being evaluated in a
backward work or is waiting for the backward result
Node State
Latest Task ID ID of the latest task that evaluating the node n
Number of Children Reported How many children has finished the work and
report in backward results Number of Cuts Reported
How many new cuts has been reported in backward results during the time parent is waiting
F
Root node is ready for a forward work.
F
Root node is evaluated a forward work.
T00
B
T0 done!
Root node solved the forward work and get x0
1. Now, it is waiting to hear from his children
Children are ready for forward works.
F
F
F
X01
B
Nodes in stage 1 are evaluated in forward works.
x01
x01
F
F
F
T1
T2
B
B
B
F
F
F
F
F
F
F
T2
T1 done
B
B
B
F
F
F
F
F
F
F
T2
T3
T5
Going on …
…
What happen if T2 infeasible?
T4
T6
x01
x11
x11
x11
x11
F
T3
T5
“I don’t like the proposal, give me another!”
T6
x11
x11
x11
T4x11
F
T3
T5
T4 Done!
No mater what the result is, it can not change the node states any more
T6
x11
x11
x11
B
B
B
F
F
F
F
F
F
F
T10
T11
T5 & T13
Start a new iteration
T12
T6 & T14
x02
x11
x11
x11
x11
What happen if …
T5 Done?
T12 never returns?
T14 infeasible?
.
.
.
Controller RYG Method 1 RegisterTaskCompletion(Task t)
1: if t.way == Forward then2: for all nodes n in t.givenX do3: if n.ContainInfeasiblityCut then4: Store the cut in CutManager, store the cut index in nodeinfo[t-1,n]5: if n.color == GB && n.latestID ≤ t.ID then6: n.color ← YB (YF if t.stage == 2 //n is root)7: end if8: for all nodes m in Descendants(n) do9: m.color ← R 10: end for11: else12: for all nodes m in t with the same parent do13: if m.color == GF && m.latestID ≤ t.ID then14: Store proposals15: m.color ← GB16: S(m).color ← YF (YB if t.stage == T-1 //S(m) is at last stage)17: end if18: end for19: end if20: update n.numCutReported21: end for
Controller RYG Method 1 (Cont.) RegisterTaskCompletion(Task t)
22: else if t.way == Backward then
23: for all nodes n in t.givenX do
24: if n.ContainInfeasiblityCut then
25: Store the cut in CutManager, store the cut index in nodeinfo[t-1,n]
26: if n.color == GB && n.latestID ≤ t.ID then
27: n.color ← YB (YF if t.stage == 2 //n is root)
28: end if
29: for all nodes m in Descendants(n) do
30: m.color ← R
31: end for
32: else
33: for all nodes m in t with the same parent do
34: if m.color == GB && m.latestID ≤ t.ID then
35: m.color ← R
36: P(m).numChildReported ++
37: end if
38: end for
39: end if
40: update n.numCutReported
Controller RYG Method 1 (Cont.) RegisterTaskCompletion(Task t)
41: if n.color == GB && n.latestID ≤ t.ID then
42: if (n.numChildReported / n.numChildren ≥ 1
&& n.numCutReported / n.numPossibleCut ≥ 2)
|| (n.numChildReported / n.numChildren == 1 && n.numCutReported ≥ 1) then
43: n.color ← YB (YF if t.stage == 2 //n is root)
44: else if n.numChildReported / n.numChildren == 1 && n.numCutReported == 0 then
45: n.color ← YB (R if t.stage == 2 //n is root)
46: end if
47: end if
48: end for
49: end if
Controller RYG Method 2 AnalyzeTreeStatus(Period t, LatestId ID)
1: if t == 0 then2: if all nodes n in tree . color == R then Algorithm Terminate3: else if root.color == YF then 4: Create root task with cuts stored in root node, root.color ← GF 5: Update root node.LatestID6: end if7: else 8: for all nodes n in t do9: if n.color == YF then push n into ReadyForForwardTaskList10: end if11: end for12: if ReadyForForwardTaskList.num > 3 then 13: Create forward tasks, nodes.color ← GF 14: Update latestID for nodes in tasks15: end if16: for all clusters i in t do17: if all node n in cluster i . color == YB then push i into ReadyForBackwardTaskList18: end if19: if ReadyForBackwardTaskList.num > 4 then 20: Create backward tasks, nodes.color ← GB 21: Update latestID for nodes22: end if23: end if
Is the code working?
Solves small Multi-Stage SP testing problem in a single processor.
Working fine while going from single to multiprocessor.
The parallel efficiency is terrible.
Testing in Beowulf (Parallel) 21:11:58 Number of (different) workers: 64 21:11:58 Wall clock time for this job: 291.6457 21:11:58 Total time workers were alive (up): 11564.4345 21:11:58 Total wall clock time of workers: 49.9986 21:11:58 Total cpu time used by all workers: 47.8942 21:11:58 Total time workers were suspended: 0.0000 21:11:58 Averaged benchmark factor : 0.0000 21:11:58 Equivalent benchmark factor : 0.0000 21:11:58 Minimum benchmark factor : 0.0000 21:11:58 Maximum benchmark factor : 0.0000 21:11:58 Average Number Present Workers : 39.6523 21:11:58 Average Number NonSuspended Workers : 39.6523 21:11:58 Average Number Active Workers : 0.1642 21:11:58 Equivalent Pool Performance : 0.0000 21:11:58 Equivalent Run Time : 0.0000 21:11:58 Overall Parallel Performance : 0.0043 21:11:58 Total Number of benchmark tasks : 0
What is the problem?
Tasks are not big enough, most times workers are wasting time in communicating with master rather then actually doing the computational job.
Master is too busy and not assigning the tasks smart enough.
Current working or ideas
Aggregation of nodes
At some stage, the nodes start to swallow his descendants and become big “families” – large deterministic equivalents
Making the size of tasks reasonable Reducing the number of cuts necessary
Current working or ideas
Node/Task Sequencing
Speed up the algorithm
Solve problem in two phases First, we randomly pick up some path (or sub-
trees) through scenario tree to get some useful cuts
Then, do a Fast Forward Fast Backward
Current working or ideas
Intelligent tasking strategy
Decide a reasonable number of nodes (in a forward task) or clusters (in a backward task)
Should be a dynamic strategy Things might be considered in the strategy:
different stages, number of workers at the time being, number of available nodes at the time being
Current working or ideas
Intelligent clustering strategy
Decide a reasonable number of nodes in one cluster
Currently we think this being a static strategy Things might be considered in the strategy:
different stages, number of children of the node
Current working or ideas
Regularization
Implement a trust region method
Probably only on the master problem for the first stage
Current working or ideas
Purging cuts
As the iterations going on, lots of cuts are accumulated at each node while a lot of them are unnecessary, this is a lot of memory!
Purging all the cuts at one node will break the convergence of the algorithm unless the sub-problem under this node is solved to optimal
May be we can this once a while after several iterations or when the number of cuts reaches a certain amount
Current working or ideas
Making workers “experienced”
A big part in a work is cuts that generated before. It takes time to pack and unpack these information and also it depends on the communication.
We are trying to find out the possibility to store the existing cuts locally at the workers.
In a new work, only updated cut information is included.
Top Related