How to use your favorite MIP Solver ... - univie.ac.at · 261 across all node LP solves, there are...

How to use your favorite MIP Solver:modeling, solving, cannibalizing

Andrea LodiUniversity of Bologna, Italy

[email protected]

January-February, 2012 @ Universitat Wien

A. Lodi, How to use your favorite MIP Solver

Setting

• We consider a general Mixed Integer Program in the form:

max{cTx : Ax ≤ b, x ≥ 0, xj ∈ Z, ∀j ∈ I} (1)

where matrix A does not have a special structure.

• Thus, the problem is solved through branch-and-bound and the bounds are computed by

iteratively solving the LP relaxations through a general-purpose LP solver.

• The course basically covers the MIP but we will try to discuss when possible how crucial is the

LP component (the engine), and how much the whole framework is built on top the capability

of effectively solving LPs.

• Roughly speaking, using the LP computation as a tool, MIP solvers integrate the

branch-and-bound and the cutting plane algorithms through variations of the general

branch-and-cut scheme [Padberg & Rinaldi 1987] developed in the context of the Traveling

Salesman Problem (TSP).

A. Lodi, How to use your favorite MIP Solver 1

Outline

1. The building blocks of a MIP solver.

We will run over the first 50 exciting years of MIP by showing some crucial milestones and we

will highlight the building blocks that are making nowadays solvers effective from both a

performance and an application viewpoint.

2. How to use a MIP solver as a sophisticated (heuristic) framework.

Nowadays MIP solvers should not be conceived as black-box exact tools. In fact, they provide

countless options for their smart use as hybrid algorithmic frameworks, which thing might turn

out especially interesting on the applied context. We will review some of those options and

possible hybridizations, including some real-world applications.


Outline


We will run over the first 50 exciting years of MIP by showing some crucial milestones and we

will highlight the building blocks that are making nowadays solvers effective from both a

performance and an application viewpoint.


Nowadays MIP solvers should not be conceived as black-box exact tools. In fact, they provide

countless options for their smart use as hybrid algorithmic frameworks, which thing might turn

out especially interesting on the applied context. We will review some of those options and

possible hybridizations, including some real-world applications.

3. Modeling and algorithmic tips to make a solver effective in practice.

The capability of a solver to produce good, potentially optimal, solutions depends on the

selection of the right model and the use of the right algorithmic tools the solver provides. We

will discuss useful tips, from simple to sophisticated, which allow a smart use of a MIP solver.

Finally, we will show that this is NOT the end of the story and many challenges for MIP

technology are still to be faced.


PART 3

1. The building blocks of a MIP solver

2. How to use a MIP solver as a sophisticated (heuristic) framework

3. Modeling and algorithmic tips to make a solver effective in practice


PART 3




• Outline:

– Solving difficult MIPs [heavily based on Klotz & Newman 2011, PART-II-2.pdf]


PART 3




• Outline:


1. Lack of node throughput because of LP solving

2. Lack in progress in the best (mixed-)integer solution

3. Lack in progress in the best bound

4. Lack of node throughput because of numerical instability


PART 3




• Outline:


1. Lack of node throughput because of LP solving

2. Lack in progress in the best (mixed-)integer solution

3. Lack in progress in the best bound

4. Lack of node throughput because of numerical instability

– MIP Challenges


Solving difficult MIPs: four common reasons

• There are four common reasons that integer programs can require significant amount of

solution time.




solution time.

1. There is lack of node throughput due to troublesome linear programming node solves.




solution time.


2. There is lack of progress in the best integer solution, i.e., the lower bound.




solution time.



3. There is lack of progress in the best upper bound.




solution time.



3. There is lack of progress in the best upper bound.

4. There is insufficient node throughput due to numerical instability in the problem data or

excessive memory usage.


Lack of node throughput because of LP solving

solution. Therefore, alternate optimal bases can result in different branching variable selections.235

Different branching selections, in turn, can cause significant performance variation if the model236

formulation or optimizer features are not sufficiently robust to consistently solve the model quickly.237

This notion of performance variability in integer programs is discussed in more detail in Danna238

(2008). However, regardless of whether an integer program is consistently or only occasionally239

difficult to solve, the guidelines described in this section can help address the performance problem.240

We now discuss each potential performance bottleneck and suggest an associated remedy.241

3.1 Lack of Node Throughput Due to Troublesome Linear Programming Node242

Solves243

Because processing each node in the branch-and-bound tree requires the solution of a linear pro-244

gram, the choice of a linear programming algorithm can profoundly influence performance. An245

interior point method may be used for the root node solve; it is less frequently used than the246

simplex method at the child nodes because it lacks a basis and hence, the ability to start with247

an initial solution - an important ability when processing tens or hundreds of thousands of nodes.248

However, conducting different runs in which the practitioner invokes, alternately, the primal or the249

dual simplex method at the child nodes is a good idea. Consider the following two node logs, the250

former corresponding to solving the root and child node linear programs with the dual simplex251

method and the latter with the primal simplex method.252

253

Node Log #1: Node Linear Programs Solved with Dual Simplex254

Nodes Cuts/ ItCnt

Node Left Objective IInf Best Integer Best Node

0 0 -89.0000 6 -89.0000 5278

0 0 -89.0000 6 Fract: 4 12799

0 2 -89.0000 6 -89.0000 12799

1 1 infeasible -89.0000 20767

2 2 -89.0000 5 -89.0000 27275

3 1 infeasible -89.0000 32502

...

8 2 -89.0000 8 -89.0000 65717

9 1 infeasible -89.0000 73714

...

9Solution time = 177.33 sec. Iterations = 73,714 Nodes = 10 (1)


Lack of node throughput because of LP solving (cont.d)Solution time = 177.33 sec. Iterations = 73714 Nodes = 10 (1)

255

256

Node Log #2: Node Linear Programs Solved with Primal Simplex257

Nodes Cuts/ ItCnt

Node Left Objective IInf Best Integer Best Node

0 0 -89.0000 5 -89.0000 6603

0 0 -89.0000 5 Fract: 5 7120

0 2 -89.0000 5 -89.0000 7120

1 1 infeasible -89.0000 9621

2 2 -89.0000 5 -89.0000 10616

3 1 infeasible -89.0000 12963

...

8 2 -89.0000 8 -89.0000 21522

9 1 infeasible -89.0000 23891

...

Solution time = 54.37 sec. Iterations = 23891 Nodes = 10 (1)

258

The iteration count for the root node solve shown in Node Log #1 that occurred without259

any advanced start information indicates 5,278 iterations. Computing the average iteration count260

across all node LP solves, there are 11 solves (10 nodes, and 1 extra solve for cut generation at node261

0) and 73,714 iterations, which were performed in a total of 177 seconds. The summary output in262

gray indicates in parentheses that one unexplored node remains. So, the average solution time per263

node is approximately 17 seconds, and the average number of iterations per node is about 6,701.264

In Node Log #2, the solution time is 54 seconds, at which point the algorithm has performed 11265

solves, and the iteration count is 23,891. Hence, the average number of iterations per node is about266

2,172. Thus, in Node Log #1, the 10 child node LPs require more iterations, 6,844, on average,267

than the root node LP (which requires 5,278), despite the advanced basis at the child node solves268

that was absent at the root node solve. Any time this is true, or even when the average node LP269

iteration count is more than 30-50% of the root node iteration count, an opportunity for improving270

node LP solve times exists by changing algorithms or algorithmic settings. In Node Log #2, the271

10 child node LPs require 1,729 iterations, on average, which is much fewer than those required by272

10

Solution time = 54.37 sec. Iterations = 23,891 Nodes = 10 (1)


Lack of node throughput because of LP solving (cont.d)

• Summary of the Dual Simplex output:

– 5,278 iterations without any advanced start information,

– overall 73,714 iterations and 177 seconds for 11 solves indicates approximately 6,701

iterations and 17 seconds per solve.







• Summary of the Primal Simplex output:














• Thus, with the dual Simplex any solve subsequent to the first one requires 6,844 = (73,714 -

5,278)/10 iterations, which is higher than the 5,278 iterations required for the first solve!

• Any time this is true, or even when the average node LP iteration count is more than 30-50%

of the root node iteration count, an opportunity for improving node LP solve times exists by

changing algorithms or algorithmic settings.











• Thus, with the dual Simplex any solve subsequent to the first one requires 6,844 = (73,714 -

5,278)/10 iterations, which is higher than the 5,278 iterations required for the first solve!

• Any time this is true, or even when the average node LP iteration count is more than 30-50%

of the root node iteration count, an opportunity for improving node LP solve times exists by

changing algorithms or algorithmic settings.

• Indeed, in the primal Simplex case, any additional solve requires only 1,729 iterations, which is

much smaller than the 6,603 of the first solve.


Lack in progress in the best (mixed-)integer solution!!

!

!

!!!!!!!!!

(or even all) of the penalty variables set to nonzero values.304

• Solve a related, auxiliary problem to get a solution (e.g. the Feasopt method in CPLEX,305

which looks for feasible solutions by minimizing infeasibilities), provided that the gain from306

the starting solution exceeds the auxiliary solve time.307

• Use the solution from a previous solve for the next solve when solving a sequence of models.308

To see the advantages of providing a starting point, compare Node Log #5 with Node Log309

#4. Log #4 shows that CPLEX with default settings takes about 1589 seconds to find a first310

feasible solution, with an associated gap of 4.18%. Log #5 illustrates the results obtained by311

solving a sequence of 5 faster optimizations (see Lambert et al. (2011) for details) to obtain a312

starting solution with a gap of 2.23%. The total computation time to obtain the starting solution313

was 623 seconds. So, the time to obtain the first solution is faster by providing an initial feasible314

solution, and if we let the algorithm with the initial solution run for an additional 1589−623 = 966315

seconds, the gap for the instance with the initial solution improves to 1.53%.316

317

Node Log #4: No initial practitioner-supplied solution318

Root relaxation solution time = 131.45 sec.

Nodes Cuts/

Node Left Objective IInf Best Integer Best Node ItCnt Gap

0 0 1.09590e+07 2424 1.09590e+07 108111

0 0 1.09570e+07 2531 Cuts: 4 108510

0 0 1.09405e+07 2476 Cuts: 2 109208

Heuristic still looking.





0 2 1.09405e+07 2476 1.09405e+07 109208

Elapsed real time = 384.09 sec. (tree size = 0.01 MB)

1 3 1.08913e+07 2488 1.09405e+07 109673

2 4 1.09261e+07 2326 1.09405e+07 109977

12

(or even all) of the penalty variables set to nonzero values.304

• Solve a related, auxiliary problem to get a solution (e.g. the Feasopt method in CPLEX,305

which looks for feasible solutions by minimizing infeasibilities), provided that the gain from306

the starting solution exceeds the auxiliary solve time.307

• Use the solution from a previous solve for the next solve when solving a sequence of models.308

To see the advantages of providing a starting point, compare Node Log #5 with Node Log309

#4. Log #4 shows that CPLEX with default settings takes about 1589 seconds to find a first310

feasible solution, with an associated gap of 4.18%. Log #5 illustrates the results obtained by311

solving a sequence of 5 faster optimizations (see Lambert et al. (2011) for details) to obtain a312

starting solution with a gap of 2.23%. The total computation time to obtain the starting solution313

was 623 seconds. So, the time to obtain the first solution is faster by providing an initial feasible314

solution, and if we let the algorithm with the initial solution run for an additional 1589−623 = 966315

seconds, the gap for the instance with the initial solution improves to 1.53%.316

317

Node Log #4: No initial practitioner-supplied solution318


Nodes Cuts/


0 0 1.09590e+07 2424 1.09590e+07 108111

0 0 1.09570e+07 2531 Cuts: 4 108510

0 0 1.09405e+07 2476 Cuts: 2 109208






0 2 1.09405e+07 2476 1.09405e+07 109208


1 3 1.08913e+07 2488 1.09405e+07 109673

2 4 1.09261e+07 2326 1.09405e+07 109977

12...

1776 1208 1.05645e+07 27 1.09164e+07 474242

1814 1246 1.05588e+07 31 1.09164e+07 478648

1847 1277 1.05554e+07 225 1.09164e+07 484687

* 1880+ 1300 1.04780e+07 1.09164e+07 491469 4.18%

1880 1302 1.05474e+07 228 1.04780e+07 1.09164e+07 491469 4.18%


319

320

Node Log #5: An initial solution supplied by the practitioner321


Nodes Cuts/


* 0+ 0 1.07197e+07 108111 ---

0 0 1.09590e+07 2424 1.07197e+07 1.09590e+07 108111 2.23%

0 0 1.09570e+07 2531 1.07197e+07 Cuts: 4 108538 2.21%

...

485 433 1.09075e+07 2398 1.07197e+07 1.08840e+07 244077 1.53%

487 434 1.08237e+07 2303 1.07197e+07 1.08840e+07 244350 1.53%

497 439 1.08637e+07 1638 1.07197e+07 1.08840e+07 245391 1.53%


501 443 1.08503e+07 1561 1.07197e+07 1.08840e+07 245895 1.53%

...


1263 674 1.08590e+07 2574 1.07197e+07 1.08840e+07 314814 1.53%

322

In the absence of a readily identifiable initial solution, various branching strategies can aid in323

obtaining initial and subsequent solutions. These branching strategies may be purely based on the324

13


Lack in progress in the best (mixed-)integer solution (cont.d)

...

1776 1208 1.05645e+07 27 1.09164e+07 474242

1814 1246 1.05588e+07 31 1.09164e+07 478648

1847 1277 1.05554e+07 225 1.09164e+07 484687

* 1880+ 1300 1.04780e+07 1.09164e+07 491469 4.18%

1880 1302 1.05474e+07 228 1.04780e+07 1.09164e+07 491469 4.18%


319

320

Node Log #5: An initial solution supplied by the practitioner321


Nodes Cuts/


* 0+ 0 1.07197e+07 108111 ---

0 0 1.09590e+07 2424 1.07197e+07 1.09590e+07 108111 2.23%

0 0 1.09570e+07 2531 1.07197e+07 Cuts: 4 108538 2.21%

...

485 433 1.09075e+07 2398 1.07197e+07 1.08840e+07 244077 1.53%

487 434 1.08237e+07 2303 1.07197e+07 1.08840e+07 244350 1.53%

497 439 1.08637e+07 1638 1.07197e+07 1.08840e+07 245391 1.53%


501 443 1.08503e+07 1561 1.07197e+07 1.08840e+07 245895 1.53%

...


1263 674 1.08590e+07 2574 1.07197e+07 1.08840e+07 314814 1.53%

322

In the absence of a readily identifiable initial solution, various branching strategies can aid in323

obtaining initial and subsequent solutions. These branching strategies may be purely based on the324

13


Lack in progress in the best bound

P1 represents the convex hull of all integer feasible solutions of the MIP, while P2 represents the358

feasible region of the LP relaxation. Adding cuts yields the region P3, which contains all integer359

solutions of the MIP, but contains only a subset of the fractional solutions feasible for P2.360

P1

P3

P2

aT1 x ≤ b1

aT2 x ≤ b2

aT3 x ≤ b3

P1 := conv{x ∈ Zn : Ax ≤ b, x ≥ 0}P2 := {x ∈ Rn : Ax ≤ b, x ≥ 0}P3 := P2 ∩ {x ∈ Rn : Ax ≤ b}

Cuts must satisfy

1) aTi x ≤ bi ∀x ∈ P1 (validity)

2) ∃ x ∈ P2 : aTi x > bi (separation)

Figure 2: Convex hull

Node log #6 exemplifies progress in best integer solution but not in the best bound:361

362

Node Log #6: Progress in Best Integer Solution but not in the Best Bound363

Nodes Cuts/ ItCnt Gap364

Node Left Objective IInf Best Integer Best Node365

366

300 296 2018.0000 27 3780.0000 560.0000 3703 85.19%367

* 300+ 296 0 2626.0000 560.0000 3703 78.67%368

* 393 368 0 2590.0000 560.0000 4405 78.38%369

400 372 560.0000 291 2590.0000 560.0000 4553 78.38%370

500 472 810.0000 175 2590.0000 560.0000 5747 78.38%371

...372

* 7740+ 5183 0 1710.0000 560.0000 66026 67.25%373

7800 5240 1544.0000 110 1710.0000 560.0000 66279 67.25%374

7900 5325 944.0000 176 1710.0000 560.0000 66801 67.25%375

8000 5424 1468.0000 93 1710.0000 560.0000 67732 67.25%376

377

15


Lack in progress in the best bound (cont.d)

• Most solvers offer parameter settings that can help improve progress of the best node or

tighten the formulation of the model by moving the value of the linear programming relaxation

closer to that of the optimal integer objective.






Especially,

– Best-bound-first node selection.

– Strong Branching.

– Probing.

– More aggressive levels of cut generation.






Especially,



– Probing.


• If those mechanisms fail, then the user must carefully look at the model and

– Change model formulation by using alternate variable definitions.






Especially,



– Probing.




– Revisit the use of elastic/indicator variables, i.e., those relaxing a constraint by allowing for

violations (penalized in the objective function).






Especially,



– Probing.




– Revisit the use of elastic/indicator variables, i.e., those relaxing a constraint by allowing for

violations (penalized in the objective function).

– Look at additional cutting planes either within the standard families that might not having

been discovered by the solver or specific for the model at hands.



• Let us consider again cut generation and the following small MIP:

max 3x1 + 2x2 + x3 + 2x4 + x5 (2)

subject to x1 + x2 ≤ 1 (3)

x1 + x3 ≤ 1 (4)

x2 + x3 ≤ 1 (5)

4x3 + 3x4 + 5x5 ≤ 10 (6)

x1 + 3x4 ≤ 2 (7)

3x2 + 4x5 ≤ 5 (8)

x ∈ {0, 1}5 (9)



• Adding cuts does not always help branch-and-bound performance.




• While it can remove integer infeasibilities, it also results in more constraints in each node LP.

• More constraints can increase the intractability of these LPs. Without a commensurate

speed-up in solution time associated with processing fewer nodes, cuts may not be worth

adding.







adding.

• Some solvers have internal logic to automatically assess the trade-offs between adding cuts and

node LP solve time. However, if the solver lacks such logic or fails to make a good decision, the

user may need to look at the branch-and-bound output.







adding.

• Some solvers have internal logic to automatically assess the trade-offs between adding cuts and

node LP solve time. However, if the solver lacks such logic or fails to make a good decision, the

user may need to look at the branch-and-bound output.

• In other cases, the computational effort required to derive the cuts needed to effectively solve

the model may exceed the performance benefit they provide.


Lack of node throughput because of numerical instability

• Because the solver solves LPs at each node of the branch-and-bound tree, the practitioner must

be careful to avoid LP numerical performance issues (see, Section 3 of Klotz and Newman,

reading material PART-II-1.pdf).






• Especially important is avoiding, when possible, large differences in orders of magnitude in data

to preclude the introduction of unnecessary round-off error. Such differences of input values

create round-off error in floating point calculations that makes it difficult for the algorithm to

distinguish between this error and a legitimate value [Koch et al. 2011].










• Elastic/indicator variables (refereed to as “big-M ’s”) correspond to logic expressions as “ifz = 0, then x = 0”, which are imposed through the use of arbitrary large coefficients

x− 100000000000z ≤ 0 (10)

0 ≤ x ≤ 5000; z binary (11)










• Elastic/indicator variables (refereed to as “big-M ’s”) correspond to logic expressions as “ifz = 0, then x = 0”, which are imposed through the use of arbitrary large coefficients

x− 100000000000z ≤ 0 (10)

0 ≤ x ≤ 5000; z binary (11)

• However, the coefficient 1011 can be safely and effectively replaced by 5000, which forbids a

solution in which z = 10−8 and x = 1000 from being feasible.


Tightening the formulation

• As anticipated, sometimes the user might be required to add problem-specific cutting planes.




However, before doing that, it is often useful to identify elements of the model making it

difficult, specifically, those that contain the constraints and variables from which useful cuts can

be derived.






be derived.

– Simplify the model if necessary.

For example, try to identify any constraints or integrality restrictions that are not involved in

the slow performance by systematically removing constraints and restrictions and solving the

resulting model.






be derived.




resulting model.

– Identify the constraints that prevent the objective from improving.

With a maximization problem, this typically means identifying the constraints that force

prizes not to be gained.






be derived.




resulting model.

– Identify the constraints that prevent the objective from improving.

With a maximization problem, this typically means identifying the constraints that force

prizes not to be gained.

– Determine how removing integrality restrictions allows the root node relaxation to improve.

In weak formulations, the root node relaxation objective tends to be significantly better than

the optimal objective of the associated MIP. The variables with fractional solutions in the

root node relaxation help identify the constraints and variables that motivate additional cuts.


Tightening the formulation (cont.d)

• Model characteristics from which to derive cuts are




– Linear or logical combinations of constraints.

As discussed, combination of constraints is the base of current cut generation techniques.

The knowledge of the problem at hand can suggest which constraints should be combined.







– The optimization of one or more related models.

Extract a small(er) instance with same characteristics of the problem at hand to play with is

often very instructive.










– Use of the incumbent solution objective value.

Template cuts are based on detecting infeasibilities, while optimality cuts might lead in some

special cases to effective partitions of the solution space.













– Disjunctions.













– Disjunctions.

– The exploitation of infeasibility.

Infeasibility considerations on the model might allow to remove useless pieces of the search

tree.


Tightening the formulation, Example 1

• The very small MIP

13429x1 + 26850x2 + 26855x3 + 40280x4 +

40281x5 + 53711x6 + 53714x7 + 67141x8 = 45094583 (12)

xj ≥ 0, integer, j = 1, . . . , 8 (13)

presents the following (disappointing) computational behavior:


Tightening the formulation, Example 1 (cont.d)

Running CPLEX 12.2.0.2 with default settings results in no conclusion after over 7 hours and644

2 billion nodes, as illustrated in Node Log #7:645

646

Node Log #7647

Nodes Cuts/648

Node Left Objective IInf Best Integer Best Node ItCnt Gap649

...650

2054970910 13066 0.0000 1 0.0000 25234328651

Elapsed real time = 27702.98 sec. (tree size = 2.70 MB, solutions = 0)652

2067491472 14446 0.0000 1 0.0000 25388082653

2080023238 12892 0.0000 1 0.0000 25542160654

2092548561 15366 0.0000 1 0.0000 25696280655

...656

-------657

Total (root+branch&cut) = 28302.29 sec.658

659

660

MIP - Node limit exceeded, no integer solution.661

Current MIP best bound = 0.0000000000e+00 (gap is infinite)662

Solution time = 28302.31 sec. Iterations = 25787898 Nodes = 2100000004 (16642)663

664

However, note that all the coefficients in the model are very close to integer multiples of the665

coefficient of x1. Therefore, we can separate the left hand side into the part that is an integer666

multiple of this coefficient, and the much smaller remainder terms:667

13429 (x1 + 2x2 + 2x3 + 3x4 + 3x5 + 4x6 + 4x7 + 5x8)︸︷︷︸x

(19)

−8x2 − 3x3 − 7x4 − 6x5 − 5x6 − 2x7 − 4x8 (20)

= 3358 ∗ 13429 + 1 = 3359 ∗ 13429 − 13428 (21)

This constraint resembles the one from which we previously derived the mixed integer rounding668

cut. But, instead of separating the integer and fractional components, we separate the components669

27


Tightening the formulation, Example 1 (cont.d)

• The behavior is then improved by the addition of the two cuts

x1 + 2x2 + 2x3 + 3x4 + 3x5 + 4x6 + 4x7 + 5x8 ≥ 3359 (14)

8x2 + 3x3 + 7x4 + 6x5 + 5x6 + 2x7 + 4x8 ≥ 13428 (15)

that are exact multiples of the coefficient of x1 from the remaining terms. We now perform the670

disjunction on x in an analogous manner, again using the nonnegativity of the variables.671

x ≤ 3358 ⇒ −8x2 − 3x3 − 7x4 − 6x5 − 5x6 − 2x7 − 4x8︸︷︷︸≤0

≥ 1 (22)

Thus, if x ≤ 3358, the model is infeasible. Therefore, infeasibility implies that x ≥ 3359 is a672

valid cut. We can derive an additional cut from the other side of the disjunction on x:673

x ≥ 3359 ⇒ −8x2 − 3x3 − 7x4 − 6x5 − 5x6 − 2x7 − 4x8 ≤ −13428 (23)

This analysis shows that we either have an infeasible model, or that constraints (24) (using the674

infeasibility argument above) and (25) (multiplying 23 through by -1) are globally valid cuts.675

x1 + 2x2 + 2x3 + 3x4 + 3x5 + 4x6 + 4x7 + 5x8 ≥ 3359 (24)

8x2 + 3x3 + 7x4 + 6x5 + 5x6 + 2x7 + 4x8 ≥ 13428 (25)

Adding these cuts enables CPLEX 12.2.0.2 to easily identify that the model is infeasible (see Node676

Log #8).677

678

Node Log #8679

680

Nodes Cuts/681


683

0 0 0.0000 1 0.0000 1684

0 0 0.0000 2 MIRcuts: 1 3685

0 0 0.0000 2 MIRcuts: 1 5686

0 0 cutoff 5687


Mixed integer rounding cuts applied: 1689

...690

MIP - Integer infeasible.691

Current MIP best bound is infinite.692

Solution time = 0.46 sec. Iterations = 5 Nodes = 0693

28


Tightening the formulation, Example 2

• The following Mixed Integer Quadratic Program

max

n∑i=1

n∑j=i+1

dijxixj (16)

subject ton∑j=1

xj ≤ k (17)

x ∈ {0, 1}n (18)

can be classically reformulated as a MIP by binary variables zij = xixj and constraints

zij ≤ xi, ∀i, j (19)

zij ≤ xj, ∀i, j (20)

xi + xj ≤ 1 + zij ∀i, j. (21)

• The performance of Cplex solver are as follows.


Tightening the formulation, Example 2 (cont.d)!!!!

!

!!!!!!

while (30) forces zij to 1. So, regardless of the values of xi, and xj , zij = xixj , and we can replace713

occurrences of xixj with zij to obtain the linearized reformulation above.714

Using this linearized model with n = 60 and k = 24, Node Log #9 gives the results. The715

instance of the model has 1830 binary variables, and 5311 constraints; CPLEX processes over716

4 million nodes before running out of memory after about 4 hours. This level of performance717

indicates significant potential for improvement. Due to the large size of the branch-and-bound718

tree, we set CPLEX’s file parameter to instruct CPLEX to efficiently swap the memory associated719

with the branch-and-bound tree to disk. This enables the run to proceed further than with default720

settings in which CPLEX stores the tree in physical memory. All other parameter settings remain721

at defaults, so CPLEX makes use of all four available processors. CPLEX runs for just over four722

hours, terminating when the size of the swap file for the branch-and-bound tree exceeds memory723

limits. At that point the solution has an objective value of 3483.0000, proven to be within 51.32% of724

optimal. Although we do not provide the output here, the original MIQP formulation in (MIQP )725

performs even worse.726

727

Node Log #9728

Nodes Cuts/729


731

* 0+ 0 0.0000 2247 ---732

0 0 7640.4000 1830 0.0000 7640.4000 2247 ---733

* 0+ 0 19.0000 7640.4000 2247 ---734

735

...736

737

* 0+ 0 3185.0000 7445.4286 2286 133.77%738

0 2 7628.5333 1829 3185.0000 7445.4286 2286 133.77%739


35 37 6579.2308 1378 3185.0000 7445.4286 6615 133.77%741

...742

4332613 3675298 4936.6750 1099 3483.0000 5270.8377 1.78e+08 51.33%743

4341075 3682375 3889.4643 714 3483.0000 5270.4545 1.79e+08 51.32%744

745

30

...746

CPLEX Error 1803: Failure on temporary file write.747

748

Solution pool: 25 solutions saved.749

750

MIP - Error termination, no tree: Objective = 3.4830000000e+03751

Current MIP best bound = 5.2704102564e+03 (gap = 1787.41, 51.32%)752

Solution time = 15031.18 sec. Iterations = 178699476 Nodes = 4342299 (3682262)753

754

Experimentation with non-default parameter settings as described in Section 3 yields modest755

performance improvements, but does not come close to enabling CPLEX to find an optimal solution756

to the model.757

We carefully examine a smaller model instance with n = 3 and k = 2 to assess how removing758

integrality restrictions yields an artificially high objective function value:759

max 3z12 + 4z13 + 5z23

subject to x1 + x2 + x3 ≤ 2

z12 − x1 ≤ 0

z12 − x2 ≤ 0

x1 + x2 ≤ 1 + z12

z13 − x1 ≤ 0

z13 − x3 ≤ 0

x1 + x3 ≤ 1 + z13

z23 − x2 ≤ 0

z23 − x3 ≤ 0

x2 + x3 ≤ 1 + z23

x1, x2, x3, z12, z13, z23 binary

The optimal solution to this MILP consists of setting z23 = x2 = x3 = 1, yielding an objective760

value of 5. By contrast, relaxing integrality enables a fractional solution consisting of setting all761

x and z variables to 2/3, yielding a much better objective value of 8. Note that the difference762

31


Tightening the formulation, Example 2, improved

!!!!!!

!

!!

!!!!!

!

!!!!

x1 = x2 = · · · = xk = 1, and xk+1 = · · · = xn = 0. From (32), zij = 1 if and only if 1 ≤ i ≤ k,784

1 ≤ j ≤ k, and i < j. We can therefore count the number of z variables that equal 1 when785

x1 = x2 = · · · = xk = 1. Specifically, there are k(k −1) pairs (i, j) with i #= j, but only half of them786

have i < j. So, at most k(k − 1)/2 of the zij variables can be set to 1 when k of the x variables are787

set to 1. In other words,788

n∑

i=1

n∑

j=i+1

zij ≤ k(k − 1)/2

is a globally valid cut.789

Adding this cut to the instance with n = 60 and k = 24 enables CPLEX to solve the model790

to optimality in just over 2 hours and 30 minutes on the same machine using identical settings791

as the previous run without the cut. (See Node Log #10.) Note that the cut tightened the792

formulation significantly, as can be seen by the much better root node objective value of 4552.4000,793

which compares favorably to the root node objective value of 7640.4000 on the instance without794

the cut. Furthermore, the cut enabled CPLEX to add numerous zero-half cuts to the model that795

it could not with the original formulation. The zero-half cuts resulted in additional progress in the796

best node value that was essential to solving the model to optimality in a reasonable amount of797

time.798

799

Node Log #10800

Nodes Cuts/


* 0+ 0 0.0000 1161 ---

0 0 4552.4000 750 0.0000 4552.4000 1161 ---

* 0+ 0 6.0000 4552.4000 1161 ---

...

* 0+ 0 3477.0000 3924.7459 37882 12.88%

0 2 3924.7459 1281 3477.0000 3924.7459 37882 12.88%

Elapsed real time = 51.42 sec. (tree size = 0.01 MB, solutions = 31)

1 3 3919.3378 1212 3477.0000 3924.7459 39886 12.88%

2 4 3910.8201 1243 3477.0000 3924.7459 42289 12.88%

3 5 3910.8041 1144 3477.0000 3919.3355 44070 12.72%

...

33

125571 7819 cutoff 3590.0000 3599.7046 60456851 0.27%


Nodefile size = 196.38 MB (168.88 MB after compression)

*126172 7231 integral 0 3591.0000 3599.7046 60571398 0.24%

127700 5225 cutoff 3591.0000 3598.0159 60769494 0.20%

131688 6 cutoff 3591.0000 3592.5939 60980430 0.04%

Zero-half cuts applied: 2244

Solution pool: 44 solutions saved.

MIP - Integer optimal solution: Objective = 3.5910000000e+03


801

Given the modest size of the model, a run time of 2.5 hours to optimality suggests potential802

for additional improvements in the formulation. However, by adding one globally valid cut, we see803

a dramatic performance improvement nonetheless. Furthermore, the derivation of this cut draws804

heavily on the guidelines proposed for tightening the formulation. By using a small instance of805

the model, we can easily identify how removal of integrality restrictions enables the objective to806

improve. Furthermore, we use infeasibility to derive the cut: by recognizing that the simplified807

MILP model is infeasible when z12 + z13 + z23 ≥ 2, we show that z12 + z13 + z23 ≤ 1 is a valid cut.808

5 Conclusion809

Today’s hardware and software allow practitioners to formulate and solve increasingly large and810

detailed models. However, optimizers have become less straightforward, often providing many811

methods for implementing their algorithms to enhance performance given various mathematical812

structures. Additionally, the literature regarding methods to increase the tractability of mixed813

integer linear programming problems contains a high degree of theoretical sophistication. Both of814

these facts might lead a practitioner to conclude that developing the skills necessary to successfully815

solve difficult mixed integer programs is too time consuming or difficult. This paper attempts to816

refute that perception, illustrating that practitioners can implement many techniques for improving817

performance without expert knowledge in the underlying theory of integer programming, thereby818

enabling them to solve larger and more detailed models with existing technology.819

34

!!!!!

!

!!!!

x1 = x2 = · · · = xk = 1, and xk+1 = · · · = xn = 0. From (32), zij = 1 if and only if 1 ≤ i ≤ k,784

1 ≤ j ≤ k, and i < j. We can therefore count the number of z variables that equal 1 when785

x1 = x2 = · · · = xk = 1. Specifically, there are k(k −1) pairs (i, j) with i #= j, but only half of them786

have i < j. So, at most k(k − 1)/2 of the zij variables can be set to 1 when k of the x variables are787

set to 1. In other words,788

n∑

i=1

n∑

j=i+1

zij ≤ k(k − 1)/2

is a globally valid cut.789

Adding this cut to the instance with n = 60 and k = 24 enables CPLEX to solve the model790

to optimality in just over 2 hours and 30 minutes on the same machine using identical settings791

as the previous run without the cut. (See Node Log #10.) Note that the cut tightened the792

formulation significantly, as can be seen by the much better root node objective value of 4552.4000,793

which compares favorably to the root node objective value of 7640.4000 on the instance without794

the cut. Furthermore, the cut enabled CPLEX to add numerous zero-half cuts to the model that795

it could not with the original formulation. The zero-half cuts resulted in additional progress in the796

best node value that was essential to solving the model to optimality in a reasonable amount of797

time.798

799

Node Log #10800

Nodes Cuts/


* 0+ 0 0.0000 1161 ---

0 0 4552.4000 750 0.0000 4552.4000 1161 ---

* 0+ 0 6.0000 4552.4000 1161 ---

...

* 0+ 0 3477.0000 3924.7459 37882 12.88%

0 2 3924.7459 1281 3477.0000 3924.7459 37882 12.88%


1 3 3919.3378 1212 3477.0000 3924.7459 39886 12.88%

2 4 3910.8201 1243 3477.0000 3924.7459 42289 12.88%

3 5 3910.8041 1144 3477.0000 3919.3355 44070 12.72%

...

33

125571 7819 cutoff 3590.0000 3599.7046 60456851 0.27%


Nodefile size = 196.38 MB (168.88 MB after compression)

*126172 7231 integral 0 3591.0000 3599.7046 60571398 0.24%

127700 5225 cutoff 3591.0000 3598.0159 60769494 0.20%

131688 6 cutoff 3591.0000 3592.5939 60980430 0.04%

Zero-half cuts applied: 2244

Solution pool: 44 solutions saved.

MIP - Integer optimal solution: Objective = 3.5910000000e+03


801

Given the modest size of the model, a run time of 2.5 hours to optimality suggests potential802

for additional improvements in the formulation. However, by adding one globally valid cut, we see803

a dramatic performance improvement nonetheless. Furthermore, the derivation of this cut draws804

heavily on the guidelines proposed for tightening the formulation. By using a small instance of805

the model, we can easily identify how removal of integrality restrictions enables the objective to806

improve. Furthermore, we use infeasibility to derive the cut: by recognizing that the simplified807

MILP model is infeasible when z12 + z13 + z23 ≥ 2, we show that z12 + z13 + z23 ≤ 1 is a valid cut.808

5 Conclusion809

Today’s hardware and software allow practitioners to formulate and solve increasingly large and810

detailed models. However, optimizers have become less straightforward, often providing many811

methods for implementing their algorithms to enhance performance given various mathematical812

structures. Additionally, the literature regarding methods to increase the tractability of mixed813

integer linear programming problems contains a high degree of theoretical sophistication. Both of814

these facts might lead a practitioner to conclude that developing the skills necessary to successfully815

solve difficult mixed integer programs is too time consuming or difficult. This paper attempts to816

refute that perception, illustrating that practitioners can implement many techniques for improving817

performance without expert knowledge in the underlying theory of integer programming, thereby818

enabling them to solve larger and more detailed models with existing technology.819

34


MIP Challenges

• Overall, a big challenge from both performance and modeling viewpoints is accuracy, which is

somehow a new issue, i.e., an old issue that starts to be very important after realizing that MIP

solvers can now really solve the problems.


MIP Challenges




The MIPlib 2010 paper [Koch et al. 2011] includes, for the first time, scripts to run automated

tests in a predefined way, and a solution checker to test the accuracy of provided solutions

using exact arithmetic.


MIP Challenges




The MIPlib 2010 paper [Koch et al. 2011] includes, for the first time, scripts to run automated

tests in a predefined way, and a solution checker to test the accuracy of provided solutions

using exact arithmetic.

• Some difficult MIPs are encountered because of:

– bad modeling, i.e.,

∗ the model has numerical difficulties,

∗ the MIP modeling capability is not sufficient wrt the real problem;

– large size;

– knapsack constraints with huge coefficients and general-integer variables with large bounds;

– scheduling components with disjunctive constraints and fundamental continuous variables.


MIP Challenges, performance

• The performance of MIP solvers can/must be improved in many different directions.


MIP Challenges, performance

• The performance of MIP solvers can/must be improved in many different directions.

Among them, my favorite ones are:

– branching vs cutting

– sophisticated techniques for general-integer and continuous variables

– performance variability

– revisiting good “old” methods

– cutting plane exploitation

– symmetric MIPs


MIP Challenges: branching vs cutting

x∗αTx = α0 αTx = α0 + 1

-�



x∗αTx = α0 αTx = α0 + 1

-�

/

R



x∗αTx = α0 αTx = α0 + 1

-�

/

R

x∗

first wisdom

x∗,1 x∗,2

βTx = β0

W

W



x∗αTx = α0 αTx = α0 + 1

-�

/

R

x∗

first wisdom

x∗,1 x∗,2

βTx = β0

W

W

/

@@@@@@@@@@@@@@R

x∗

βTx = β0

αTx = α0� -

BBN

N

αTx = α0 + 1



x∗αTx = α0 αTx = α0 + 1

-�

/

R

x∗

first wisdom

x∗,1 x∗,2

βTx = β0

W

W

/

@@@@@@@@@@@@@@R

x∗

βTx = β0

αTx = α0� -

BBN

N

αTx = α0 + 1

second wisdom


MIP Challenges, branching vs cutting (cont.d)

• The previous slide highlights a possibility of using traditional cutting plane theory in the

branching context [Karamanov & Cornuejols 2005, 2010]





• It seems that a better coordination of these two fundamental ingredients of the MIP solvers is

crucial for strong improvements.







• In the context of hard knapsack constraints branching on variables is not effective while (pure)

basis reduction methods have proven to be very powerful [Eisenbrand; Aardal; Pataki; . . . ].









• On the other hand, a tight integration of basis reduction techniques within MIP solvers has not

yet been achieved. One possibility for such an integration is the use of partial reformulations

but an intriguing option is exploiting these reformulations to generate cuts in the original space

of variables [Aardal & Wolsey 2009].













• Branching on appropriate disjunctions has been recently proposed in the context of highly

symmetric MIPs [Ostrowsky, Linderoth, Rossi & Smriglio 2009].













• Branching on appropriate disjunctions has been recently proposed in the context of highly

symmetric MIPs [Ostrowsky, Linderoth, Rossi & Smriglio 2009].

• Finally, the use of bilevel programming for computing strong multiple disjunctions (i.e.,

disjunctions involving more than 2 children) has been recently shown to be effective for special

0-1 MIPs [Lodi, Ralphs, Rossi & Smriglio 2011].


MIP Challenges, performance (cont.d)

• A very important class of MIPs is 0/1 IPs. Many of the sophisticated techniques already

discussed have been originally proposed for this class and eventually extended to general MIPs.





• For example, branching on variables is particularly natural and effective in the 0/1 case while it

is not when general-integer variables play a central role.







• Another example are the models in which continuous variables are important: for those

variables MIP solvers do not do much (heuristics, strengthening, . . . ).









• A (urgent) MIP challenge is definitely dealing with general-integer and continuous variables

with special-purpose techniques.











• Cutting plane generation has been a key step for the success of MIP solvers but: are we using

cuts in the best way?












cuts in the best way?By far not!












cuts in the best way?By far not!

• Fundamental questions about the use of cutting planes remain open, among which:

– stabilization/saturation issues,

– cut selection,

– cut interaction.



• The already discussed performance variability (some good/neutral features that might not be

monotonically helpful, or, worse, can deteriorate performance) [Koch et al. 2011] is due to

imperfect tie-breaking but is also related to the interaction of key ingredients of MIP.






• This is the case of finding a (near-)optimal solution very early in the search tree that explicitly

improves the quality of the primal bound but might sometimes hurt in proving optimality (or at

least does not help).









• A deeper understanding through sophisticated testing techniques is needed [Hooker; McGeoch;

Margot].










Margot].

• The negative example suggests an additional very crucial question: besides avoiding good primal

solutions hurting the optimality proof, how can we use them to have instead a strong speed up?










Margot].

• The negative example suggests an additional very crucial question: besides avoiding good primal

solutions hurting the optimality proof, how can we use them to have instead a strong speed up?

• Good “old” methods have been rediscovered and revisited during the years, Gomory Mixed

Integer cuts being the most noticeable example. Recently:

– strong Benders cutting planes [Fischetti, Salvagnin & Zanette 2009];

– cutting plane use with the lexicographic simplex [Zanette, Fischetti & Balas 2010];

– cutting planes from group relaxation [Gomory; Richard; Dey; Wolsey; Dash & Gunluk;. . . ].


MIP Challenges, the modeling viewpoint

• Besides developing additional tools in the spirit of the ones described before

(among all possible I would like

a tool for detecting minimal sources of numerical instability)






the main challenge from a modeling/application viewpoint seems to be dissemination.







• More precisely, an interesting direction would be extending the modeling (and solving)

capability of the MIP framework.







• More precisely, an interesting direction would be extending the modeling (and solving)

capability of the MIP framework.

• Two successful stories in this direction are:

1. SCIP (Solving Constraint Integer Programs [Achterberg 2007]) whose main feature is a tight

integration of Constraint Programming (CP) and SATisfiability techniques within an MIP

solver.

2. Bonmin (Basic Open-source Nonlinear Mixed INteger programming [Bonami et al. 2008])

has been developed for Convex MINLP within the framework of the MIP solver Cbc [Forrest].


MIP Modeling, CP and SCIP

• SCIP can handle arbitrary (non-linear) constraints in a Constraint Programming fashion




– A global constraint defines

combinatorially a portion of the

feasible region, i.e., it is able to check

feasibility of an assignment of values

to variables.








to variables.

– Moreover, a global constraint contains

an algorithm that prunes (filters) values

from the variable domains so as to

reduce as much as possible the search

space.








to variables.





space.

• In other words, a higher-level modeling layer has been added, of which MIP is just one of the

options, so as to allow a beneficial interaction among different modeling and solving

technologies.








to variables.





space.

• In other words, a higher-level modeling layer has been added, of which MIP is just one of the

options, so as to allow a beneficial interaction among different modeling and solving

technologies.

• This is especially effective for those applications, like some classes of scheduling problems, in

which none of those technologies, in isolation, would outperform the others [Heinz & Beck

2012].


MIP Modeling, Convex (and Non-Convex) MINLPs and Bonmin

• A network design example in water distribution, instance fossolo

• The model does not have special difficulties besides the so-called Hazen-Williams equation

modeling pressure loss in water pipes. However, such an equation is very “bad” . . .






• A classical MIP model from the 80’s linearizes such an equation BUT Cplex does not find any

feasible solution for fossolo in 2 days of CPU time (!!) while Bonmin finds a very accurate

one in few seconds.






• A classical MIP model from the 80’s linearizes such an equation BUT Cplex does not find any

feasible solution for fossolo in 2 days of CPU time (!!) while Bonmin finds a very accurate

one in few seconds. Using the diameters computed by Bonmin, the MIP does not certify the

solution to be feasible even allowing 1,000 linearization points.


End of the course: concluding remarks

• We have seen






• We have seen




Finally, we discussed some MIP Challenges.



• We have seen





• In summary, MIP technology provides, through its commercial and noncommercial solvers, a

challenging, reliable, flexible and effective environment for application-oriented optimization.

1. challenging: a lot of good theoretical, methodological and experimental work is needed;

2. reliable: the software tools are stable;

3. flexible: it is open to hybridization, cannibalization, extensions;

4. effective: problems that were conceived as impossible only few years ago can regularly be

solved nowadays.



• We have seen





• In summary, MIP technology provides, through its commercial and noncommercial solvers, a

challenging, reliable, flexible and effective environment for application-oriented optimization.

1. challenging: a lot of good theoretical, methodological and experimental work is needed;

2. reliable: the software tools are stable;

3. flexible: it is open to hybridization, cannibalization, extensions;

4. effective: problems that were conceived as impossible only few years ago can regularly be

solved nowadays.

• All of the above look like solid reasons for developing the skills for using (and, why not,

improving on) the MIP technology.


How to use your favorite MIP Solver ... - univie.ac.at · 261 across all node LP solves, there are...

Documents

Transcript of How to use your favorite MIP Solver ... - univie.ac.at · 261 across all node LP solves, there are...