Mixed Integer Nonlinear Programming - UZH - ZCCFE · A Popular MINLP Method Dantzig’s Two-Phase...

Post on 01-Jun-2020

7 views 0 download

Transcript of Mixed Integer Nonlinear Programming - UZH - ZCCFE · A Popular MINLP Method Dantzig’s Two-Phase...

Mixed Integer Nonlinear Programming (MINLP)

Sven Leyffer

MCS DivisionArgonne National Lableyffer@mcs.anl.gov

Jeff Linderoth

ISE DepartmentLehigh Universityjtl3@lehigh.edu

1 / 160

Overview

1 Introduction, Applications, and Formulations

2 Classical Solution Methods

3 Modern Developments in MINLP

4 Implementation and Software

2 / 160

Part I

Introduction, Applications, and Formulations

3 / 160

The Problem of the Day

Mixed Integer Nonlinear Program (MINLP)minimize

x ,yf (x , y)

subject to c(x , y) ≤ 0x ∈ X , y ∈ Y integer

f , c smooth (convex) functions

X ,Y polyhedral sets, e.g. Y = y ∈ [0, 1]p | Ay ≤ by ∈ Y integer ⇒ hard problem

f , c not convex ⇒ very hard problem

4 / 160

The Problem of the Day

Mixed Integer Nonlinear Program (MINLP)minimize

x ,yf (x , y)

subject to c(x , y) ≤ 0x ∈ X , y ∈ Y integer

f , c smooth (convex) functions

X ,Y polyhedral sets, e.g. Y = y ∈ [0, 1]p | Ay ≤ b

y ∈ Y integer ⇒ hard problem

f , c not convex ⇒ very hard problem

5 / 160

The Problem of the Day

Mixed Integer Nonlinear Program (MINLP)minimize

x ,yf (x , y)

subject to c(x , y) ≤ 0x ∈ X , y ∈ Y integer

f , c smooth (convex) functions

X ,Y polyhedral sets, e.g. Y = y ∈ [0, 1]p | Ay ≤ by ∈ Y integer ⇒ hard problem

f , c not convex ⇒ very hard problem

6 / 160

The Problem of the Day

Mixed Integer Nonlinear Program (MINLP)minimize

x ,yf (x , y)

subject to c(x , y) ≤ 0x ∈ X , y ∈ Y integer

f , c smooth (convex) functions

X ,Y polyhedral sets, e.g. Y = y ∈ [0, 1]p | Ay ≤ by ∈ Y integer ⇒ hard problem

f , c not convex ⇒ very hard problem

7 / 160

Why the MI?

We can use 0-1 (binary) variables for a variety of purposes

Modeling yes/no decisionsEnforcing disjunctionsEnforcing logical conditionsModeling fixed costsModeling piecewise linear functions

If the variable is associated with a physical entity that is indivisible,then it must be integer

Number of aircraft carriers to to produce. Gomory’s Initial Motivation

8 / 160

Why the MI?

We can use 0-1 (binary) variables for a variety of purposes

Modeling yes/no decisionsEnforcing disjunctionsEnforcing logical conditionsModeling fixed costsModeling piecewise linear functions

If the variable is associated with a physical entity that is indivisible,then it must be integer

Number of aircraft carriers to to produce. Gomory’s Initial Motivation

9 / 160

A Popular MINLP Method

Dantzig’s Two-Phase Method for MINLP Adapted by Leyffer and Linderoth

1 Convince the user that he or she does not wish to solve a mixedinteger nonlinear programming problem at all!

2 Otherwise, solve the continuous relaxation (NLP) and round off theminimizer to the nearest integer.

For 0− 1 problems, or those in which the |y | is “small”, thecontinuous approximation to the discrete decision is not accurateenough for practical purposes.

Conclusion: MINLP methods must be studied!

10 / 160

A Popular MINLP Method

Dantzig’s Two-Phase Method for MINLP Adapted by Leyffer and Linderoth

1 Convince the user that he or she does not wish to solve a mixedinteger nonlinear programming problem at all!

2 Otherwise, solve the continuous relaxation (NLP) and round off theminimizer to the nearest integer.

For 0− 1 problems, or those in which the |y | is “small”, thecontinuous approximation to the discrete decision is not accurateenough for practical purposes.

Conclusion: MINLP methods must be studied!

11 / 160

A Popular MINLP Method

Dantzig’s Two-Phase Method for MINLP Adapted by Leyffer and Linderoth

1 Convince the user that he or she does not wish to solve a mixedinteger nonlinear programming problem at all!

2 Otherwise, solve the continuous relaxation (NLP) and round off theminimizer to the nearest integer.

For 0− 1 problems, or those in which the |y | is “small”, thecontinuous approximation to the discrete decision is not accurateenough for practical purposes.

Conclusion: MINLP methods must be studied!

12 / 160

A Popular MINLP Method

Dantzig’s Two-Phase Method for MINLP Adapted by Leyffer and Linderoth

1 Convince the user that he or she does not wish to solve a mixedinteger nonlinear programming problem at all!

2 Otherwise, solve the continuous relaxation (NLP) and round off theminimizer to the nearest integer.

For 0− 1 problems, or those in which the |y | is “small”, thecontinuous approximation to the discrete decision is not accurateenough for practical purposes.

Conclusion: MINLP methods must be studied!

13 / 160

Example: Core Reload Operation (Quist, A.J., 2000)

max. reactor efficiency after reloadsubject to diffusion PDE & safety

diffusion PDE ' nonlinear equation⇒ integer & nonlinear model

avoid reactor becoming sub-critical

14 / 160

Example: Core Reload Operation (Quist, A.J., 2000)

max. reactor efficiency after reloadsubject to diffusion PDE & safety

diffusion PDE ' nonlinear equation⇒ integer & nonlinear model

avoid reactor becoming overheated

15 / 160

Example: Core Reload Operation (Quist, A.J., 2000)

look for cycles for moving bundles:e.g. 4 → 6 → 8 → 10i.e. bundle moved from 4 to 6 ...

model with binary xilm ∈ 0, 1xilm = 1⇔ node i has bundle l of cycle m

16 / 160

AMPL Model of Core Reload Operation

Exactly one bundle per node:

L∑l=1

M∑m=1

xilm = 1 ∀i ∈ I

AMPL model:var x I,L,M binary ;

Bundle i in I: suml in L, m in M x[i,l,m] = 1 ;

Multiple Choice: One of the most common uses of IP

Full AMPL model c-reload.mod atwww.mcs.anl.gov/~leyffer/MacMINLP/

17 / 160

Gas Transmission Problem (De Wolf and Smeers, 2000)

Belgium has no gas!

All natural gas is importedfrom Norway, Holland, orAlgeria.

Supply gas to all demandpoints in a network in aminimum cost fashion.

Gas is pumped through thenetwork with a series ofcompressors

There are constraints on thepressure of the gas withinthe pipe

18 / 160

Gas Transmission Problem (De Wolf and Smeers, 2000)

Belgium has no gas!

All natural gas is importedfrom Norway, Holland, orAlgeria.

Supply gas to all demandpoints in a network in aminimum cost fashion.

Gas is pumped through thenetwork with a series ofcompressors

There are constraints on thepressure of the gas withinthe pipe

19 / 160

Gas Transmission Problem (De Wolf and Smeers, 2000)

Belgium has no gas!

All natural gas is importedfrom Norway, Holland, orAlgeria.

Supply gas to all demandpoints in a network in aminimum cost fashion.

Gas is pumped through thenetwork with a series ofcompressors

There are constraints on thepressure of the gas withinthe pipe

20 / 160

Pressure Loss is Nonlinear

Assume horizontal pipes andsteady state flows

Pressure loss p across a pipe isrelated to the flow rate f as

p2in − p2

out =1

Ψsign(f )f 2

Ψ: “Friction Factor”

21 / 160

Gas Transmission: Problem Input

Network (N,A). A = Ap ∪ Aa

Aa: active arcs have compressor. Flow rate can increase on arcAp: passive arcs simply conserve flow rate

Ns ⊆ N: set of supply nodes

ci , i ∈ Ns : Purchase cost of gas

s i , s i : Lower and upper bounds on gas “supply” at node i

pi, pi : Lower and upper bounds on gas pressure at node i

si , i ∈ N: supply at node i .

si > 0⇒ gas added to the network at node isi < 0⇒ gas removed from the network at node i to meet demand

fij , (i , j) ∈ A: flow along arc (i , j)

f (i , j) > 0⇒ gas flows i → jf (i , j) < 0⇒ gas flows j → i

22 / 160

Gas Transmission: Problem Input

Network (N,A). A = Ap ∪ Aa

Aa: active arcs have compressor. Flow rate can increase on arcAp: passive arcs simply conserve flow rate

Ns ⊆ N: set of supply nodes

ci , i ∈ Ns : Purchase cost of gas

s i , s i : Lower and upper bounds on gas “supply” at node i

pi, pi : Lower and upper bounds on gas pressure at node i

si , i ∈ N: supply at node i .

si > 0⇒ gas added to the network at node isi < 0⇒ gas removed from the network at node i to meet demand

fij , (i , j) ∈ A: flow along arc (i , j)

f (i , j) > 0⇒ gas flows i → jf (i , j) < 0⇒ gas flows j → i

23 / 160

Gas Transmission Model

min∑j∈Ns

cjsj

subject to ∑j |(i ,j)∈A

fij = si ∀i ∈ N

sign(fij)f2ij −Ψij(p

2i − p2

j ) = 0 ∀(i , j) ∈ Ap

sign(fij)f2ij −Ψij(p

2i − p2

j ) ≥ 0 ∀(i , j) ∈ Aa

si ∈ [s i , s i ] ∀i ∈ Npi ∈ [p

i, pi ] ∀i ∈ N

fij ≥ 0 ∀(i , j) ∈ Aa

24 / 160

Your First Modeling Trick

Don’t include nonlinearities or nonconvexities unless necessary!

Replace p2i ← ρi

sign(fij)f2ij −Ψij(ρi − ρj) = 0 ∀(i , j) ∈ Ap

f 2ij −Ψij(ρi − ρj) ≥ 0 ∀(i , j) ∈ Aa

ρi ∈ [√pi,√

pi ] ∀i ∈ N

This trick only works because1 p2

i terms appear only in the bound constraints2 Also fij ≥ 0 ∀(i , j) ∈ Aa

This model is nonconvex: sign(fij)f2ij is a nonconvex function

Some solvers do not like sign

25 / 160

Your First Modeling Trick

Don’t include nonlinearities or nonconvexities unless necessary!

Replace p2i ← ρi

sign(fij)f2ij −Ψij(ρi − ρj) = 0 ∀(i , j) ∈ Ap

f 2ij −Ψij(ρi − ρj) ≥ 0 ∀(i , j) ∈ Aa

ρi ∈ [√pi,√pi ] ∀i ∈ N

This trick only works because1 p2

i terms appear only in the bound constraints2 Also fij ≥ 0 ∀(i , j) ∈ Aa

This model is nonconvex: sign(fij)f2ij is a nonconvex function

Some solvers do not like sign

26 / 160

Your First Modeling Trick

Don’t include nonlinearities or nonconvexities unless necessary!

Replace p2i ← ρi

sign(fij)f2ij −Ψij(ρi − ρj) = 0 ∀(i , j) ∈ Ap

f 2ij −Ψij(ρi − ρj) ≥ 0 ∀(i , j) ∈ Aa

ρi ∈ [√pi,√pi ] ∀i ∈ N

This trick only works because1 p2

i terms appear only in the bound constraints2 Also fij ≥ 0 ∀(i , j) ∈ Aa

This model is nonconvex: sign(fij)f2ij is a nonconvex function

Some solvers do not like sign

27 / 160

Dealing with sign(·): The NLP Way

Use auxiliary binary variables to indicate direction of flow

Let |fij | ≤ F ∀(i , j) ∈ Ap

zij =

1 fij ≥ 0 fij ≥ −F (1− zij)0 fij ≤ 0 fij ≤ Fzij

Note thatsign(fij) = 2zij − 1

Write constraint as

(2zij − 1)f 2ij −Ψij(ρi − ρj) = 0.

28 / 160

Special Ordered Sets

Sven thinks this ’NLP trick’ is pretty cool

It is not how it is done in De Wolf and Smeers (2000).

Heuristic for finding a good starting solution, then a localoptimization approach based on a piecewise-linear simplex method

Another (similar) approach involves approximating the nonlinearfunction by piecewise linear segments, but searching for the globallyoptimal solution: Special Ordered Sets of Type 2

If the “multidimensional” nonlinearity cannot be removed, resort toSpecial Ordered Sets of Type 3

29 / 160

Special Ordered Sets

Sven thinks this ’NLP trick’ is pretty cool

It is not how it is done in De Wolf and Smeers (2000).

Heuristic for finding a good starting solution, then a localoptimization approach based on a piecewise-linear simplex method

Another (similar) approach involves approximating the nonlinearfunction by piecewise linear segments, but searching for the globallyoptimal solution: Special Ordered Sets of Type 2

If the “multidimensional” nonlinearity cannot be removed, resort toSpecial Ordered Sets of Type 3

30 / 160

Special Ordered Sets

Sven thinks this ’NLP trick’ is pretty cool

It is not how it is done in De Wolf and Smeers (2000).

Heuristic for finding a good starting solution, then a localoptimization approach based on a piecewise-linear simplex method

Another (similar) approach involves approximating the nonlinearfunction by piecewise linear segments, but searching for the globallyoptimal solution: Special Ordered Sets of Type 2

If the “multidimensional” nonlinearity cannot be removed, resort toSpecial Ordered Sets of Type 3

31 / 160

Portfolio Management

N: Universe of asset to purchase

xi : Amount of asset i to hold

B: Budget

minx∈R|N|+

u(x) |

∑i∈N

xi = B

Markowitz: u(x)def= −αT x + λxTQx

α: Expected returnsQ: Variance-covariance matrix of expected returnsλ: Risk aversion parameter

32 / 160

Portfolio Management

N: Universe of asset to purchase

xi : Amount of asset i to hold

B: Budget

minx∈R|N|+

u(x) |

∑i∈N

xi = B

Markowitz: u(x)def= −αT x + λxTQx

α: Expected returnsQ: Variance-covariance matrix of expected returnsλ: Risk aversion parameter

33 / 160

More Realistic Models

b ∈ R|N| of “benchmark” holdings

Benchmark Tracking: u(x)def= (x − b)TQ(x − b)

Constraint on E[Return]: αT x ≥ r

Limit Names: |i ∈ N : xi > 0| ≤ K

Use binary indicator variables to model the implicationxi > 0⇒ yi = 1Implication modeled with variable upper bounds:

xi ≤ Byi ∀i ∈ N∑i∈N yi ≤ K

34 / 160

More Realistic Models

b ∈ R|N| of “benchmark” holdings

Benchmark Tracking: u(x)def= (x − b)TQ(x − b)

Constraint on E[Return]: αT x ≥ r

Limit Names: |i ∈ N : xi > 0| ≤ K

Use binary indicator variables to model the implicationxi > 0⇒ yi = 1Implication modeled with variable upper bounds:

xi ≤ Byi ∀i ∈ N∑i∈N yi ≤ K

35 / 160

Even More Models

Min Holdings: (xi = 0) ∨ (xi ≥ m)

Model implication: xi > 0⇒ xi ≥ mxi > 0⇒ yi = 1⇒ xi ≥ mxi ≤ Byi , xi ≥ myi ∀i ∈ N

Round Lots: xi ∈ kLi , k = 1, 2, . . .xi − ziLi = 0, zi ∈ Z+ ∀i ∈ N

Vector h of initial holdings

Transactions: ti = |xi − hi |Turnover:

∑i∈N ti ≤ ∆

Transaction Costs:∑

i∈N ci ti in objective

Market Impact:∑

i∈N γi t2i in objective

36 / 160

Even More Models

Min Holdings: (xi = 0) ∨ (xi ≥ m)

Model implication: xi > 0⇒ xi ≥ mxi > 0⇒ yi = 1⇒ xi ≥ mxi ≤ Byi , xi ≥ myi ∀i ∈ N

Round Lots: xi ∈ kLi , k = 1, 2, . . .xi − ziLi = 0, zi ∈ Z+ ∀i ∈ N

Vector h of initial holdings

Transactions: ti = |xi − hi |Turnover:

∑i∈N ti ≤ ∆

Transaction Costs:∑

i∈N ci ti in objective

Market Impact:∑

i∈N γi t2i in objective

37 / 160

Even More Models

Min Holdings: (xi = 0) ∨ (xi ≥ m)

Model implication: xi > 0⇒ xi ≥ mxi > 0⇒ yi = 1⇒ xi ≥ mxi ≤ Byi , xi ≥ myi ∀i ∈ N

Round Lots: xi ∈ kLi , k = 1, 2, . . .xi − ziLi = 0, zi ∈ Z+ ∀i ∈ N

Vector h of initial holdings

Transactions: ti = |xi − hi |Turnover:

∑i∈N ti ≤ ∆

Transaction Costs:∑

i∈N ci ti in objective

Market Impact:∑

i∈N γi t2i in objective

38 / 160

Multiproduct Batch Plants (Kocis and Gross-

mann, 1988)

M: Batch Processing Stages

N: Different Products

H: Horizon Time

Qi : Required quantity of product i

tij : Processing time product i stage j

Sij : “Size Factor” product i stage j

Bi : Batch size of product i ∈ N

Vj : Stage j size: Vj ≥ SijBi ∀i , jNj : Number of machines at stage j

Ci : Longest stage time for product i : Ci ≥ tij/Nj ∀i , j

39 / 160

Multiproduct Batch Plants (Kocis and Gross-

mann, 1988)

M: Batch Processing Stages

N: Different Products

H: Horizon Time

Qi : Required quantity of product i

tij : Processing time product i stage j

Sij : “Size Factor” product i stage j

Bi : Batch size of product i ∈ N

Vj : Stage j size: Vj ≥ SijBi ∀i , jNj : Number of machines at stage j

Ci : Longest stage time for product i : Ci ≥ tij/Nj ∀i , j40 / 160

Multiproduct Batch Plants

min∑j∈M

αjNjVβjj

s.t.

Vj − SijBi ≥ 0 ∀i ∈ N,∀j ∈ MCiNj ≥ tij ∀i ∈ N,∀j ∈ M∑

i∈N

Qi

BiCi ≤ H

Bound Constraints on Vj ,Ci ,Bi ,Nj

Nj ∈ Z ∀j ∈ M

41 / 160

Modeling Trick #2

Horizon Time and Objective Function Nonconvex. :-(

Sometimes variable transformations work!

vj = ln(Vj), nj = ln(Nj), bi = ln(Bi ), ci = lnCi

min∑j∈M

αjeNj+βjVj

s.t. vj − ln(Sij)bi ≥ 0 ∀i ∈ N,∀j ∈ Mci + nj ≥ ln(τij) ∀i ∈ N,∀j ∈ M∑

i∈NQie

Ci−Bi ≤ H

(Transformed) Bound Constraints on Vj ,Ci ,Bi

42 / 160

Modeling Trick #2

Horizon Time and Objective Function Nonconvex. :-(

Sometimes variable transformations work!

vj = ln(Vj), nj = ln(Nj), bi = ln(Bi ), ci = lnCi

min∑j∈M

αjeNj+βjVj

s.t. vj − ln(Sij)bi ≥ 0 ∀i ∈ N, ∀j ∈ Mci + nj ≥ ln(τij) ∀i ∈ N,∀j ∈ M∑

i∈NQie

Ci−Bi ≤ H

(Transformed) Bound Constraints on Vj ,Ci ,Bi

43 / 160

Modeling Trick #2

Horizon Time and Objective Function Nonconvex. :-(

Sometimes variable transformations work!

vj = ln(Vj), nj = ln(Nj), bi = ln(Bi ), ci = lnCi

min∑j∈M

αjeNj+βjVj

s.t. vj − ln(Sij)bi ≥ 0 ∀i ∈ N, ∀j ∈ Mci + nj ≥ ln(τij) ∀i ∈ N,∀j ∈ M∑

i∈NQie

Ci−Bi ≤ H

(Transformed) Bound Constraints on Vj ,Ci ,Bi

44 / 160

How to Handle the Integrality?

But what to do about the integrality?

1 ≤ Nj ≤ N j ∀j ∈ M,Nj ∈ Z ∀j ∈ M

nj ∈ 0, ln(2), ln(3), . . . ...

Ykj =

1 nj takes value ln(k)0 Otherwise

nj −K∑

k=1

ln(k)Ykj = 0 ∀j ∈ M

K∑k=1

Ykj = 1 ∀j ∈ M

This model is available at http://www-unix.mcs.anl.gov/

~leyffer/macminlp/problems/batch.mod

45 / 160

A Small Smattering of Other Applications

Chemical Engineering Applications:

process synthesis (Kocis and Grossmann, 1988)batch plant design (Grossmann and Sargent, 1979)cyclic scheduling (Jain, V. and Grossmann, I.E., 1998)design of distillation columns (Viswanathan and Grossmann, 1993)pump configuration optimization (Westerlund, T., Pettersson, F. andGrossmann, I.E., 1994)

Forestry/Paper

production (Westerlund, T., Isaksson, J. and Harjunkoski, I., 1995)trimloss minimization (Harjunkoski, I., Westerlund, T., Porn, R. andSkrifvars, H., 1998)

Topology Optimization (Sigmund, 2001)

46 / 160

Part II

Classical Solution Methods

47 / 160

Classical Solution Methods for MINLP

1 Classical Branch-and-Bound

2 Outer Approximation & Benders Decomposition3 Hybrid Methods

LP/NLP Based Branch-and-BoundIntegrating SQP with Branch-and-Bound

48 / 160

Branch-and-Bound

Solve relaxed NLP (0 ≤ y ≤ 1 continuous relaxation). . . solution value provides lower bound

Branch on yi non-integral

Solve NLPs & branch until1 Node infeasible ... •2 Node integer feasible ... ⇒ get upper bound (U)

3 Lower bound ≥ U ...⊗

y = 1

y = 0i

i

dominated by upper bound

infeasible

integer feasibleetc.

etc.

Search until no unexplored nodes on tree

49 / 160

Variable Selection for Branch-and-BoundAssume yi ∈ 0, 1 for simplicity ...(x , y) fractional solution to parent node; f = f (x , y)

1 maximal fractional branching: choose yi closest to 12

maximin(1− yi , yi )

2 strong branching: (approx) solve all NLP children:

f+/−i ←

minimize

x ,yf (x , y)

subject to c(x , y) ≤ 0x ∈ X , y ∈ Y , yi = 1/0

branching variable yi that changes objective the most:

maxi

min(f +

i , f−i )

50 / 160

Variable Selection for Branch-and-BoundAssume yi ∈ 0, 1 for simplicity ...(x , y) fractional solution to parent node; f = f (x , y)

1 maximal fractional branching: choose yi closest to 12

maximin(1− yi , yi )

2 strong branching: (approx) solve all NLP children:

f+/−i ←

minimize

x ,yf (x , y)

subject to c(x , y) ≤ 0x ∈ X , y ∈ Y , yi = 1/0

branching variable yi that changes objective the most:

maxi

min(f +

i , f−i )

51 / 160

Node Selection for Branch-and-Bound

Which node n on tree T should be solved next?1 depth-first search: select deepest node in tree

minimizes number of NLP nodes storedexploit warm-starts (MILP/MIQP only)

2 best estimate: choose node with best expected integer soln

minn∈T

fp(n) +∑

i :yi fractional

mine+i (1− yi ), e

−i yi

where fp(n) = value of parent node, e+/−i = pseudo-costs

summing pseudo-cost estimates for all integers in subtree

52 / 160

Node Selection for Branch-and-Bound

Which node n on tree T should be solved next?1 depth-first search: select deepest node in tree

minimizes number of NLP nodes storedexploit warm-starts (MILP/MIQP only)

2 best estimate: choose node with best expected integer soln

minn∈T

fp(n) +∑

i :yi fractional

mine+i (1− yi ), e

−i yi

where fp(n) = value of parent node, e+/−i = pseudo-costs

summing pseudo-cost estimates for all integers in subtree

53 / 160

Outer Approximation (Duran and Grossmann, 1986)

Motivation: avoid solving huge number of NLPs• Exploit MILP/NLP solvers: decompose integer/nonlinear part

Key idea: reformulate MINLP as MILP (implicit)• Solve alternating sequence of MILP & NLP

NLP subproblem yj fixed:

NLP(yj)

minimize

xf (x , yj)

subject to c(x , yj) ≤ 0x ∈ X

Main Assumption: f , c are convex

(y )jNLP

MILP

54 / 160

Outer Approximation (Duran and Grossmann, 1986)

• let (xj , yj) solve NLP(yj)• linearize f , c about (xj , yj) =: zj• new objective variable η ≥ f (x , y)• MINLP (P) ≡ MILP (M)

f(x)

η

(M)

minimizez=(x ,y),η

η

subject to η ≥ fj +∇f Tj (z − zj) ∀yj ∈ Y

0 ≥ cj +∇cTj (z − zj) ∀yj ∈ Y

x ∈ X , y ∈ Y integer

SNAG: need all yj ∈ Y linearizations

55 / 160

Outer Approximation (Duran and Grossmann, 1986)

(Mk): lower bound (underestimate convex f , c)NLP(yj): upper bound U (fixed yj)

NLP( ) subproblemylinearizationNLP gives

MILP findsnew y

MILP infeasible?

Yes

STOP

No

MILP master program

⇒ stop, if lower bound ≥ upper bound

56 / 160

Outer Approximation & Benders Decomposition

Take OA cuts for zj := (xj , yj) ... wlog X = Rn

η ≥ fj +∇f Tj (z − zj) & 0 ≥ cj +∇cTj (z − zj)

sum with (1, λj) ... λj multipliers of NLP(yj)

η ≥ fj + λTj cj + (∇fj +∇cjλj)T (z − zj)

KKT conditions of NLP(yj) ⇒ ∇x fj +∇xcjλj = 0... eliminate x components from valid inequality in y

⇒ η ≥ fj + (∇y fj +∇ycjλj)T (y − yj)

NB: µj = ∇y fj +∇ycjλj multiplier of y = yj in NLP(yj)References: (Geoffrion, 1972)

57 / 160

LP/NLP Based Branch-and-Bound

AIM: avoid re-solving MILP master (M)

Consider MILP branch-and-bound

interrupt MILP, when yj found⇒ solve NLP(yj) get xj

linearize f , c about (xj , yj)⇒ add linearization to tree

continue MILP tree-search

... until lower bound ≥ upper bound

58 / 160

LP/NLP Based Branch-and-Bound

AIM: avoid re-solving MILP master (M)

Consider MILP branch-and-bound

interrupt MILP, when yj found⇒ solve NLP(yj) get xj

linearize f , c about (xj , yj)⇒ add linearization to tree

continue MILP tree-search

integerfeasible

... until lower bound ≥ upper bound

59 / 160

LP/NLP Based Branch-and-Bound

AIM: avoid re-solving MILP master (M)

Consider MILP branch-and-bound

interrupt MILP, when yj found⇒ solve NLP(yj) get xj

linearize f , c about (xj , yj)⇒ add linearization to tree

continue MILP tree-search

... until lower bound ≥ upper bound

60 / 160

LP/NLP Based Branch-and-Bound

AIM: avoid re-solving MILP master (M)

Consider MILP branch-and-bound

interrupt MILP, when yj found⇒ solve NLP(yj) get xj

linearize f , c about (xj , yj)⇒ add linearization to tree

continue MILP tree-search

... until lower bound ≥ upper bound

61 / 160

LP/NLP Based Branch-and-Bound

need access to MILP solver ... call back exploit good MILP (branch-cut-price) solver (Akrotirianakis et al., 2001) use Gomory cuts in tree-search

preliminary results: order of magnitude faster than OA same number of NLPs, but only one MILP

similar ideas for Benders & Extended Cutting Plane methods

recent implementation by CMU/IBM group

References: (Quesada and Grossmann, 1992)

62 / 160

Integrating SQP & Branch-and-Bound

AIM: Avoid solving NLP node to convergence.

Sequential Quadratic Programming (SQP)→ solve sequence (QPk) at every node

(QPk)

minimize

dfk +∇f Tk d + 1

2dTHkd

subject to ck +∇cTk d ≤ 0xk + dx ∈ X

yk + dy ∈ Y .

Early branching:After QP step choose non-integral yk+1

i , branch & continue SQPReferences: (Borchers and Mitchell, 1994; Leyffer, 2001)

63 / 160

Integrating SQP & Branch-and-Bound

SNAG: (QPk) not lower bound⇒ no fathoming from upper bound

minimized

fk +∇f Tk d + 12d

THkd

subject to ck +∇cTk d ≤ 0xk + dx ∈ X

yk + dy ∈ Y .

Remedy: Exploit OA underestimating property (Leyffer, 2001):

add objective cut fk +∇f Tk d ≤ U − ε to (QPk)

fathom node, if (QPk) inconsistent

NB: (QPk) inconsistent and trust-region active ⇒ do not fathom

64 / 160

Integrating SQP & Branch-and-Bound

SNAG: (QPk) not lower bound⇒ no fathoming from upper bound

minimized

fk +∇f Tk d + 12d

THkd

subject to ck +∇cTk d ≤ 0xk + dx ∈ X

yk + dy ∈ Y .

Remedy: Exploit OA underestimating property (Leyffer, 2001):

add objective cut fk +∇f Tk d ≤ U − ε to (QPk)

fathom node, if (QPk) inconsistent

NB: (QPk) inconsistent and trust-region active ⇒ do not fathom

65 / 160

Comparison of Classical MINLP Techniques

Summary of numerical experience

1 Quadratic OA master: usually fewer iterationMIQP harder to solve

2 NLP branch-and-bound faster than OA... depends on MIP solver

3 LP/NLP-based-BB order of magnitude faster than OA. . . also faster than B&B

4 Integrated SQP-B&B up to 3× faster than B&B' number of QPs per node

5 ECP works well, if function/gradient evals expensive

66 / 160

Part III

Modern Developments in MINLP

67 / 160

Modern Methods for MINLP

1 Formulations

RelaxationsGood formulations: big M ′s and disaggregation

2 Cutting Planes

Cuts from relaxations and special structuresCuts from integrality

3 Handling Nonconvexity

EnvelopesMethods

68 / 160

Relaxations

z(S)def= minx∈S f (x)

z(T )def= minx∈T f (x)

S

T

Independent of f , S ,T :z(T ) ≤ z(S)

If x∗T = arg minx∈T f (x)

And x∗T ∈ S , then

x∗T = arg minx∈S f (x)

69 / 160

Relaxations

z(S)def= minx∈S f (x)

z(T )def= minx∈T f (x)

S

T

Independent of f , S ,T :z(T ) ≤ z(S)

If x∗T = arg minx∈T f (x)

And x∗T ∈ S , then

x∗T = arg minx∈S f (x)

70 / 160

Relaxations

z(S)def= minx∈S f (x)

z(T )def= minx∈T f (x)

S

T

Independent of f , S ,T :z(T ) ≤ z(S)

If x∗T = arg minx∈T f (x)

And x∗T ∈ S , then

x∗T = arg minx∈S f (x)

71 / 160

UFL: Uncapacitated Facility LocationFacilities: J

Customers: Imin

∑j∈J

fjxj +∑i∈I

∑j∈J

fijyij

∑j∈J

yij = 1 ∀i ∈ I∑i∈I

yij ≤ |I |xj ∀j ∈ J (1)

OR yij ≤ xj ∀i ∈ I , j ∈ J (2)

Which formulation is to be preferred?

I = J = 40. Costs random.

Formulation 1. 53,121 seconds, optimal solution.Formulation 2. 2 seconds, optimal solution.

72 / 160

UFL: Uncapacitated Facility LocationFacilities: J

Customers: Imin

∑j∈J

fjxj +∑i∈I

∑j∈J

fijyij

∑j∈J

yij = 1 ∀i ∈ I∑i∈I

yij ≤ |I |xj ∀j ∈ J (1)

OR yij ≤ xj ∀i ∈ I , j ∈ J (2)

Which formulation is to be preferred?

I = J = 40. Costs random.

Formulation 1. 53,121 seconds, optimal solution.Formulation 2. 2 seconds, optimal solution.

73 / 160

UFL: Uncapacitated Facility LocationFacilities: J

Customers: Imin

∑j∈J

fjxj +∑i∈I

∑j∈J

fijyij

∑j∈J

yij = 1 ∀i ∈ I∑i∈I

yij ≤ |I |xj ∀j ∈ J (1)

OR yij ≤ xj ∀i ∈ I , j ∈ J (2)

Which formulation is to be preferred?

I = J = 40. Costs random.

Formulation 1. 53,121 seconds, optimal solution.Formulation 2. 2 seconds, optimal solution.

74 / 160

UFL: Uncapacitated Facility LocationFacilities: J

Customers: Imin

∑j∈J

fjxj +∑i∈I

∑j∈J

fijyij

∑j∈J

yij = 1 ∀i ∈ I∑i∈I

yij ≤ |I |xj ∀j ∈ J (1)

OR yij ≤ xj ∀i ∈ I , j ∈ J (2)

Which formulation is to be preferred?

I = J = 40. Costs random.

Formulation 1. 53,121 seconds, optimal solution.Formulation 2. 2 seconds, optimal solution.

75 / 160

Valid Inequalities

Sometimes we can get a better formulation by dynamicallyimproving it.

An inequality πT x ≤ π0 is a valid inequality for S ifπT x ≤ π0 ∀x ∈ S

Alternatively: maxx∈SπT x ≤ π0

Thm: (Hahn-Banach). Let S ⊂ Rn bea closed, convex set, and let x 6∈ S .Then there exists π ∈ Rn such that

πT x > maxx∈SπT x S

xπTx = π0

76 / 160

Valid Inequalities

Sometimes we can get a better formulation by dynamicallyimproving it.

An inequality πT x ≤ π0 is a valid inequality for S ifπT x ≤ π0 ∀x ∈ S

Alternatively: maxx∈SπT x ≤ π0

Thm: (Hahn-Banach). Let S ⊂ Rn bea closed, convex set, and let x 6∈ S .Then there exists π ∈ Rn such that

πT x > maxx∈SπT x S

xπTx = π0

77 / 160

Nonlinear Branch-and-Cut

Consider MINLPminimize

x ,yf Tx x + f Ty y

subject to c(x , y) ≤ 0y ∈ 0, 1p, 0 ≤ x ≤ U

Note the Linear objective

This is WLOG:

min f (x , y) ⇔ min η s.t. η ≥ f (x , y)

78 / 160

It’s Actually Important!

We want to approximate the convex hull of integer solutions, butwithout a linear objective function, the solution to the relaxationmight occur in the interior.

No Separating Hyperplane! :-(

min(y1 − 1/2)2 + (y2 − 1/2)2

s.t. y1 ∈ 0, 1, y2 ∈ 0, 1

η ≥ (y1 − 1/2)2 + (y2 − 1/2)2

y1

y2

(y1, y2)

79 / 160

It’s Actually Important!

We want to approximate the convex hull of integer solutions, butwithout a linear objective function, the solution to the relaxationmight occur in the interior.

No Separating Hyperplane! :-(

min(y1 − 1/2)2 + (y2 − 1/2)2

s.t. y1 ∈ 0, 1, y2 ∈ 0, 1

η ≥ (y1 − 1/2)2 + (y2 − 1/2)2

y1

y2

(y1, y2)

η

80 / 160

Valid Inequalities From Relaxations

Idea: Inequalities valid for a relaxation are valid for original

Generating valid inequalities for a relaxation is often easier.

T

Sx

πTx

0

Separation Problem over T:Given x ,T find (π, π0) suchthat πT x > π0,πT x ≤ π0∀x ∈ T

81 / 160

Simple Relaxations

Idea: Consider one row relaxations

If P = x ∈ 0, 1n | Ax ≤ b, then for any row i ,Pi = x ∈ 0, 1n | aTi x ≤ bi is a relaxation of P.

If the intersection of the relaxations is a good approximation to thetrue problem, then the inequalities will be quite useful.

Crowder et al. (1983) is the seminal paper that shows this to be truefor IP.

MINLP: Single (linear) row relaxations are also valid ⇒ sameinequalities can also be used

82 / 160

Simple Relaxations

Idea: Consider one row relaxations

If P = x ∈ 0, 1n | Ax ≤ b, then for any row i ,Pi = x ∈ 0, 1n | aTi x ≤ bi is a relaxation of P.

If the intersection of the relaxations is a good approximation to thetrue problem, then the inequalities will be quite useful.

Crowder et al. (1983) is the seminal paper that shows this to be truefor IP.

MINLP: Single (linear) row relaxations are also valid ⇒ sameinequalities can also be used

83 / 160

Simple Relaxations

Idea: Consider one row relaxations

If P = x ∈ 0, 1n | Ax ≤ b, then for any row i ,Pi = x ∈ 0, 1n | aTi x ≤ bi is a relaxation of P.

If the intersection of the relaxations is a good approximation to thetrue problem, then the inequalities will be quite useful.

Crowder et al. (1983) is the seminal paper that shows this to be truefor IP.

MINLP: Single (linear) row relaxations are also valid ⇒ sameinequalities can also be used

84 / 160

Simple Relaxations

Idea: Consider one row relaxations

If P = x ∈ 0, 1n | Ax ≤ b, then for any row i ,Pi = x ∈ 0, 1n | aTi x ≤ bi is a relaxation of P.

If the intersection of the relaxations is a good approximation to thetrue problem, then the inequalities will be quite useful.

Crowder et al. (1983) is the seminal paper that shows this to be truefor IP.

MINLP: Single (linear) row relaxations are also valid ⇒ sameinequalities can also be used

85 / 160

Simple Relaxations

Idea: Consider one row relaxations

If P = x ∈ 0, 1n | Ax ≤ b, then for any row i ,Pi = x ∈ 0, 1n | aTi x ≤ bi is a relaxation of P.

If the intersection of the relaxations is a good approximation to thetrue problem, then the inequalities will be quite useful.

Crowder et al. (1983) is the seminal paper that shows this to be truefor IP.

MINLP: Single (linear) row relaxations are also valid ⇒ sameinequalities can also be used

86 / 160

Knapsack Covers

K = x ∈ 0, 1n | aT x ≤ b

A set C ⊆ N is a cover if∑

j∈C aj > b

A cover C is a minimal cover if C \ j is not a cover ∀j ∈ C

If C ⊆ N is a cover, then the cover inequality∑j∈C

xj ≤ |C | − 1

is a valid inequality for S

Sometimes (minimal) cover inequalities are facets of conv(K )

87 / 160

Knapsack Covers

K = x ∈ 0, 1n | aT x ≤ b

A set C ⊆ N is a cover if∑

j∈C aj > b

A cover C is a minimal cover if C \ j is not a cover ∀j ∈ C

If C ⊆ N is a cover, then the cover inequality∑j∈C

xj ≤ |C | − 1

is a valid inequality for S

Sometimes (minimal) cover inequalities are facets of conv(K )

88 / 160

Other Substructures

Single node flow: (Padberg et al., 1985)

S =

x ∈ R|N|+ , y ∈ 0, 1|N| |∑j∈N

xj ≤ b, xj ≤ ujyj ∀ j ∈ N

Knapsack with single continuous variable: (Marchand and Wolsey,

1999)

S =

x ∈ R+, y ∈ 0, 1|N| |∑j∈N

ajyj ≤ b + x

Set Packing: (Borndorfer and Weismantel, 2000)

S =y ∈ 0, 1|N| | Ay ≤ e

A ∈ 0, 1|M|×|N|, e = (1, 1, . . . , 1)T

89 / 160

The Chvatal-Gomory Procedure

A general procedure for generating valid inequalities for integerprograms

Let the columns of A ∈ Rm×n be denoted by a1, a2, . . . anS = y ∈ Zn

+ | Ay ≤ b.1 Choose nonnegative multipliers u ∈ Rm

+2 uTAy ≤ uTb is a valid inequality (

∑j∈N uTajyj ≤ uTb).

3∑

j∈NbuTajcyj ≤ uTb (Since y ≥ 0).4∑

j∈NbuTajcyj ≤ buTbc is valid for S since buTajcyj is an integer

Simply Amazing: This simple procedure suffices to generate everyvalid inequality for an integer program

90 / 160

The Chvatal-Gomory Procedure

A general procedure for generating valid inequalities for integerprograms

Let the columns of A ∈ Rm×n be denoted by a1, a2, . . . anS = y ∈ Zn

+ | Ay ≤ b.1 Choose nonnegative multipliers u ∈ Rm

+2 uTAy ≤ uTb is a valid inequality (

∑j∈N uTajyj ≤ uTb).

3∑

j∈NbuTajcyj ≤ uTb (Since y ≥ 0).4∑

j∈NbuTajcyj ≤ buTbc is valid for S since buTajcyj is an integer

Simply Amazing: This simple procedure suffices to generate everyvalid inequality for an integer program

91 / 160

The Chvatal-Gomory Procedure

A general procedure for generating valid inequalities for integerprograms

Let the columns of A ∈ Rm×n be denoted by a1, a2, . . . anS = y ∈ Zn

+ | Ay ≤ b.1 Choose nonnegative multipliers u ∈ Rm

+2 uTAy ≤ uTb is a valid inequality (

∑j∈N uTajyj ≤ uTb).

3∑

j∈NbuTajcyj ≤ uTb (Since y ≥ 0).4∑

j∈NbuTajcyj ≤ buTbc is valid for S since buTajcyj is an integer

Simply Amazing: This simple procedure suffices to generate everyvalid inequality for an integer program

92 / 160

Extension to MINLP (Cezik and Iyengar, 2005)

This simple idea also extends to mixed 0-1 conic programmingminimizezdef= (x ,y)

f T z

subject to Az K by ∈ 0, 1p, 0 ≤ x ≤ U

K: Homogeneous, self-dual, proper, convex cone

x K y ⇔ (x − y) ∈ K

93 / 160

Gomory On Cones (Cezik and Iyengar, 2005)

LP: Kl = Rn+

SOCP: Kq = (x0, x) | x0 ≥ ‖x‖SDP: Ks = x = vec(X ) | X = XT ,X p.s.d

Dual Cone: K∗ def= u | uT z ≥ 0 ∀z ∈ K

Extension is clear from the following equivalence:

Az K b ⇔ uTAz ≥ uTb ∀u K∗ 0

Many classes of nonlinear inequalities can be represented as

Ax Kq b or Ax Ks b

94 / 160

Gomory On Cones (Cezik and Iyengar, 2005)

LP: Kl = Rn+

SOCP: Kq = (x0, x) | x0 ≥ ‖x‖SDP: Ks = x = vec(X ) | X = XT ,X p.s.d

Dual Cone: K∗ def= u | uT z ≥ 0 ∀z ∈ K

Extension is clear from the following equivalence:

Az K b ⇔ uTAz ≥ uTb ∀u K∗ 0

Many classes of nonlinear inequalities can be represented as

Ax Kq b or Ax Ks b

95 / 160

Gomory On Cones (Cezik and Iyengar, 2005)

LP: Kl = Rn+

SOCP: Kq = (x0, x) | x0 ≥ ‖x‖SDP: Ks = x = vec(X ) | X = XT ,X p.s.d

Dual Cone: K∗ def= u | uT z ≥ 0 ∀z ∈ K

Extension is clear from the following equivalence:

Az K b ⇔ uTAz ≥ uTb ∀u K∗ 0

Many classes of nonlinear inequalities can be represented as

Ax Kq b or Ax Ks b

96 / 160

Using Gomory Cuts in MINLP (Akrotirianakis et al., 2001)

LP/NLP Based Branch-and-Bound solves MILP instances:minimizezdef= (x ,y),η

η

subject to η ≥ fj +∇f Tj (z − zj) ∀yj ∈ Y k

0 ≥ cj +∇cTj (z − zj) ∀yj ∈ Y k

x ∈ X , y ∈ Y integer

Create Gomory mixed integer cuts from

η ≥ fj +∇f Tj (z − zj)

0 ≥ cj +∇cTj (z − zj)

Akrotirianakis et al. (2001) shows modest improvements

Research Question: Other cut classes?

Research Question: Exploit “outer approximation” property?

97 / 160

Using Gomory Cuts in MINLP (Akrotirianakis et al., 2001)

LP/NLP Based Branch-and-Bound solves MILP instances:minimizezdef= (x ,y),η

η

subject to η ≥ fj +∇f Tj (z − zj) ∀yj ∈ Y k

0 ≥ cj +∇cTj (z − zj) ∀yj ∈ Y k

x ∈ X , y ∈ Y integer

Create Gomory mixed integer cuts from

η ≥ fj +∇f Tj (z − zj)

0 ≥ cj +∇cTj (z − zj)

Akrotirianakis et al. (2001) shows modest improvements

Research Question: Other cut classes?

Research Question: Exploit “outer approximation” property?

98 / 160

Using Gomory Cuts in MINLP (Akrotirianakis et al., 2001)

LP/NLP Based Branch-and-Bound solves MILP instances:minimizezdef= (x ,y),η

η

subject to η ≥ fj +∇f Tj (z − zj) ∀yj ∈ Y k

0 ≥ cj +∇cTj (z − zj) ∀yj ∈ Y k

x ∈ X , y ∈ Y integer

Create Gomory mixed integer cuts from

η ≥ fj +∇f Tj (z − zj)

0 ≥ cj +∇cTj (z − zj)

Akrotirianakis et al. (2001) shows modest improvements

Research Question: Other cut classes?

Research Question: Exploit “outer approximation” property?

99 / 160

Disjunctive Cuts for MINLP (Stubbs and Mehrotra, 1999)

Extension of Disjunctive Cuts for MILP: (Balas, 1979; Balas et al., 1993)

Continuous relaxation (zdef= (x , y))

Cdef= z |c(z) ≤ 0, 0 ≤ y ≤ 1, 0 ≤ x ≤ U

C def= conv(x ∈ C | y ∈ 0, 1p)

C0/1j

def= z ∈ C |yj = 0/1

letMj(C )def=

z = λ0u0 + λ1u1

λ0 + λ1 = 1, λ0, λ1 ≥ 0u0 ∈ C 0

j , u1 ∈ C 1j

⇒ Pj(C ) := projection of Mj(C ) onto z

y

x

continuousrelaxation

⇒ Pj(C ) = conv (C ∩ yj ∈ 0, 1) and P1...p(C ) = C

100 / 160

Disjunctive Cuts for MINLP (Stubbs and Mehrotra, 1999)

Extension of Disjunctive Cuts for MILP: (Balas, 1979; Balas et al., 1993)

Continuous relaxation (zdef= (x , y))

Cdef= z |c(z) ≤ 0, 0 ≤ y ≤ 1, 0 ≤ x ≤ U

C def= conv(x ∈ C | y ∈ 0, 1p)

C0/1j

def= z ∈ C |yj = 0/1

letMj(C )def=

z = λ0u0 + λ1u1

λ0 + λ1 = 1, λ0, λ1 ≥ 0u0 ∈ C 0

j , u1 ∈ C 1j

⇒ Pj(C ) := projection of Mj(C ) onto z

y

x

convexhull

⇒ Pj(C ) = conv (C ∩ yj ∈ 0, 1) and P1...p(C ) = C

101 / 160

Disjunctive Cuts for MINLP (Stubbs and Mehrotra, 1999)

Extension of Disjunctive Cuts for MILP: (Balas, 1979; Balas et al., 1993)

Continuous relaxation (zdef= (x , y))

Cdef= z |c(z) ≤ 0, 0 ≤ y ≤ 1, 0 ≤ x ≤ U

C def= conv(x ∈ C | y ∈ 0, 1p)

C0/1j

def= z ∈ C |yj = 0/1

letMj(C )def=

z = λ0u0 + λ1u1

λ0 + λ1 = 1, λ0, λ1 ≥ 0u0 ∈ C 0

j , u1 ∈ C 1j

⇒ Pj(C ) := projection of Mj(C ) onto z

y

x

integerfeasible

set

⇒ Pj(C ) = conv (C ∩ yj ∈ 0, 1) and P1...p(C ) = C

102 / 160

Disjunctive Cuts: Example

minimizex ,y

x | (x − 1/2)2 + (y − 3/4)2 ≤ 1,−2 ≤ x ≤ 2, y ∈ 0, 1

C 0j C 1

j

z = (x , y)

y

xGiven z with yj 6∈ 0, 1 find separatinghyperplane

minimize

z‖z − z‖

subject to z ∈ Pj(C )

103 / 160

Disjunctive Cuts ExampleC 0j C 1

j

z = (x , y)

z∗

z∗def= arg min ‖z − z‖

s.t. λ0u0 + λ1u1 = zλ0 + λ1 = 1(

−0.160

)≤ u0 ≤

(0.66

1

)(−0.47

0

)≤ u1 ≤

(1.47

1

)λ0, λ1 ≥ 0

NONCONVEX

104 / 160

Disjunctive Cuts ExampleC 0j C 1

j

z = (x , y)

z∗

z∗def= arg min ‖z − z‖2

2

s.t. λ0u0 + λ1u1 = zλ0 + λ1 = 1(

−0.160

)≤ u0 ≤

(0.66

1

)(−0.47

0

)≤ u1 ≤

(1.47

1

)λ0, λ1 ≥ 0

NONCONVEX

105 / 160

Disjunctive Cuts ExampleC 0j C 1

j

z = (x , y)

z∗

z∗def= arg min ‖z − z‖∞

s.t. λ0u0 + λ1u1 = zλ0 + λ1 = 1(

−0.160

)≤ u0 ≤

(0.66

1

)(−0.47

0

)≤ u1 ≤

(1.47

1

)λ0, λ1 ≥ 0

NONCONVEX

106 / 160

Disjunctive Cuts ExampleC 0j C 1

j

z = (x , y)

z∗

z∗def= arg min ‖z − z‖

s.t. λ0u0 + λ1u1 = zλ0 + λ1 = 1(

−0.160

)≤ u0 ≤

(0.66

1

)(−0.47

0

)≤ u1 ≤

(1.47

1

)λ0, λ1 ≥ 0

NONCONVEX

107 / 160

What to do? (Stubbs and Mehrotra, 1999)

Look at the perspective of c(z)

P(c(z), µ) = µc(z/µ)

Think of z = µz

Perspective gives a convex reformulation of Mj(C ): Mj(C ), where

C :=

(z , µ)

∣∣∣∣∣∣µci (z/µ) ≤ 00 ≤ µ ≤ 10 ≤ x ≤ µU, 0 ≤ y ≤ µ

c(0/0) = 0 ⇒ convex representation

108 / 160

What to do? (Stubbs and Mehrotra, 1999)

Look at the perspective of c(z)

P(c(z), µ) = µc(z/µ)

Think of z = µz

Perspective gives a convex reformulation of Mj(C ): Mj(C ), where

C :=

(z , µ)

∣∣∣∣∣∣µci (z/µ) ≤ 00 ≤ µ ≤ 10 ≤ x ≤ µU, 0 ≤ y ≤ µ

c(0/0) = 0 ⇒ convex representation

109 / 160

Disjunctive Cuts Example

C =

x

∣∣∣∣∣∣∣∣µ[(x/µ− 1/2)2 + (y/µ− 3/4)2 − 1

]≤ 0

−2µ ≤ x ≤ 2µ0 ≤ y ≤ µ0 ≤ µ ≤ 1

C 0j C 1

j y

x

µ

C 0j

C 1jy

x

µ

C 0j

C 1j

y

x

µ

110 / 160

Example, cont.

C 0j = (z , µ) | yj = 0 C 1

j = (z , µ) | yj = µ

Take v0 ← µ0u0 v1 ← µ1u1

min ‖z − z‖

s.t. v0 + v1 = zµ0 + µ1 = 1(v0, µ0) ∈ C 0

j

(v1, µ1) ∈ C 1j

µ0, µ1 ≥ 0

Solution to example:(x∗

y∗

)=

(−0.4010.780

)

separating hyperplane: ψT (z − z), where ψ ∈ ∂‖z − z‖

111 / 160

Example, cont.

C 0j = (z , µ) | yj = 0 C 1

j = (z , µ) | yj = µ

Take v0 ← µ0u0 v1 ← µ1u1

min ‖z − z‖

s.t. v0 + v1 = zµ0 + µ1 = 1(v0, µ0) ∈ C 0

j

(v1, µ1) ∈ C 1j

µ0, µ1 ≥ 0

Solution to example:(x∗

y∗

)=

(−0.4010.780

)

separating hyperplane: ψT (z − z), where ψ ∈ ∂‖z − z‖

112 / 160

Example, Cont.C 0j C 1

j

z = (x , y)

z∗

0.198x + 0.061y = −0.032

ψ =

(2x∗ + 0.5

2y∗ − 0.75

)0.198x + 0.061y ≥ −0.032

113 / 160

Nonlinear Branch-and-Cut (Stubbs and Mehrotra, 1999)

Can do this at all nodes of the branch-and-bound tree

Generalize disjunctive approach from MILP

solve one convex NLP per cut

Generalizes Sherali and Adams (1990) and Lovasz and Schrijver(1991)

tighten cuts by adding semi-definite constraint

Stubbs and Mehrohtra (2002) also show how to generate convexquadratic inequalities, but computational results are not thatpromising

114 / 160

Generalized Disjunctive Programming (Raman and Grossmann,

1994; Lee and Grossmann, 2000)

Consider disjunctive NLP

minimizex ,Y

∑fi + f (x)

subject to

Yi

ci (x) ≤ 0fi = γi

∨ ¬Yi

Bix = 0fi = 0

∀i ∈ I

0 ≤ x ≤ U, Ω(Y ) = true, Y ∈ true, falsep

Application: process synthesis• Yi represents presence/absence of units• Bix = 0 eliminates variables if unit absentExploit disjunctive structure• special branching ... OA/GBD algorithms

115 / 160

Generalized Disjunctive Programming (Raman and Grossmann,

1994; Lee and Grossmann, 2000)

Consider disjunctive NLP

minimizex ,Y

∑fi + f (x)

subject to

Yi

ci (x) ≤ 0fi = γi

∨ ¬Yi

Bix = 0fi = 0

∀i ∈ I

0 ≤ x ≤ U, Ω(Y ) = true, Y ∈ true, falsep

Big-M formulation (notoriously bad), M > 0:

ci (x) ≤ M(1− yi )−Myi ≤ Bix ≤ Myifi = yiγi Ω(Y ) converted to linear inequalities

116 / 160

Generalized Disjunctive Programming (Raman and Grossmann,

1994; Lee and Grossmann, 2000)

Consider disjunctive NLP

minimizex ,Y

∑fi + f (x)

subject to

Yi

ci (x) ≤ 0fi = γi

∨ ¬Yi

Bix = 0fi = 0

∀i ∈ I

0 ≤ x ≤ U, Ω(Y ) = true, Y ∈ true, falsep

convex hull representation ...

x = vi1 + vi0, λi1 + λi0 = 1λi1ci (vi1/λi1) ≤ 0, Bivi0 = 00 ≤ vij ≤ λijU, 0 ≤ λij ≤ 1, fi = λi1γi

117 / 160

Dealing with Nonconvexities

Functional nonconvexity causesserious problems.

Branch and bound must havetrue lower bound (globalsolution)

Underestimate nonconvexfunctions. Solve relaxation.Provides lower bound.

If relaxation is not exact, thenbranch

118 / 160

Dealing with Nonconvexities

Functional nonconvexity causesserious problems.

Branch and bound must havetrue lower bound (globalsolution)

Underestimate nonconvexfunctions. Solve relaxation.Provides lower bound.

If relaxation is not exact, thenbranch

119 / 160

Dealing with Nonconvexities

Functional nonconvexity causesserious problems.

Branch and bound must havetrue lower bound (globalsolution)

Underestimate nonconvexfunctions. Solve relaxation.Provides lower bound.

If relaxation is not exact, thenbranch

120 / 160

Dealing with Nonconvex Constraints

If nonconvexity in constraints,may need to overestimate andunderestimate the function toget a convex region

121 / 160

Envelopes

f : Ω→ R

Convex Envelope (vexΩ(f )):Pointwise supremum of convexunderestimators of f over Ω.

Concave Envelope (cavΩ(f )):Pointwise infimum of concaveoverestimators of f over Ω.

122 / 160

Envelopes

f : Ω→ R

Convex Envelope (vexΩ(f )):Pointwise supremum of convexunderestimators of f over Ω.

Concave Envelope (cavΩ(f )):Pointwise infimum of concaveoverestimators of f over Ω.

123 / 160

Branch-and-Bound Global Optimization Methods

Under/Overestimate “simple” parts of (Factorable) Functionsindividually

Bilinear TermsTrilinear TermsFractional TermsUnivariate convex/concave terms

General nonconvex functions f (x) can be underestimated over aregion [l , u] “overpowering” the function with a quadratic functionthat is ≤ 0 on the region of interest

L(x) = f (x) +n∑

i=1

αi (li − xi )(ui − xi )

Refs: (McCormick, 1976; Adjiman et al., 1998; Tawarmalani andSahinidis, 2002)

124 / 160

Branch-and-Bound Global Optimization Methods

Under/Overestimate “simple” parts of (Factorable) Functionsindividually

Bilinear TermsTrilinear TermsFractional TermsUnivariate convex/concave terms

General nonconvex functions f (x) can be underestimated over aregion [l , u] “overpowering” the function with a quadratic functionthat is ≤ 0 on the region of interest

L(x) = f (x) +n∑

i=1

αi (li − xi )(ui − xi )

Refs: (McCormick, 1976; Adjiman et al., 1998; Tawarmalani andSahinidis, 2002)

125 / 160

Bilinear Terms

The convex and concave envelopes of the bilinear function xy over arectangular region

Rdef= (x , y) ∈ R2 | lx ≤ x ≤ ux , ly ≤ y ≤ uy

are given by the expressions

vexxyR(x , y) = maxlyx + lxy − lx ly , uyx + uxy − uxuycavxyR(x , y) = minuyx + lxy − lxuy , lyx + uxy − ux ly

126 / 160

Worth 1000 Words?xy

127 / 160

Worth 1000 Words?vexR(xy)

128 / 160

Worth 1000 Words?cavR(xy)

129 / 160

Summary

MINLP: Good relaxations are important

Relaxations can be improved

Statically: Better formulation/preprocessingDynamically: Cutting planes

Nonconvex MINLP:

Methods exist, again based on relaxations

Tight relaxations is an active area of research

Lots of empirical questions remain

130 / 160

Part IV

Implementation and Software

131 / 160

Implementation and Software for MINLP

1 Special Ordered Sets

2 Implementation & Software Issues

132 / 160

Special Ordered Sets of Type 1

SOS1:∑λi = 1 & at most one λi is nonzero

Example 1: d ∈ d1, . . . , dp discrete diameters

⇔ d =∑λidi and λ1, . . . , λp is SOS1

⇔ d =∑λidi and

∑λi = 1 and λi ∈ 0, 1

. . . d is convex combination with coefficients λi

Example 2: nonlinear function c(y) of single integer

⇔ y =∑

iλi and c =∑

c(i)λi and λ1, . . . , λp is SOS1

References: (Beale, 1979; Nemhauser, G.L. and Wolsey, L.A., 1988;Williams, 1993) . . .

133 / 160

Special Ordered Sets of Type 1

SOS1:∑λi = 1 & at most one λi is nonzero

Branching on SOS1

1 reference row a1 < . . . < ape.g. diameters

2 fractionality: a :=∑

aiλi

3 find t : at < a ≤ at+1

4 branch: λt+1, . . . , λp = 0or λ1, . . . , λt = 0

a < at

a > at+1

134 / 160

Special Ordered Sets of Type 1

SOS1:∑λi = 1 & at most one λi is nonzero

Branching on SOS1

1 reference row a1 < . . . < ape.g. diameters

2 fractionality: a :=∑

aiλi3 find t : at < a ≤ at+1

4 branch: λt+1, . . . , λp = 0or λ1, . . . , λt = 0 a < a

ta > a

t+1

135 / 160

Special Ordered Sets of Type 2

SOS2:∑λi = 1 & at most two adjacent λi nonzero

Example: Approximation of nonlinear function z = z(x)

z(x)

x

breakpoints x1 < . . . < xp

function values zi = z(xi )

piece-wise linear

x =∑λixi

z =∑λizi

λ1, . . . , λp is SOS2

. . . convex combination of two breakpoints . . .

136 / 160

Special Ordered Sets of Type 2

SOS2:∑λi = 1 & at most two adjacent λi nonzero

Example: Approximation of nonlinear function z = z(x)

z(x)

x

breakpoints x1 < . . . < xp

function values zi = z(xi )

piece-wise linear

x =∑λixi

z =∑λizi

λ1, . . . , λp is SOS2

. . . convex combination of two breakpoints . . .

137 / 160

Special Ordered Sets of Type 2

SOS2:∑λi = 1 & at most two adjacent λi nonzero

Branching on SOS2

1 reference row a1 < . . . < ape.g. ai = xi

2 fractionality: a :=∑

aiλi

3 find t : at < a ≤ at+1

4 branch: λt+1, . . . , λp = 0or λ1, . . . , λt−1

tx > ax < a

t

138 / 160

Special Ordered Sets of Type 2

SOS2:∑λi = 1 & at most two adjacent λi nonzero

Branching on SOS2

1 reference row a1 < . . . < ape.g. ai = xi

2 fractionality: a :=∑

aiλi3 find t : at < a ≤ at+1

4 branch: λt+1, . . . , λp = 0or λ1, . . . , λt−1 t

x > ax < at

139 / 160

Special Ordered Sets of Type 3

Example: Approximation of 2D function u = g(v ,w)

Triangularization of [vL, vU ]× [wL,wU ] domain

1 vL = v1 < . . . < vk = vU2 wL = w1 < . . . < wl = wU

3 function uij := g(vi ,wj)

4 λij weight of vertex (i , j)

v =∑λijvi

w =∑λijwj

u =∑λijuij

1 =∑λij is SOS3 . . .

140 / 160

Special Ordered Sets of Type 3

Example: Approximation of 2D function u = g(v ,w)

Triangularization of [vL, vU ]× [wL,wU ] domain

1 vL = v1 < . . . < vk = vU2 wL = w1 < . . . < wl = wU

3 function uij := g(vi ,wj)

4 λij weight of vertex (i , j)

v =∑λijvi

w =∑λijwj

u =∑λijuij

v

w

1 =∑λij is SOS3 . . .

141 / 160

Special Ordered Sets of Type 3

Example: Approximation of 2D function u = g(v ,w)

Triangularization of [vL, vU ]× [wL,wU ] domain

1 vL = v1 < . . . < vk = vU2 wL = w1 < . . . < wl = wU

3 function uij := g(vi ,wj)

4 λij weight of vertex (i , j)

v =∑λijvi

w =∑λijwj

u =∑λijuij

v

w

1 =∑λij is SOS3 . . .

142 / 160

Special Ordered Sets of Type 3

SOS3:∑λij = 1 & set condition holds

1 v =∑λijvi ... convex combinations

2 w =∑λijwj

3 u =∑λijuij

λ11, . . . , λkl satisfies set condition

⇔ ∃ trangle ∆ : (i , j) : λij > 0 ⊂ ∆ v

w

violates set condn

i.e. nonzeros in single triangle ∆

143 / 160

Branching on SOS3

λ violates set condition

compute centers:v =

∑λijvi &

w =∑λijwi

find s, t such thatvs ≤ v < vs+1 &ws ≤ w < ws+1

branch on v or w

v

w

violates set condn

144 / 160

Branching on SOS3

λ violates set condition

compute centers:v =

∑λijvi &

w =∑λijwi

find s, t such thatvs ≤ v < vs+1 &ws ≤ w < ws+1

branch on v or w

vertical branching:∑L

λij = 1∑R

λij = 1

145 / 160

Branching on SOS3

λ violates set condition

compute centers:v =

∑λijvi &

w =∑λijwi

find s, t such thatvs ≤ v < vs+1 &ws ≤ w < ws+1

branch on v or w

horizontal branching:∑T

λij = 1∑B

λij = 1

146 / 160

Extension to SOS-k

Example: electricity transmission network:

c(x) = 4x1 − x22 − 0.2 · x2x4 sin(x3)

(Martin et al., 2005) extend SOS3 to SOSk models for any k⇒ function with p variables on N grid needs Np λ’s

Alternative (Gatzke, 2005):

exploit computational graph' automatic differentiation

only need SOS2 & SOS3 ...replace nonconvex parts

piece-wise polyhedral approx.

147 / 160

Extension to SOS-k

Example: electricity transmission network:

c(x) = 4x1 − x22 − 0.2 · x2x4 sin(x3)

(Martin et al., 2005) extend SOS3 to SOSk models for any k⇒ function with p variables on N grid needs Np λ’s

Alternative (Gatzke, 2005):

exploit computational graph' automatic differentiation

only need SOS2 & SOS3 ...replace nonconvex parts

piece-wise polyhedral approx.

148 / 160

Extension to SOS-k

Example: electricity transmission network:

c(x) = 4x1 − x22 − 0.2 · x2x4 sin(x3)

(Martin et al., 2005) extend SOS3 to SOSk models for any k⇒ function with p variables on N grid needs Np λ’s

Alternative (Gatzke, 2005):

exploit computational graph' automatic differentiation

only need SOS2 & SOS3 ...replace nonconvex parts

piece-wise polyhedral approx.

149 / 160

Extension to SOS-k

Example: electricity transmission network:

c(x) = 4x1 − x22 − 0.2 · x2x4 sin(x3)

(Martin et al., 2005) extend SOS3 to SOSk models for any k⇒ function with p variables on N grid needs Np λ’s

Alternative (Gatzke, 2005):

exploit computational graph' automatic differentiation

only need SOS2 & SOS3 ...replace nonconvex parts

piece-wise polyhedral approx.

150 / 160

Software for MINLP

Outer Approximation: DICOPT++ (& AIMMS)NLP solvers: CONOPT, MINOS, SNOPTMILP solvers: CPLEX, OSL2

Branch-and-Bound Solvers: SBB & MINLPNLP solvers: CONOPT, MINOS, SNOPT & FilterSQPvariable & node selection; SOS1 & SOS2 support

Global MINLP: BARON & MINOPT underestimators & branchingCPLEX, MINOS, SNOPT, OSL

Online Tools: MINLP World, MacMINLP & NEOS MINLP Worldwww.gamsworld.org/minlp/

NEOS server www-neos.mcs.anl.gov/

151 / 160

COIN-OR

http://www.coin-or.org

COmputational INfrastructure for Operations Research

A library of (interoperable) software tools for optimization

A development platform for open source projects in the ORcommunity

Possibly Relevant Modules:

OSI: Open Solver InterfaceCGL: Cut Generation LibraryCLP: Coin Linear Programming ToolkitCBC: Coin Branch and CutIPOPT: Interior Point OPTimizer for NLPNLPAPI: NonLinear Programming API

152 / 160

MINLP with COIN-OR

New implementation of LP/NLP based BB

MIP branch-and-cut: CBC & CGL

NLPs: IPOPT interior point ... OK for NLP(yi )

New hybrid method:

solve more NLPs at non-integer yi⇒ better outer approximationallow complete MIP at some nodes⇒ generate new integer assignment

... faster than DICOPT++, SBB

simplifies to OA and BB at extremes ... less efficient

... see Bonami et al. (2005) ... coming in 2006.

153 / 160

Conclusions

MINLP rich modeling paradigm most popular solver on NEOS

Algorithms for MINLP: Branch-and-bound (branch-and-cut) Outer approximation et al.

“MINLP solvers lag 15 years behind MIP solvers”

⇒ many research opportunities!!!

154 / 160

Conclusions

MINLP rich modeling paradigm most popular solver on NEOS

Algorithms for MINLP: Branch-and-bound (branch-and-cut) Outer approximation et al.

“MINLP solvers lag 15 years behind MIP solvers”

⇒ many research opportunities!!!

155 / 160

Part V

References

156 / 160

C. Adjiman, S. Dallwig, C. A. Floudas, and A. Neumaier. A global optimization method, aBB,for general twice-differentiable constrained NLPs - I. Theoretical advances. Computers andChemical Engineering, 22:1137–1158, 1998.

I. Akrotirianakis, I. Maros, and B. Rustem. An outer approximation based branch-and-cutalgorithm for convex 0-1 MINLP problems. Optimization Methods and Software, 16:21–47, 2001.

E. Balas. Disjunctive programming. In Annals of Discrete Mathematics 5: DiscreteOptimization, pages 3–51. North Holland, 1979.

E. Balas, S. Ceria, and G. Corneujols. A lift-and-project cutting plane algorithm for mixed 0-1programs. Mathematical Programming, 58:295–324, 1993.

E. M. L. Beale. Branch-and-bound methods for mathematical programming systems. Annalsof Discrete Mathematics, 5:201–219, 1979.

P. Bonami, L. Biegler, A. Conn, G. Cornuejols, I. Grossmann, C. Laird, J. Lee, A. Lodi,F. Margot, N. Saaya, and A. Wachter. An algorithmic framework for convex mixed integernonlinear programs. Technical report, IBM Research Division, Thomas J. Watson ResearchCenter, 2005.

B. Borchers and J. E. Mitchell. An improved branch and bound algorithm for Mixed IntegerNonlinear Programming. Computers and Operations Research, 21(4):359–367, 1994.

R. Borndorfer and R. Weismantel. Set packing relaxations of some integer programs.Mathematical Programming, 88:425 – 450, 2000.

M. T. Cezik and G. Iyengar. Cuts for mixed 0-1 conic programming. MathematicalProgramming, 2005. to appear.

H. Crowder, E. L. Johnson, and M. W. Padberg. Solving large scale zero-one linearprogramming problems. Operations Research, 31:803–834, 1983.

157 / 160

D. De Wolf and Y. Smeers. The gas transmission problem solved by an extension of thesimplex algorithm. Management Science, 46:1454–1465, 2000.

M. Duran and I. E. Grossmann. An outer-approximation algorithm for a class of mixed–integernonlinear programs. Mathematical Programming, 36:307–339, 1986.

A. M. Geoffrion. Generalized Benders decomposition. Journal of Optimization Theory andApplications, 10:237–260, 1972.

I. E. Grossmann and R. W. H. Sargent. Optimal design of multipurpose batch plants.Ind. Engng. Chem. Process Des. Dev., 18:343–348, 1979.

Harjunkoski, I., Westerlund, T., Porn, R. and Skrifvars, H. Different transformations forsolving non-convex trim-loss problems by MINLP. European Journal of OpertationalResearch, 105:594–603, 1998.

Jain, V. and Grossmann, I.E. Cyclic scheduling of continuous parallel-process units withdecaying performance. AIChE Journal, 44:1623–1636, 1998.

G. R. Kocis and I. E. Grossmann. Global optimization of nonconvex mixed–integer nonlinearprogramming (MINLP) problems in process synthesis. Industrial Engineering ChemistryResearch, 27:1407–1421, 1988.

S. Lee and I. Grossmann. New algorithms for nonlinear disjunctive programming. Computersand Chemical Engineering, 24:2125–2141, 2000.

S. Leyffer. Integrating SQP and branch-and-bound for mixed integer nonlinear programming.Computational Optimization & Applications, 18:295–309, 2001.

L. Lovasz and A. Schrijver. Cones of matrices and setfunctions, and 0-1 optimization. SIAMJournal on Optimization, 1, 1991.

H. Marchand and L. Wolsey. The 0-1 knapsack problem with a single continuous variable.Mathematical Programming, 85:15–33, 1999.

158 / 160

A. Martin, M. Moller, and S. Moritz. Mixed integer models for the stationary case of gasnetwork optimization. Technical report, Darmstadt University of Technology, 2005.

G. P. McCormick. Computability of global solutions to factorable nonconvex programs: PartI—Convex underestimating problems. Mathematical Programming, 10:147–175, 1976.

Nemhauser, G.L. and Wolsey, L.A. Integer and Combinatorial Optimization. John Wiley,New York, 1988.

M. Padberg, T. J. Van Roy, and L. Wolsey. Valid linear inequalities for fixed charge problems.Operations Research, 33:842–861, 1985.

I. Quesada and I. E. Grossmann. An LP/NLP based branch–and–bound algorithm for convexMINLP optimization problems. Computers and Chemical Engineering, 16:937–947, 1992.

Quist, A.J. Application of Mathematical Optimization Techniques to Nuclear ReactorReload Pattern Design. PhD thesis, Technische Universiteit Delft, Thomas StieltjesInstitute for Mathematics, The Netherlands, 2000.

R. Raman and I. E. Grossmann. Modeling and computational techniques for logic basedinteger programming. Computers and Chemical Engineering, 18:563–578, 1994.

H. D. Sherali and W. P. Adams. A hierarchy of relaxations between the continuous andconvex hull representations for zero-one programming problems. SIAM Journal on DiscreteMathematics, 3:411–430, 1990.

O. Sigmund. A 99 line topology optimization code written in matlab. StructuralMultidisciplinary Optimization, 21:120–127, 2001.

R. Stubbs and S. Mehrohtra. Generating convex polynomial inequalities for mixed 0-1programs. Journal of Global Optimization, 24:311–332, 2002.

R. A. Stubbs and S. Mehrotra. A branch–and–cut method for 0–1 mixed convexprogramming. Mathematical Programming, 86:515–532, 1999.

159 / 160

M. Tawarmalani and N. V. Sahinidis. Convexification and Global Optimization in Continuousand Mixed-Integer Nonlinear Programming: Theory, Algorithms, Software, andApplications. Kluwer Academic Publishers, Boston MA, 2002.

J. Viswanathan and I. E. Grossmann. Optimal feed location and number of trays fordistillation columns with multiple feeds. I&EC Research, 32:2942–2949, 1993.

Westerlund, T., Isaksson, J. and Harjunkoski, I. Solving a production optimization problem inthe paper industry. Report 95–146–A, Department of Chemical Engineering, Abo Akademi,Abo, Finland, 1995.

Westerlund, T., Pettersson, F. and Grossmann, I.E. Optimization of pump configurations asMINLP problem. Computers & Chemical Engineering, 18(9):845–858, 1994.

H. P. Williams. Model Solving in Mathematical Programming. John Wiley & Sons Ltd.,Chichester, 1993.

160 / 160