Directions in Optimization/ Nonlinear Programming › ... › retreat › slides ›...

Directions in Optimization/Nonlinear Programming Stats Dept Retreat, Oct 27, 2012 Mihai Anitescu

Optimization interests

•  Linear, Quadratic, Conic, Semiinfinite, Nonconvex – but primarily nonlinear programming,

•  Variational inequalities -- Nonlinear Complementarity—differential variational inequalities

•  Stochastic programming:

minx f (x)

s.t. g(x) ≤ 0 ∈K( )h(x) = 0

x − x( )F(x) ≥ 0,∀x ∈K x ∈K ⊥ F(x)∈ K *

f x( ) = f xd , x u( ){ }u( ) = φ xd , x u( )( )du∫

1. DIFFERENTIAL VARIATIONAL INEQUALITIES

Mihai Anitescu, STAT 343, Autumn 12. Not for use outside UC

Differential variational inequalities -- DVI

•  DVIs appear whenever both dynamics and inequalities/ switching appear in model description.

•  Dense granular flow. The second most-manipulated industrial material after water!

•  Microstructure evolution (how does fatigue appear in steel)?

•  Model predictive control: What is the optimal way to control a complex dynamical system such as power grid?

DVI formulation

•  Differential variational inequalities: Mixture of differential equations and variational inequalities.

•  In the case of complementarity,

Dvi Approaches •  Recall, DVI (for C=R+)

•  Smoothing

•  Followed by forward Euler.

Easy to implement!! But Stiff!

•  I specialize in time-stepping. Can be much faster IF you can solve the subproblem

x = f t,x t( ),u t( )( );u ≥ 0 ⊥ F t,x t( ),u t( )( ) ≥ 0

x = f t,x t( ),u t( )( );ui Fi t,x t( ),u t( )( ) = ε , i = 1,2,…nu

uin Fi t

n−1,xn−1,un−1( ) = ε , i = 1,2,…nuxn+1 = xn + hf tn ,xn ,un( );

( )( )

1 1 1 1

1 1 1 1

, , ;

0 , , 0

n n n n n

n n n n

x x hf t x u

u F t x u

+ + + +

+ + + +

= +

≥ ⊥ ≥

Granular flow: PBNR •  160’000 Uranium-Graphite

spheres, 600’000 contacts on average

•  Two millions of primal variables, six millions of dual variables.

•  A new convex subproblem approach which is much faster.

2( ) ( )( ) ( )1 2

( ) ( ) ( ) ( ) ( ) ( )1 1 2 2

1 2

( ) ( )

( ) ( ) ( ) ( )1 2 1 1 2 2

( ) ( )

0 ( ) 0 1 2

argminj jj j

n

j j j j j jn c

j … p

j jn

j j T j T j

c

dvM c n t t f q v k t q vdt

dq vdtc q j … p

v t v tµ β β

β β

β β β β⎛ ⎞⎜ ⎟⎝ ⎠

⎛ ⎞⎜ ⎟⎝ ⎠

= , , ,

⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠ ⎝ ⎠≥ +

= + + + , + , ,

=

≥ ⊥ Φ ≥ , = , , ,

⎡ ⎤, = +⎣ ⎦

∑

Microstructure Evolution

8

Evolution of phases: e.g defects in solids

Applications: •  Data Assimilation (Estimation), Transmission Planning, Power State Forecasting,

Generation Control, Buildings Optimization, Markets

DO Large-Scale NLP

Discretization Data

Dynamic Optimization (DO)– Model Predictive Control

•  It is very expensive … but if I write its optimality conditions and look at it as a differential variational inequality, I can approximate it with limited amount of work per step !

Outstanding Research Issues

•  Many applications are “undermodeled” due to “fear of switches” in dynamics, lots to do in modeling. Ice Sheet Modeling, Microstructure Evolution, Hybrid Systems.

•  What are proper splittings in time that have best stability/accuracy properties?

•  Convergence theory for PDE-based DVI. •  Efficient solvers for problems whose active set changes often?

E.g how do we adapt multigrid? •  How do we reuse information optimally from step to step (hot-

starting). For example, interior-point is notorious for not being able to do that?

•  Etc.


2. DECISION UNDER UNCERTAINTY : STOCHASTIC OPTIMIZATION


A leading paradigm for optimization under uncertainty paradigm: stochastic programming.

•  Two-stage stochastic programming with recourse (“here-and-now”)

• 

Mihai Anitescu - Optimization under uncertainty

12

{ }0

0 0 ( ) ) ( ,

x xMin f x M xin f ω⎡ ⎤+ ⎣ ⎦Es.t. g0 (x0 ) = b0 s.t. h (x ,ω ) = b(ω ) − g (x0 ,ω )

x0 ≥ 0 x (ω ) ≥ 0

Minx0 ,x1,x2 ,…,xS

f 0 (x0 )+1S

f k (xk )k =1

S

∑g0 (x0 ) = b0gk (x0 ) + hk (xk ) = bk ,

x0 ≥ 0, xk ≥ 0, k = 1,...,S .

subj. to. 1 2, , , Sξ ξ ξ…

Second-stage random data ( )ξ ω

continuous discrete

Sampling

Inference Analysis

M samples

Sample average approximation (SAA)

or bootstrapping

x0*

x0N

x0N − x0

* ~ ?

2.1 A PERHAPS UNEXPECTED APPLICATION. SCALABLE MAX LIKELIHOOD FOR GAUSSIAN PROCESSES


Maximum Likelihood Estimation (MLE)

•  A family of covariance functions parameterized by θ: φ(x; θ)

•  Maximize the log-likelihood to estimate θ:

•  First order optimality: (also known as score equations)

14

maxθL(θ ) = log (2π )−n/2 (detK )−1/2 exp(−yTK −1y / 2){ }

= −12yTK −1y− 1

2log(detK )− n

2log2π

12yTK −1(∂ jK )K

−1y− 12tr K −1(∂ jK )#$ %&= 0

Maximum Likelihood Estimation (MLE)

The log-det term poses a significant challenge for large-scale computations •  Cholesky of K: Prohibitively expensive! •  log(det K) = tr(log K): Need some matrix function methods to

handle the log •  No existing method to evaluate the log-det term in sufficient

accuracy

15

maxθ

− 12yTK −1y− 1

2log(detK )− n

2log2π

Sample Average Approximation of Maximum Likelihood Estimation (MLE)

We (Anitescu et al. 2012) consider approximately solving the first order optimality instead: •  A randomized trace estimator tr(A) = E[uTAu]

–  u has i.i.d. entries taking ±1 with equal probability •  It becomes a stochastic nonlinear equation •  As N tends to infinity, the solution approaches the true estimate •  Numerically, one must solve linear systems with O(N) right-hand

sides. 16

12yTK −1(∂ jK )K

−1y− 12

tr K −1(∂ jK )#$ %&

≈ 12yTK −1(∂ jK )K

−1y− 12N

uiT

i=1

N

∑ K −1(∂ jK )#$ %&ui = 0

Convergence of Stochastic Programming - SAA

•  Let

•  First result: where

17

θ : truth

θ̂ : sol of 12yTK −1(∂ jK )K

−1y− 12

tr K −1(∂ jK )#$ %&= 0

θ̂ N : sol of F = 12yTK −1(∂ jK )K

−1y− 12N

uiT

i=1

N

∑ K −1(∂ jK )#$ %&ui = 0

[V N ]−1/2 (θ̂ N −θ̂ ) D" →" standard normal, V N = [J N ]−T ΣN [J N ]−1

J N =∇F(θ̂ N ) and ΣN = cov{F(θ̂ N )}

Simulation: We scale

•  Truth θ = [7, 10], Matern ν = 1.5

18

104 1066

7

8

9

10

11

matrix dimension n

1

2

104 106101

102

103

104

105

matrix dimension n

time

(sec

onds

)

64x64 grid 2.56 mins func eval: 7

128x128 grid6.62 mins func eval: 7

256x256 grid1.1 hours func eval: 8



2.1 Stochastic Programming for GP

•  Prof. Stein will have some as well •  How do we precondition? •  Are there classes of processes for which we can prove global

convergence? •  Can we prove global convergence without function values? •  Can we exploit hierarchical structure? •  Can we approximate the matrix-vector multiplication efficiently

AND maintain positive definiteness? •  (The ScalaGAUSS project).


2.2: DECISION (CONTROL, DESIGN, PLANNING) OF ENERGY SYSTEMS UNDER UNCERTAINTY


Ambient Condition Effects in Energy Systems Operation of Energy Systems is Strongly Affected by Ambient Conditions - Power Grid Management: Predict Spatio-Temporal Demands (Douglas, et.al. 1999)

- Power Plants: Generation levels affected by air humidity and temperature (General Electric)

- Petrochemical: Heating and Cooling Utilities (ExxonMobil)

- Buildings: Heating and Cooling Needs (Braun, et.al. 2004)

- (Focus) Next Generation Energy Systems assume a major renewable energy penetration: Wind + Solar + Fossil (Beyer, et.al. 1999)

-  Increased reliance on renewables must account for variability of ambient conditions, which cannot be done deterministically …

-  We must optimize operational and planning decisions accounting for the uncertainty in ambient conditions (and others, e.g. demand)

-  Optimization Under Uncertainty.

Wind Power Profiles Mihai Anitescu - Optimization under uncertainty

21

Multifaceted Mathematics for Complex Energy Systems (M2ACS) Project Director: Mihai Anitescu, Argonne National Lab

22

Goals: •  By taking a holistic view, develop deep mathematical

understanding and effective algorithms to remove current bottlenecks in analysis, simulation, and optimization of complex energy systems.

•  Address the mathematical and computational complexities of analyzing, designing, planning, maintaining, and operating the nation's electrical energy systems and related infrastructure.

Integrated Novel Mathematics Research: •  Predictive modeling that accounts for uncertainty and

errors •  Mathematics of decisions that allow hierarchical, data-

driven and real-time decision making •  Scalable algorithms for optimization and dynamic

simulation •  Integrative frameworks leveraging model reduction

and multiscale analysis

Long-Term DOE Impact: •  Development of new mathematics at the

intersection of multiple mathematical sub-domains

•  Addresses a broad class of applications for complex energy systems, such as :

•  Planning for power grid and related infrastructure

•  Analysis and design for renewable energy integration

Team: Argonne National Lab (Lead), Pacific Northwest National Lab, Sandia National Lab, University of Wisconsin, University of Chicago

Representative decision-making activities and their time scales in electric power systems. Image courtesy of Chris de Marco (U-Wisconsin).

Multifaceted Mathematics for Complex Energy Systems

Stochastic Predictive Control

Dynamic System Model

Stochastic

Optimization

Weather Model

Low-Level

Control

Forecast

Energy System

Set-Points

Forecast & Uncertainty

Measurements

Stochastic NLMPC

Min

x0f0 (x0 ) + E Minx f (x,ω )

⎡⎣

⎤⎦{ }

subj. to. g0 x0( ) = b0gi x0 ,xi( ) = bi i =1,2…Sx0 ≥ 0, xi ≥ 0

Two-stage Stoch Prog


23

Stochastic Unit Commitment with Wind Power

•  Wind Forecast – WRF(Weather Research and Forecasting) Model –  Real-time grid-nested 24h simulation –  30 samples require 1h on 500 CPUs (Jazz@Argonne)


24

1min COST

s.t. , ,

, ,

ramping constr., min. up/down constr.

wind

wind

p u dsjk jk jk

s j ks

sjk kj

windsjk

j

wik ksj

ndsk

jjk

j

c c cN

p D s k

p D R s k

p

p

∈ ∈ ∈

∈

∈

∈

∈

⎛ ⎞= + +⎜ ⎟

⎝ ⎠+ = ∈ ∈

+ ≥ + ∈ ∈

∑ ∑∑

∑ ∑

∑ ∑S N T

N

N

N

N

S T

S T

Zavala & al 2010.

Thermal Units Schedule? Minimize Cost Satisfy Demand Have a Reserve Dispatch through network

0 24 48 720

200

400

600

800

1000

1200

Tota

l P

ow

er

[MW

]

Time [hr]

  Unit commitment & energy dispatch with uncertain wind power generation for the State of Illinois, assuming 20% wind power penetration, using the same windfarm sites as the one existing today.

 Full integration with 10 thermal units to meet demands. Consider dynamics of start-up, shutdown, set-point changes

  The solution is only 1% more expensive then the one with exact information. Solution on average infeasible at 10%.

wind power


25

Demand Samples Wind

Thermal

Wind power forecast and stochastic programming

2.2 Stochastic Programming Challenges

•  How do we produce efficient confidence intervals for SAA (as the number of samples may be limited)?

•  How do I solve the sharply increased problem efficiently (we solve problems with 3B variables, but about 10 times slower than we would like)?

•  How do we deal with integer variables (millions of them ) •  How do we insert economic actors (economic equilibria) when

we have integer variables? •  Can we use machine learning concepts to reduce decision space? •  How do we achieve stability, low memory and fast convergence? •  ….


Directions in Optimization/ Nonlinear Programming › ... › retreat › slides ›...

Documents

Transcript of Directions in Optimization/ Nonlinear Programming › ... › retreat › slides ›...