Directions in Optimization/ Nonlinear Programming › ... › retreat › slides ›...

26
Directions in Optimization/ Nonlinear Programming Stats Dept Retreat, Oct 27, 2012 Mihai Anitescu

Transcript of Directions in Optimization/ Nonlinear Programming › ... › retreat › slides ›...

  • Directions in Optimization/Nonlinear Programming Stats Dept Retreat, Oct 27, 2012 Mihai Anitescu

  • Optimization interests

    •  Linear, Quadratic, Conic, Semiinfinite, Nonconvex – but primarily nonlinear programming,

    •  Variational inequalities -- Nonlinear Complementarity—differential variational inequalities

    •  Stochastic programming:

    minx f (x)

    s.t. g(x) ≤ 0 ∈K( )h(x) = 0

    x − x( )F(x) ≥ 0,∀x ∈K x ∈K ⊥ F(x)∈ K *

    f x( ) = f xd , x u( ){ }u( ) = φ xd , x u( )( )du∫

  • 1. DIFFERENTIAL VARIATIONAL INEQUALITIES

    Mihai Anitescu, STAT 343, Autumn 12. Not for use outside UC

  • Differential variational inequalities -- DVI

    •  DVIs appear whenever both dynamics and inequalities/ switching appear in model description.

    •  Dense granular flow. The second most-manipulated industrial material after water!

    •  Microstructure evolution (how does fatigue appear in steel)?

    •  Model predictive control: What is the optimal way to control a complex dynamical system such as power grid?

  • DVI formulation

    •  Differential variational inequalities: Mixture of differential equations and variational inequalities.

    •  In the case of complementarity,

  • Dvi Approaches •  Recall, DVI (for C=R+)

    •  Smoothing

    •  Followed by forward Euler.

    Easy to implement!! But Stiff!

    •  I specialize in time-stepping. Can be much faster IF you can solve the subproblem

    x = f t,x t( ),u t( )( );u ≥ 0 ⊥ F t,x t( ),u t( )( ) ≥ 0

    x = f t,x t( ),u t( )( );ui Fi t,x t( ),u t( )( ) = ε , i = 1,2,…nu

    uin Fi t

    n−1,xn−1,un−1( ) = ε , i = 1,2,…nuxn+1 = xn + hf tn ,xn ,un( );

    ( )( )

    1 1 1 1

    1 1 1 1

    , , ;

    0 , , 0

    n n n n n

    n n n n

    x x hf t x u

    u F t x u

    + + + +

    + + + +

    = +

    ≥ ⊥ ≥

  • Granular flow: PBNR •  160’000 Uranium-Graphite

    spheres, 600’000 contacts on average

    •  Two millions of primal variables, six millions of dual variables.

    •  A new convex subproblem approach which is much faster.

    2( ) ( )( ) ( )1 2

    ( ) ( ) ( ) ( ) ( ) ( )1 1 2 2

    1 2

    ( ) ( )

    ( ) ( ) ( ) ( )1 2 1 1 2 2

    ( ) ( )

    0 ( ) 0 1 2

    argminj jj j

    n

    j j j j j jn c

    j … p

    j jn

    j j T j T j

    c

    dvM c n t t f q v k t q vdt

    dq vdtc q j … p

    v t v tµ β β

    β β

    β β β β⎛ ⎞⎜ ⎟⎝ ⎠

    ⎛ ⎞⎜ ⎟⎝ ⎠

    = , , ,

    ⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠ ⎝ ⎠≥ +

    = + + + , + , ,

    =

    ≥ ⊥ Φ ≥ , = , , ,

    ⎡ ⎤, = +⎣ ⎦

  • Microstructure Evolution

    8  

    Evolution of phases: e.g defects in solids

  • Applications: •  Data Assimilation (Estimation), Transmission Planning, Power State Forecasting,

    Generation Control, Buildings Optimization, Markets

    DO Large-Scale NLP

    Discretization Data

    Dynamic Optimization (DO)– Model Predictive Control

    •  It is very expensive … but if I write its optimality conditions and look at it as a differential variational inequality, I can approximate it with limited amount of work per step !

  • Outstanding Research Issues

    •  Many applications are “undermodeled” due to “fear of switches” in dynamics, lots to do in modeling. Ice Sheet Modeling, Microstructure Evolution, Hybrid Systems.

    •  What are proper splittings in time that have best stability/accuracy properties?

    •  Convergence theory for PDE-based DVI. •  Efficient solvers for problems whose active set changes often?

    E.g how do we adapt multigrid? •  How do we reuse information optimally from step to step (hot-

    starting). For example, interior-point is notorious for not being able to do that?

    •  Etc.

    Mihai Anitescu, STAT 343, Autumn 12. Not for use outside UC

  • 2. DECISION UNDER UNCERTAINTY : STOCHASTIC OPTIMIZATION

    Mihai Anitescu, STAT 343, Autumn 12. Not for use outside UC

  • A leading paradigm for optimization under uncertainty paradigm: stochastic programming.

    •  Two-stage stochastic programming with recourse (“here-and-now”)

    • 

    Mihai Anitescu - Optimization under uncertainty

    12

    { }0

    0 0 ( ) ) ( ,

    x xMin f x M xin f ω⎡ ⎤+ ⎣ ⎦Es.t. g0 (x0 ) = b0 s.t. h (x ,ω ) = b(ω ) − g (x0 ,ω )

    x0 ≥ 0 x (ω ) ≥ 0

    Minx0 ,x1,x2 ,…,xS

    f 0 (x0 )+1S

    f k (xk )k =1

    S

    ∑g0 (x0 ) = b0gk (x0 ) + hk (xk ) = bk ,

    x0 ≥ 0, xk ≥ 0, k = 1,...,S .

    subj. to. 1 2, , , Sξ ξ ξ…

    Second-stage random data ( )ξ ω

    continuous discrete

    Sampling

    Inference Analysis

    M samples

    Sample average approximation (SAA)

    or bootstrapping

    x0*

    x0N

    x0N − x0

    * ~ ?

  • 2.1 A PERHAPS UNEXPECTED APPLICATION. SCALABLE MAX LIKELIHOOD FOR GAUSSIAN PROCESSES

    Mihai Anitescu, STAT 343, Autumn 12. Not for use outside UC

  • Maximum Likelihood Estimation (MLE)

    •  A family of covariance functions parameterized by θ: φ(x; θ)

    •  Maximize the log-likelihood to estimate θ:

    •  First order optimality: (also known as score equations)

    14

    maxθL(θ ) = log (2π )−n/2 (detK )−1/2 exp(−yTK −1y / 2){ }

    = −12yTK −1y− 1

    2log(detK )− n

    2log2π

    12yTK −1(∂ jK )K

    −1y− 12tr K −1(∂ jK )#$ %&= 0

  • Maximum Likelihood Estimation (MLE)

    The log-det term poses a significant challenge for large-scale computations •  Cholesky of K: Prohibitively expensive! •  log(det K) = tr(log K): Need some matrix function methods to

    handle the log •  No existing method to evaluate the log-det term in sufficient

    accuracy

    15

    maxθ

    − 12yTK −1y− 1

    2log(detK )− n

    2log2π

  • Sample Average Approximation of Maximum Likelihood Estimation (MLE)

    We (Anitescu et al. 2012) consider approximately solving the first order optimality instead: •  A randomized trace estimator tr(A) = E[uTAu]

    –  u has i.i.d. entries taking ±1 with equal probability •  It becomes a stochastic nonlinear equation •  As N tends to infinity, the solution approaches the true estimate •  Numerically, one must solve linear systems with O(N) right-hand

    sides. 16

    12yTK −1(∂ jK )K

    −1y− 12

    tr K −1(∂ jK )#$ %&

    ≈ 12yTK −1(∂ jK )K

    −1y− 12N

    uiT

    i=1

    N

    ∑ K −1(∂ jK )#$ %&ui = 0

  • Convergence of Stochastic Programming - SAA

    •  Let

    •  First result: where

    17

    θ : truth

    θ̂ : sol of 12yTK −1(∂ jK )K

    −1y− 12

    tr K −1(∂ jK )#$ %&= 0

    θ̂ N : sol of F = 12yTK −1(∂ jK )K

    −1y− 12N

    uiT

    i=1

    N

    ∑ K −1(∂ jK )#$ %&ui = 0

    [V N ]−1/2 (θ̂ N −θ̂ ) D" →" standard normal, V N = [J N ]−T ΣN [J N ]−1

    J N =∇F(θ̂ N ) and ΣN = cov{F(θ̂ N )}

  • Simulation: We scale

    •  Truth θ = [7, 10], Matern ν = 1.5

    18

    104 1066

    7

    8

    9

    10

    11

    matrix dimension n

    1

    2

    104 106101

    102

    103

    104

    105

    matrix dimension n

    time

    (sec

    onds

    )

    64x64 grid 2.56 mins func eval: 7

    128x128 grid6.62 mins func eval: 7

    256x256 grid1.1 hours func eval: 8

    512x512 grid2.74 hours func eval: 8

    1024x1024 grid11.7 hours func eval: 8

  • 2.1 Stochastic Programming for GP

    •  Prof. Stein will have some as well •  How do we precondition? •  Are there classes of processes for which we can prove global

    convergence? •  Can we prove global convergence without function values? •  Can we exploit hierarchical structure? •  Can we approximate the matrix-vector multiplication efficiently

    AND maintain positive definiteness? •  (The ScalaGAUSS project).

    Mihai Anitescu, STAT 343, Autumn 12. Not for use outside UC

  • 2.2: DECISION (CONTROL, DESIGN, PLANNING) OF ENERGY SYSTEMS UNDER UNCERTAINTY

    Mihai Anitescu, STAT 343, Autumn 12. Not for use outside UC

  • Ambient Condition Effects in Energy Systems Operation of Energy Systems is Strongly Affected by Ambient Conditions - Power Grid Management: Predict Spatio-Temporal Demands (Douglas, et.al. 1999)

    - Power Plants: Generation levels affected by air humidity and temperature (General Electric)

    - Petrochemical: Heating and Cooling Utilities (ExxonMobil)

    - Buildings: Heating and Cooling Needs (Braun, et.al. 2004)

    - (Focus) Next Generation Energy Systems assume a major renewable energy penetration: Wind + Solar + Fossil (Beyer, et.al. 1999)

    -  Increased reliance on renewables must account for variability of ambient conditions, which cannot be done deterministically …

    -  We must optimize operational and planning decisions accounting for the uncertainty in ambient conditions (and others, e.g. demand)

    -  Optimization Under Uncertainty.

    Wind Power Profiles Mihai Anitescu - Optimization under uncertainty

    21

  • Multifaceted Mathematics for Complex Energy Systems (M2ACS) Project Director: Mihai Anitescu, Argonne National Lab

    22

    Goals: •  By taking a holistic view, develop deep mathematical

    understanding and effective algorithms to remove current bottlenecks in analysis, simulation, and optimization of complex energy systems.

    •  Address the mathematical and computational complexities of analyzing, designing, planning, maintaining, and operating the nation's electrical energy systems and related infrastructure.

    Integrated Novel Mathematics Research: •  Predictive modeling that accounts for uncertainty and

    errors •  Mathematics of decisions that allow hierarchical, data-

    driven and real-time decision making •  Scalable algorithms for optimization and dynamic

    simulation •  Integrative frameworks leveraging model reduction

    and multiscale analysis

    Long-Term DOE Impact: •  Development of new mathematics at the

    intersection of multiple mathematical sub-domains

    •  Addresses a broad class of applications for complex energy systems, such as :

    •  Planning for power grid and related infrastructure

    •  Analysis and design for renewable energy integration

    Team: Argonne National Lab (Lead), Pacific Northwest National Lab, Sandia National Lab, University of Wisconsin, University of Chicago

    Representative decision-making activities and their time scales in electric power systems. Image courtesy of Chris de Marco (U-Wisconsin).

    Multifaceted Mathematics for Complex Energy Systems

  • Stochastic Predictive Control

    Dynamic System Model

    Stochastic

    Optimization

    Weather Model

    Low-Level

    Control

    Forecast

    Energy System

    Set-Points

    Forecast & Uncertainty

    Measurements

    Stochastic NLMPC

    Min

    x0f0 (x0 ) + E Minx f (x,ω )

    ⎡⎣

    ⎤⎦{ }

    subj. to. g0 x0( ) = b0gi x0 ,xi( ) = bi i =1,2…Sx0 ≥ 0, xi ≥ 0

    Two-stage Stoch Prog

    Mihai Anitescu - Optimization under uncertainty

    23

  • Stochastic Unit Commitment with Wind Power

    •  Wind Forecast – WRF(Weather Research and Forecasting) Model –  Real-time grid-nested 24h simulation –  30 samples require 1h on 500 CPUs (Jazz@Argonne)

    Mihai Anitescu - Optimization under uncertainty

    24

    1min COST

    s.t. , ,

    , ,

    ramping constr., min. up/down constr.

    wind

    wind

    p u dsjk jk jk

    s j ks

    sjk kj

    windsjk

    j

    wik ksj

    ndsk

    jjk

    j

    c c cN

    p D s k

    p D R s k

    p

    p

    ∈ ∈ ∈

    ⎛ ⎞= + +⎜ ⎟

    ⎝ ⎠+ = ∈ ∈

    + ≥ + ∈ ∈

    ∑ ∑∑

    ∑ ∑

    ∑ ∑S N T

    N

    N

    N

    N

    S T

    S T

    Zavala & al 2010.

    Thermal Units Schedule? Minimize Cost Satisfy Demand Have a Reserve Dispatch through network

  • 0 24 48 720

    200

    400

    600

    800

    1000

    1200

    Tota

    l P

    ow

    er

    [MW

    ]

    Time [hr]

      Unit commitment & energy dispatch with uncertain wind power generation for the State of Illinois, assuming 20% wind power penetration, using the same windfarm sites as the one existing today.

     Full integration with 10 thermal units to meet demands. Consider dynamics of start-up, shutdown, set-point changes

      The solution is only 1% more expensive then the one with exact information. Solution on average infeasible at 10%.

    wind power

    Mihai Anitescu - Optimization under uncertainty

    25

    Demand Samples Wind

    Thermal

    Wind power forecast and stochastic programming

  • 2.2 Stochastic Programming Challenges

    •  How do we produce efficient confidence intervals for SAA (as the number of samples may be limited)?

    •  How do I solve the sharply increased problem efficiently (we solve problems with 3B variables, but about 10 times slower than we would like)?

    •  How do we deal with integer variables (millions of them ) •  How do we insert economic actors (economic equilibria) when

    we have integer variables? •  Can we use machine learning concepts to reduce decision space? •  How do we achieve stability, low memory and fast convergence? •  ….

    Mihai Anitescu, STAT 343, Autumn 12. Not for use outside UC