AnBian, KfirY. Levy,...

Post on 04-Feb-2021

1 views 0 download

Transcript of AnBian, KfirY. Levy,...

  • Non-monotone Continuous DR-submodular Maximization:Structure and Algorithms

    An Bian, Kfir Y. Levy, Andreas Krause and Joachim M. Buhmann

    DR-submodular (Diminishing Returns) Maximization & Its Applications

    πŸ‘‰ Softmax extension for determinantal point processes (DPPs) [Gillenwater et al β€˜12]

    πŸ‘‰ Mean-field inference for log-submodular models [Djolonga et al β€˜14]

    πŸ‘‰ DR-submodular quadratic programming

    πŸ‘‰ (Generalized submodularity over conic lattices) e.g., logistic regression with a non-convex separable regularizer [Antoniadis et al β€˜11]

    πŸ‘‰ Etc… (more see paper)

    Based on Local-Global Relation, can use any solver for finding an approximately stationary point as the subroutine, e.g., the Non-convex Frank-Wolfe solver in [Lascote-Julien β€˜16]

    TWO-PHASE ALGORITHMInput: stopping tolerances πœ–", πœ–$, #iterations 𝐾", 𝐾$𝒙 ← Non-convex Frank-Wolfe(𝑓, 𝒫, 𝐾", πœ–") // Phase I on 𝒫𝒬 ← 𝒫 ∩ π’š π’š ≀ 𝒖0 βˆ’ 𝒙}𝒛 ← Non-convex Frank-Wolfe(𝑓, 𝒬, 𝐾$, πœ–$) // Phase II on 𝒬Output: argmax 𝑓 𝒙 , 𝑓(𝒛)

    Underlying Properties of DR-submodular Maximization

    πŸ‘‰ Concavity Along Non-negative Directions:

    Experimental Results (more see paper)

    DR-submodular (DR property) [Bian et al β€˜17]: βˆ€π’‚ ≀ 𝒃 ∈ 𝒳, βˆ€π‘–, βˆ€π‘˜ ∈ ℝC, it holds,

    𝑓 π‘˜π’†E + 𝒂 βˆ’ 𝑓 𝒂 β‰₯ 𝑓 π‘˜π’†E + 𝒃 βˆ’ 𝑓(𝒃).

    - If 𝑓 differentiable, 𝛻𝑓() is an antitone mapping (βˆ€π’‚ ≀ 𝒃, it holds 𝛻𝑓 𝒂 β‰₯ 𝛻𝑓 𝒃 )

    - If 𝑓 twice differentiable, 𝛻EI$𝑓 𝒙 ≀ 0, βˆ€π’™

    maxπ’™βˆˆπ’«

    𝑓(𝒙)𝑓: 𝒳 β†’ ℝ is continuous DR-submodular. 𝒳 is a hypercube. Wlog, let 𝒳 = 𝟎, 𝒖0 . 𝒫 βŠ† 𝒳 is convex and down-closed: 𝒙 ∈ 𝒫 & 𝟎 ≀ π’š ≀ 𝒙 implies π’š ∈ 𝒫.

    App

    licat

    ions

    Ref

    eren

    ces

    Feldman, Naor, and Schwartz. A unified continuous greedy algorithm for submodular maximization. FOCS 2011

    Gillenwater, Kulesza, and Taskar. Near-optimal map inference for determinantal point processes. NIPS 2012.

    Bach. Submodular functions: from discrete to continous domains. arXiv:1511.00394, 2015.

    Lacoste-Julien. Convergence rate of frank-wolfe for non-convex objectives. arXiv:1607.00345, 2016.

    Bian, Mirzasoleiman, Buhmann, and Krause. Guaranteed non-convex optimization: Submodular maximization over continuous domains. AISTATS 2017.

    Quadratic Lower Bound. With a 𝐿-Lipschitz gradient, for all 𝒙 and 𝒗 ∈ ±ℝCT , it holds,𝑓 𝒙 + 𝒗 β‰₯ 𝑓 𝒙 + βŸ¨π›»π‘“ 𝒙 , π’—βŸ© βˆ’ W$ 𝒗 X

    Strongly DR-submodular & Quadratic Upper Bound. 𝑓 is πœ‡-strongly DR-submodular if for all 𝒙 and 𝒗 ∈ ±ℝCT , it holds,

    𝑓 𝒙 + 𝒗 ≀ 𝑓 𝒙 + βŸ¨π›»π‘“ 𝒙 , π’—βŸ© βˆ’ Z$ 𝒗 X

    Two Guaranteed Algorithms

    Guarantee of TWO-PHASE ALGORITHM.

    max 𝑓 𝒙 , 𝑓 𝒛 β‰₯[\( 𝒙 βˆ’ π’™βˆ— $ + 𝒛 βˆ’ π’›βˆ— $)+^_ ` 𝒙

    βˆ— abcd e^f^g^

    οΏ½ ,i^ abcdeXfXg^

    οΏ½ ,iX ,

    where π’›βˆ— ≔ 𝒙 ∨ π’™βˆ— βˆ’ 𝒙

    NON-MONOTONE FRANK-WOLFE VARIANTInput: step size 𝛾 ∈ (0,1]𝒙(o) ← 0, π‘˜ ← 0, 𝑑(o) ← 0 // 𝑑: cumulative step sizeWhile 𝑑(q) < 1 do:

    𝒗(q) ← argmaxπ’—βˆˆπ’«,𝒗s𝒖0a𝒙(t) 𝒗, 𝛻𝑓(𝒙(q)) // shrunken LMO

    𝛾q ← min 𝛾, 1 βˆ’ 𝑑(q)

    𝒙(qC") ← 𝒙(q) + 𝛾q𝒗(q), 𝑑(qC") ← 𝑑(q) + 𝛾q, π‘˜ + +Output: 𝒙(w)

    Guarantee of NON-MONOTONE FRANK-WOLFE VARIANT.

    𝑓 𝒙 w β‰₯ 𝑒a"𝑓 π’™βˆ— βˆ’ 𝑂 "wX 𝑓 π’™βˆ— βˆ’ z

    XW$w

    Baselines: - QUADPROGIP: global solver for non-convex quadratic programming (possibly in exponential time)- Projected Gradient Ascent (PROJGRAD) with diminishing step sizes (" qC"⁄ )

    DR-submodular Quadratic Programming. Synthetic problem instances 𝑓 𝒙 = ^X𝒙|𝐇𝒙 + 𝒉

    𝒙 + 𝑐, 𝒫 = {𝒙 ∈ ℝCT|𝐀𝒙 ≀ 𝒃, 𝒙 ≀𝒖0, 𝐀 ∈ ℝCCΓ—T, 𝒃 ∈ ℝC} has π‘š linear constraints.

    Randomly generated in two manners:1) Uniform distribution (see Figs below); 2) Exponential distribution

    Maximizing Softmax Extensions for MAP inference of DPPs.𝑓 𝒙 = log det diag 𝒙 𝐋 βˆ’ 𝐈 + 𝐈 , 𝒙 ∈ 0,1 T

    𝐋:kernel/similarity matrix. 𝒫 is a matching polytope for matched summarization.

    Synthetic problem instances: - Softmax objectives: generate 𝐋 with 𝑛 random eigenvalues - Generate polytope constraints similarly as that for quadratic programming

    Real-world results on matched summarization:Select a set of document pairs out of a corpus of documents, such that the two documents within a pair are similar, and the overall set of pairs is as diverse as possible. Setting similar to [Gillenwater et al β€˜12], experimented on the 2012 US Republican detates data.

    0.2 0.4 0.6 0.8 1Match quality controller

    2

    4

    6

    8

    10

    Func

    tion

    valu

    e

    0 20 40 60 80 100Iteration

    0

    0.5

    1

    1.5

    2

    2.5

    Func

    tion

    valu

    e

    Submodular

    Concave Convex

    DR-submodular

    πŸ‘‰ Approximately Stationary Points & Global Optimum:

    (Local-Global Relation). Let 𝒙 ∈ 𝒫 with non-stationarity 𝑔𝒫 𝒙 . Define 𝒬 ≔ 𝒫 ∩ π’š π’š ≀ 𝒖0 βˆ’ 𝒙}. Let 𝒛 ∈ 𝒬 with non-stationarity 𝑔𝒬 𝒛 . Then,

    max 𝑓 𝒙 , 𝑓 𝒛 β‰₯ ^_[𝑓 π’™βˆ— βˆ’ 𝑔𝒫 𝒙 βˆ’ 𝑔𝒬 𝒛 ] + [\( 𝒙 βˆ’ 𝒙

    βˆ— $ + 𝒛 βˆ’ π’›βˆ— $),

    where π’›βˆ— ≔ 𝒙 ∨ π’™βˆ— βˆ’ 𝒙.

    - Proof using the essential DR property on carefully constructed auxiliary points

    - Good empirical performance for the Two-Phase algorithm: if 𝒙 is away from π’™βˆ—, 𝒙 βˆ’ π’™βˆ— $ will augment the bound; if 𝒙 is close to π’™βˆ—, by the smoothness of 𝑓, should be near optimal.

    DR-submodularity captures a subclass of non-convex/non-concave functions that enables exact minimization and approximate maximization in poly. time.

    πŸ‘‰ Investigate geometric properties that underlie such objectives, e.g., a strong relation between stationary points & global optimum is proved.

    πŸ‘‰ Devise two guaranteed algorithms: i) A β€œtwo-phase” algorithm with ΒΌ approximation guarantee. ii) A non-monotone Frank-Wolfe variant with " ⁄ approximation guarantee

    πŸ‘‰ Extend to a much broader class of submodular functions on β€œconic” lattices.

    Abst

    ract

    8 10 12 14 16Dimensionality

    0.85

    0.9

    0.95

    1

    Appr

    ox. r

    atio

    8 10 12 14 16Dimensionality

    0.85

    0.9

    0.95

    1

    Appr

    ox. r

    atio

    8 10 12 14 16Dimensionality

    0.9

    0.95

    1

    Appr

    ox. r

    atio

    π‘š = 0.5𝑛 π‘š = 𝑛 π‘š = 1.5𝑛

    8 10 12 14 16Dimensionality

    -0.05

    0

    0.05

    0.1

    0.15

    0.2

    Func

    tion

    valu

    e

    8 10 12 14 16Dimensionality

    -0.05

    0

    0.05

    0.1

    0.15

    0.2

    Func

    tion

    valu

    e

    8 10 12 14 16Dimensionality

    -0.05

    0

    0.05

    0.1

    0.15

    0.2

    Func

    tion

    valu

    e

    π‘š = 0.5𝑛 π‘š = 1.5π‘›π‘š = 𝑛

    key difference from the monotone Frank-Wolfevariant [Bian et al β€˜17]

    Lemma. For any 𝒙, π’š, βŸ¨π’š βˆ’ 𝒙, 𝛻𝑓 𝒙 ⟩ β‰₯ 𝑓 𝒙 ∨ π’š + 𝑓 𝒙 ∧ π’š βˆ’ 2𝑓 𝒙 + Z$ 𝒙aπ’š X

    If 𝛻𝑓 𝒙 = 0, then 2𝑓 𝒙 β‰₯ 𝑓 𝒙 ∨ π’š + 𝑓 𝒙 ∧ π’š + [X 𝒙aπ’š XΓ  implicit relation between 𝒙 & π’š. (finding an exact stationary point is difficult 😟)

    Non-stationarity Measure [Lacoste-Julien β€˜16]. For any 𝒬 βŠ† 𝒳, the non-stationarity of 𝒙 ∈ 𝒬 is,𝑔𝒬 𝒙 ≔ maxπ’—βˆˆπ’¬ βŸ¨π’— βˆ’ 𝒙, 𝛻𝑓 𝒙 ⟩

    coordinate-wise max. coordinate-wise min.

    𝐷: diameter of 𝒫𝐿: smooth gradient

    Softmax (red) & multilinear (blue)extensions, & concave cross-sectionsFig. from [Gillenwater et al β€˜12]