AnBian, KfirY. Levy,...

Non-monotone Continuous DR-submodular Maximization:Structure and Algorithms

An Bian, Kfir Y. Levy, Andreas Krause and Joachim M. Buhmann

DR-submodular (Diminishing Returns) Maximization & Its Applications

👉 Softmax extension for determinantal point processes (DPPs) [Gillenwater et al ‘12]

👉 Mean-field inference for log-submodular models [Djolonga et al ‘14]

👉 DR-submodular quadratic programming

👉 (Generalized submodularity over conic lattices) e.g., logistic regression with a non-convex separable regularizer [Antoniadis et al ‘11]

👉 Etc… (more see paper)

Based on Local-Global Relation, can use any solver for finding an approximately stationary point as the subroutine, e.g., the Non-convex Frank-Wolfe solver in [Lascote-Julien ‘16]

TWO-PHASE ALGORITHMInput: stopping tolerances 𝜖", 𝜖$, #iterations 𝐾", 𝐾$𝒙 ← Non-convex Frank-Wolfe(𝑓, 𝒫, 𝐾", 𝜖") // Phase I on 𝒫𝒬 ← 𝒫 ∩ 𝒚 𝒚 ≤ 𝒖0 − 𝒙}𝒛 ← Non-convex Frank-Wolfe(𝑓, 𝒬, 𝐾$, 𝜖$) // Phase II on 𝒬Output: argmax 𝑓 𝒙 , 𝑓(𝒛)

Underlying Properties of DR-submodular Maximization

👉 Concavity Along Non-negative Directions:

Experimental Results (more see paper)

DR-submodular (DR property) [Bian et al ‘17]: ∀𝒂 ≤ 𝒃 ∈ 𝒳, ∀𝑖, ∀𝑘 ∈ ℝC, it holds,

𝑓 𝑘𝒆E + 𝒂 − 𝑓 𝒂 ≥ 𝑓 𝑘𝒆E + 𝒃 − 𝑓(𝒃).

- If 𝑓 differentiable, 𝛻𝑓() is an antitone mapping (∀𝒂 ≤ 𝒃, it holds 𝛻𝑓 𝒂 ≥ 𝛻𝑓 𝒃 )

- If 𝑓 twice differentiable, 𝛻EI$𝑓 𝒙 ≤ 0, ∀𝒙

max𝒙∈𝒫

𝑓(𝒙)𝑓: 𝒳 → ℝ is continuous DR-submodular. 𝒳 is a hypercube. Wlog, let 𝒳 = 𝟎, 𝒖0 . 𝒫 ⊆ 𝒳 is convex and down-closed: 𝒙 ∈ 𝒫 & 𝟎 ≤ 𝒚 ≤ 𝒙 implies 𝒚 ∈ 𝒫.

App

licat

ions

Ref

eren

ces

Feldman, Naor, and Schwartz. A unified continuous greedy algorithm for submodular maximization. FOCS 2011

Gillenwater, Kulesza, and Taskar. Near-optimal map inference for determinantal point processes. NIPS 2012.

Bach. Submodular functions: from discrete to continous domains. arXiv:1511.00394, 2015.

Lacoste-Julien. Convergence rate of frank-wolfe for non-convex objectives. arXiv:1607.00345, 2016.

Bian, Mirzasoleiman, Buhmann, and Krause. Guaranteed non-convex optimization: Submodular maximization over continuous domains. AISTATS 2017.

Quadratic Lower Bound. With a 𝐿-Lipschitz gradient, for all 𝒙 and 𝒗 ∈ ±ℝCT , it holds,𝑓 𝒙 + 𝒗 ≥ 𝑓 𝒙 + ⟨𝛻𝑓 𝒙 , 𝒗⟩ − W$ 𝒗 X

Strongly DR-submodular & Quadratic Upper Bound. 𝑓 is 𝜇-strongly DR-submodular if for all 𝒙 and 𝒗 ∈ ±ℝCT , it holds,

𝑓 𝒙 + 𝒗 ≤ 𝑓 𝒙 + ⟨𝛻𝑓 𝒙 , 𝒗⟩ − Z$ 𝒗 X

Two Guaranteed Algorithms

Guarantee of TWO-PHASE ALGORITHM.

max 𝑓 𝒙 , 𝑓 𝒛 ≥[\( 𝒙 − 𝒙∗ $ + 𝒛 − 𝒛∗ $)+^_ ` 𝒙

∗ abcd e^f^g^

� ,i^ abcdeXfXg^

� ,iX ,

where 𝒛∗ ≔ 𝒙 ∨ 𝒙∗ − 𝒙

NON-MONOTONE FRANK-WOLFE VARIANTInput: step size 𝛾 ∈ (0,1]𝒙(o) ← 0, 𝑘 ← 0, 𝑡(o) ← 0 // 𝑡: cumulative step sizeWhile 𝑡(q) < 1 do:

𝒗(q) ← argmax𝒗∈𝒫,𝒗s𝒖0a𝒙(t) 𝒗, 𝛻𝑓(𝒙(q)) // shrunken LMO

𝛾q ← min 𝛾, 1 − 𝑡(q)

𝒙(qC") ← 𝒙(q) + 𝛾q𝒗(q), 𝑡(qC") ← 𝑡(q) + 𝛾q, 𝑘 + +Output: 𝒙(w)

Guarantee of NON-MONOTONE FRANK-WOLFE VARIANT.

𝑓 𝒙 w ≥ 𝑒a"𝑓 𝒙∗ − 𝑂 "wX 𝑓 𝒙∗ − z

XW$w

Baselines: - QUADPROGIP: global solver for non-convex quadratic programming (possibly in exponential time)- Projected Gradient Ascent (PROJGRAD) with diminishing step sizes (" qC"⁄ )

DR-submodular Quadratic Programming. Synthetic problem instances 𝑓 𝒙 = ^X𝒙|𝐇𝒙 + 𝒉

𝒙 + 𝑐, 𝒫 = {𝒙 ∈ ℝCT|𝐀𝒙 ≤ 𝒃, 𝒙 ≤𝒖0, 𝐀 ∈ ℝCC×T, 𝒃 ∈ ℝC} has 𝑚 linear constraints.

Randomly generated in two manners:1) Uniform distribution (see Figs below); 2) Exponential distribution

Maximizing Softmax Extensions for MAP inference of DPPs.𝑓 𝒙 = log det diag 𝒙 𝐋 − 𝐈 + 𝐈 , 𝒙 ∈ 0,1 T

𝐋:kernel/similarity matrix. 𝒫 is a matching polytope for matched summarization.

Synthetic problem instances: - Softmax objectives: generate 𝐋 with 𝑛 random eigenvalues - Generate polytope constraints similarly as that for quadratic programming

Real-world results on matched summarization:Select a set of document pairs out of a corpus of documents, such that the two documents within a pair are similar, and the overall set of pairs is as diverse as possible. Setting similar to [Gillenwater et al ‘12], experimented on the 2012 US Republican detates data.

0.2 0.4 0.6 0.8 1Match quality controller

2

4

6

8

10

Func

tion

valu

e

0 20 40 60 80 100Iteration

0

0.5

1

1.5

2

2.5

Func

tion

valu

e

Submodular

Concave Convex

DR-submodular

👉 Approximately Stationary Points & Global Optimum:

(Local-Global Relation). Let 𝒙 ∈ 𝒫 with non-stationarity 𝑔𝒫 𝒙 . Define 𝒬 ≔ 𝒫 ∩ 𝒚 𝒚 ≤ 𝒖0 − 𝒙}. Let 𝒛 ∈ 𝒬 with non-stationarity 𝑔𝒬 𝒛 . Then,

max 𝑓 𝒙 , 𝑓 𝒛 ≥ ^_[𝑓 𝒙∗ − 𝑔𝒫 𝒙 − 𝑔𝒬 𝒛 ] + [\( 𝒙 − 𝒙

∗ $ + 𝒛 − 𝒛∗ $),

where 𝒛∗ ≔ 𝒙 ∨ 𝒙∗ − 𝒙.

- Proof using the essential DR property on carefully constructed auxiliary points

- Good empirical performance for the Two-Phase algorithm: if 𝒙 is away from 𝒙∗, 𝒙 − 𝒙∗ $ will augment the bound; if 𝒙 is close to 𝒙∗, by the smoothness of 𝑓, should be near optimal.

DR-submodularity captures a subclass of non-convex/non-concave functions that enables exact minimization and approximate maximization in poly. time.

👉 Investigate geometric properties that underlie such objectives, e.g., a strong relation between stationary points & global optimum is proved.

👉 Devise two guaranteed algorithms: i) A “two-phase” algorithm with ¼ approximation guarantee. ii) A non-monotone Frank-Wolfe variant with " ⁄ approximation guarantee

👉 Extend to a much broader class of submodular functions on “conic” lattices.

Abst

ract

8 10 12 14 16Dimensionality

0.85

0.9

0.95

1

Appr

ox. r

atio


0.85

0.9

0.95

1

Appr

ox. r

atio


0.9

0.95

1

Appr

ox. r

atio

𝑚 = 0.5𝑛 𝑚 = 𝑛 𝑚 = 1.5𝑛


-0.05

0

0.05

0.1

0.15

0.2

Func

tion

valu

e


-0.05

0

0.05

0.1

0.15

0.2

Func

tion

valu

e


-0.05

0

0.05

0.1

0.15

0.2

Func

tion

valu

e

𝑚 = 0.5𝑛 𝑚 = 1.5𝑛𝑚 = 𝑛

key difference from the monotone Frank-Wolfevariant [Bian et al ‘17]

Lemma. For any 𝒙, 𝒚, ⟨𝒚 − 𝒙, 𝛻𝑓 𝒙 ⟩ ≥ 𝑓 𝒙 ∨ 𝒚 + 𝑓 𝒙 ∧ 𝒚 − 2𝑓 𝒙 + Z$ 𝒙a𝒚 X

If 𝛻𝑓 𝒙 = 0, then 2𝑓 𝒙 ≥ 𝑓 𝒙 ∨ 𝒚 + 𝑓 𝒙 ∧ 𝒚 + [X 𝒙a𝒚 Xà implicit relation between 𝒙 & 𝒚. (finding an exact stationary point is difficult 😟)

Non-stationarity Measure [Lacoste-Julien ‘16]. For any 𝒬 ⊆ 𝒳, the non-stationarity of 𝒙 ∈ 𝒬 is,𝑔𝒬 𝒙 ≔ max𝒗∈𝒬 ⟨𝒗 − 𝒙, 𝛻𝑓 𝒙 ⟩

coordinate-wise max. coordinate-wise min.

𝐷: diameter of 𝒫𝐿: smooth gradient

Softmax (red) & multilinear (blue)extensions, & concave cross-sectionsFig. from [Gillenwater et al ‘12]

AnBian, KfirY. Levy,...

Documents

Transcript of AnBian, KfirY. Levy,...

ModelDB: a system for managing ML modelslearningsys.org/nips17/assets/slides/modeldb-nips17.pdf · Scala Python … ModelDB Frontend: ... At last NIPS • Initial version of ModelDB

KIANTAE LEVY

A Low-Latency Online Prediction Serving Systemlearningsys.org/nips17/assets/slides/clipper-nips17.pdf · A Low-Latency Online Prediction Serving System Clipper. Big Data Training

Mayonesa, Levy

New Soc2Seq: Social Embedding meets Conversation Modelalborz-geramifard.com/workshops/nips17-Conversational-AI/... · 2017. 12. 1. · Soc2Seq: Social Embedding meets Conversation

Persianliteratur Levy

Optimizing Human Learning - teaching-machines.ccteaching-machines.cc/nips2017/papers/nips17-teaching_paper-3.pdf · wheretr isthetimeofthelastreview, ni(t) 2R+ istheforgettingrateand!isatimescaleparameter.

Levy Levy Prospect Theory Mgmt Sci 2002

About the EMFR Levy. Topics Introduction The Queensland experience Services The levy Farm land Levy questions Further information.

Levy presentation

Systems for Machine Learning and Machine Learning for …learningsys.org/nips17/assets/slides/dean-nips17.pdf · Machine Learning for Systems and Systems for Machine Learning Jeff

End to End Optimization Stack for Deep Learninglearningsys.org/nips17/assets/slides/TVM-MLSys-NIPS17.pdfEnd to End Optimization Stack for Deep Learning Presenter: Tianqi Chen Paul

Levy Cibercultura

DLVM - learningsys.orglearningsys.org/nips17/assets/slides/dlvm-nips17.pdf‘b1(): return 0: } Transformations: Differentiation & Optimizations. func @inference:

Electronic Wage Levy - Amazon Web Services · Adam SSN Employee Levy Debt ID Alex SSN Employee Levy Debt ID XYZ LLC FEIN Betty SSN Employee Levy Debt ID Bob SSN Employee Levy Debt

10 Branding Insights from Mark Levy of Levy Innovation

Thomas Levy

Howard Levy

El Paso County · Levy = 4.279 $91.78 Water Districts Levy = .944 $20.25 Library District Levy = 4.000 $85.80 School District #11 Levy =56.084 $1,203.00 El Paso County Levy = 7.738

AbOuT TAmESIdE COLLEgE APPRENTICES · Full levy funded apprenticeship Full levy funded apprenticeship Full levy funded apprenticeship Full levy funded apprenticeship WhAT dO EmPLOyERS