Learning and Testing Submodular Functions Grigory Yaroslavtsev Slides at .

Learning and Testing Submodular Functions

Grigory Yaroslavtsevhttp://grigory.us

Slides at http://grigory.us/cis625/lecture3.pdf

CIS 625: Computational Learning Theory

http://grigory.us/

http://grigory.us/cis625/lecture3.pdf

http://grigory.us/cis625/lecture3.pdf

Submodularity• Discrete analog of convexity/concavity, “law of

diminishing returns”• Applications: combintorial optimization, AGT, etc.Let • Discrete derivative:

• Submodular function:

Approximating everywhere

• Q2: O-fraction of arguments (PAC-style learning with membership queries

under uniform distribution)?

• A2: Almost as hard [Balcan, Harvey, STOC’11].

Pr𝑟𝑎𝑛𝑑𝑜𝑚𝑛𝑒𝑠𝑠 𝑜𝑓 𝑨 [ Pr

𝑺∼𝑈 (2𝑋)[ 𝑨 (𝑺 )= 𝒇 (𝑺 ) ]≥1−𝜖]≥ 12

• Q1: Approximate a submodular for all arguments with only poly(|X|) queries?

• A1: Only-approximation (multiplicative) possible [Goemans, Harvey, Iwata, Mirrokni, SODA’09]

Approximate learning

• PMAC-learning (Multiplicative), with poly(|X|) queries : [Balcan, Harvey ’11]

• PAAC-learning (Additive)• Running time: [Gupta, Hardt, Roth, Ullman, STOC’11]

• Running time: poly [Cheraghchi, Klivans, Kothari, Lee, SODA’12]

Pr𝑟𝑎𝑛𝑑 .𝑜𝑓 𝑨 [ Pr

𝑺∼𝑈 (2 𝑋)[¿ 𝒇 (𝑺 )−𝑨 (𝑺 )∨≤ 𝜷 ]≥1−𝜖 ]≥ 12

Learning

Goemans,Harvey,Iwata,Mirrokni

Balcan,Harvey

Gupta,Hardt,Roth,Ullman

Cheraghchi,Klivans,Kothari,Lee

Raskhodnikova, Y.

Learning -approximationEverywhere

PMAC Multiplicative

PAACAdditive

PAC

(bounded integral range )

Time Poly(|X|) Poly(|X|)

Extrafeatures

Under arbitrary distribution

Tolerant queries

SQ- queries,Agnostic

• For all algorithms

Polylog(|X|) queries

Learning: Bigger picture

}XOS = Fractionally subadditive

Subadditive

Submodular

Gross substitutes

OXS

[Badanidiyuru, Dobzinski, Fu, Kleinberg, Nisan, Roughgarden,SODA’12]

Additive (linear)

Coverage (valuations)

Other positive results:• Learning valuation functions [Balcan,

Constantin, Iwata, Wang, COLT’12]• PMAC-learning (sketching) coverage functions

[BDFKNR’12]• PMAC learning Lipschitz submodular functions

[BH’10] (concentration around average via Talagrand)

Discrete convexity• Monotone convex

• Convex

1 2 3 … <=R … … … … … … … … n02468

1 2 3 … <=R … … … … >= n-R

… … … n02468

Discrete submodularity

• Monotone submodular

𝑋

∅|𝑺|≤𝑹

• Submodular

𝑋

∅|𝑺|≤𝑹

|𝑺|≥|𝑿|−𝑹

• Case study: = 1 (Boolean submodular functions )Monotone submodular = (monomial)Submodular = (2-term CNF)

Discrete monotone submodularity

≥𝒎𝒂𝒙 ( 𝒇 (𝑺𝟏 ) , 𝒇 (𝑺𝟐))

|𝑺|≤𝑹

• Monotone submodular

Discrete monotone submodularity• Theorem: for monotone submodular f• (by monotonicity)

|𝑺|≤𝑹

𝑇

Discrete monotone submodularity

• S smallest subset of such that • we have => Restriction of on is monotone increasing =>

|𝑺|≤𝑹

𝑇

𝑆 ′ : 𝑓 (𝑆′ )= 𝑓 (𝑇 )

𝑆 ′𝜕𝑥 𝑓 (𝑆′∖ {𝑥 } )>0

Representation by a formula• Theorem: for monotone submodular f

• Alternative notation: , • =

(Monotone, if no negations)

• Theorem (restated): Monotone submodular can be represented as a monotone pseudo-Boolean -DNF with constants

Discrete submodularity• Submodular can be represented as a

pseudo-Boolean 2R-DNF with constants • Hint [Lovasz] (Submodular monotonization): Given submodular define

Then is monotone and submodular.

𝑋

∅|𝑺|≤𝑹

|𝑺|≥|𝑿|−𝑹

Proof

• We’re done if we have a coverage :1. All have large size: 2. For all there exists3. For every restriction of on is monotone

• Every is a monotone pB R-DNF (3)• Add at most R negated variables to

every clause to restrict to (1)• (2)

𝑋

∅

𝐓

𝒇 𝑻

Proof

• There is no such coverage => relaxation [GHRU’11]– All have large size: – For all there exists a pair

– Restriction of on all is monotone𝑋

∅

𝐓

𝐓 ′

Coverage by monotone lower bounds

• Let be defined as – is monotone submodular [Lovasz]– For all we have– For all we have

• () (where is a monotone pB R-DNF)

𝒇 𝑻𝒎𝒐𝒏 (𝑺 )≤ 𝒇 (𝑺)

𝑺’

𝑻

∅

𝒇 𝑻𝒎𝒐𝒏(𝑺)= 𝒇 (𝑺)

𝑺

Learning pB-formulas and k-DNF• = class of pB -DNF with • i-slice defined as

• If its i-slices are -DNF and:

• PAC-learning:

• Learn every i-slice on fraction of arguments => union bound

(

Pr𝑟𝑎𝑛𝑑 (𝑨) [ Pr

𝑺∼𝑈( {0,1 }𝑛)[ 𝑨 (𝑺 )= 𝒇 (𝑺 ) ]≥1−𝜖]≥ 12

iff

Learning Fourier coefficients• Learn (-DNF) on fraction of arguments

• Fourier sparsity = # of largest Fourier coefficients sufficient to PAC-learn every

• = [Mansour]: doesn’t depend on n!– Kushilevitz-Mansour (Goldreich-Levin): queries/time. – ``Attribute efficient learning’’: queries– Lower bound: () queries to learn a random -junta ( -DNF) up to constant

precision.

• = – Optimizations: Do all R iterations of KM/GL in parallel by reusing queries

Property testing

• Let be the class of submodular • How to (approximately) test, whether a given is in ?• Property tester: (randomized) algorithm for distinguishing:

1. -far):

• Key idea: -DNFs have small representations:– [Gopalan, Meka,Reingold CCC’12] (using quasi-sunflowers [Rossman’10]), -DNF formula F there exists:-DNF formula F’ of size such that

𝑪-close

-far

Testing by implicit learning• Good approximation by juntas => efficient property

testing [Diakonikolas, Lee, Matulef, Onak ,Rubinfeld, Servedio, Wan]– -approximation by -junta– Good dependence on :

• For submodular functions – Query complexity , independent of n! – Running time exponential in – lower bound for testing -DNF (reduction from Gap Set

Intersection)

• [Blais, Onak, Servedio, Y.] exact characterization of submodular functions

Previous work on testing submodularity

[Parnas, Ron, Rubinfeld ‘03, Seshadhri, Vondrak, ICS’11]:

• U. • Lower bound: Special case: coverage functions [Chakrabarty, Huang, ICALP’12].

Gap in query complexity

Directions

• Close gaps between upper and lower bounds, extend to more general learning/testing settings

• Connections to optimization?• What if we use distance between functions

instead of Hamming distance in property testing? [Berman, Raskhodnikova, Y.]

Learning and Testing Submodular Functions Grigory Yaroslavtsev Slides at .

Documents

Transcript of Learning and Testing Submodular Functions Grigory Yaroslavtsev Slides at .