Problems in Sparse Multivariate Statistics with a Discrete ...€¦ · Problems in Sparse...

Problems in Sparse Multivariate Statistics with aDiscrete Optimization Lens

Rahul Mazumder

Massachusetts Institute of Technology

(Joint with D. Bertsimas, M. Copenhaver, A. King, P. Radchenko, H. Qin, J.

Goetz, K. Khamaru)

August, 2016

R. Mazumder Sparsity with Discrete Optimization 1

Motivation

I Several basic statistical estimation tasks are inherently discrete

I Often dismissed as computationally infeasible

I We often “relax” the hard problems:– Convex (continuous) optimization plays a key role (e.g. Lasso)– They work very well in many cases...

I However, often leads to a compromise in statistical performance

I Question: Can we use advances in discrete optimization to globallysolve nonconvex problems?

Motivation

I Several basic statistical estimation tasks are inherently discrete

I Often dismissed as computationally infeasible

I We often “relax” the hard problems:– Convex (continuous) optimization plays a key role (e.g. Lasso)– They work very well in many cases...

I However, often leads to a compromise in statistical performance

I Question: Can we use advances in discrete optimization to globallysolve nonconvex problems?

Motivation

I We seldom know a-priori which method will work for a givenapplication

I “...A statistician’s toolkit should have a whole array of methods, toexperiment with...”

...Jerome. H. Friedman

I Use tools from mathematical optimization︸︷︷︸Discrete & Convex Optimization

to devise estimators:

I that are flexibleI have a disciplined computational framework:

– Obtain almost optimal solutions in seconds/minutes– Certify optimality in minutes/hours

Motivation

I We seldom know a-priori which method will work for a givenapplication

I “...A statistician’s toolkit should have a whole array of methods, toexperiment with...”

...Jerome. H. Friedman

I Use tools from mathematical optimization︸︷︷︸Discrete & Convex Optimization

to devise estimators:

I that are flexibleI have a disciplined computational framework:

– Obtain almost optimal solutions in seconds/minutes– Certify optimality in minutes/hours

Outline

I Best Subset Selection in Regression [Mallows ’66, Miller ’90]

— Least Squares Variable Selection— Discrete Dantzig Selector— Grouped Variable Selection and Sparse Additive Models

I Robust Linear Regression [Rousseeuw ’83]

— Least Median of Squares Regression

I Low rank Factor Analysis [Spearman ’04]

— Least Squares Factor Analysis— Maximum Likelihood Factor Analysis

Outline

Best Subset Regression: Statement

[Bertsimas, King, M., ’16, Annals of Statistics]

I Usual linear regression model n samples, p regressors

I Want a sparse β with good data-fidelity:

2‖y −Xβ‖22 s.t. ‖β‖0 ≤ k, (?)

[Miller ’90; Foster & George ’94; George ’00 ]

I Problem (?) is NP-hard [Natarajan ’95].

I R package leaps can handle n ≥ p ≤ 31.(branch and leaps [Furnival & Wilson 1974])

I Not surprisingly, advised to stay away from Problem (?).

Best Subset Regression: Current Approaches & Limitations

I Lasso (`1) [Tibshirani ’96, Chen & Donoho ’98] is a very popular andeffective proxy:

12‖y −Xβ‖22 + λ‖β‖1,

I Computation: convex optimization, fast & scalable

I `1 =⇒ good models, under assumptions︸︷︷︸difficult to verify

I `1 6=⇒ reliable sparse solutions, and `1 6= `0 solutions.

[Buhlmann, Van de Geer ’11; Cai, Shen ’11; Zhang, Jiang ’08....]

Shortcomings of the Lasso: a simple explanation

I In presence of correlated variables, to obtain model with goodpredictive power, Lasso brings in a large number of nonzerocoefficients

I Lasso leads to biased estimates—`1-norm penalizes large and smallcoefficients uniformly.

I Upon increasing the degree of regularization, Lasso sets morecoefficients to zero—leaves out true predictors from the active set.

Best Subset Regression: `1 vs `0

I If β denotes the best subset solution, for any (fixed) X,

sup‖β∗‖0≤k

nE(‖Xβ −Xβ∗‖22) .

σ2k log p

I If β`1 denotes a Lasso-based k-sparse estimator, then ∃ X:

σ2k1−δ log p

n. sup‖β∗‖0≤k

nE(‖Xβ`1 −Xβ∗‖22) .

σ2k log p

I There is a significant gap between `0 and `1-type solutions.

[Bunea et. al. ’07; Raskutti et. al. ’09; Zhang et. al. ’14 ]

Best Subset Regression: Current Approaches & Limitations

I To circumvent shortcomings, alternatives exist

I Non-convex penalties/ greedy methods[Fan, Li ’01; Zou ’06; Zou, Li ’08; Zhang ’10 ; Mazumder et. al. ’11; Zhang ,

Zhang ’12; Loh, Wainwright ’14]

I Problems are non-convex and hard to solve.

I Computational approaches mostly heuristic:cannot certify/prove global optimality for arbitrary dataset.Exception: [Liu , Yao , Li ’16]

Best Subset Regression: Our approach

[Bertsimas, King, M., ’16, Annals of Statistics]

I Certifiably minβ

12‖y −Xβ‖22 s.t. ‖β‖0 ≤ k

I Main workhorses:

Tools from different branches of Optimization:

I Modern Technology of Mixed Integer Optimization (MIO)

I Discrete First Order methods (motivated from convexcontinuous optimization)

I Consider minβ

12‖y −Xβ‖22 s.t. ‖β‖0 ≤ k

I Express as Mixed Integer Optimization problem (MIO)

I Discrete First Order methods for advanced warm-starts

I Enhancing MIO: Stronger Formulations

I Consider minβ

12‖y −Xβ‖22 s.t. ‖β‖0 ≤ k

I Express as Mixed Integer Optimization problem (MIO)

Brief Background on MIO

Mixed Integer Optimation (MIO)

I MIO: a particular class of discrete optimization problems

I The general form of a Mixed Integer Quadratic Optimization:

min αTQα + αTa

s.t. Aα ≤ b

αi ∈ 0, 1, ∀i ∈ I

αj ∈ <+, ∀j /∈ I,

a ∈ <m,A ∈ <k×m,b ∈ <k and Q ∈ <m×m (PSD)problem-parameters;

I Special instances: Mixed Integer Linear Optimization,Quadratic/Linear Programming...

I MIO optimization methods employ a combination of branch andbound, branch and cut, cutting plane methods, ...(not complete enumeration)

I Foundations deeply rooted in polyhedral theory, combinatorics,discrete geometry/algebra,...

I Worst case: NP hard. Our focus is not worst case analysis.(Simplex Algorithm, Path Algorithms like LARS, TSP, ...)

I Modern MIO is tractable (in practice)tractability: Ability to solve problems of realistic size in times thatare appropriate for the applications we consider.(successful applications: production planning, transportation, inventory

management, air-traffic control, warehouse location, matching assignments,...)

I Modern MIO is tractable (in practice)

tractability: Ability to solve problems of realistic size in times thatare appropriate for the applications we consider.(successful applications: production planning, transportation, inventory

I Modern MIO is tractable (in practice)tractability: Ability to solve problems of realistic size in times thatare appropriate for the applications we consider.(successful applications: production planning, transportation, inventory

Progress of MIO

I Algorithms and Software have undergone huge improvements overpast 25+ years (1991 - 2016).

I Algorithms speed-up: ∼1.4 million times(Combined speedup: CPLEX 1.2 to 11 & Gurobi 1.0 to 6.5)

I Hardware speed-up: ∼1.6 million times(Peak Supercomputer performance)

I Total speed-up: 2.2 trillion times!

I Commercial packages: Xpress, Gurobi, Cplex,...

Non-commercial packages: GLPK, lpsolve, CBC, SCIP,...

Interfaces: Matlab, R, Python, Julia (JuMP)

Back to Formulation

Vanilla MIO formulation

For problem: minβ12‖y −Xβ‖22 s.t. ‖β‖0 ≤ k,

A simple (natural) MIO formulation is given by

minβ,z

12‖y −Xβ‖22

s.t. |βi| ≤M · zi, i = 1, . . . , pp∑i=1

zi ≤ k

zi ∈ 0, 1, i = 1, . . . , p,

where, M (“Big-M”) is a parameter— M ≥ ‖β‖∞— M controls the strength of the MIO formulation

Diabetes Dataset, n = 350, p = 64, k = 6

Typical behavior of Overall Algorithm

0 100 200 300 400

Time (secs)

jective

Upper BoundsLower BoundsGlobal Minimum

0 100 200 300 400

Time (secs)

Time (secs) Time (secs)

Our approach

I Consider minβ12‖y −Xβ‖22 s.t. ‖β‖0 ≤ k

I Express best-subset as a Mixed Integer Optimization problem (MIO)

Discrete First Order Method

I Stylized gradient based method for

g(β) s.t. ‖β‖0 ≤ k,

I g(β) convex and ‖∇g(β)−∇g(β0)‖ ≤ ` · ‖β − β0‖.

I This implies that for all L ≥ `

g(β) ≤ Q(β) = g(β0) + 〈∇g(β0),β − β0〉+L

2‖β − β0‖22

I For the purpose of finding feasible solutions, we propose

Q(β) s.t. ‖β‖0 ≤ k

[Related work: Blumensath, Davis ’08; Donoho, Johnstone ’95 ]

Solution

I Equivalent to

∥∥∥∥β − (β0 −1

L∇g(β0)

)∥∥∥∥2

s.t. ‖β‖0 ≤ k

I Reducing tominβ‖β − u‖22 s.t. ‖β‖0 ≤ k

I Optimal solution is β∗ ∈ Hk(u), where Hk(u) is thehard-thresholding operator (retains the top k entries of u inabsolute value).[Donoho & Johnstone ’95]

Discrete First Order Algorithm (DFA)

Algorithm to get feasible solutions for:

g(β) s.t. ‖β‖0 ≤ k.

1. Initialize with a solution β0; m = 0.

2. m := m+ 1.

3. βm+1 ∈ Hk

(βm − 1

L∇g(βm)).

4. Perform a line search to get βm+1

5. Repeat Steps 2-4 until ‖βm+1 − βm‖ ≤ ε.

Convergence properties

Theorem. (Bertsimas, King, M. ’16)

Let βm,m ≥ 1 be generated by DFA:

(a) For any L ≥ `, the sequence g(βm) is decreasing and converges.

(b) If L > ` and under some minor regularity properties

I ‖βm+1 − βm‖22 ≤ ε in at most O( 1ε ) many iterations.

I Supp(βm) stabilizes after finitely many iterations and βmconverges to a first order stationary point.

Our approach

I Consider minβ12‖y −Xβ‖22 s.t. ‖β‖0 ≤ k

I Express best-subset as a Mixed Integer Optimization problem (MIO)

Special Ordered Sets-formulation

‖y −Xβ‖22 s.t. ‖β‖0 ≤ k

is equivalent to

minβ,z

‖y −Xβ‖22

s.t. (βi, 1− zi) : SOS type-1,i = 1, . . . , pp∑i=1

zi ≤ k

zi ∈ 0, 1, i = 1, . . . , p.

Implied Constraints

‖y −Xβ‖22 s.t. ‖β‖0 ≤ k

is equivalent to

‖y −Xβ‖22

s.t. ‖β‖0 ≤ k‖β‖∞ ≤ δ11, ‖β‖1 ≤ δ21

‖Xβ‖∞ ≤ δ12, ‖Xβ‖1 ≤ δ22

for constants δ11, δ12, δ21, δ22 (which can be computed from data).

Behavior with user-guided intelligence

Diabetes data: n = 350, p = 64.

0 20 40 60 80

Time (secs)

Warm StartCold Start

0 500 1000 1500 2000 2500 3000 3500

Time (secs)

Warm StartCold Start

Statistical Behavior

Sparsity Detection for n = 500, p = 100

1.742 3.484 6.967Signal−to−Noise Ratio

Method

Sparsenet

Prediction Error = ‖Xβalg −Xβtrue‖22/‖Xβtrue‖22

1.742 3.484 6.967Signal−to−Noise Ratio

Method

Sparsenet

Sparsity Detection for n = 50, p = 2000

3 7 10Signal−to−Noise Ratio

Method

First Order + MIO

First Order Only

Sparsenet

Prediction Error for n = 50, p = 2000

3 7 10Signal−to−Noise Ratio

Method

First Order + MIO

First Order Only

Sparsenet

What did we learn?

I For the case n > p, MIO+intelligence finds provably optimalsolutions for n = 500s, p = 100s in minutes.

I For the case n < p, MIO+intelligence finds solutions for n = 50s,p = 1000s in minutes and proving (approx)-optimality in hours.

I MIO solutions have a significant edge in sparsity and improvedprediction accuracy.

I Modern optimization (MIO+user guided intelligence) is capable oftackling large instances.

Outline

The Discrete Dantzig Selector

[M. & Radchenko ’16+]

I The Dantzig Selector [Candes, Tao ’07]:

βDS`1 ∈ argmin ‖β‖1 s.t. ‖X′(y −Xβ)‖∞ ≤ δ

I Instead, consider its `0 analogue:

I Find the sparsest β such that maximal (abs) correlation betweencovariates and residuals is small.

I Why is this important?– Formulation is a Mixed Integer Linear Optimization.

– Mixed Integer Linear is a more mature technology than MixedInteger Quadratic Optimization

[M. & Radchenko ’16+]

I The Dantzig Selector [Candes, Tao ’07]:

I Instead, consider its `0 analogue:

I Find the sparsest β such that maximal (abs) correlation betweencovariates and residuals is small.

I Why is this important?– Formulation is a Mixed Integer Linear Optimization.

– Mixed Integer Linear is a more mature technology than MixedInteger Quadratic Optimization

Under a sparse linear model with Gaussian errors: y = Xβ∗ + ε

I The errors:

– ‖βDS`0− β∗‖22

– ‖βDS`0− β∗‖21

– ‖X(βDS`0− β∗)‖22

are much smaller than the convex estimator βDS`1

(when features arecorrelated)

I # Non-zeros βDS`0

# Non-zeros βDS`1

I Statistical properties of βDS`0

comparable with Least Squares SubsetSelection

Some Large Problems

(Synthetic Examples)n p k∗ Upper Bound Lower Bound MIO Gap Time to Prove Opt

4,000 8,000 20 20 20 0 41.93,000 8,000 20 20 20 0 18.31,000 10,000 10 10 10 0 14.25,000 10,000 10 10 10 0 2.5

10,000 10,000 30 30 27 10% 42.5

(Real Data Examples)n p k∗ Upper Bound Lower Bound MIO Gap Time to Prove Opt

6,000 4,500 20 20 20 0 5.06,000 4,500 40 40 37 10% 12.5

Table: Solutions obtained within 5-10 minutes for all problems. CertifyingOptimality takes longer.

Some Large Problems

(Synthetic Examples)n p k∗ Upper Bound Lower Bound MIO Gap Time to Prove Opt

4,000 8,000 20 20 20 0 41.93,000 8,000 20 20 20 0 18.31,000 10,000 10 10 10 0 14.25,000 10,000 10 10 10 0 2.5

10,000 10,000 30 30 27 10% 42.5

(Real Data Examples)n p k∗ Upper Bound Lower Bound MIO Gap Time to Prove Opt

6,000 4,500 20 20 20 0 5.06,000 4,500 40 40 37 10% 12.5

Table: Solutions obtained within 5-10 minutes for all problems. CertifyingOptimality takes longer (several hours).

Outline

Effect of Outliers in Regression

[Bertsimas, M., ’14, Annals of Statistics]

I Least Squares (LS) estimator

β(LS) ∈ argminβ

n∑i=1

r2i , ri = yi − x′iβ

has a breakdown point of zero (Dohono & Huber ’83; Hampel ’75).

I The Least Absolute Deviation (LAD) estimator has a breakdownpoint of zero

β(LAD) ∈ argminβ

n∑i=1

I M-Estimators (Huber ’73) slightly improve the breakdown point

n∑i=1

ρ(ri), ρ(r) symmetric function

Least Median Regression

I Least Median of Squares (LMS) estimator [Rousseew (’84)]

β(LMS) ∈ argminβ

(mediani=1,...,n

|ri|).

I LMS highest possible breakdown point of almost 50%.

I More generally, Least Quantile of Squares (LQS) estimator:

β(LQS) ∈ argminβ

|r(q)|,

where, r(q) is the qth ordered absolute residual:

|r(1)| ≤ |r(2)| ≤ . . . ≤ |r(n)|.

Problem we address

I Solve the following problem:

minβ|r(q)|,

where, ri = yi − x′iβ, q is a quantile.

I Our approach extends to

minβ|r(q)|, s.t. Aβ ≤ b (and/or ‖β‖22 ≤ δ)

LQS and Subset Selection: A surprising link

I LQS and subset-selection in regression seem to be completelyunrelated concepts...

I However, a curious link emerges...

I Claim: LQS is performing an implicit subset search

Theorem [Bertsimas & M. ’14]: The LQS problem is equivalent to thefollowing:

minβ|r(q)| = min

I∈Ωq

(minβ‖yI −XIβ‖∞

where, Ωq := I : I ⊂ 1, . . . , n, |I| = q and (yI ,XI) denotes thesubsample (yi,xi), i ∈ I.

minβ|r(q)| = min

I∈Ωq

minβ|r(q)| = min

I∈Ωq

Overview of our approach

I Write the LMS problem as a MIO.— Main idea: MIO formulation sorts to express |r(q)|— Formulation very different from best subset selection inregression

I Using Discrete First Order methods we find good feasible solutions.

I Warm-starts and improved behavior with user-guided intelligence

MIO Formulation

Notation:|r(1)| ≤ |r(2)| ≤ . . . ≤ |r(n)|.

Step 1: Introduce binary variables zi, i = 1, . . . , n such that:

1, if |ri| ≤ |r(q)|,0, otherwise.

Step 2: Use auxiliary continuous variables µi, µi ≥ 0 such that:

|ri| − µi ≤ |r(q)| ≤ |ri|+ µi, i = 1, . . . , n,

with the conditions:

If |ri| ≥ |r(q)|, then µi = 0, µi ≥ 0,

and if |ri| ≤ |r(q)|, then µi = 0, µi ≥ 0.

MIO Formulation

Notation:|r(1)| ≤ |r(2)| ≤ . . . ≤ |r(n)|.

Step 1: Introduce binary variables zi, i = 1, . . . , n such that:

1, if |ri| ≤ |r(q)|,0, otherwise.

Step 2: Use auxiliary continuous variables µi, µi ≥ 0 such that:

|ri| − µi ≤ |r(q)| ≤ |ri|+ µi, i = 1, . . . , n,

with the conditions:

If |ri| ≥ |r(q)|, then µi = 0, µi ≥ 0,

and if |ri| ≤ |r(q)|, then µi = 0, µi ≥ 0.

MIO representable

MIO Formulation

min γ

s.t. |ri|+ µi ≥ γ, i = 1 . . . , n

γ ≥ |ri| − µi, i = 1 . . . , n

Muzi ≥ µi, i = 1, . . . , n

M`(1− zi) ≥ µi, i = 1, . . . , nn∑i=1

zi = q

µi ≥ 0, i = 1, . . . , n

zi ∈ 0, 1, i = 1, . . . , n,

where γ, zi, µi, µi, i = 1, . . . , n are decision variables and Mu,M` areBig-M constants.

What do we achieve?

I Prior exact algorithms can solve upto: n = 50 and p = 5

I We obtain:

I near optimal solutions for problems with n ≈ 200s and p ≈ 20sin seconds, proving optimality in minutes.

I near optimal solutions for problems with n ≈ 10, 000 andp ≈ 50 in minutes.

Outline

Background & Formulation

[Bertsimas, Copenhaver, M., ’16+]

Low Rank Factor Analysis (FA) [Spearman 1904]:

I widely used in multivariate statistics, econometrics, psychometrics

I represent correlation structure with few common (latent) factors.

Estimation Problem: Σ = L1L′1︸︷︷︸

+ L2L′2︸︷︷︸

− Σ ≈ Θ + Φ

− Φ = diag(Φ1, . . . ,Φp) 0

− rank(Θ) ≤ r,Θ 0

− Σ−Θ 0; Σ−Φ 0

Background & Formulation

Low Rank Factor Analysis (FA) [Spearman 1904]:

I widely used in multivariate statistics, econometrics, psychometrics

I represent correlation structure with few common (latent) factors.

Estimation Problem: Σ = L1L′1︸︷︷︸

+ L2L′2︸︷︷︸

− Σ ≈ Θ + Φ

− Φ = diag(Φ1, . . . ,Φp) 0

− rank(Θ) ≤ r,Θ 0

− Σ−Θ 0; Σ−Φ 0

min ‖Σ− (Θ + Φ)‖s.t. rank(Θ) ≤ r

Σ−Φ 0

Our Approach

Σ−Θ 0

Our Approach

Σ−Θ 0

← Sum of Singular Values

← Rank Constraint

← Semidefinite Constraint

Our Approach

Σ−Θ 0

← Sum of Singular Values

← Rank Constraint

← Semidefinite Constraint

I SDP with rank constraints

I Key Idea: Reformulate (†) equivalently as a SDP (without rankconstraint)

I Nonlinear Optimization techniques for feasible solutions

I Specialized Branch & Bound methods to certify optimality

Reformulation and tailored B&B

Σ−Θ 0

~w Variational Representation

of Spectral Functions

min 〈W,Σ−Θ〉 −p∑i=1

wiiΦi

s.t. I W 0

Tr(W) = p− rΣ−Θ 0

Reformulation and tailored B&B

Σ−Θ 0

~w Variational Representation

of Spectral Functions

min 〈W,Σ−Θ〉 −p∑i=1

wiiΦi ←−

Bilinear Form (Nonconvex)

McCormick Hulls/ B&B

s.t. I W 0

Tr(W) = p− rΣ−Θ 0

What do we learn?

I Several experiments on both real and synthetic datasets, reveal:

I Upper bounds obtained within few seconds (p = 100) toseveral minutes (p = 4000)

I Certifying optimality takes longer (several hours)

I Global optimality certificates obtained on datasets, where,assumptions required for convex problem to succeed cannot beverified.

ArXiv link: http://arxiv.org/pdf/1604.06837v1.pdf

Conclusions

I MIO is an advanced, computationally tractable mathematicalprogramming framework

I Provides a powerful modeling tool for statistical problems

I Leads to a significant edge in Sparse Learning problems that areinherently discrete.

I 15.097: PhD class taught at MIT Spring 2016 on related topics.

Thank you!

All papers available at:http://www.mit.edu/~rahulmaz/research.html

Problems in Sparse Multivariate Statistics with a Discrete ...€¦ · Problems in Sparse...

Documents

Transcript of Problems in Sparse Multivariate Statistics with a Discrete ...€¦ · Problems in Sparse...

Sparse Representation of Multivariate Extremes with ...proceedings.mlr.press/v51/goix16-supp.pdf · an original dimension-reduction technique in the ... In a multivariate ‘Peaks-over-threshold’

On the numerical solution of large-scale sparse discrete ...hfassben/papers/LargeDARE_11_2009.pdf · The numerical solution of several types of large-scale matrix equations with sparse

Semideﬁnite Optimization with Applications in Sparse …aspremon/PDF/Banff07.pdf · 2013. 9. 9. · Semideﬁnite Optimization with Applications in Sparse Multivariate Statistics

Jhonson Kotz Dist Discrete Multivariate (1996)

Discrete Green’s Functions for Harmonic and …na.math.uni-goettingen.de/pdf/emmcvpr15.pdf · Discrete Green’s Functions for Harmonic and Biharmonic Inpainting with Sparse Atoms

[Part 4] 1/43 Discrete Choice Modeling Bivariate & Multivariate Probit Discrete Choice Modeling William Greene Stern School of Business New York University.

Discrete uncertainty principles and sparse signal processing · · 2017-05-26Discrete uncertainty principles and sparse signal processing ... digital signal processing has ... of

Sparse Directional Image Representations using the Discrete ...dlabate/shear_ELL.pdfSparse Directional Image Representations using the Discrete Shearlet Transform Glenn Easley System

Multivariate Convolutional Sparse Coding for ... · Multivariate Convolutional Sparse Coding for Electromagnetic Brain Signals Dupr e La Tour T., TM, Mainak J., Gramfort A. INRIA

Discrete Multivariate Analysis

Discrete Multivariate Analysis Analysis of Multivariate Categorical Data.

This paper shows how discrete multivariate analysis (or ... · methods generally described as (multidimensional) contingency table, or discrete multivariate analysis. These methods

Michael.Unser@epfl.ch Switzerland Sparse …big2.1 DECOUPLING OF SPARSE PROCESSES 25 s =L1w , w =Ls Discrete approximation of operator Operator-like wavelet analysis Decoupling: Linear

Sparse Control for High-DOF Assistive Robotscs.brown.edu/~cjenkins/papers/cjenkins_isr2007.pdf · Sparse Control for High-DOF Assistive Robots ... and discrete control of 4 DOF robot

On Learning Discrete Graphical Models using Group-Sparse ...pradeepr/paperz/dgm_aistats.pdf · On Learning Discrete Graphical Models using Group-Sparse Regularization regression [32,

Clustering for multivariate continuous and discrete ...

Sparse Frequency Analysis with Sparse-Derivative Instantaneous … · 2013. 2. 26. · Index Terms—sparse signal representation, total variation, discrete Fourier transform (DFT),

Expression quantitative trait loci mapping with multivariate sparse …pages.stat.wisc.edu/~keles/Papers/chun_keles_eQTL_SPLS... · 2008. 7. 3. · Expression quantitative trait loci

Multivariate sparse dynamic process modeling and inferenceweb.math.ku.dk/~richard/download/talks/sparseDynamic... · 2011. 6. 8. · Multivariate sparse dynamic process modeling and

Sparse and low-rank multivariate Hawkes processes