Outlier Pursuit: Robust PCA and Collaborative Filtering Pursuit: Robust PCA and Collaborative...

Outlier Pursuit: Robust PCA and Collaborative Filtering

Huan Xu

Dept. of Mechanical Engineering & Dept. of Mathematics

National University of Singapore

Joint w/ Constantine Caramanis, Yudong Chen, Sujay Sanghavi

Outline

PCA and Outliers - Why SVD fails - Corrupted features vs. corrupted points

Our idea + Algorithms Results - Full observation - Missing Data Framework for Robustness in High Dimensions

Principal Components Analysis

Data Points Low Rank Matrix

Classical technique: 1. Organize points as matrix 2. Take SVD 3. Top singular vectors span space

Given points that lie on/near a Lower dimensional subspace, find this subspace.

Fragility

Gross errors of even one/few points can completely throw off PCA

Reason: Classical PCA minimizes error, which is susceptible to gross outliers

Two types of gross errors

- Individual entries corrupted -Entire columns corrupted

.. and missing data versions of both

Corrupted Features Outliers

PCA with Outliers Points

Objective: find identities of outliers (and hence col. space of true matrix)

Outlier Pursuit - Idea Points

Standard PCA

Outlier Pursuit - Method Points

We propose:

Convex surrogate for Rank constraint

Convex surrogate for Column-sparsity

When does it (not) work ?

When certain directions of column space of poorly represented

This vector has large inner product with some coordinate axes

is large

Results

Assumption: Columns of true are incoherent:

Note:

First consider: Noiseless case

Results

Assumption: Columns of true are incoherent:

Theorem: (noiseless case) Our convex program can identify upto a fraction of outliers as long as

Note:

Outer bound: makes the problem un-identifiable

Proof Technique

A point is the optimum of a convex function

Zero lies in the (sub) gradient of at

Steps: 1. guess a “nice” point, -- oracle problem 2. show it is the optimum by showing zero is in subgradient

Proof Technique

Guessing a “nice” optimum (Note: Due to the fact that column sparse matrix is also low-rank, hence “nice” points are non-unique. In problems like matrix completion, compressed sensing or RPCA with sparse corruption, this is not an issue)

Oracle Problem:

is, by definition, a nice point. Rest of proof: showing it is the optimum of original program, under our assumption.

Proof Technique

Constructing Dual Certificate: need a Q to satisfy

A first guess is to use This works when each corrupted column is orthogonal, but fails otherwise. To satisfy the equalities, use a corrected Q. Then show the inequalities hold

under the conditions given in the theorem.

(More detailed) Proof Technique

• One can verify that

• Let then (1) and (2) are satisfied • To satisfy (3), we need , such that

• This can be constructed via

• Indeed the least square solution. • The inverse operator = , provided the

series converges. • satisfies all three equalities. Then

show the inequalities hold.

Noisy case

• We observe for unknown noise • Relax the equality constraint to inequality, i.e., solve

• Under essentially same condition as the noiseless case, the noisy outlier pursuit succeeds: it finds a solution close to a pair with correct column space and column support.

Implementation Issue

• Outlier Pursuit is an SDP.

• Interior methods may not scale.

• We are interested in the structure – can sacrifice accuracy for scalability

• Use proximal gradient algorithms.

Performance

L + C formulation L + S formulation ( from [Chandrasekaran et. al.], [Candes,et. al.] )

More experiment

• 220 samples of digit ‘1’ and 11 samples of digit ‘7’, unlabeled.

Another view…

Mean is solution of

Median is solution of

Standard PCA of M is solution of

Our method is (convex rel. of)

Fragile: Can be easily skewed by one / few points

Robust: skewing requires Error in constant fraction of pts

Collaborative Filtering w/ Adversaries

“In an incident that highlights the pitfalls of online recommendation systems, Amazon.com on Friday removed a link to a sex manual that appeared next to a listing for a spiritual guide by well-known Christian televangelist Pat Robertson.”

http://news.cnet.com/2100-1023-976435.html

Manipulation in RS

• Recommendation systems utilize ALL users’ ratings to predict theirs future preferences.

• Malicious users try to skew prediction by injecting fake ratings.

• Some popular algorithms (e.g. kNN) are vulnerable to manipulation.

RS via Matrix Completion (convention wisdom)

• Preference Matrix

• Partial Observations

Given Recover L0?

RS via Matrix Completion

• Assumption: User Preferences form Low-rank Matrix

Users

Products

Products Features

User Features

Products Features

User 3’s overall preference for Product 2

Product 2’s feature vector

User 3’s feature vector

Performance

Portability

Preference for performance

Preference for portability

e.g. Product = Laptop

Robust Matrix Completion (New idea)

L0 (Low-rank)

Authentic columns M

C0 (Column-Sparse)

Arbitrarily corrupted columns

Partial Observation

Problem:

Given , find L0 and indentify columns of C0

Collaborative Filtering w/ Adversaries

Users

Low-rank matrix that -Is partiallly observed

-Has some corrupted columns

== outliers with missing data !

Our setting: -Good users == random sampling of incoherent matrix (as in matrix completion)

-Manipulators == completely arbitrary sampling, values

Outlier Pursuit with Missing Data

for observed

Now: need row space to be incoherent as well - since we are doing matrix completion and manipulator identification

Our Result

Theorem: Convex program optimum is such that has the correct column space and the support of is exactly the set of manipulators, whp, provided

Sampling density

Fraction of users that are manipulators

Note: no assumptions on manipulators

Corollaries

• The algorithm succeeds if – there are a constant fraction of corrupted

columns, and the fraction of observation is also a constant;

– there are a growing number of corrupted columns, and the fraction of observations goes to zero.

Roadmap of the Proof

• Need a Q to satisfy

• Least square solution to satisfy (1), (2), and (5)? • Problem: least square solution involves infinite series,

which breaks down independence, while to get sharp inequality, we need to exploit iid of sampling.

Roadmap of the proof -2

• Rethink of the optimality condition: Optim. certificate insures for any feasible pertubation • The condition , comes from the need to

show above inequality for any • However, that is not necessary. One can show that

• We can relax the equality condition to

inequality, but tighten the other inequalities. • Makes it possible to avoid infinite series.

Robust Collaborative Filtering

Algo: Partially observed Low-rank + Column-sparse

Algo: Partially observed Sparse + Low-rank

More generally …

Several methods in High-dim. Statistics

Our approach:

Loss function regularizer

(same) Loss function Weighted sum of regularizers

Yields robustness + flexibility in several settings. Today: PCA wit Outliers + missing data

Papers can be found in my website: http://guppy.mpe.nus.edu.sg/~mpexuh/

Outlier Pursuit: Robust PCA and Collaborative Filtering Pursuit: Robust PCA and Collaborative...

Documents

Transcript of Outlier Pursuit: Robust PCA and Collaborative Filtering Pursuit: Robust PCA and Collaborative...