Bo-Jian Hou, Lijun Zhang, Zhi-Hua Zhou - LAMDA · Basic Idea 2 Proposed Methods Experiment Storage...

1
Basic Idea 2 Proposed Methods Experiment Storage Fit Learning with Unlabeled Data [email protected] ; [email protected] ; [email protected] . Contact Motivation 1 4 A new setting: storage fit learning with unlabeled data. Key: given different storage budgets, the behavior of the algorithm should be adjusted differently. Concern algorithms relying on spectral analysis which suffer seriously from storage burden of kernel matrix. Utilize the techniques of low-rank approximation to adapt these algorithms to fit for a given storage budget. Conclusion 5 3 Bo-Jian Hou, Lijun Zhang, Zhi-Hua Zhou SoCK and NysCK outperform other two methods under all budgets. The proposed algorithms achieve competitive performance on all data sets. NysCK gets an approximate solution with a not high AUC value in a short time. SoCK can refine its solution continuously with the decreasing of approximation error and outperforms NysCK after a few seconds. SoCK is more effective while NysCK is more efficient. Smart mobile devices are widely used and can generate large amount of data. Face the classification task on mobile devices. Few photos are labeled: Most photos are unlabeled: flower tree north building person The storage for the learning process on mobile devices is limited. Different mobile devices have different limited memories: 1G 2G 3G 4G 3G 2G We need to adjust the semi-supervised algorithms to fit for different storage budgets. With labeled and unlabeled data, we can do semi-supervised learning. Now we are aimed at obtaining “virtual samples”. Obtaining “virtual samples” to transform kernel SVM to linear SVM; Then storage costs from ( ) to (); Fit for different storage budgets by ! With Storage Budget: Without Storage Budget: SoCK vs NysCK: Two methods to find the eigensystem (eigenvalues and eigenvectors) of or : Stochastic Optimization for Cluster Kernel (SoCK) We only need to find the eigensystem of the original kernel ! Singular Value Thresholding (SVT) Stochastic Composite Optimization (SCO) is generated by Random Fourier Features SPGD to solve it and take the last iteration as the final solution Nystrom Cluster Kernel (NysCK) To obtain the top eigensystem of , we need to find a low-rank matrix to approximate . To obtain the top eigensystem of , we sample instances and use Nystrom to directly calculate the eigensystem of . Then we need to do SVD decomposition on 1− + where and can be split into two matrices: , . Space complexity ( + ) where and is much smaller than through adjusting , we can fit for different storage budgets. Space complexity where is much smaller than through adjusting , we can fit for different storage budgets. eigenvalues eigenvectors SVD decomposition s Space and time complexity comparisons virtual samples cluster kernel top k eigenvector of original kernel modified top k eigenvalue of original kernel So in each iteration we only need to do SVD decomposition on .

Transcript of Bo-Jian Hou, Lijun Zhang, Zhi-Hua Zhou - LAMDA · Basic Idea 2 Proposed Methods Experiment Storage...

Page 1: Bo-Jian Hou, Lijun Zhang, Zhi-Hua Zhou - LAMDA · Basic Idea 2 Proposed Methods Experiment Storage Fit Learning with Unlabeled Data houbj@lamda.nju.edu.cn; zhanglj@lamda.nju.edu.cn;

Basic Idea 2

Proposed Methods Experiment

Storage Fit Learning

with Unlabeled Data

[email protected]; [email protected]; [email protected].

Contact

Motivation 1

4

• A new setting: storage fit learning with unlabeled data.

• Key: given different storage budgets, the behavior of the algorithm

should be adjusted differently.

• Concern algorithms relying on spectral analysis which suffer seriously

from storage burden of kernel matrix.

• Utilize the techniques of low-rank approximation to adapt these

algorithms to fit for a given storage budget.

Conclusion 5

3

Bo-Jian Hou, Lijun Zhang, Zhi-Hua Zhou

SoCK and NysCK outperform other two methods under all budgets.

The proposed algorithms achieve competitive performance on all data sets.

• NysCK gets an approximate solution with a not high AUC value in a short time.• SoCK can refine its solution continuously with the decreasing of approximation

error and outperforms NysCK after a few seconds.• SoCK is more effective while NysCK is more efficient.

• Smart mobile devices are widely usedand can generate large amount of data.

• Face the classification task on mobiledevices.

Few photos are labeled:

Most photos are unlabeled:

flower tree

north building person

• The storage for the learning process on mobile devices is limited.• Different mobile devices have different limited memories:

1G 2G 3G

4G 3G 2G

We need to adjust the semi-supervised algorithms to fit for different storage budgets.

With labeled and unlabeled data, we can do semi-supervised learning.

Now we are aimed at obtaining “virtual samples”.

• Obtaining “virtual samples” to transform kernel SVM to linear SVM;• Then storage costs from 𝑶(𝒏𝟐) to 𝑶(𝒏𝒅);• Fit for different storage budgets by 𝒅 !

With Storage Budget:

Without Storage Budget:

SoCK vs NysCK:

Two methods to find the eigensystem (eigenvalues and eigenvectors) of 𝐿 or 𝐾:

Stochastic Optimization for Cluster Kernel (SoCK)

We only need to find the eigensystem of the original kernel !

Singular Value Thresholding (SVT) Stochastic Composite Optimization (SCO)

𝐿𝜉 is generated by

Random Fourier FeaturesSPGD to solve it and take the last iteration as the final solution

Nystrom Cluster Kernel (NysCK)

To obtain the top eigensystem of 𝐿, we need to find a low-rank matrix 𝑳 toapproximate 𝐿.

To obtain the top eigensystem of 𝑲, we sample instances and use Nystrom todirectly calculate the eigensystem of 𝑲.

Then we need to do SVD decomposition on 1 − 𝜂𝑡 𝑍𝑡 + 𝜂𝑡𝐿𝑡 where 𝐿𝑡 and 𝑍𝑡 can besplit into two matrices: , .

Space complexity 𝑂 𝑛(𝑎𝑡 + 𝑏𝑡) where 𝑎𝑡 and 𝑏𝑡 is much smaller than 𝑛,through adjusting 𝑏𝑡, we can fit for different storage budgets.

Space complexity 𝑂 𝑛𝑠 where 𝑠 is much smaller than 𝑛,through adjusting 𝑠, we can fit for different storage budgets.

eigenvalues eigenvectors

SVD decomposition

s

Space and timecomplexitycomparisons

virtual samples

cluster kernel

top k eigenvectorof original kernel

modified top k eigenvalueof original kernel

So in each iteration we only need to do SVD decomposition on .