Multi-Task Learning: Theory, Algorithms, and Applicationsjye02/Software/MALSAR/MTL-SDM12.pdfTutorial...

83
The Twelfth SIAM Information Conference on Data Mining, April 27, 2012 Center for Evolutionary Medicine and Informatics Multi-Task Learning: Theory, Algorithms, and Applications Jiayu Zhou 1,2 , Jianhui Chen 3 , Jieping Ye 1,2 1 Computer Science and Engineering, Arizona State University, AZ 2 The Biodesign Institute, Arizona State University, AZ 3 GE Global Research, NY

Transcript of Multi-Task Learning: Theory, Algorithms, and Applicationsjye02/Software/MALSAR/MTL-SDM12.pdfTutorial...

Page 1: Multi-Task Learning: Theory, Algorithms, and Applicationsjye02/Software/MALSAR/MTL-SDM12.pdfTutorial Goals •Understand the basic concepts in multi-task learning •Understand different

The Twelfth SIAM Information Conference on Data Mining, April 27, 2012

Center for Evolutionary Medicine and Informatics

Multi-Task Learning: Theory, Algorithms, and Applications

Jiayu Zhou1,2, Jianhui Chen3, Jieping Ye1,2

1Computer Science and Engineering, Arizona State University, AZ 2The Biodesign Institute, Arizona State University, AZ

3GE Global Research, NY

Page 2: Multi-Task Learning: Theory, Algorithms, and Applicationsjye02/Software/MALSAR/MTL-SDM12.pdfTutorial Goals •Understand the basic concepts in multi-task learning •Understand different

The Twelfth SIAM Information Conference on Data Mining, April 27, 2012

Center for Evolutionary Medicine and Informatics

Tutorial Goals

• Understand the basic concepts in multi-task learning

• Understand different approaches to model task relatedness

• Get familiar with different types of multi-task learning techniques

• Introduce multi-task learning applications

• Introduce the multi-task learning package: MALSAR

2

Page 3: Multi-Task Learning: Theory, Algorithms, and Applicationsjye02/Software/MALSAR/MTL-SDM12.pdfTutorial Goals •Understand the basic concepts in multi-task learning •Understand different

The Twelfth SIAM Information Conference on Data Mining, April 27, 2012

Center for Evolutionary Medicine and Informatics

Tutorial Road Map

• Part I: Multi-task Learning (MTL) background and motivations

• Part II: MTL formulations

• Part III: Case study of real-world applications

– Incomplete Multi-Source Fusion

– Drosophila Gene Expression Image Analysis

• Part IV: An MTL Package (MALSAR)

• Current and future directions

3

Page 4: Multi-Task Learning: Theory, Algorithms, and Applicationsjye02/Software/MALSAR/MTL-SDM12.pdfTutorial Goals •Understand the basic concepts in multi-task learning •Understand different

The Twelfth SIAM Information Conference on Data Mining, April 27, 2012

Center for Evolutionary Medicine and Informatics

Tutorial Road Map

• Part I: Multi-task Learning (MTL) background and motivations

• Part II: MTL formulations

• Part III: Case study of real-world applications

– Incomplete Multi-Source Fusion

– Drosophila Gene Expression Image Analysis

• Part IV: An MTL Package (MALSAR)

• Current and future directions

4

Page 5: Multi-Task Learning: Theory, Algorithms, and Applicationsjye02/Software/MALSAR/MTL-SDM12.pdfTutorial Goals •Understand the basic concepts in multi-task learning •Understand different

The Twelfth SIAM Information Conference on Data Mining, April 27, 2012

Center for Evolutionary Medicine and Informatics

Multiple Tasks o Examination Scores Prediction1 (Argyriou et. al.’08)

5

School 1 - Alverno High School

School 138 - Jefferson Intermediate School

School 139 - Rosemead High School

1The Inner London Education Authority (ILEA)

student-dependent school-dependent

student-dependent school-dependent

student-dependent school-dependent

Student

id

Birth

year

Previous

score

School

ranking

72981 1985 95 … 83% …

Student

id

Birth

year

Previous

score

School

ranking

31256 1986 87 … 72% …

Student

id

Birth

year

Previous

score

School

ranking

12381 1986 83 … 77% …

Exam

score

?

Exam

score

?

Exam

score

?

Page 6: Multi-Task Learning: Theory, Algorithms, and Applicationsjye02/Software/MALSAR/MTL-SDM12.pdfTutorial Goals •Understand the basic concepts in multi-task learning •Understand different

The Twelfth SIAM Information Conference on Data Mining, April 27, 2012

Center for Evolutionary Medicine and Informatics

Learning Multiple Tasks o Learning each task independently

6

task 1st

task 138th

task 139th

Student

id

Birth

year

Previous

score

School

ranking

72981 1985 95 83% …

Student

id

Birth

year

Previous

score

School

ranking

31256 1986 87 72% …

Student

id

Birth

year

Previous

score

School

ranking

12381 1986 83 77% …

Exam

Score

?

Exam

Score

?

Exam

Score

?

School 1 - Alverno High School

School 138 - Jefferson Intermediate School

School 139 - Rosemead High School

Excellent

Excellent

Excellent

Page 7: Multi-Task Learning: Theory, Algorithms, and Applicationsjye02/Software/MALSAR/MTL-SDM12.pdfTutorial Goals •Understand the basic concepts in multi-task learning •Understand different

The Twelfth SIAM Information Conference on Data Mining, April 27, 2012

Center for Evolutionary Medicine and Informatics

Learning Multiple Tasks o Leaning multiple tasks simultaneously

7

task 1st

task 138th

task 139th

Learn tasks simultaneously Model the tasks relationship

Student

id

Birth

year

Previous

score

School

ranking

72981 1985 95 83% …

Student

id

Birth

year

Previous

score

School

ranking

31256 1986 87 72% …

Student

id

Birth

year

Previous

score

School

ranking

12381 1986 83 77% …

Exam

Score

?

Exam

Score

?

School 1 - Alverno High School

School 138 - Jefferson Intermediate School

School 139 - Rosemead High School

Exam

Score

?

Page 8: Multi-Task Learning: Theory, Algorithms, and Applicationsjye02/Software/MALSAR/MTL-SDM12.pdfTutorial Goals •Understand the basic concepts in multi-task learning •Understand different

The Twelfth SIAM Information Conference on Data Mining, April 27, 2012

Center for Evolutionary Medicine and Informatics

Performance of MTL o Evaluation on the School data:

• Predict exam scores for 15362 students from 139 schools

• Describe each student by 27 attributes

• Compare single task learning approaches (Ridge Regression, Lasso) and one multi-task learning approach (trace-norm regularized learning)

8

N−MSE = mean squared error

variance (target)

1 2 3 4 5 6 7 80.7

0.75

0.8

0.85

0.9

0.95

1

1.05

Index of Training Ratio

N-M

SE

Ridge Regression

Lasso

Trace Norm

Performance measure:

Page 9: Multi-Task Learning: Theory, Algorithms, and Applicationsjye02/Software/MALSAR/MTL-SDM12.pdfTutorial Goals •Understand the basic concepts in multi-task learning •Understand different

The Twelfth SIAM Information Conference on Data Mining, April 27, 2012

Center for Evolutionary Medicine and Informatics

Multi-Task Learning

• Multi-task Learning is different from single task learning in the training (induction) process.

• Inductions of multiple tasks are performed simultaneously to capture intrinsic relatedness.

9

Training DataTrained Model

Task 1

Training DataTrained Model

Task 2

Training DataTrained Model

Task t

...

...

Training

Generalization

Generalization

Generalization

Multi-Task Learning

Training DataTrained Model

Task 1

Training DataTrained Model

Task 2

Training DataTrained Model

Task t

...

...

Training

Training

Training

Generalization

Generalization

Generalization

Single Task Learning

Page 10: Multi-Task Learning: Theory, Algorithms, and Applicationsjye02/Software/MALSAR/MTL-SDM12.pdfTutorial Goals •Understand the basic concepts in multi-task learning •Understand different

The Twelfth SIAM Information Conference on Data Mining, April 27, 2012

Center for Evolutionary Medicine and Informatics

Learning Methods

o

o

o

Page 11: Multi-Task Learning: Theory, Algorithms, and Applicationsjye02/Software/MALSAR/MTL-SDM12.pdfTutorial Goals •Understand the basic concepts in multi-task learning •Understand different

The Twelfth SIAM Information Conference on Data Mining, April 27, 2012

Center for Evolutionary Medicine and Informatics

Web Pages Categorization Chen et. al. 2009 ICML

• Classify documents into categories

• The classification of each category is a task

• The tasks of predicting different categories may be latently related

11

Health

Travel

World

Politics

US...

Category Classifiers

Models of different categories are

latently related

Classifiers’ Parameters

Page 12: Multi-Task Learning: Theory, Algorithms, and Applicationsjye02/Software/MALSAR/MTL-SDM12.pdfTutorial Goals •Understand the basic concepts in multi-task learning •Understand different

The Twelfth SIAM Information Conference on Data Mining, April 27, 2012

Center for Evolutionary Medicine and Informatics

MTL for HIV Therapy Screening Bickel et. al. ICML 08

• Hundreds of possible combinations of drugs, some of which use similar biochemical mechanisms

• The samples available for each combination are limited.

• For a patient, the prediction of using one combination is a task

• Use the similarity information by multiple task learning

12

ETV

ABC

AZT

D4T

DDI

FTC

...

Patient Treatment RecordDrug

AZT

ETV

FTC

?

Page 13: Multi-Task Learning: Theory, Algorithms, and Applicationsjye02/Software/MALSAR/MTL-SDM12.pdfTutorial Goals •Understand the basic concepts in multi-task learning •Understand different

The Twelfth SIAM Information Conference on Data Mining, April 27, 2012

Center for Evolutionary Medicine and Informatics

Other Applications

• Portfolio selection [Ghosn and Bengio, NIPS’97]

• Collaborative ordinal regression [Yu et. al. NIPS’06]

• Web image and video search [Wang et. al. CVPR’09]

• Disease progression modeling [Zhou et. al. KDD’11]

• Disease prediction [Zhang et. al. NeuroImage 12]

13

Page 14: Multi-Task Learning: Theory, Algorithms, and Applicationsjye02/Software/MALSAR/MTL-SDM12.pdfTutorial Goals •Understand the basic concepts in multi-task learning •Understand different

The Twelfth SIAM Information Conference on Data Mining, April 27, 2012

Center for Evolutionary Medicine and Informatics

Tutorial Road Map

• Part I: Multi-task Learning (MTL) background and motivations

• Part II: MTL formulations

• Part III: Case study of real-world applications

– Incomplete Multi-Source Fusion

– Drosophila Gene Expression Image Analysis

• Part IV: An MTL Package (MALSAR)

• Current and future directions

14

Page 15: Multi-Task Learning: Theory, Algorithms, and Applicationsjye02/Software/MALSAR/MTL-SDM12.pdfTutorial Goals •Understand the basic concepts in multi-task learning •Understand different

The Twelfth SIAM Information Conference on Data Mining, April 27, 2012

Center for Evolutionary Medicine and Informatics

How Tasks Are Related

15

All tasks are relatedAssumption:

Methods • Mean-regularized MTL • Joint feature learning • Trace-Norm regularized

MTL • Alternating structural

optimization (ASO) • Shared Parameter

Gaussian Process

Page 16: Multi-Task Learning: Theory, Algorithms, and Applicationsjye02/Software/MALSAR/MTL-SDM12.pdfTutorial Goals •Understand the basic concepts in multi-task learning •Understand different

The Twelfth SIAM Information Conference on Data Mining, April 27, 2012

Center for Evolutionary Medicine and Informatics

How Tasks Are Related

16

• Assume all tasks are related may be too strong for practical applications.

• There are some irrelevant (outlier) tasks.

There are outlier tasksAssumption:

Page 17: Multi-Task Learning: Theory, Algorithms, and Applicationsjye02/Software/MALSAR/MTL-SDM12.pdfTutorial Goals •Understand the basic concepts in multi-task learning •Understand different

The Twelfth SIAM Information Conference on Data Mining, April 27, 2012

Center for Evolutionary Medicine and Informatics

How Tasks Are Related

17

Methods • Clustered MTL • Tree MTL • Network MTL

Tasks have group structuresAssumption:

Tasks have tree structuresAssumption:

a cb

Models

Tasks have graph/network structuresAssumption:

Page 18: Multi-Task Learning: Theory, Algorithms, and Applicationsjye02/Software/MALSAR/MTL-SDM12.pdfTutorial Goals •Understand the basic concepts in multi-task learning •Understand different

The Twelfth SIAM Information Conference on Data Mining, April 27, 2012

Center for Evolutionary Medicine and Informatics

Multi-Task Learning Methods

18

• Regularization-based MTL

– All tasks are related • regularized MTL, joint feature learning, low rank MTL, ASO

– Learning with outlier tasks: robust MTL

– Tasks form groups/graphs/trees • clustered MTL, network MTL, tree MTL

• Other Methods

– Shared Hidden Nodes in Neural Network

– Shared Parameter Gaussian Process

Page 19: Multi-Task Learning: Theory, Algorithms, and Applicationsjye02/Software/MALSAR/MTL-SDM12.pdfTutorial Goals •Understand the basic concepts in multi-task learning •Understand different

The Twelfth SIAM Information Conference on Data Mining, April 27, 2012

Center for Evolutionary Medicine and Informatics

Regularization-based Multi-Task Learning

• All tasks are related

– Mean-Regularized MTL

– MTL in high dimensional feature space • Embedded Feature Selection

• Low-Rank Subspace Learning

• Clustered MTL

• MTL with Tree/Graph structure

19

Page 20: Multi-Task Learning: Theory, Algorithms, and Applicationsjye02/Software/MALSAR/MTL-SDM12.pdfTutorial Goals •Understand the basic concepts in multi-task learning •Understand different

The Twelfth SIAM Information Conference on Data Mining, April 27, 2012

Center for Evolutionary Medicine and Informatics

Notation

• We focus on linear models: Yi = 𝑋𝑖 ×𝑊𝑖 𝑋𝑖 ∈ ℝ

𝑛𝑖×𝑑 , 𝑌𝑖 ∈ ℝ𝑛𝑖×1,𝑊 = [𝑊1,𝑊2, … ,𝑊𝑚]

20

Learning

Task m

Dimension d

Sam

ple

nt

... Sa

mp

le n

2

Sam

ple

n1

Feature Matrices Xi

Task m

Sam

ple

nt

... Sa

mp

le n

2

Sam

ple

n1

Target Vectors Yi

Task m

Dim

ensio

n d

Model Matrix W

Page 21: Multi-Task Learning: Theory, Algorithms, and Applicationsjye02/Software/MALSAR/MTL-SDM12.pdfTutorial Goals •Understand the basic concepts in multi-task learning •Understand different

The Twelfth SIAM Information Conference on Data Mining, April 27, 2012

Center for Evolutionary Medicine and Informatics

Mean-Regularized Multi-Task Learning Evgeniou & Pontil, 2004 KDD

• Assumption: task parameter vectors of all tasks are close to each other.

– Advantage: simple, intuitive, easy to implement

– Disadvantage: may not hold in real applications.

21

mean

Task

Regularization penalizes the deviation of each task from the mean

min𝑊

1

2𝑋𝑊 − 𝑌 𝐹

2 + 𝜆 𝑊𝑖 −1

𝑚 𝑊𝑠

𝑚

𝑠=1

𝑚

𝑖=1 2

2

Page 22: Multi-Task Learning: Theory, Algorithms, and Applicationsjye02/Software/MALSAR/MTL-SDM12.pdfTutorial Goals •Understand the basic concepts in multi-task learning •Understand different

The Twelfth SIAM Information Conference on Data Mining, April 27, 2012

Center for Evolutionary Medicine and Informatics

Regularization-based Multi-Task Learning

• All tasks are related

– Mean-Regularized MTL

– MTL in high dimensional feature space • Embedded Feature Selection

• Low-Rank Subspace Learning

• Clustered MTL

• MTL with Tree/Graph structure

22

Page 23: Multi-Task Learning: Theory, Algorithms, and Applicationsjye02/Software/MALSAR/MTL-SDM12.pdfTutorial Goals •Understand the basic concepts in multi-task learning •Understand different

The Twelfth SIAM Information Conference on Data Mining, April 27, 2012

Center for Evolutionary Medicine and Informatics

Multi-Task Learning with High Dimensional Data

• In practical applications, we may deal with high dimensional data.

– Gene expression data, biomedical image data

• Curse of Dimensionality

• Dealing with high dimensional data in multi-task learning

– Embedded feature selection: L1/Lq - Group Lasso

– Low-rank subspace learning: low-rank assumption – ASO, Trace-norm regularization

23

Page 24: Multi-Task Learning: Theory, Algorithms, and Applicationsjye02/Software/MALSAR/MTL-SDM12.pdfTutorial Goals •Understand the basic concepts in multi-task learning •Understand different

The Twelfth SIAM Information Conference on Data Mining, April 27, 2012

Center for Evolutionary Medicine and Informatics

Regularization-based Multi-Task Learning

• All tasks are related

– Mean-Regularized MTL

– MTL in high dimensional feature space • Embedded Feature Selection

• Low-Rank Subspace Learning

• Clustered MTL

• MTL with Tree/Graph structure

24

Page 25: Multi-Task Learning: Theory, Algorithms, and Applicationsjye02/Software/MALSAR/MTL-SDM12.pdfTutorial Goals •Understand the basic concepts in multi-task learning •Understand different

The Twelfth SIAM Information Conference on Data Mining, April 27, 2012

Center for Evolutionary Medicine and Informatics

Multi-Task Learning with Joint Feature Learning

• One way to capture the task relatedness from multiple related tasks is to constrain all models to share a common set of features.

• For example, in school data, the scores from different schools may be determined by a similar set of features.

25

Feature 1

Feature 2

Feature 3

Feature 4

Feature 5

Feature 6

Feature 7

Feature 8

Feature 9

Feature 10

Task 1Task 2

Task 3Task 4

Task 5

Page 26: Multi-Task Learning: Theory, Algorithms, and Applicationsjye02/Software/MALSAR/MTL-SDM12.pdfTutorial Goals •Understand the basic concepts in multi-task learning •Understand different

The Twelfth SIAM Information Conference on Data Mining, April 27, 2012

Center for Evolutionary Medicine and Informatics

Multi-Task Learning with Joint Feature Learning Obozinski et. al. 2009 Stat Comput, Liu et. al. 2010 Technical Report

• Using group sparsity: ℓ1/ℓ𝑞-norm regularization

• When q>1 we have group sparsity.

26

min𝑊

1

2𝑋𝑊 − 𝑌 𝐹

2 + 𝜆 𝑊 1,𝑞

𝑊 1,𝑞 = 𝒘𝑖 𝑞

𝑑

𝑖=1

...

Y

n×m

... ×

WX

n×d d×m

Sample 1

...

...Sample 2

Sample 3

Sample n-2

Sample n-1

Sample n

Task 1Task 2

Task 3Task m

Task 1Task 2

Task 3Task m

Input ModelOutput

Page 27: Multi-Task Learning: Theory, Algorithms, and Applicationsjye02/Software/MALSAR/MTL-SDM12.pdfTutorial Goals •Understand the basic concepts in multi-task learning •Understand different

The Twelfth SIAM Information Conference on Data Mining, April 27, 2012

Center for Evolutionary Medicine and Informatics

Writer-Specific Character Recognition Obozinski, Taskar, and Jordan, 2006

• Each task is a classification between two letters for one writer.

27

Page 28: Multi-Task Learning: Theory, Algorithms, and Applicationsjye02/Software/MALSAR/MTL-SDM12.pdfTutorial Goals •Understand the basic concepts in multi-task learning •Understand different

The Twelfth SIAM Information Conference on Data Mining, April 27, 2012

Center for Evolutionary Medicine and Informatics

Dirty Model for Multi-Task Learning Jalali et. al. 2010 NIPS

• In practical applications, it is too restrictive to constrain all tasks to share a single shared structure.

28

+=

Group Sparse Component

P

Sparse Component

Q

Model

W

min𝑃,𝑄𝑌 − 𝑋 𝑃 + 𝑄 𝐹

2 + 𝜆1 𝑃 1,𝑞 + 𝜆2 𝑄 1

Page 29: Multi-Task Learning: Theory, Algorithms, and Applicationsjye02/Software/MALSAR/MTL-SDM12.pdfTutorial Goals •Understand the basic concepts in multi-task learning •Understand different

The Twelfth SIAM Information Conference on Data Mining, April 27, 2012

Center for Evolutionary Medicine and Informatics

Robust Multi-Task Learning

o Most Existing MTL Approaches o Robust MTL Approaches

29

relevant tasks all tasks are relevant

All tasks are relatedAssumption:

There are outlier tasksAssumption:

irrelevant task

irrelevant task

Page 30: Multi-Task Learning: Theory, Algorithms, and Applicationsjye02/Software/MALSAR/MTL-SDM12.pdfTutorial Goals •Understand the basic concepts in multi-task learning •Understand different

The Twelfth SIAM Information Conference on Data Mining, April 27, 2012

Center for Evolutionary Medicine and Informatics

Robust Multi-Task Feature Learning Gong et. al. 2012 Submitted

• Simultaneously captures a common set of features among relevant tasks and identifies outlier tasks.

30 min𝑃,𝑄𝑌 − 𝑋 𝑃 + 𝑄 𝐹

2 + 𝜆1 𝑃 1,𝑞 + 𝜆2 𝑄𝑇1,𝑞

+=

Group Sparse Component

P

Group Sparse Component

Q

Model

W

Joint Selected Features Outlier Tasks

Page 31: Multi-Task Learning: Theory, Algorithms, and Applicationsjye02/Software/MALSAR/MTL-SDM12.pdfTutorial Goals •Understand the basic concepts in multi-task learning •Understand different

The Twelfth SIAM Information Conference on Data Mining, April 27, 2012

Center for Evolutionary Medicine and Informatics

Regularization-based Multi-Task Learning

• All tasks are related

– Mean-Regularized MTL

– MTL in high dimensional feature space • Embedded Feature Selection

• Low-Rank Subspace Learning

• Clustered MTL

• MTL with Tree/Graph structure

31

Page 32: Multi-Task Learning: Theory, Algorithms, and Applicationsjye02/Software/MALSAR/MTL-SDM12.pdfTutorial Goals •Understand the basic concepts in multi-task learning •Understand different

The Twelfth SIAM Information Conference on Data Mining, April 27, 2012

Center for Evolutionary Medicine and Informatics

Trace-Norm Regularized MTL

32

o Capture task relatedness via a shared low-rank structure

trained model

training data

task 1

training

trained model

training data

task 2 …

A shared low-rank subspace

generalization

generalization

trained model

training data

task n generalization

Page 33: Multi-Task Learning: Theory, Algorithms, and Applicationsjye02/Software/MALSAR/MTL-SDM12.pdfTutorial Goals •Understand the basic concepts in multi-task learning •Understand different

The Twelfth SIAM Information Conference on Data Mining, April 27, 2012

Center for Evolutionary Medicine and Informatics

Low-Rank Structure for MTL

… 𝛼1 𝛼2 + ≈ ×

training data weight vector target

Task 1

… ≈ ×

Task 2

≈ × Task 3

=

=

=

𝛽1 𝛽1 +

𝛾1 𝛾2 +

basis vector basis vector

• Assume we have a rank 2 model matrix:

Page 34: Multi-Task Learning: Theory, Algorithms, and Applicationsjye02/Software/MALSAR/MTL-SDM12.pdfTutorial Goals •Understand the basic concepts in multi-task learning •Understand different

The Twelfth SIAM Information Conference on Data Mining, April 27, 2012

Center for Evolutionary Medicine and Informatics

Low-Rank Structure for MTL

34

=

=

𝛼1 𝛼2𝛽1 𝛽2 𝛾1 𝛾2

𝑇

𝜏 11 ⋯ 𝜏 1𝑚 ⋮ ⋱ ⋮𝜏 𝑝1 ⋯ 𝜏 𝑝𝑚

… …

×

×

A low rank structure Basis vectors Coefficients

m tasks p basis vectors 𝑚 > 𝑝

Page 35: Multi-Task Learning: Theory, Algorithms, and Applicationsjye02/Software/MALSAR/MTL-SDM12.pdfTutorial Goals •Understand the basic concepts in multi-task learning •Understand different

The Twelfth SIAM Information Conference on Data Mining, April 27, 2012

Center for Evolutionary Medicine and Informatics

Low-Rank Structure for MTL Ji et. al. 2009 ICML

• Rank minimization formulation

– min𝑊Loss(𝑊) + 𝜆 × Rank(𝑊)

– Rank minimization is NP-Hard for general loss functions

• Convex relaxation: trace norm minimization

– min𝑊Loss(𝑊) + 𝜆 × 𝑊 ∗

– The trace norm is theoretically shown to be a good approximation for rank function (Fazel et al., 2001).

35

𝑊 ∗ : sum of singular values of W

Page 36: Multi-Task Learning: Theory, Algorithms, and Applicationsjye02/Software/MALSAR/MTL-SDM12.pdfTutorial Goals •Understand the basic concepts in multi-task learning •Understand different

The Twelfth SIAM Information Conference on Data Mining, April 27, 2012

Center for Evolutionary Medicine and Informatics

Low-Rank Structure for MTL o Evaluation on the School data:

• Predict exam scores for 15362 students from 139 schools

• Describe each student by 27 attributes

• Compare Ridge Regression, Lasso, and Trace Norm (for inducing a low-rank structure)

36

N−MSE = mean squared error

variance (target)

1 2 3 4 5 6 7 80.7

0.75

0.8

0.85

0.9

0.95

1

1.05

Index of Training Ratio

N-M

SE

Ridge Regression

Lasso

Trace Norm

Performance measure:

The Low-Rank Structure (induced via Trace Norm) leads to the smallest N-MSE.

Page 37: Multi-Task Learning: Theory, Algorithms, and Applicationsjye02/Software/MALSAR/MTL-SDM12.pdfTutorial Goals •Understand the basic concepts in multi-task learning •Understand different

The Twelfth SIAM Information Conference on Data Mining, April 27, 2012

Center for Evolutionary Medicine and Informatics

Alternating Structure Optimization (ASO) Ando and Zhang, 2005 JMRL

37

Input X ≈ Task 1

Low-Dimensional Feature Map

+ Θ Xw1 v1

Input X ≈ Task 2 + Θ Xw2 v2

Input X ≈ Task m + Θ Xwm vm

...

Θ

• ASO assumes that the model is the sum of two components: a task specific one and a shared low dimensional subspace.

Page 38: Multi-Task Learning: Theory, Algorithms, and Applicationsjye02/Software/MALSAR/MTL-SDM12.pdfTutorial Goals •Understand the basic concepts in multi-task learning •Understand different

The Twelfth SIAM Information Conference on Data Mining, April 27, 2012

Center for Evolutionary Medicine and Informatics

Alternating Structure Optimization (ASO) Ando and Zhang, 2005 JMRL

38

o Learning from the i-th task

o ASO simultaneously learns models and the shared structure:

ℒi Xi Өvi +wi , yi = Xi Өvi +wi − yi 2

minӨ,{vi,wi}

ℒi Xi Өvi +wi , yi + 𝛼 wi2

𝑚

𝑖=1

subject to ӨTӨ = I

o Empirical loss function for i-th task

Input X ≈

Xi

Low-Dimensional Feature Map

+ Θ Xwi vi

yi

ui =Өvi + wi

Page 39: Multi-Task Learning: Theory, Algorithms, and Applicationsjye02/Software/MALSAR/MTL-SDM12.pdfTutorial Goals •Understand the basic concepts in multi-task learning •Understand different

The Twelfth SIAM Information Conference on Data Mining, April 27, 2012

Center for Evolutionary Medicine and Informatics

iASO Formulation Chen et al., 2009 ICML

39

o iASO formulation

minӨ,{vi,wi}

ℒi Xi Өvi +wi , yi + 𝛼 Өvi +wi2 + 𝛽 wi

2

𝑚

𝑖=1

subject to ӨTӨ = I

• control both model complexity and task relatedness

• subsume ASO (Ando et al.’05) as a special case

• naturally lead to a convex relaxation (Chen et al., 09, ICML)

• Convex relaxed ASO is equivalent to iASO under certain mild conditions

Page 40: Multi-Task Learning: Theory, Algorithms, and Applicationsjye02/Software/MALSAR/MTL-SDM12.pdfTutorial Goals •Understand the basic concepts in multi-task learning •Understand different

The Twelfth SIAM Information Conference on Data Mining, April 27, 2012

Center for Evolutionary Medicine and Informatics

Incoherent Low-Rank and Sparse Structures Chen et. al. 2010 KDD

• ASO uses L2-norm on task-specific component, we can also use L1-norm to learn task-specific features.

40

basis coefficient+= = X

Element-wise Sparse Component

Q

Model

W

Low-Rank Component

P

minP,Q

ℒi Xi Pi + Qi , yi + ߣ Q 1

𝑚

𝑖=1

subject to P ∗ ⩽ 𝜂 Convex formulation

Capture task relatedness Task-specific features

Page 41: Multi-Task Learning: Theory, Algorithms, and Applicationsjye02/Software/MALSAR/MTL-SDM12.pdfTutorial Goals •Understand the basic concepts in multi-task learning •Understand different

The Twelfth SIAM Information Conference on Data Mining, April 27, 2012

Center for Evolutionary Medicine and Informatics

Robust Low-Rank in MTL Chen et. al. 2011 KDD

41

• Simultaneously perform low-rank MTL and identify outlier tasks.

Outlier Tasks

basis coefficient+= = X

Model

W

Low-Rank Component

P

Group Sparse Component

Q

miniP,Q ℒi Xi 𝑃𝑖 + 𝑄𝑖 , yi + 𝛼 P ∗ + 𝛽 QT

1,𝑞

𝑚

𝑖=1

Capture task relatedness Identify irrelevant tasks

Page 42: Multi-Task Learning: Theory, Algorithms, and Applicationsjye02/Software/MALSAR/MTL-SDM12.pdfTutorial Goals •Understand the basic concepts in multi-task learning •Understand different

The Twelfth SIAM Information Conference on Data Mining, April 27, 2012

Center for Evolutionary Medicine and Informatics

Summary

• All multi-task learning formulations discussed above can fit into the W=P+Q schema. – Component P: shared structure

– Component Q: information not captured by the shared structure

42

Embedded Feature Selection Shared Structure P Component Q L1/Lq Feature Selection (L1/Lq Norm) 0 Dirty Feature Selection (L1/Lq Norm) L1-norm rMTFL Feature Selection (L1/Lq Norm) Outlier (column-wise L1/Lq Norm) Low-Rank Subspace Learning

Trace Norm Low-Rank (Trace Norm) 0 ISLR Low-Rank (Trace Norm) L1-norm ASO Low-Rank (Shared Subspace) L2-norm on independent comp. RMTL Low-Rank (Trace Norm) Outlier (column-wise L1/Lq Norm)

Page 43: Multi-Task Learning: Theory, Algorithms, and Applicationsjye02/Software/MALSAR/MTL-SDM12.pdfTutorial Goals •Understand the basic concepts in multi-task learning •Understand different

The Twelfth SIAM Information Conference on Data Mining, April 27, 2012

Center for Evolutionary Medicine and Informatics

Regularization-based Multi-Task Learning

• All tasks are related

– Mean-Regularized MTL

– MTL in high dimensional feature space • Embedded Feature Selection

• Low-Rank Subspace Learning

• Clustered MTL

• MTL with Tree/Graph structure

43

Page 44: Multi-Task Learning: Theory, Algorithms, and Applicationsjye02/Software/MALSAR/MTL-SDM12.pdfTutorial Goals •Understand the basic concepts in multi-task learning •Understand different

The Twelfth SIAM Information Conference on Data Mining, April 27, 2012

Center for Evolutionary Medicine and Informatics

Multi-Task Learning with Clustered Structures

44

• Most MTL techniques assume all tasks are related

• Not true in many applications • Clustered multi-task learning

assumes the tasks have a group

structure the models of tasks from the

same group are closer to each other than those from a different group

Tasks have group structuresAssumption:

e.g. tasks in the yellow group are predictions of heart related diseases and in the blue group are brain related diseases.

Page 45: Multi-Task Learning: Theory, Algorithms, and Applicationsjye02/Software/MALSAR/MTL-SDM12.pdfTutorial Goals •Understand the basic concepts in multi-task learning •Understand different

The Twelfth SIAM Information Conference on Data Mining, April 27, 2012

Center for Evolutionary Medicine and Informatics

Clustered Multi-Task Learning Jacob et. al. 2008 NIPS, Zhou et. al. 2011 NIPS

• Use regularization to capture clustered structures.

45

Training Data X ≈

Training Data X ≈

...

Clustered Models

...

Cluster 1 Cluster 2 Cluster k-1 Cluster k

Cluster 1

Cluster 2

Cluster k-1

Cluster k

Page 46: Multi-Task Learning: Theory, Algorithms, and Applicationsjye02/Software/MALSAR/MTL-SDM12.pdfTutorial Goals •Understand the basic concepts in multi-task learning •Understand different

The Twelfth SIAM Information Conference on Data Mining, April 27, 2012

Center for Evolutionary Medicine and Informatics

Clustered Multi-Task Learning Zhou et. al. 2011 NIPS

• Capture structures by minimizing sum-of-square error (SSE) in K-means clustering:

46

Ij index set of jth cluster

min𝐼 𝑤𝑣 − 𝑤 𝑗 2

2

𝑣∈𝐼𝑗

𝑘

𝑗=1

min𝐹tr 𝑊𝑇𝑊 − tr(𝐹𝑇𝑊𝑇𝑊𝐹)

𝐹 : m×k orthogonal cluster indicator matrix 𝐹𝑖,𝑗 = 1/ 𝑛𝑗 if 𝑖 ∈ 𝐼𝑗 and 0 otherwise

Clustered Models

...

Cluster 1 Cluster 2 Cluster k-1 Cluster k

Cluster 1

Cluster 2

Cluster k-1

Cluster k

m tasks

task number m > cluster number k

Page 47: Multi-Task Learning: Theory, Algorithms, and Applicationsjye02/Software/MALSAR/MTL-SDM12.pdfTutorial Goals •Understand the basic concepts in multi-task learning •Understand different

The Twelfth SIAM Information Conference on Data Mining, April 27, 2012

Center for Evolutionary Medicine and Informatics

Clustered Multi-Task Learning Zhou et. al. 2011 NIPS

• Directly minimizing SSE is hard because of the non-linear constraint on F:

47

min𝐹tr 𝑊𝑇𝑊 − tr(𝐹𝑇𝑊𝑇𝑊𝐹)

𝐹 : m×k orthogonal cluster indicator matrix 𝐹𝑖,𝑗 = 1/ 𝑛𝑗 if 𝑖 ∈ 𝐼𝑗 and 0 otherwise

min𝐹:𝐹𝑇𝐹=𝐼𝑘

tr 𝑊𝑇𝑊 − tr(𝐹𝑇𝑊𝑇𝑊𝐹)

Zha et. al. 2001 NIPS

Clustered Models

...

Cluster 1 Cluster 2 Cluster k-1 Cluster k

Cluster 1

Cluster 2

Cluster k-1

Cluster k

m tasks

task number m > cluster number k

Page 48: Multi-Task Learning: Theory, Algorithms, and Applicationsjye02/Software/MALSAR/MTL-SDM12.pdfTutorial Goals •Understand the basic concepts in multi-task learning •Understand different

The Twelfth SIAM Information Conference on Data Mining, April 27, 2012

Center for Evolutionary Medicine and Informatics

Improves generalization performance

capture cluster structures

Cluster 1

Cluster 2

Cluster k-1

Cluster k

Clustered Multi-Task Learning Zhou et. al. 2011 NIPS

• Clustered multi-task learning (CMTL) formulation

• CMTL has been shown to be equivalent to ASO

– Given the dimension of the shared low-rank subspace in ASO and the cluster number in clustered multi-task learning (CMTL) are the same.

48

min𝑊,𝐹:𝐹𝑇𝐹=𝐼𝑘

Loss W + 𝛼 tr 𝑊𝑇𝑊 − tr 𝐹𝑇𝑊𝑇𝑊𝐹 + 𝛽 tr 𝑊𝑇𝑊

Page 49: Multi-Task Learning: Theory, Algorithms, and Applicationsjye02/Software/MALSAR/MTL-SDM12.pdfTutorial Goals •Understand the basic concepts in multi-task learning •Understand different

The Twelfth SIAM Information Conference on Data Mining, April 27, 2012

Center for Evolutionary Medicine and Informatics

Convex Clustered Multi-Task Learning Zhou et. al. 2011 NIPS

49

Ground Truth Mean Regularized MTL

Trace Norm Regularized MTL

Convex Relaxed CMTL

noise introduced by relaxations

Low rank can also well capture

cluster structure

Page 50: Multi-Task Learning: Theory, Algorithms, and Applicationsjye02/Software/MALSAR/MTL-SDM12.pdfTutorial Goals •Understand the basic concepts in multi-task learning •Understand different

The Twelfth SIAM Information Conference on Data Mining, April 27, 2012

Center for Evolutionary Medicine and Informatics

Regularization-based Multi-Task Learning

• All tasks are related

– Mean-Regularized MTL

– MTL in high dimensional feature space • Embedded Feature Selection

• Low-Rank Subspace Learning

• Clustered MTL

• MTL with Tree/Graph structure

50

Page 51: Multi-Task Learning: Theory, Algorithms, and Applicationsjye02/Software/MALSAR/MTL-SDM12.pdfTutorial Goals •Understand the basic concepts in multi-task learning •Understand different

The Twelfth SIAM Information Conference on Data Mining, April 27, 2012

Center for Evolutionary Medicine and Informatics

Multi-Task Learning with Tree Structures

• In some applications, the tasks may be equipped with a tree structure:

– The tasks belonging to the same node are similar to each other

– The similarity between two nodes is related to the depth of the ‘common’ tree node

51

Tasks have tree structuresAssumption:

a cb

ModelsTask a is more similar with b,

compared to c

Page 52: Multi-Task Learning: Theory, Algorithms, and Applicationsjye02/Software/MALSAR/MTL-SDM12.pdfTutorial Goals •Understand the basic concepts in multi-task learning •Understand different

The Twelfth SIAM Information Conference on Data Mining, April 27, 2012

Center for Evolutionary Medicine and Informatics

Multi-Task Learning with Tree Structures Kim and Xing 2010 ICML

• Tree-Guided Group Lasso

52

min𝑊Loss 𝑊 + 𝜆 𝑤𝑣 𝑊𝐺𝑣

𝑗

2𝑣∈𝑉

𝑑

𝑗=1

Inp

ut

Feat

ure

s

Output (Tasks)

W1 W2 W3

Gv5={W1, W2, W3}

Gv4={W1, W2}

Gv1={W1} Gv2={W2} Gv3={W3}

Structure

Page 53: Multi-Task Learning: Theory, Algorithms, and Applicationsjye02/Software/MALSAR/MTL-SDM12.pdfTutorial Goals •Understand the basic concepts in multi-task learning •Understand different

The Twelfth SIAM Information Conference on Data Mining, April 27, 2012

Center for Evolutionary Medicine and Informatics

Multi-Task Learning with Graph Structures

• In real applications, tasks involved in MTL may have graph structures

– The two tasks are related if they are connected in a graph, i.e. the connected tasks are similar

– The similarity of two related tasks can be represented by the weight of the connecting edge.

53

Tasks have graph/network structuresAssumption:

Page 54: Multi-Task Learning: Theory, Algorithms, and Applicationsjye02/Software/MALSAR/MTL-SDM12.pdfTutorial Goals •Understand the basic concepts in multi-task learning •Understand different

The Twelfth SIAM Information Conference on Data Mining, April 27, 2012

Center for Evolutionary Medicine and Informatics

Multi-Task Learning with Graph Structures

• A simple way to encode graph structure is to penalize the difference of two tasks that have an edge between them

• Given a set of edges E, we thus penalize:

• The graph regularization term can also be represented in the form of Laplacian term

54

𝑊𝑒{𝑖,1} −𝑊𝑒{𝑖,2} 2

2𝐸

𝑖=1

= 𝑊𝑅𝑇 𝐹2

𝑊𝑅𝑇 𝐹2 = 𝑡𝑟 𝑊𝑅𝑇 𝑇𝑊𝑅𝑇 = 𝑡𝑟 𝑊𝑅𝑇𝑅𝑊𝑇 = 𝑡𝑟(𝑊ℒ𝑊𝑇)

𝑅 ∈ ℝ 𝐸 ×𝑚

Page 55: Multi-Task Learning: Theory, Algorithms, and Applicationsjye02/Software/MALSAR/MTL-SDM12.pdfTutorial Goals •Understand the basic concepts in multi-task learning •Understand different

The Twelfth SIAM Information Conference on Data Mining, April 27, 2012

Center for Evolutionary Medicine and Informatics

Multi-Task Learning with Graph Structures

• How to obtain graph information

– External domain knowledge • protein-protein interaction (PPI) for microarray

– Discover task relatedness from data • Pairwise correlation coefficient

• Sparse inverse covariance (Friedman et. al. 2008 Biostatistics)

55

Page 56: Multi-Task Learning: Theory, Algorithms, and Applicationsjye02/Software/MALSAR/MTL-SDM12.pdfTutorial Goals •Understand the basic concepts in multi-task learning •Understand different

The Twelfth SIAM Information Conference on Data Mining, April 27, 2012

Center for Evolutionary Medicine and Informatics

Multi-Task Learning with Graph Structures Chen et. al. 2011 UAI, Kim et. al. 2009 Bioinformatics

• Graph-guided Fused Lasso

56

ACGTTTTACTGTACAATTTACGene SNP

phenotype

Input

Output

ACGTTTTACTGTACAATTTACGene SNP

phenotype

Input

Output

Lasso

Graph-Guided Fused Lasso

min𝑊Loss 𝑊 + 𝜆 𝑊 1 + Ω(𝑊) Graph-guided Fusion Penalty

Ω 𝑊 = 𝛾 𝑊𝑗𝑚 − 𝑠𝑖𝑔𝑛 𝑟𝑚𝑙 𝑊𝑗𝑙

𝐽

𝑗=1

𝑒= 𝑚,𝑙 ∈𝐸

Page 57: Multi-Task Learning: Theory, Algorithms, and Applicationsjye02/Software/MALSAR/MTL-SDM12.pdfTutorial Goals •Understand the basic concepts in multi-task learning •Understand different

The Twelfth SIAM Information Conference on Data Mining, April 27, 2012

Center for Evolutionary Medicine and Informatics

Multi-Task Learning with Graph Structures Kim et. al. 2009 Bioinformatics

• In some applications, we know not only which pairs are related, but also how they are related.

• Graph-Weighted Fused Lasso.

57

min𝑊Loss 𝑊 + 𝜆 𝑊 1 + 𝛾 𝜏(𝑟𝑚𝑙)

𝑒= 𝑚,𝑙 ∈𝐸

𝑊𝑗𝑚 − 𝑠𝑖𝑔𝑛 𝑟𝑚𝑙 𝑊𝑗𝑙

𝐽

𝑗=1

Added weight information!

ACGTTTTACTGTACAATTTACGene SNP

phenotype

Input

Output

Graph-Weighted Fused Lasso

Page 58: Multi-Task Learning: Theory, Algorithms, and Applicationsjye02/Software/MALSAR/MTL-SDM12.pdfTutorial Goals •Understand the basic concepts in multi-task learning •Understand different

The Twelfth SIAM Information Conference on Data Mining, April 27, 2012

Center for Evolutionary Medicine and Informatics

Practical Guideline

• MTL versus STL

– MTL is preferred when dealing with multiple related tasks with small number of training samples

• Shared features versus shared subspace

– Identifying shared features is preferred When the data dimensionality is large

– Identifying a shared subspace is preferred when the number of tasks is large

58

Page 59: Multi-Task Learning: Theory, Algorithms, and Applicationsjye02/Software/MALSAR/MTL-SDM12.pdfTutorial Goals •Understand the basic concepts in multi-task learning •Understand different

The Twelfth SIAM Information Conference on Data Mining, April 27, 2012

Center for Evolutionary Medicine and Informatics

Tutorial Road Map

• Part I: Multi-task Learning (MTL) background and motivations

• Part II: MTL formulations

• Part III: Case study of real-world applications

– Incomplete Multi-Source Fusion

– Drosophila Gene Expression Image Analysis

• Part IV: An MTL Package (MALSAR)

• Current and future directions

59

Page 60: Multi-Task Learning: Theory, Algorithms, and Applicationsjye02/Software/MALSAR/MTL-SDM12.pdfTutorial Goals •Understand the basic concepts in multi-task learning •Understand different

The Twelfth SIAM Information Conference on Data Mining, April 27, 2012

Center for Evolutionary Medicine and Informatics

Case Study I: Incomplete Multi-Source Data Fusion Yuan et. al. 2012 NeuroImage

• In many applications, multiple data sources may contain a considerable amount of missing data.

• In ADNI, over half of the subjects lack CSF measurements; an independent half of the subjects do not have FDG-PET.

60

P1 P2 P3 … P114 P115 P116

PET

Subject1

Subject60

Subject61

Subject62

Subject139

Subject140

Subject141

Subject148

Subject149

Subject245

.........

...

......

......

......

......

M1 M2 M3 M4 … M303 M304 M305 C1 C2 C3 C4 C5

MRI CSF

Page 61: Multi-Task Learning: Theory, Algorithms, and Applicationsjye02/Software/MALSAR/MTL-SDM12.pdfTutorial Goals •Understand the basic concepts in multi-task learning •Understand different

The Twelfth SIAM Information Conference on Data Mining, April 27, 2012

Center for Evolutionary Medicine and Informatics

Overview of iMSF Yuan et. al. 2012 NeuroImage

61

MRIPET

Task ITask I

Task IITask II

Task IIITask III

Task IVTask IV

Model I

Model II

Model III

Model IV

MRI CSF

CSF

PET

min𝑊

1

𝑚 𝐿𝑜𝑠𝑠 𝑋𝑖 , 𝑌𝑖 ,𝑊𝑖

𝑚

𝑖=1

+ 𝜆 𝑊𝐺𝑘 2

𝑑

𝑘=1

Page 62: Multi-Task Learning: Theory, Algorithms, and Applicationsjye02/Software/MALSAR/MTL-SDM12.pdfTutorial Goals •Understand the basic concepts in multi-task learning •Understand different

The Twelfth SIAM Information Conference on Data Mining, April 27, 2012

Center for Evolutionary Medicine and Informatics

iMSF: Performance Yuan et. al. 2012 NeuroImage

62

0.79

0.8

0.81

0.82

0.83

0.84

50.0% 66.7% 75.0%

Acc

ura

cy iMSF

Zero

EM

KNN

SVD

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

50.0% 66.7% 75.0%

Sen

siti

vity

iMSF

Zero

EM

KNN

SVD

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

50.0% 66.7% 75.0%

Spe

cifi

city

iMSF

Zero

EM

KNN

SVD

Page 63: Multi-Task Learning: Theory, Algorithms, and Applicationsjye02/Software/MALSAR/MTL-SDM12.pdfTutorial Goals •Understand the basic concepts in multi-task learning •Understand different

The Twelfth SIAM Information Conference on Data Mining, April 27, 2012

Center for Evolutionary Medicine and Informatics

• Drosophila (fruit fly) is a favorite model system for geneticists and developmental biologists studying embryogenesis.

– The small size and short generation time make it ideal for genetic studies.

• In situ hybridization allows us to generate images showing when and where individual genes were active.

– The analysis of such images can potentially reveal gene functions and gene-gene interactions.

Case Study II: Drosophila Gene Expression Image Analysis

Page 64: Multi-Task Learning: Theory, Algorithms, and Applicationsjye02/Software/MALSAR/MTL-SDM12.pdfTutorial Goals •Understand the basic concepts in multi-task learning •Understand different

The Twelfth SIAM Information Conference on Data Mining, April 27, 2012

Center for Evolutionary Medicine and Informatics

Berkeley Drosophila Genome Project (BDGP) http://www.fruitfly.org/

Fly-FISH http://fly-fish.ccbr.utoronto.ca/

Stage 1-3 Stage 4-6 Stage 7-8 Stage 9-10 Stage 11-12 Stage 13-

Drosophila gene expression pattern images

[Tomancak et al. (2002) Genome Biology; Lécuyer et al. (2007) Cell]

Stage 1-3 Stage 4-5 Stage 6-7 Stage 8-9 Stage 10-

bgm

en

hb

runt

Expressions

Page 65: Multi-Task Learning: Theory, Algorithms, and Applicationsjye02/Software/MALSAR/MTL-SDM12.pdfTutorial Goals •Understand the basic concepts in multi-task learning •Understand different

The Twelfth SIAM Information Conference on Data Mining, April 27, 2012

Center for Evolutionary Medicine and Informatics

Comparative image analysis Twist heartless stumps

anterior endoderm AISN trunk mesoderm AISN subset cellular blastoderm mesoderm AISN

dorsal ectoderm AISN procephalic ectoderm AISN subset cellular blastoderm mesoderm AISN

anterior endoderm AISN trunk mesoderm AISN head mesoderm AISN

stage 4-6

[Tomancak et al. (2002) Genome Biology; Sandmann et al. (2007) Genes & Dev.]

trunk mesoderm PR head mesoderm PR

trunk mesoderm PR head mesoderm PR anterior endoderm anlage

yolk nuclei trunk mesoderm PR head mesoderm PR anterior endoderm anlage

stage 7-8

We need the spatial and temporal annotations of expressions

Page 66: Multi-Task Learning: Theory, Algorithms, and Applicationsjye02/Software/MALSAR/MTL-SDM12.pdfTutorial Goals •Understand the basic concepts in multi-task learning •Understand different

The Twelfth SIAM Information Conference on Data Mining, April 27, 2012

Center for Evolutionary Medicine and Informatics

Challenges of manual annotation

0

20000

40000

60000

80000

100000

120000

140000

160000

180000

1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2010

Nu

mb

er

of

imag

es

Cumulative number of in situ images

Page 67: Multi-Task Learning: Theory, Algorithms, and Applicationsjye02/Software/MALSAR/MTL-SDM12.pdfTutorial Goals •Understand the basic concepts in multi-task learning •Understand different

The Twelfth SIAM Information Conference on Data Mining, April 27, 2012

Center for Evolutionary Medicine and Informatics

Low-rank multi-task

Model construction

Graph-based multi-task

Method outline Ji et. al. 2008 Bioinformatics; Ji et. al. 2009 BMC Bioinformatics; Ji et. al. 2009 NIPS

Images Sparse coding

Feature extraction

Page 68: Multi-Task Learning: Theory, Algorithms, and Applicationsjye02/Software/MALSAR/MTL-SDM12.pdfTutorial Goals •Understand the basic concepts in multi-task learning •Understand different

The Twelfth SIAM Information Conference on Data Mining, April 27, 2012

Center for Evolutionary Medicine and Informatics

Low rank multi-task learning model Ji et. al. 2009 BMC Bioinformatics

*1 1

),( WYxwLk

i

n

j

jij

T

i

X Y W

Low rank

trace norm = sum of singular values

Loss term

Page 69: Multi-Task Learning: Theory, Algorithms, and Applicationsjye02/Software/MALSAR/MTL-SDM12.pdfTutorial Goals •Understand the basic concepts in multi-task learning •Understand different

The Twelfth SIAM Information Conference on Data Mining, April 27, 2012

Center for Evolutionary Medicine and Informatics

2-norm loss

Graph-based multi-task learning model Ji et. al. 2009 SIGKDD

2

),(

2

2

1

1 1

)sgn()(),( qpqp

Gqp

pqF

k

i

n

j

jij

T

i wCwCgWYxwL

sensory system head

ventral sensory complex

dorsal/lateral sensory complexes

embryonic maxillary sensory complex

embryonic antennal sense organ

sensory nervous system

0.56

0.31 0.57

0.60

0.79

0.50

0.36

0.35

Closed-form solution

Page 70: Multi-Task Learning: Theory, Algorithms, and Applicationsjye02/Software/MALSAR/MTL-SDM12.pdfTutorial Goals •Understand the basic concepts in multi-task learning •Understand different

The Twelfth SIAM Information Conference on Data Mining, April 27, 2012

Center for Evolutionary Medicine and Informatics

Spatial annotation performance

• 50% data for training and 50% for testing and 30 random trials are generated

• Multi-task approaches outperform single-task approaches

65

67

69

71

73

75

77

79

81

83

85

Stages 4-6 Stages 7-8 Stages 9-10 Stages 11-12 Stages 13-16

AU

C

SC+LR

SC+Graph

BoW+Graph

Kernel+SVM

Page 71: Multi-Task Learning: Theory, Algorithms, and Applicationsjye02/Software/MALSAR/MTL-SDM12.pdfTutorial Goals •Understand the basic concepts in multi-task learning •Understand different

The Twelfth SIAM Information Conference on Data Mining, April 27, 2012

Center for Evolutionary Medicine and Informatics

Tutorial Road Map

• Part I: Multi-task Learning (MTL) background and motivations

• Part II: MTL formulations

• Part III: Case study of real-world applications

– Incomplete Multi-Source Fusion

– Drosophila Gene Expression Image Analysis

• Part IV: An MTL Package (MALSAR)

• Current and future directions

71

Page 72: Multi-Task Learning: Theory, Algorithms, and Applicationsjye02/Software/MALSAR/MTL-SDM12.pdfTutorial Goals •Understand the basic concepts in multi-task learning •Understand different

The Twelfth SIAM Information Conference on Data Mining, April 27, 2012

Center for Evolutionary Medicine and Informatics

• A multi-task learning package

• Encode task relationship via structural regularization

• www.public.asu.edu/~jye02/Software/MALSAR/

72

Page 73: Multi-Task Learning: Theory, Algorithms, and Applicationsjye02/Software/MALSAR/MTL-SDM12.pdfTutorial Goals •Understand the basic concepts in multi-task learning •Understand different

The Twelfth SIAM Information Conference on Data Mining, April 27, 2012

Center for Evolutionary Medicine and Informatics

MTL Algorithms in MALSAR 1.0 • Mean-Regularized Multi-Task Learning

• MTL with Embedded Feature Selection – Joint Feature Learning

– Dirty Multi-Task Learning

– Robust Multi-Task Feature Learning

• MTL with Low-Rank Subspace Learning – Trace Norm Regularized Learning

– Alternating Structure Optimization

– Incoherent Sparse and Low Rank Learning

– Robust Low-Rank Multi-Task Learning

• Clustered Multi-Task Learning

• Graph Regularized Multi-Task Learning

73

Page 74: Multi-Task Learning: Theory, Algorithms, and Applicationsjye02/Software/MALSAR/MTL-SDM12.pdfTutorial Goals •Understand the basic concepts in multi-task learning •Understand different

The Twelfth SIAM Information Conference on Data Mining, April 27, 2012

Center for Evolutionary Medicine and Informatics

Tutorial Road Map

• Part I: Multi-task Learning (MTL) background and motivations

• Part II: MTL formulations

• Part III: Case study of real-world applications

– Incomplete Multi-Source Fusion

– Drosophila Gene Expression Image Analysis

• Part IV: An MTL Package (MALSAR)

• Current and future directions

74

Page 75: Multi-Task Learning: Theory, Algorithms, and Applicationsjye02/Software/MALSAR/MTL-SDM12.pdfTutorial Goals •Understand the basic concepts in multi-task learning •Understand different

The Twelfth SIAM Information Conference on Data Mining, April 27, 2012

Center for Evolutionary Medicine and Informatics

Trends in Multi-Task Learning • Develop efficient algorithms for large-scale multi-

task learning.

• Semi-supervised and unsupervised MTL

• Learn task structures automatically in MTL

• Asymmetric MTL

• Cross-Domain MTL – The features may be different

75

The relationship is not mutual

Page 76: Multi-Task Learning: Theory, Algorithms, and Applicationsjye02/Software/MALSAR/MTL-SDM12.pdfTutorial Goals •Understand the basic concepts in multi-task learning •Understand different

The Twelfth SIAM Information Conference on Data Mining, April 27, 2012

Center for Evolutionary Medicine and Informatics

76

• National Science Foundation

• National Institute of Health

Acknowledgement

Page 77: Multi-Task Learning: Theory, Algorithms, and Applicationsjye02/Software/MALSAR/MTL-SDM12.pdfTutorial Goals •Understand the basic concepts in multi-task learning •Understand different

The Twelfth SIAM Information Conference on Data Mining, April 27, 2012

Center for Evolutionary Medicine and Informatics

Reference

77

Page 78: Multi-Task Learning: Theory, Algorithms, and Applicationsjye02/Software/MALSAR/MTL-SDM12.pdfTutorial Goals •Understand the basic concepts in multi-task learning •Understand different

The Twelfth SIAM Information Conference on Data Mining, April 27, 2012

Center for Evolutionary Medicine and Informatics

Reference

78

Page 79: Multi-Task Learning: Theory, Algorithms, and Applicationsjye02/Software/MALSAR/MTL-SDM12.pdfTutorial Goals •Understand the basic concepts in multi-task learning •Understand different

The Twelfth SIAM Information Conference on Data Mining, April 27, 2012

Center for Evolutionary Medicine and Informatics

Reference

79

Page 80: Multi-Task Learning: Theory, Algorithms, and Applicationsjye02/Software/MALSAR/MTL-SDM12.pdfTutorial Goals •Understand the basic concepts in multi-task learning •Understand different

The Twelfth SIAM Information Conference on Data Mining, April 27, 2012

Center for Evolutionary Medicine and Informatics

Reference

80

Page 81: Multi-Task Learning: Theory, Algorithms, and Applicationsjye02/Software/MALSAR/MTL-SDM12.pdfTutorial Goals •Understand the basic concepts in multi-task learning •Understand different

The Twelfth SIAM Information Conference on Data Mining, April 27, 2012

Center for Evolutionary Medicine and Informatics

Reference

81

Page 82: Multi-Task Learning: Theory, Algorithms, and Applicationsjye02/Software/MALSAR/MTL-SDM12.pdfTutorial Goals •Understand the basic concepts in multi-task learning •Understand different

The Twelfth SIAM Information Conference on Data Mining, April 27, 2012

Center for Evolutionary Medicine and Informatics

Reference

82

Page 83: Multi-Task Learning: Theory, Algorithms, and Applicationsjye02/Software/MALSAR/MTL-SDM12.pdfTutorial Goals •Understand the basic concepts in multi-task learning •Understand different

The Twelfth SIAM Information Conference on Data Mining, April 27, 2012

Center for Evolutionary Medicine and Informatics

Reference

83