Random forests with random projections of the output space for high dimensional multi-label...

16
Random forests with random projections of the output space for high dimensional multi-label classification Arnaud Joly, Pierre Geurts, Louis Wehenkel

description

We adapt the idea of random projections applied to the out- put space, so as to enhance tree-based ensemble methods in the context of multi-label classification. We show how learning time complexity can be reduced without affecting computational complexity and accuracy of predictions. We also show that random output space projections may be used in order to reach different bias-variance tradeoffs, over a broad panel of benchmark problems, and that this may lead to improved accuracy while reducing significantly the computational burden of the learning stage. Link to the paper http://orbi.ulg.ac.be/handle/2268/172146 Souce code available at https://github.com/arjoly/random-output-trees

Transcript of Random forests with random projections of the output space for high dimensional multi-label...

Page 1: Random forests with random projections of the output space for high dimensional multi-label classification

Random forests with random projections of theoutput space for high dimensional multi-label

classification

Arnaud Joly, Pierre Geurts, Louis Wehenkel

Page 2: Random forests with random projections of the output space for high dimensional multi-label classification

Multi-label classification tasks

Many supervised learning applications in text, biology or imageprocessing where samples are associated to sets of labels.

Input X 800× 600 pixel Output Y labelsdriver, mountain, road,car, tree, rock, line,human, . . .

If each label correspondsto a wikipedia article, thenwe have around 4 millionlabels.

2 / 15

Page 3: Random forests with random projections of the output space for high dimensional multi-label classification

Random forest

Randomized trees are built on a bootstrap copy of the input-outputpairs ((xi, yi) ∈ (X × Y))ni=1 by recursively maximizing thereduction of impurity, here the variance Var. At each node, thebest split is selected among k randomly selected features.

S

SL

Xk ≤ tk

SR

Xk > tk

Var(S) = 0.24

Var(SL) = 0.014

Var(SR) = 0.1875

∆ Var(S) = Var(S)−12

20Var(SL)− 8

20Var(tR)

≈ 0.16

3 / 15

Page 4: Random forests with random projections of the output space for high dimensional multi-label classification

When Y is very high dimensional, this constitutes the mainbottleneck of the random tree ensemble.

The multi-output single tree algorithm requires the computation ofthe sum of the variance over the label space at each tree node andfor each candidate split.

4 / 15

Page 5: Random forests with random projections of the output space for high dimensional multi-label classification

Multi-output regression trees in randomly projected outputspace

We propose to approximate the computation of the variance byusing random projection of the output space.

TheoremGiven ε > 0, a sample (yi)ni=1 of n points y ∈ Rd, and a projectionmatrix Φ ∈ Rm×d such that for all pairs of points theJonhson-Lindenstrauss lemma holds, we have also

(1− ε) Var((yi)ni=1

)≤ Var

((Φyi)ni=1

)≤ (1 + ε) Var

((yi)ni=1

).

5 / 15

Page 6: Random forests with random projections of the output space for high dimensional multi-label classification

Multi-output regression trees in randomly projected outputspace

We propose to approximate the computation of the variance byusing random projection of the output space.

TheoremGiven ε > 0, a sample (yi)ni=1 of n points y ∈ Rd, and a projectionmatrix Φ ∈ Rm×d such that for all pairs of points theJonhson-Lindenstrauss lemma holds, we have also

(1− ε) Var((yi)ni=1

)≤ Var

((Φyi)ni=1

)≤ (1 + ε) Var

((yi)ni=1

).

5 / 15

Page 7: Random forests with random projections of the output space for high dimensional multi-label classification

Multi-output regression trees in randomly projected outputspace

1. Randomly project the output space

Φyi

=

2. Grow the tree on the projected output space(xi,Φyi)ni=1

3. Label leaves using (yi)ni=1

6 / 15

Page 8: Random forests with random projections of the output space for high dimensional multi-label classification

Ensemble of randomized trees

Shared subspace Individual subspace

(xi,Φyi)ni=1 (xi,Φ1yi)ni=1 (xi,Φ2y

i)ni=1

7 / 15

Page 9: Random forests with random projections of the output space for high dimensional multi-label classification

Bias-variance analysisAveraging over the learning set LS, algorithm randomization ε andoutput subspace randomization Φ, the square error Err of t multioutput tree models can be decomposed into:Single shared subspace (Algo 1)

ELS,Φ,εt{Err(f1(x;LS,Φ, εt))}

= σ2R(x) +B2(x) + VLS(x) +

VAlgo(x)

t+ VProj(x).

Individual subspace (Algo 2)

ELS,Φt,εt{Err(f2(x;LS,Φt, εt))}

= σ2R(x)︸ ︷︷ ︸

residual error

+B2(x)︸ ︷︷ ︸bias

+VLS(x) +VAlgo(x) + VProj(x)

t︸ ︷︷ ︸variance

.

Individual subspace should always be preferred to single sharedsubspace.

8 / 15

Page 10: Random forests with random projections of the output space for high dimensional multi-label classification

Label ranking average precision to assess performance

LRAP(f̂) =1

|TS|∑i∈TS

1

|yi|∑

j∈{k:yik=1}

|Lij(yi)||Lij(1d)|

,

Lij(q) ={k : qk = 1 and f̂(xi)k ≥ f̂(xi)j

}where f̂(xi)j is the probability (or the score) associated to the labelj by the learnt model f̂ applied to xi, 1d is a d-dimensional rowvector of ones.

Higher score if true labels have a higher probability (score) than thefalse labels.

9 / 15

Page 11: Random forests with random projections of the output space for high dimensional multi-label classification

Decision tree performance converges with m = 200Gaussian random output projections

0 200 400 600 800 1000m

0.180

0.185

0.190

0.195

0.200

0.205LR

AP

Decision treeDecision tree on Gaussian subspace (Algo 2)

Delicious dataset (983 labels)

10 / 15

Page 12: Random forests with random projections of the output space for high dimensional multi-label classification

Faster convergence with ensemble of randomized trees

0 200 400 600 800 1000m

0.366

0.369

0.372

0.375

0.378

LRA

P

Random forestRandom forest on Gaussian subspace (Algo 2)Random forest on fix-Gaussian subspace (Algo 1)

Delicious dataset (983 labels, k =√p, t = 100, nmin = 1)

Randomly projecting the output space reduces computing timefrom 3458 seconds (no projection) to 311 seconds (m = 25,individual subspace) without accuracy degradation.

11 / 15

Page 13: Random forests with random projections of the output space for high dimensional multi-label classification

Systematic analysis on 24 datasetsIncreasing m leads to convergence in LRAP

0.8 1.0 1.2LRAP(random forest on Gaussian output subspace) / LRAP(random forest)

bookmarksdeliciousCAL500corel5kdiatoms

protein-interactionEUR-Lex (desc.)drug-interaction

scenegenbase

EUR-Lex (dir.)SCOP-GO

enronmediamillYeast-GO

medicalyeastWIPO

Expression-GO emotions

reuterstmc2017

bibtexEUR-Lex (subj.) m=1

m=log(d)m=d

(k =√p, t = 100, nmin = 1, averaged over 10 repetitions)

12 / 15

Page 14: Random forests with random projections of the output space for high dimensional multi-label classification

Output randomization could be more effective than inputrandomization

0 150 300 450 600k

0.360

0.368

0.376

0.384

0.392

0.400LR

AP

Random forestRandom forest on Gaussian subspace (Algo 2) with m= 1Random forest on Gaussian subspace (Algo 2) with m= 7Random forest on Gaussian subspace (Algo 2) with m=14

Drug-interaction dataset(1554 labels, t = 100, nmin = 1)

13 / 15

Page 15: Random forests with random projections of the output space for high dimensional multi-label classification

Alternative random output subspace

0 30 60 90 120 150m

0.33

0.34

0.35

0.36

0.37

0.38

0.39LR

AP

Random forestRandom forest on random output subspace (Algo 2)

Random forest on sparse Rademacher (s=√d ) subspace (Algo 2)

Random forest on sparse Rademacher (s=3) subspace (Algo 2)

Delicious dataset(981 labels, k =√p, t = 100, nmin = 1)

14 / 15

Page 16: Random forests with random projections of the output space for high dimensional multi-label classification

Random forests with random projections of the output spacefor high dimensional multi-label classification

ConclusionsI Lower computing time, without affecting accuracy.I Optimizing input and output randomization could improve

prediction performance.

Future workEfficient technique to adjust random output space parameters so asto reach the best accuracy and computing time trade-off.

Source code is available @github.com/arjoly/random-output-trees.

15 / 15