Large-Margin Feature Adaptation for Automatic Speech Recognition
Fast Training of Support Vector Machines for Survival Analysis · 2020. 7. 2. · Herbrich et al.:...
Transcript of Fast Training of Support Vector Machines for Survival Analysis · 2020. 7. 2. · Herbrich et al.:...
-
Computer Aided Medical Procedures
Fast Training of Support Vector Machines for Survival AnalysisSebastian Pölsterl1, Nassir Navab1,2, and Amin Katouzian11: Technische Universität München, Munich, Germany2: Johns Hopkins University, Baltimore MD, USA
-
Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 2
Survival Analysis
● Objective: to establish a connection between covariates and the time between the start of the study and an event.
● Possible formulation: Rank subjects according to observed survival time.
● Usually, parts of survival data can only be partially observed – they are censored.
● Survival data consists of n triplets:– a d-dimensional feature vector
– time of event or time of censoring
– event indicator
-
Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 3
Right Censoring
2 4 6 8 10 12
Time in months
End
of Stud
y
A Lost
B †
C Dropped out
D †
E
1 2 3 4 5 6
Time since enrollment in months
A Lost
B †
C Dropped out
D †
E
Pat
ient
s
● Only events that occur while the study is running can be recorded (records are uncensored).
● For individuals that remained event-free during the study period, it is unknown whether an event has or has not occurred after the study ended (records are right censored).
-
Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 4
Right Censoring
● Only events that occur while the study is running can be recorded (records are uncensored).
● For individuals that remained event-free during the study period, it is unknown whether an event has or has not occurred after the study ended (records are right censored).
Incomparable
2 4 6 8 10 12
Time in months
End
of Stud
y
A Lost
B †
C Dropped out
D †
E
1 2 3 4 5 6
Time since enrollment in months
A Lost
B †
C Dropped out
D †
E
Pat
ient
s
-
Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 5
Right Censoring
● Only events that occur while the study is running can be recorded (records are uncensored).
● For individuals that remained event-free during the study period, it is unknown whether an event has or has not occurred after the study ended (records are right censored).
Incomparable
Comparable
2 4 6 8 10 12
Time in months
End
of Stud
y
A Lost
B †
C Dropped out
D †
E
1 2 3 4 5 6
Time since enrollment in months
A Lost
B †
C Dropped out
D †
E
Pat
ient
s
-
Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 6
Overview
● Problem:– Naive training algorithms for linear Survival Support Vector
Machines require O(n4) time and O(n2) space (Van Belle et al., 2007; Evers et al., 2008).
● Proposed Solution:– Perform optimization in the primal using truncated Newton
optimization.
– Use order statistics trees to lower time and space requirements.
– Approach extends to hybrid ranking-regression objective function as well as non-linear Survival SVM.
-
Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 7
Survival SVM
● Objective function depends on a quadratic number of pairwise comparisons
● Closely related to RankSVM (Herbrich et al., 2000), where
● Ties in survival time are not common, i.e., number of relevance levels r for RankSVM is O(n).
-
Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 8
Related Work – Survival SVM
● Van Belle et al., 2007: Explicitly construct all pairwise comparisons of samples to transform ranking problem into classification problem and use standard dual SVM solver.
● Van Belle et al., 2008: Reduces number of samples n by clustering data according to survival times using k-nearest neighbor search.
-
Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 9
Related Work – Rank SVM
● Airola et al., 2011: Combines cutting plane optimization with red-black tree based approach to subgradient calculations.
● Lee et al., 2014: Combines truncated Newton optimization with order statistics trees to compute gradient and Hessian.
-
Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 10
The Objective Function (1)
● is a sparse matrix with each row having one entry that is 1, one entry that is -1, and the remainder all zeros.
● denotes the number of support vectors:
-
Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 11
The Objective Function (2)
● is a sparse matrix with each row having one entry that is 1, one entry that is -1, and the remainder all zeros.
●
● Example:
-
Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 13
Truncated Newton Optimization
● Problem: Explicitly storing the Hessian matrix can be prohibitive for high-dimensional survival data.
● Proposed Solution:– Optimization in primal.
– Avoid constructing Hessian matrix by using truncated Newton optimization, which only requires computation of Hessian-vector product:
-
Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 14
Calculation of Search Direction (1)
● In each iteration of Newton's method, has to be recomputed due to its dependency on
-
Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 15
Calculation of Search Direction (2)
● In each iteration of Newton's method, has to be recomputed due to its dependency on
-
Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 16
Calculation of Search Direction (3)
● can compactly be expressed as:
-
Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 17
Calculation of Search Direction (4)
● Assume that have been computed.
● Hessian-vector product can be computed in instead of
-
Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 18
Order Statistics Trees
● Problem: Order depends on survival times and predicted scores
● Solution:– Sort survival data according to .
– Incrementally add and to an order statistics tree (balanced binary search tree).
-
Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 19
Order Statistics Trees
6
95
1 (censored)
-
Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 20
Order Statistics Trees
6
85
92
1
(censored)
-
Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 21
Order Statistics Trees
6
85
92
1
-
Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 22
Order Statistics Trees
6
85
92
1
(censored)
-
Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 23
Order Statistics Trees
6
85
92
1
-
Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 24
Order Statistics Trees
6
85
92
1 3
7
-
Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 26
Efficient Hessian-vector Product
● Before: Hessian-vector product required
● Now: After sorting according to predicted scores, can be obtained in
● Hessian-vector product does not require constructing matrix of size anymore
● Hessian-vector product requires
-
Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 27
Overall Complexity
● Time complexity:
● Space complexity:
No need to explicitly construct all pairwise differences.
-
Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 28
Training Time (in seconds)
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0.35
Tim
e in
sec
onds
0.30.3
0.2
0.2
dataset size = 1,000
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
0.6 0.7 0.7
1.4dataset size = 2,500
0
1
2
3
4
5
6
7
1.2 1.2
3.2
6.7
dataset size = 5,000
0
5
10
15
20
25
30
35
40
4.4 4.5
15.9
38.5
dataset size = 10,000
0
20
40
60
80
100
120
140
160
Tim
e in
sec
onds
5.5 6.4
68.8
152.1
dataset size = 20,000
0
5
10
15
20
25
30
35
40
34.8 35.8
dataset size = 100,000
0
50
100
150
200
250
300
350
400
450
424.2 441.7
dataset size = 1,000,000
MethodProposed (Red-black tree)Proposed (AVL tree)ImprovedSimple
Insu
ffici
ent
Mem
ory
Insu
ffici
ent
Mem
ory
-
Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 29
Extensions
● Non-linear Survival SVM– Transform data with Kernel PCA before training in primal
(Chapelle & Keerthi, 2009).
● Hybrid ranking-regression– Ranking approach cannot be used to predict exact time of
event.
– Use objective function that combines ranking and regression loss.
-
Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 30
Conclusion
● Time complexity could be lowered from to
● Space complexity reduces from to
● Same optimization scheme can be applied to non-linear
Survival SVM and hybrid ranking-regression.
● Implementation is available online at
https://github.com/tum-camp.
-
Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 32
Bibliography
● Airola et al.: Training linear ranking SVMs in linearithmic time using red–black trees. Pattern Recogn. Lett. 32(9), 1328–36 (2011)
● Chapelle et al.: Efficient algorithms for ranking with SVMs. Information Retrieval 13(3), 201–5 (2009)
● Evers et al.: Sparse kernel methods for high-dimensional survival data. Bioinformatics 24(14), 1632–8 (2008)
● Herbrich et al.: Large Margin Rank Boundaries for Ordinal Regression. In: Advances in Large Margin Classifiers. pp. 115–32. (2000)
● Lee et al.: Large-Scale Linear RankSVM. Neural Comput. 26(4), 781–817 (2014)
● Van Belle et al.: Support Vector Machines for Survival Analysis. In: Proc. 3rd Int. Conf. Comput. Intell. Med. Healthc. pp. 1–8 (2007)
● Van Belle et al.: Survival SVM: a practical scalable algorithm. In: Proc. of 16th European Symposium on Artificial Neural Networks. pp. 89–94 (2008)