Fast Training of Support Vector Machines for Survival Analysis · 2020. 7. 2. · Herbrich et al.:...

29
Computer Aided Medical Procedures Fast Training of Support Vector Machines for Survival Analysis Sebastian Pölsterl 1 , Nassir Navab 1,2 , and Amin Katouzian 1 1: Technische Universität München, Munich, Germany 2: Johns Hopkins University, Baltimore MD, USA

Transcript of Fast Training of Support Vector Machines for Survival Analysis · 2020. 7. 2. · Herbrich et al.:...

  • Computer Aided Medical Procedures

    Fast Training of Support Vector Machines for Survival AnalysisSebastian Pölsterl1, Nassir Navab1,2, and Amin Katouzian11: Technische Universität München, Munich, Germany2: Johns Hopkins University, Baltimore MD, USA

  • Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 2

    Survival Analysis

    ● Objective: to establish a connection between covariates and the time between the start of the study and an event.

    ● Possible formulation: Rank subjects according to observed survival time.

    ● Usually, parts of survival data can only be partially observed – they are censored.

    ● Survival data consists of n triplets:– a d-dimensional feature vector

    – time of event or time of censoring

    – event indicator

  • Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 3

    Right Censoring

    2 4 6 8 10 12

    Time in months

    End

    of Stud

    y

    A Lost

    B †

    C Dropped out

    D †

    E

    1 2 3 4 5 6

    Time since enrollment in months

    A Lost

    B †

    C Dropped out

    D †

    E

    Pat

    ient

    s

    ● Only events that occur while the study is running can be recorded (records are uncensored).

    ● For individuals that remained event-free during the study period, it is unknown whether an event has or has not occurred after the study ended (records are right censored).

  • Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 4

    Right Censoring

    ● Only events that occur while the study is running can be recorded (records are uncensored).

    ● For individuals that remained event-free during the study period, it is unknown whether an event has or has not occurred after the study ended (records are right censored).

    Incomparable

    2 4 6 8 10 12

    Time in months

    End

    of Stud

    y

    A Lost

    B †

    C Dropped out

    D †

    E

    1 2 3 4 5 6

    Time since enrollment in months

    A Lost

    B †

    C Dropped out

    D †

    E

    Pat

    ient

    s

  • Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 5

    Right Censoring

    ● Only events that occur while the study is running can be recorded (records are uncensored).

    ● For individuals that remained event-free during the study period, it is unknown whether an event has or has not occurred after the study ended (records are right censored).

    Incomparable

    Comparable

    2 4 6 8 10 12

    Time in months

    End

    of Stud

    y

    A Lost

    B †

    C Dropped out

    D †

    E

    1 2 3 4 5 6

    Time since enrollment in months

    A Lost

    B †

    C Dropped out

    D †

    E

    Pat

    ient

    s

  • Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 6

    Overview

    ● Problem:– Naive training algorithms for linear Survival Support Vector

    Machines require O(n4) time and O(n2) space (Van Belle et al., 2007; Evers et al., 2008).

    ● Proposed Solution:– Perform optimization in the primal using truncated Newton

    optimization.

    – Use order statistics trees to lower time and space requirements.

    – Approach extends to hybrid ranking-regression objective function as well as non-linear Survival SVM.

  • Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 7

    Survival SVM

    ● Objective function depends on a quadratic number of pairwise comparisons

    ● Closely related to RankSVM (Herbrich et al., 2000), where

    ● Ties in survival time are not common, i.e., number of relevance levels r for RankSVM is O(n).

  • Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 8

    Related Work – Survival SVM

    ● Van Belle et al., 2007: Explicitly construct all pairwise comparisons of samples to transform ranking problem into classification problem and use standard dual SVM solver.

    ● Van Belle et al., 2008: Reduces number of samples n by clustering data according to survival times using k-nearest neighbor search.

  • Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 9

    Related Work – Rank SVM

    ● Airola et al., 2011: Combines cutting plane optimization with red-black tree based approach to subgradient calculations.

    ● Lee et al., 2014: Combines truncated Newton optimization with order statistics trees to compute gradient and Hessian.

  • Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 10

    The Objective Function (1)

    ● is a sparse matrix with each row having one entry that is 1, one entry that is -1, and the remainder all zeros.

    ● denotes the number of support vectors:

  • Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 11

    The Objective Function (2)

    ● is a sparse matrix with each row having one entry that is 1, one entry that is -1, and the remainder all zeros.

    ● Example:

  • Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 13

    Truncated Newton Optimization

    ● Problem: Explicitly storing the Hessian matrix can be prohibitive for high-dimensional survival data.

    ● Proposed Solution:– Optimization in primal.

    – Avoid constructing Hessian matrix by using truncated Newton optimization, which only requires computation of Hessian-vector product:

  • Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 14

    Calculation of Search Direction (1)

    ● In each iteration of Newton's method, has to be recomputed due to its dependency on

  • Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 15

    Calculation of Search Direction (2)

    ● In each iteration of Newton's method, has to be recomputed due to its dependency on

  • Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 16

    Calculation of Search Direction (3)

    ● can compactly be expressed as:

  • Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 17

    Calculation of Search Direction (4)

    ● Assume that have been computed.

    ● Hessian-vector product can be computed in instead of

  • Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 18

    Order Statistics Trees

    ● Problem: Order depends on survival times and predicted scores

    ● Solution:– Sort survival data according to .

    – Incrementally add and to an order statistics tree (balanced binary search tree).

  • Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 19

    Order Statistics Trees

    6

    95

    1 (censored)

  • Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 20

    Order Statistics Trees

    6

    85

    92

    1

    (censored)

  • Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 21

    Order Statistics Trees

    6

    85

    92

    1

  • Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 22

    Order Statistics Trees

    6

    85

    92

    1

    (censored)

  • Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 23

    Order Statistics Trees

    6

    85

    92

    1

  • Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 24

    Order Statistics Trees

    6

    85

    92

    1 3

    7

  • Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 26

    Efficient Hessian-vector Product

    ● Before: Hessian-vector product required

    ● Now: After sorting according to predicted scores, can be obtained in

    ● Hessian-vector product does not require constructing matrix of size anymore

    ● Hessian-vector product requires

  • Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 27

    Overall Complexity

    ● Time complexity:

    ● Space complexity:

    No need to explicitly construct all pairwise differences.

  • Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 28

    Training Time (in seconds)

    0.00

    0.05

    0.10

    0.15

    0.20

    0.25

    0.30

    0.35

    Tim

    e in

    sec

    onds

    0.30.3

    0.2

    0.2

    dataset size = 1,000

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    1.2

    1.4

    0.6 0.7 0.7

    1.4dataset size = 2,500

    0

    1

    2

    3

    4

    5

    6

    7

    1.2 1.2

    3.2

    6.7

    dataset size = 5,000

    0

    5

    10

    15

    20

    25

    30

    35

    40

    4.4 4.5

    15.9

    38.5

    dataset size = 10,000

    0

    20

    40

    60

    80

    100

    120

    140

    160

    Tim

    e in

    sec

    onds

    5.5 6.4

    68.8

    152.1

    dataset size = 20,000

    0

    5

    10

    15

    20

    25

    30

    35

    40

    34.8 35.8

    dataset size = 100,000

    0

    50

    100

    150

    200

    250

    300

    350

    400

    450

    424.2 441.7

    dataset size = 1,000,000

    MethodProposed (Red-black tree)Proposed (AVL tree)ImprovedSimple

    Insu

    ffici

    ent

    Mem

    ory

    Insu

    ffici

    ent

    Mem

    ory

  • Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 29

    Extensions

    ● Non-linear Survival SVM– Transform data with Kernel PCA before training in primal

    (Chapelle & Keerthi, 2009).

    ● Hybrid ranking-regression– Ranking approach cannot be used to predict exact time of

    event.

    – Use objective function that combines ranking and regression loss.

  • Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 30

    Conclusion

    ● Time complexity could be lowered from to

    ● Space complexity reduces from to

    ● Same optimization scheme can be applied to non-linear

    Survival SVM and hybrid ranking-regression.

    ● Implementation is available online at

    https://github.com/tum-camp.

  • Sebastian Pölsterl, ECML PKDD, 7-11 September 2015, Porto, Portugal 32

    Bibliography

    ● Airola et al.: Training linear ranking SVMs in linearithmic time using red–black trees. Pattern Recogn. Lett. 32(9), 1328–36 (2011)

    ● Chapelle et al.: Efficient algorithms for ranking with SVMs. Information Retrieval 13(3), 201–5 (2009)

    ● Evers et al.: Sparse kernel methods for high-dimensional survival data. Bioinformatics 24(14), 1632–8 (2008)

    ● Herbrich et al.: Large Margin Rank Boundaries for Ordinal Regression. In: Advances in Large Margin Classifiers. pp. 115–32. (2000)

    ● Lee et al.: Large-Scale Linear RankSVM. Neural Comput. 26(4), 781–817 (2014)

    ● Van Belle et al.: Support Vector Machines for Survival Analysis. In: Proc. 3rd Int. Conf. Comput. Intell. Med. Healthc. pp. 1–8 (2007)

    ● Van Belle et al.: Survival SVM: a practical scalable algorithm. In: Proc. of 16th European Symposium on Artificial Neural Networks. pp. 89–94 (2008)