Knowledge-Based Breast Cancer Prognosis Olvi Mangasarian UW Madison & UCSD La Jolla Edward Wild UW...
-
Upload
kimberly-armstrong -
Category
Documents
-
view
214 -
download
0
Transcript of Knowledge-Based Breast Cancer Prognosis Olvi Mangasarian UW Madison & UCSD La Jolla Edward Wild UW...
![Page 1: Knowledge-Based Breast Cancer Prognosis Olvi Mangasarian UW Madison & UCSD La Jolla Edward Wild UW Madison Computation and Informatics in Biology and Medicine.](https://reader031.fdocuments.in/reader031/viewer/2022032607/56649ec65503460f94bd1045/html5/thumbnails/1.jpg)
Knowledge-Based Breast Cancer Prognosis
Olvi MangasarianUW Madison & UCSD La Jolla
Edward WildUW Madison
Computation and Informatics in Biology and MedicineTraining Program Annual Retreat
October 13, 2006
![Page 2: Knowledge-Based Breast Cancer Prognosis Olvi Mangasarian UW Madison & UCSD La Jolla Edward Wild UW Madison Computation and Informatics in Biology and Medicine.](https://reader031.fdocuments.in/reader031/viewer/2022032607/56649ec65503460f94bd1045/html5/thumbnails/2.jpg)
Objectives
Primary objective: Incorporate prior knowledge over completely arbitrary sets into: function approximation, and classification without transforming (kernelizing) the knowledge
Secondary objective: Achieve transparency of the prior knowledge for practical applications
Use prior knowledge to improve accuracy on two difficult breast cancer prognosis problems
![Page 3: Knowledge-Based Breast Cancer Prognosis Olvi Mangasarian UW Madison & UCSD La Jolla Edward Wild UW Madison Computation and Informatics in Biology and Medicine.](https://reader031.fdocuments.in/reader031/viewer/2022032607/56649ec65503460f94bd1045/html5/thumbnails/3.jpg)
Classification and Function Approximation
Given a set of m points in n-dimensional real space Rn with corresponding labels
Labels in {+1, 1} for classification problems Labels in R for approximation problems
Points are represented by rows of a matrix A 2 Rm£n
Corresponding labels or function values are given by a vector y
Classification: y 2 {+1, 1}m
Approximation: y 2 Rm
Find a function f(Ai) = yi based on the given data points Ai
f : Rn ! {+1, 1} for classification f : Rn ! R for approximation
![Page 4: Knowledge-Based Breast Cancer Prognosis Olvi Mangasarian UW Madison & UCSD La Jolla Edward Wild UW Madison Computation and Informatics in Biology and Medicine.](https://reader031.fdocuments.in/reader031/viewer/2022032607/56649ec65503460f94bd1045/html5/thumbnails/4.jpg)
Graphical Example with no Prior Knowledge Incorporated
+
++
++
+++
K(x0, B0)u =
![Page 5: Knowledge-Based Breast Cancer Prognosis Olvi Mangasarian UW Madison & UCSD La Jolla Edward Wild UW Madison Computation and Informatics in Biology and Medicine.](https://reader031.fdocuments.in/reader031/viewer/2022032607/56649ec65503460f94bd1045/html5/thumbnails/5.jpg)
Classification and Function Approximation
Problem: utilizing only given data may result in a poor classifier or approximationPoints may be noisySampling may be costly
Solution: use prior knowledge to improve the classifier or approximation
![Page 6: Knowledge-Based Breast Cancer Prognosis Olvi Mangasarian UW Madison & UCSD La Jolla Edward Wild UW Madison Computation and Informatics in Biology and Medicine.](https://reader031.fdocuments.in/reader031/viewer/2022032607/56649ec65503460f94bd1045/html5/thumbnails/6.jpg)
Graphical Example with Prior Knowledge Incorporated
+
++
++
+++
g(x) · 0
h1(x) · 0
h2(x) · 0
Similar approach for approximation
K(x0, B0)u =
![Page 7: Knowledge-Based Breast Cancer Prognosis Olvi Mangasarian UW Madison & UCSD La Jolla Edward Wild UW Madison Computation and Informatics in Biology and Medicine.](https://reader031.fdocuments.in/reader031/viewer/2022032607/56649ec65503460f94bd1045/html5/thumbnails/7.jpg)
Kernel Machines
Approximate f by a nonlinear kernel function K using parameters u 2 Rk and in R
A kernel function is a nonlinear generalization of scalar product
f(x) K(x0, B0)u , x 2 Rn, K:Rn £ Rn£k ! Rk
B 2 Rk£n is a basis matrixUsually, B = A 2 Rm£n = Input data matrix In Reduced Support Vector Machines, B is a small
subset of the rows of AB may be any matrix with n columns
![Page 8: Knowledge-Based Breast Cancer Prognosis Olvi Mangasarian UW Madison & UCSD La Jolla Edward Wild UW Madison Computation and Informatics in Biology and Medicine.](https://reader031.fdocuments.in/reader031/viewer/2022032607/56649ec65503460f94bd1045/html5/thumbnails/8.jpg)
Kernel Machines
Introduce slack variable s to measure error in classification or approximation
Error s in kernel approximation of given data:s K(A, B0)u e y s, e is a vector of ones in Rm
Function approximation: f(x) x0, B0)u Error s in kernel classification of given data
K(A+, B0)u e + s+ ¸ e, s+ ¸ 0K(A , B0)u e s e, s ¸ 0
More succinctly, let: D = diag(y), the m£m matrix with diagonal y of § 1’s, then:D(K(A, B0)u e) + s ¸ e, s ¸ 0Classifier: f(x) sign(x0, B0)u
![Page 9: Knowledge-Based Breast Cancer Prognosis Olvi Mangasarian UW Madison & UCSD La Jolla Edward Wild UW Madison Computation and Informatics in Biology and Medicine.](https://reader031.fdocuments.in/reader031/viewer/2022032607/56649ec65503460f94bd1045/html5/thumbnails/9.jpg)
Positive parameter controls trade off between solution complexity: e0a = ||u||1 at solution
data fitting: e0s = ||s||1 at solution
Kernel Machines in Approximation OR Classification
OR
![Page 10: Knowledge-Based Breast Cancer Prognosis Olvi Mangasarian UW Madison & UCSD La Jolla Edward Wild UW Madison Computation and Informatics in Biology and Medicine.](https://reader031.fdocuments.in/reader031/viewer/2022032607/56649ec65503460f94bd1045/html5/thumbnails/10.jpg)
Start with arbitrary nonlinear knowledge implication
g, h are arbitrary functions on g:! Rk, h:! Rg(x) 0 K(x0, B0)u h(x), 8x 2 ½ Rn
Linear in v, u,
Nonlinear Prior Knowledge in Function Approximation
9v ¸ 0: v0g(x) K(x0, B0)u h(x) ¸ 0 8x 2
![Page 11: Knowledge-Based Breast Cancer Prognosis Olvi Mangasarian UW Madison & UCSD La Jolla Edward Wild UW Madison Computation and Informatics in Biology and Medicine.](https://reader031.fdocuments.in/reader031/viewer/2022032607/56649ec65503460f94bd1045/html5/thumbnails/11.jpg)
Assume that g(x), K(x0, B0)u , h(x) are convex functions of x, that is convex and 9 x 2 : g(x) 0. Then either: I. g(x) 0, K(x0, B0)u h(x) 0 has a solution x , or II. v Rk, v 0: K(x0, B0)u h(x) + v0g(x) 0 x But never both.
If we can find v 0: K(x0, B0)u h(x) + v0g(x) 0
x , then by above theoremg(x) 0, K(x0, B0)u h(x) 0 has no solution x or
equivalently:g(x) 0 K(x0, B0)u h(x), 8x 2
Theorem of the Alternative for Convex Functions
![Page 12: Knowledge-Based Breast Cancer Prognosis Olvi Mangasarian UW Madison & UCSD La Jolla Edward Wild UW Madison Computation and Informatics in Biology and Medicine.](https://reader031.fdocuments.in/reader031/viewer/2022032607/56649ec65503460f94bd1045/html5/thumbnails/12.jpg)
Incorporating Prior Knowledge
Linear semi-infinite program: infinite number of constraints
Discretize to obtain a finite linear program
g(xi) · 0 ) K(xi0, B0)u - ¸ h(xi), i = 1, …, k
Slacks zi allow knowledge to be satisfied inexactly at the point xi
Add term in objective to drive prior knowledge error to zero
![Page 13: Knowledge-Based Breast Cancer Prognosis Olvi Mangasarian UW Madison & UCSD La Jolla Edward Wild UW Madison Computation and Informatics in Biology and Medicine.](https://reader031.fdocuments.in/reader031/viewer/2022032607/56649ec65503460f94bd1045/html5/thumbnails/13.jpg)
Incorporating Prior Knowledge in Classification (Very Similar)
Implication for positive regiong(x) 0 K(x0, B0)u , 8x 2 ½ Rn
9v ¸ 0, K(x0, B0)u + v0g(x) ¸ 0, 8x 2 Similar implication for negative regionsAdd discretized constraints to linear program
![Page 14: Knowledge-Based Breast Cancer Prognosis Olvi Mangasarian UW Madison & UCSD La Jolla Edward Wild UW Madison Computation and Informatics in Biology and Medicine.](https://reader031.fdocuments.in/reader031/viewer/2022032607/56649ec65503460f94bd1045/html5/thumbnails/14.jpg)
Incorporating Prior Knowledge in Classification
![Page 15: Knowledge-Based Breast Cancer Prognosis Olvi Mangasarian UW Madison & UCSD La Jolla Edward Wild UW Madison Computation and Informatics in Biology and Medicine.](https://reader031.fdocuments.in/reader031/viewer/2022032607/56649ec65503460f94bd1045/html5/thumbnails/15.jpg)
100100
Checkerboard Dataset:Black and White Points in R2
Classifier based on the 16 points at the center of each square and no prior knowledge
Prior knowledge given at 100 points in the two left-most squares of the bottom row
Perfect classifier based on the same 16 points and the prior knowledge
![Page 16: Knowledge-Based Breast Cancer Prognosis Olvi Mangasarian UW Madison & UCSD La Jolla Edward Wild UW Madison Computation and Informatics in Biology and Medicine.](https://reader031.fdocuments.in/reader031/viewer/2022032607/56649ec65503460f94bd1045/html5/thumbnails/16.jpg)
Predicting Lymph Node Metastasis as a Function of Tumor Size
Number of metastasized lymph nodes is an important prognostic indicator for breast cancer recurrence Determined by surgery in addition to the removal of the
tumorOptional procedure especially if tumor size is small
Wisconsin Prognostic Breast Cancer (WPBC) dataLymph node metastasis and tumor size for 194 patients
Task: predict the number of metastasized lymph nodes given tumor size alone
![Page 17: Knowledge-Based Breast Cancer Prognosis Olvi Mangasarian UW Madison & UCSD La Jolla Edward Wild UW Madison Computation and Informatics in Biology and Medicine.](https://reader031.fdocuments.in/reader031/viewer/2022032607/56649ec65503460f94bd1045/html5/thumbnails/17.jpg)
Predicting Lymph Node Metastasis
Split data into two portionsPast data: 20% used to find prior knowledgePresent data: 80% used to evaluate performance
Simulates acquiring prior knowledge from an expert
![Page 18: Knowledge-Based Breast Cancer Prognosis Olvi Mangasarian UW Madison & UCSD La Jolla Edward Wild UW Madison Computation and Informatics in Biology and Medicine.](https://reader031.fdocuments.in/reader031/viewer/2022032607/56649ec65503460f94bd1045/html5/thumbnails/18.jpg)
Prior Knowledge for Lymph Node Metastasis as a Function of Tumor Size
Generate prior knowledge by fitting past data: h(x) := K(x0, B0)u B is the matrix of the past data points
Use density estimation to decide where to enforce knowledgep(x) is the empirical density of the past data
Prior knowledge utilized on approximating function f(x):Number of metastasized lymph nodes is greater than the
predicted value on past data, with tolerance of 1%p(x) 0.1 f(x) ¸ h(x) 0.01
![Page 19: Knowledge-Based Breast Cancer Prognosis Olvi Mangasarian UW Madison & UCSD La Jolla Edward Wild UW Madison Computation and Informatics in Biology and Medicine.](https://reader031.fdocuments.in/reader031/viewer/2022032607/56649ec65503460f94bd1045/html5/thumbnails/19.jpg)
Predicting Lymph Node Metastasis: Results
RMSE: root-mean-squared-error LOO: leave-one-out error
Improvement due to knowledge: 14.9%
Approximation ErrorPrior knowledge h(x) based
on past data 20%6.12 RMSE
f(x) without knowledge based on present data 80%
5.92 LOO
f(x) with knowledge based on present data 80%
5.04 LOO
![Page 20: Knowledge-Based Breast Cancer Prognosis Olvi Mangasarian UW Madison & UCSD La Jolla Edward Wild UW Madison Computation and Informatics in Biology and Medicine.](https://reader031.fdocuments.in/reader031/viewer/2022032607/56649ec65503460f94bd1045/html5/thumbnails/20.jpg)
Predicting Breast Cancer Recurrence Within 24 Months
Wisconsin Prognostic Breast Cancer (WPBC) dataset 155 patients monitored for recurrence within 24 months 30 cytological features 2 histological features: number of metastasized lymph nodes and tumor size
Predict whether or not a patient remains cancer free after 24 months 82% of patients remain disease free 86% accuracy (Bennett, 1992) best previously attained Prior knowledge allows us to incorporate additional information to
improve accuracy
![Page 21: Knowledge-Based Breast Cancer Prognosis Olvi Mangasarian UW Madison & UCSD La Jolla Edward Wild UW Madison Computation and Informatics in Biology and Medicine.](https://reader031.fdocuments.in/reader031/viewer/2022032607/56649ec65503460f94bd1045/html5/thumbnails/21.jpg)
Generating WPBC Prior Knowledge Gray regions indicate areas
where g(x) · 0 Simulate oncological surgeon’s
advice about recurrence Knowledge imposed at dataset
points inside given regions
Tumor Size in CentimetersNum
ber
of M
etas
tasi
zed
Lym
ph N
odes
+ Recur• Cancer free
![Page 22: Knowledge-Based Breast Cancer Prognosis Olvi Mangasarian UW Madison & UCSD La Jolla Edward Wild UW Madison Computation and Informatics in Biology and Medicine.](https://reader031.fdocuments.in/reader031/viewer/2022032607/56649ec65503460f94bd1045/html5/thumbnails/22.jpg)
WPBC Results
49.7 % improvement due to knowledge
35.7 % improvement over best previous predictor
ClassifierMisclassification
Rate
Without Knowledge 18.1% .
With Knowledge 9.0% .
![Page 23: Knowledge-Based Breast Cancer Prognosis Olvi Mangasarian UW Madison & UCSD La Jolla Edward Wild UW Madison Computation and Informatics in Biology and Medicine.](https://reader031.fdocuments.in/reader031/viewer/2022032607/56649ec65503460f94bd1045/html5/thumbnails/23.jpg)
Conclusion General nonlinear prior knowledge incorporated into
kernel classification and approximation Implemented as linear inequalities in a linear programming
problem Knowledge appears transparently
Demonstrated effectiveness of nonlinear prior knowledge on two real world problems from breast cancer prognosis
Future work Prior knowledge with more general implications User-friendly interface for knowledge specification
More information http://www.cs.wisc.edu/~olvi/ http://www.cs.wisc.edu/~wildt/