June. 15 2014, Taipei
description
Transcript of June. 15 2014, Taipei
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
June. 15 2014, Taipei
Taerim Lee(1) Hyosuk Lee(2) Edwin Diday(3)
(1) Korea National Open University [email protected]
(2) Department of Internal Medicine, SNU Hospital (3) University of Paris 9 Dauphine France [email protected]
Symbolic Tree for Prognosis of Hepato Cellular Carcinoma
June. 15 2014, Taipei
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
Outline
1. Review of Literature
2. Motivation
3. Tree structures Classification Model for HCC
4. Symbolic Data Analysis for HCC
5. Remarks
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
Motivation
1. To develop the powerful modeling
technique for exploring the functional
form of covariate effects for prognosis of
HCC patients
2. To obtain the tree structured prognostic models for HCC with time covariate
3. To extract new knowledge from a HCC data using Symbolic Data Analysis
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
Purposes
1. To identify the effect of prognostic factors of HCC.
2. To quantify the patient characteristics that related to the high risk clinical factor.
3. To explore the functional form of the relationships of the covariates.
4. To extract new knowledge and fit symbolic tree model
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
Breiman,L.,Friedman,J.H.,Olshen,R.A.,Stone,C.J.(1984)
developed Classification and regression tree, CART
L. Gorden & R. Olshen (1985) presented tree structured survival analysis in the
CancerTreatment Reports
Ciampi.Thiffault, Nakache & Asselain (1986) proposed a variety of splitting criteria such as
likelihood ratio statistics based on the exponential model or the Cox partial likelihood,
Previous Work
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
M.LeBlanc & John Crowley (1992) developed a method for obtaining tree-structured
relative risk estimate using the log-rank statistic for splitting and need between node dissimilarity in a puonning algorithm.
H.Ahn & W.Y. Loh (1994) yields a piece wise-linear Cox proportional hazard
model using curvature detection tests rather than exhaustive serach which evaluate all possible splits in finding splits to reduce computing time.
W.Y. Loh & Y.S shin (1997) derived split selection methods for classification tree
in Statistica Sinica.
Previous Work
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
T. R Lee,H.S Moon(1994) Prediction Model of craniofacial growth-dental arch classification of 6 and 7 year old children-, The Journal of Korea Society of Dental Health, vol21,no.3
T. R Lee(1998) Classification Model for High Risk Dental Caries with RBF Neural Networks,, The Journal of Data Science and Classification, vol.2 (2)
T. R Lee et al (2006) Independent Prognostic factors of 861 cases of oral squamous cell carcinoma in korean adults, Oral Oncology, vol.42, p208-217
Previous Work
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
Bock, H.H, Diday E (2000) (2000) Analysis of symbolic Data. Analysis of symbolic Data. Exploratory methods for extracting statistical Information Exploratory methods for extracting statistical Information from complex data. Springer Verlag,Heidelbergfrom complex data. Springer Verlag,Heidelberg
Bravo Liatas, M.C C (2000) (2000) Strata decision tree sysmbolic data analysis software , Data analysis, classification and related methods, Springer Verlag, p409-415
T. R Lee(2009) Tree Structured Prognostic Model for Tree Structured Prognostic Model for Hepatocellular Carcinoma, Journal of Korea Health Hepatocellular Carcinoma, Journal of Korea Health Inormation & Statistics, Vol.28 No.1, 2009.Inormation & Statistics, Vol.28 No.1, 2009.
T. R Lee (2011) Survival tree for Hepato Cellular Carcinoma
patient, Journal of Korean Society of Public Health
Information & Statistics
Previous Work
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
V. Patel, S.Leethanakul (2001) reported new approaches to the understanding of the reported new approaches to the understanding of the
molecular basis of oral cancer.molecular basis of oral cancer.
Billard L, Diday E(2003) looks at the concept of SDA in general, and attempt to looks at the concept of SDA in general, and attempt to
review the methods available to analyze such data.review the methods available to analyze such data. ‘ ‘From the statistics of Data to the Statistics of From the statistics of Data to the Statistics of
knowledge’knowledge’
Mballo C., Diday E.(2005) compare the Kolmogorov Simirnov criterion and Gini compare the Kolmogorov Simirnov criterion and Gini
index for test selection metric for decision tree index for test selection metric for decision tree induction induction
Previous Work
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
Tree Structured Classification
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
• The tree structured classification modeling
constructs class classification rules based on the information provided in a learning sample of objects with known identities.
L
X2 >b
total
DL D L
X1 >a X 3>c
X4 >d
Tree Model
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
By the stepwise Logistic Regression Analysis(LRA), four variables, were used to construct the logistic regression model.
• The Model which involves is as follows ; • Log Likelihood = 611.989, p = 0.0004, • Goodness of fit chi-sq = 569.34, p = 0.02.
Logistic Regression Model
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
Schematic comparison of a classification tree and Schematic comparison of a classification tree and logistic regression equation for risk assessment0logistic regression equation for risk assessment0Schematic comparison of a classification tree and Schematic comparison of a classification tree and logistic regression equation for risk assessment0logistic regression equation for risk assessment0
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
CART
tree structured prognostic model with effective covariate
: CART uses a decision tree to display how data may be classified or predicted.
: automatically searches for important relationships
and uncovers hidden structure even in highly complex
data.
total
LHH L L
X1 >aX2 >b
X 3>c
X4 >d
H: High riskL: Low risk
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
FACT
tree structured prognostic model with effective covariate
: FACT employs statistical hypothesis test to select a
variable for splitting each node and then uses
discriminant analysis to find the split point .
The size of the tree is determined by a set of rules
total
LHL H L
X1 >aX2 >b
X 3>c
X4 >d
H: high riskL: low risk
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
QUEST
: QUEST is a new classification tree algorithm derived
from the FACT method. It can be used with
univariate splits or linear combination splits.
Unlike FACT, QUEST uses cross-validation pruning.
It distinguishes from other decision tree classifiers is
that when used with univariate splits the classifier
performs approximately unbiased variable selection.
total
LDL D L
X4+2X1 >aX2 >b
X 3>c
X4 >d
D: deathL: live
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
DATA
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
H: High Risk groupL: Low Risk group
Classification Tree ModelClassification Tree ModelClassification Tree ModelClassification Tree Model
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
1. TAENUM 100.02. AFP 87.73. CHILD 72.3 4. SIZE 59.45. INV 59.06. CLIP 45.5
9446(0)48(1)
CHILD≤5.5CHILD≤5.5
SIZE≤3.85SIZE≤3.85
TAENUM≤1.5TAENUM≤1.5
Fig.4 Tree Structured Model for TACE group of HCC data
109(0)1(1)
CART
107(0)3(1)
33(0)0(1)
4612(0)34(1)
3522(0)13(1)
4915(0)34(1)
AFP≤10.4AFP≤10.400
00
INV≤0.5INV≤0.5
Sensitivity 71.7%
Specificity 85.4%
Total 78.7%188(0)10(1)
1714(0)3(1)
8437(0)47(1)
11
1100
0087(0)1(1)
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
RBF Neural Network ClassificationRBF Neural Network Classification
Block diagram representation of nervous system
Stimulus Neural netReceptors ResponseEffectors
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
RBF NN ROC curve according to the
Radial Basis Function
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
Kernel V16 , V17, V19 66.3 64.2
Classification results
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
Survival Tree
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
Survival Data
. The response var ; survival time -
The length of time; a patient has
survived after diagnosis
. Censoring is common since the endpoint
may not be observed because of
termination of a study or failure to
follow up
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
Cox proportional Hazard Model
. Data (Yi, i, xi)
where Yi is the minimum of failure time Zi
and a censoring time Ci
i = I (Zi Ci) is an indicator of the event
that a failure is observed.
Xi=(X1i …Xpi ) is a p dimensional column
vector of covariates.
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
Cox Proportional Hazard Model
Let (t|x) be the hazard rate at time y for
an individual with risk factor X
Cox proportional hazard model;
Where are unknow parameters
0(y) is the baseline hazard rate at
time y.
)exp()()|(1
0
P
kkkXyxt
p ,, 21
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
STUDI
Survival Tree with Unbiased Detection of Interaction
: STUDI is a tree-structured regression modeling tool.
It is easy to interpret predict survival value for new case.
Missing values can easily be handled and time dependent
covariates can be incorporated.
total
LSL S L
X1 >aX2 >b
X 3>c
X4 >d
S: short term surviveL: long term survive
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
29
Split Covariate SelectionSplit Covariate Selection
1. Fit a model to n and f covariates in the node. 2. Obtain the modified Cox-Snell residuals. 3. Perform a curvature test for each of n-s-and c-covariates.
4. Perform a interaction test for each pair of n- s-and c-covariates. 5. Select the covariate which has the smallest p-value.
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
Survival Tree with Unbiased Detection of
Interaction
Cho & Loh(2001)
- STUDI is tree structured regression modeling tool.
- It is easy to interpret predict survival value for new case.
- Missing value can easily be handled and time dependent covariates can be incorporated.
STUDI
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
Let the survival function for a covariate Xi
be
where is the cumulative baseline hazard rate.
Then median survival time for an individual i is defined as and the cost at a node t be is defined as
STUDI
)}exp()(exp{)|( 0 ii XyxXyS
)(0 y
}5.0)(|inf{~ ySyy
n
1iii |yy|R(t)
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
Modified Cox-Snell(MCS) residuals;
for
where is the estimator of the cumulative baseline hazard function.
ni ,,1
0
)1(693.0)ˆexp()(ˆMCS 0 iii XY
Tree Structured Survival Model
STUDI
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
33
Fig 4. Scatter plot of Box plot of the MCS Residuals
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
Fig.11 Tree Structured Survival Model with SNP and Clinical Data
of HCC using imputed 252 missing data
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
36
Fig. 6 Tree structured Survival model for OSCC
28
5
21
88
121110 13
4 6 7
3 141Radio ≤ 0.00E+00
Radio ≤ 5.92E+03
Pstage=1,2,3
Age ≤ 5.20E+01
15
2.42E+02
73
25txmethod=1,2,5
48
t=1,4
size ≤ 1.60E+01
25
109.40E+01
151.80E+01
size ≤ 1.04E+01
Age ≤ 5.80E+01
15
19
181
44Site
=10,2,3,4,5,6,7,9
20 21
40 41 45
9190
180
2322 28
1514
296
6.30E+01
size ≤ 6.77E+00
401.06E+02
96.30E+01
67.30E+01
Site=10,2,3,5,6,7,9
12 62.60E+01
63.30E+01
66.50E+01
18
size ≤ 1.00E+00
24 241.00E+01
131.57E+02
88.70E+01
77.50E+01
Academia Sinica
37
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
SDA
1. To generalize data mining and statistics to
higher level units described by symbolic
data
2. To extract new knowledge from a database
by using a standard data table 3. Working on higher level units called concepts
necessary described by more complex data
extending data mining to knowledge mining
(Symbolic Data Analysis)
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
From data mining to knowledge mining
1. A SDA needs two level of units
The first level : individual
The second level : concepts
2. A Concept is described by using the description
of class of individuals of its extent
3. The description of a concept must express the
variation of the individuals of its extent
4. Output of SDA provide new symbolic objects
associated with new categories, categories of
concepts
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
SDA steps
1. Related database : composed of several more or less
linked data
2. Define a set of categories based on the categorical
variable from a quary to be given related database
3. The class of individuals which defines the extent of
category
4. Generalize process is applied to the subset of
individuals belonging to the extent of each concept
5. Define a symbolic data table
6. Symbolic Data Analysis
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
The main step for a SDAThe main step for a SDA The main step for a SDAThe main step for a SDA
Put the Data in a relational Data Base
Define a Context by Giving the Units & Classes
Build a Symbolic Data Table
Apply SDA tools: Decision tree, Clustering, Graphical visualization
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
SDA Advantage
Aggregated data representation Confidentiality preservation Data volume reduction
Symbolic Object
= intention(symbolic description + recognition function of the extension)
+ extension(individuals represented by the concept)
Eg. [ sex~(man(0.8), woman(0.2))]^[region~{city, rural}]^ Salary~[1.2, 3.1]
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
Symbolic ObjectSymbolic Object Symbolic ObjectSymbolic Object
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
SDASchematic expression
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
SDA Input Symbolic DataInput Symbolic Data Input Symbolic DataInput Symbolic Data
concepts
Description of individual
Column symbolic variable
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
Symbolic Data Table
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
Symbolic Data variable
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
Input Symbolic Data 2D Zoom VisualizationInput Symbolic Data 2D Zoom VisualizationInput Symbolic Data 2D Zoom VisualizationInput Symbolic Data 2D Zoom Visualization
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
3D Zoom Stars3D Zoom Stars3D Zoom Stars3D Zoom Stars
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
2D and 3D Doom Stars2D and 3D Doom Stars2D and 3D Doom Stars2D and 3D Doom Stars
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
Fig5. SDA results according to new defined
concept of metastasis & prognosis
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
Symbolic Tree
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
Clustering Tree
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
Table 1. Patients Baseline characteristics of HCC
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
Table 2 Cox proportional hazards model for metastasis-free survival
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
Fig1. Survival curve of metastasis free HCC
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
Fig2. Survival curve of metastasis free HCC patient according to AJCC statge
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
Fig3. Survival curve of metastasis free HCC patient according to the histologic response
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
Fig.11 Tree Structured Survival Model with SNP and Clinical Data
of HCC using imputed 252 missing data
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
individuals free of HCC and liver cirrhosis but bCL (2x0xbCL), individuals free
of HCC and diagnosis3 and liver cirrhosis (3x1xbCL), individuals with
diagnosis4 and liver cirrhosis and acute HCC (4x1xaHCC), and individuals with
diagnosis 6 and acute HCC and free of liver cirrhosis (6x0xaHCC).
Fig.5 Characterization of the classes according to the evolution of HCC.
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
groups the 12 clinical variables and 3 gene data with the lowest frequency of
degradation (3x1xbCL), against Partition HCC class that contains the larger
variance with the highest frequency of degradation (4x1xaHCC).
Fig.5 Comparison of the Partition free of HCC class.
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
Fig.6 The most discriminating variables to influence HCC prognosis.
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
INPUT: the symbolic data table with three concepts OUTPUT: a symbolic data table with three rows associated to
the three concepts:The first column represents the frequencies of missing data "."
the others represent the frequencies of LC = 0 and the last row the frequencies of LC = 1
Fig.6 The most discriminating variables to influence HCC prognosis.
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
Fig.7 Maps of the distribution in the categories of "Encephalothy",
"Ascites", on the first factorial plane.
Data with the lowest frequency of degradation (3x1xbCL),
against Partition HCC class that contains the larger variance
with the highest frequency of degradation (4x1xaHCC).
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
Fig.8 Correlation circle between the first more discriminating symbolic
variables and some of their bins
The plane and symbolic variables in the smallest square which contains the first quadrant of the circle of correlation. The Figure 8 shows the distribution of the categories of the weight. From the representation of the weight on the left, it can be seen that the class (3x1xbCL), i. e. individuals with HCC at baseline and at the end of the study, are the lightest individuals, and from the representation of the encephalothy on the right, we can see that the class 3x1xbC of degradation has greater ascites well.
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
Fig.9 Correlation circle the whole gene variable and bins of clinical variables
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
INPUT : Patient data table
OUTPUT : Decision tree and rules
Notice that Symbolic TREE is better adapted
working on concepts than on individuals.
Symbolic Tree for HCC
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
Fig.10 Symbolic Tree for HCC
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
Table 3 List of the most characteristic bins of 3x1xbCL and 4x1xaHCC
Symbolic tree for prognosis of Hepato Cellular Carcinoma
Academia Sinica
RemarksRemarksRemarksRemarks
1. The application of tree structured classification
gave easy interpretable method with small number
predictor variables.
2. The application of SDA results in more detail
information and symbolic description of classes.
3. SDA gave more practical information with
visualization graph and diagram.
Academia Sinica
Q & A
Academia Sinica
Thank you !谢谢 !