CAN 8-1-06 Raponi

download CAN 8-1-06 Raponi

of 17

Transcript of CAN 8-1-06 Raponi

  • 7/31/2019 CAN 8-1-06 Raponi

    1/17

    Supporting Information

    Patient population. 130 fresh frozen, surgically resected lung SCC samples from 129

    individual patients (LS-71 and LS-136 were duplicate samples from different areas of the

    same tumor) from all stages of squamous cell lung carcinoma were evaluated in this

    study. These samples were collected from patients from the University of Michigan

    Hospital between October 1991 and July 2002 with patient consent and Institutional

    Review Board approval. Portions of the resected lung carcinomas were sectioned and

    evaluated by the study pathologist by routine H&E staining. Samples chosen for analysis

    contained greater than 70% tumor cells. Seventy-three patients had stage I SCC. Follow-

    up data was available for all patients. The mean patient age was 68 10 (range 42-91)

    with approximately 45% of patients 70 years or older. Supplementary Table 1 lists the

    clinical data associated with all lung SCC samples used in this study. The 86 lung

    adenocarcinoma samples have previously been described (1). Table 1 shows the clinical

    information associated with both adenocarcinoma and SCC samples used for identifying

    and testing the prognostic signatures. Patients were censored from statistical analysis if

    they were alive but had less than three years of clinical follow up. The independent

    NSCLC dataset comprised 36 lung adenocarcinoma (27 Stage I) and 36 lung SCC

    samples (25 Stage I) with at least 3 years follow up (Gene Omnibus dataset: GSE3141)

    (2).

    Microarray Analysis. Total RNA was extracted from fresh frozen tissue using the Trizol

    method (Gibco BRL). RNA quality was checked using the Agilent Bioanalyzer. The

  • 7/31/2019 CAN 8-1-06 Raponi

    2/17

    mean ribosomal ratio (28s/18s) for all samples was 1.5 (range: 1.0 - 2.1). Four

    micrograms of total RNA was amplified, labeled and aRNA was fragmented and

    hybridized to the Affymetrix U133A GeneChip as previously described (3). Microarray

    data was extracted using the Affymetrix MAS 5 software. Global gene expression was

    scaled to an average intensity of 600 units. The data was then normalized using a spline

    quantile normalization method. 128 of the 130 microarrays gave good data (% present >

    40, scaling factor < 10) while the remaining 2 samples (LS78, LS82) gave acceptable

    results (% present > 30, scaling factor < 15). All samples were included in the analyses.

    The CEL files of the external validation datasets were downloaded from

    http://data.cgt.duke.edu/oncogene.php (2) and the data were extracted using Affymetrix

    MAS 5 software and were globally scaled to an average intensity of 600 units. Since the

    validation datasets were from external sources on different platforms (U133Plus 2.0),

    analysis of variance (ANOVA) was used to normalize the batch effect such as different

    sample preparation methods, different RNA extraction methods, different hybridization

    protocols, and different scanners. Array data from the duplicate samples LS-71 and LS-

    136 was averaged for unsupervised and supervised analyses. The SCC data discussed in

    this publication have been deposited in NCBIs Gene Expression Omnibus (GEO,

    http://www.ncbi.nlm.nih.gov/geo/) and are accessible through GEO Series accession

    number GSE4573.

    The 50 probe sets from the adenocarcinoma classifier were identified on the

    Affymetrix HU6800 GeneChip (1). For validation of this classifier using the Duke data

    set it was necessary to map the HU6800 probe sets to the U133Plus GeneChip. To this

    end, probe sets on HU6800 were mapped to U133Plus through the U95 GeneChip based

  • 7/31/2019 CAN 8-1-06 Raponi

    3/17

    on the Affymetrix Array Comparison Spreadsheets (Best Match). For those probe sets

    that could not be mapped by the Affymetrix documentation, a BLAST was performed on

    the target sequence from HU6800 to identify the corresponding probe sets on U133Plus.

    Using this approach 47 of the 50 probe sets were mapped to the U133Plus GeneChip.

    Real-Time Quantitative RTPCR. Total RNA samples were normalized by

    OD260. Quality testing includedanalysis by capillary electrophoresis using a Bioanalyzer

    (Agilent). For aRNA, the RibobeastTM 1- Round Aminoallyl-aRNA amplification kit

    (Epicentre) was used. All first-strand cDNA synthesis, second-strand cDNA synthesis, in

    vitro transcription of aRNA, DNase treatment, purification and other steps were

    performed according to the manufactures protocol. For each sample aRNA was reverse

    transcribed into first-stand cDNA and used for real-time quantitative RT-PCR. The first-

    strand cDNA synthesis reaction contained, 100 ng of aRNA, 1 L of 50 ng/L T7-

    Oligo(dT) primer, 0.25 L of 10 mM dNTPs, 1 L of 5 X SuperscriptTM III Reverse

    Transcriptase Buffer, 0.25 L of 200 U/L SuperscriptTM III Reverse Transcriptase

    (Invitrogen Corp), 0.25 L of 100 mM DTT and 0.25 L of 0.3 U/ul RNase Inhibitor

    (Epicentre) in a total reaction volume of 5 L. Real-time quantitative RTPCR analyses

    were performed on the ABI Prism 7900HT sequence detection system (Applied

    Biosystems). Eachreaction contained 10 L of 2X TaqMan Universal PCR Master Mix

    (AppliedBiosystems), 5 L of cDNA template, and 1ul of 20 X Assays-on-Demand

    Gene Expression Assay Mix (AppliedBiosystems) in a total reaction volume of 20 L.

    The PCR consisted of an UNG activation step at 50C for 2 min and initial enzyme

  • 7/31/2019 CAN 8-1-06 Raponi

    4/17

    activation step at 95Cfor 10 min, followed by 40 cycles of 95C for 15 sec, 60C

    for 1

    min.

    Statistical Analyses

    Univariate Cox proportional hazards regression was used to identify gene

    transcripts whose expression levels were correlated to patient disease-free time. To

    reduce the effect of multiple testing and to test the robustness of the selected gene

    transcripts, the Cox model was performed with bootstrapping of the patients in the

    training set. Briefly, 400 bootstrap samples of the training set were constructed, each

    containing the same number of patients as the training set but randomly chosen with

    replacement. Cox modeling was run on each of the bootstrap samples. A bootstrap score

    was created for each gene transcript by averaging the inverses of the bootstrap p-values

    with the top and bottom 5% of p-values removed. This score was used to rank the gene

    transcripts. To determine the minimum number of gene transcripts used to construct the

    signature, combinations of gene expression markers were tested by adding one gene at a

    time according to the rank order. For each signature with increasing number of genes,

    Receiver Operating Characteristic (ROC) analysis using death within 3 years as the

    defining point was performed, in 100 5-fold cross validations, to calculate the average

    area under the curve (AUC). The optimal number of gene transcripts was determined

    when the average AUC reached plateau.

    A maximum 3 years follow-up was employed since the majority of patients who

    will relapse in this population will do so within 3 years (4). Also many of these patients

  • 7/31/2019 CAN 8-1-06 Raponi

    5/17

    were aged and death due to non-cancer related illnesses would likely increase after 3

    years (4). This rationale was also employed when performing Cox modeling.

    Cox score was used to determine each patients risk of survival. For lung SCC, it

    was defined as the sum of the linear combination of weighted log2 expression intensity

    with the average standardized Cox regression coefficient from 400 bootstrapping samples

    as the weight. For lung adenocarcinoma, the Cox score was defined in the previous study

    (2). The threshold for risk stratification was determined in another 100 5-fold cross

    validations to test a series of cutoffs (percentile of risk index for the patients in the

    training set) and the optimal cutoff was chosen to give the minimum overall error rate.

    Because the Cox scores were from the two tumor types (i.e. adenocarcinoma and SCC),

    they were normalized to have the their corresponding thresholds set at zero and to have

    the same variance in order to combine them together to create the final score for each

    patient. Patients whose scores were equal to or greater than 0 were classified in the high

    risk of survival group while patients whose scores were less than 0 were predicted as the

    low risk of survival group. The gene expression signature and the cutoff were validated in

    the independent testing set (2). Kaplan-Meier survival plots and log-rank tests were used

    to assess the differences in survival of the predicted high and low risk groups. Patients

    were censored from statistical analysis if they were alive but had less than three years of

    clinical follow up. Sensitivity was defined as the percent of the patients who died within

    3 years that were predicted correctly by the gene expression signature, and specificity

    was defined as the percent of the patients who survived for at least 3 years that were

    predicted correctly by the classifier. All the statistical analyses were performed using S-

    Plus 6 software (Insightful, VA).

  • 7/31/2019 CAN 8-1-06 Raponi

    6/17

    Hierarchical clustering was performed with GeneSpring 7.0 (Silicon Genetics) to

    identify major clusters of patients and to investigate their association with patient co-

    variates. Prior to clustering, genes that had a coefficient of variation (CV) smaller than

    0.3 (arbitrarily chosen) were removed to reduce the impact of genes that displayed

    minimal change in expression across the dataset. Thus a dataset with 11,101 gene

    transcripts was created for this clustering analysis. The signal intensity of each gene

    transcript was divided by the median expression level of that gene from all patients.

    Samples were clustered using Pearson correlation as measurement of similarity. Genes

    were clustered in the same way.

    Pathway analysis was performed by first mapping the gene transcripts on the

    Affymetrix U133A GeneChip to the Biological Process categories of Gene Ontology

    (GO). The categories that had at least 10 genes on the U133A chip were used for

    subsequent pathway analyses. Genes that were selected from data analysis were mapped

    to the GO Biological Process categories. Then the hypergeometric distribution

    probability of the genes was calculated for each category. A category that had a p-value

    less than 0.05 and contained at least two genes was considered over-represented in the

    selected gene list.

  • 7/31/2019 CAN 8-1-06 Raponi

    7/17

    Supplementary Table 1. Patient sample data.

  • 7/31/2019 CAN 8-1-06 Raponi

    8/17

    Supplementary Table 2. 121 probe sets differentially expressed between two major SCC

    clusters.

  • 7/31/2019 CAN 8-1-06 Raponi

    9/17

    Supplementary Table 3. Significantly represented Gene Ontology groups between SCC

    subgroups.

    GO ID Gene Count GO Class Gene Number on U133A p-value8544 17 epidermal differentiation 56 7.31E-12

    6325 3 chromatin architecture 12 2.75E-04

    7586 3 digestion 15 7.08E-04

    7156 4 homophilic cell adhesion 39 0.0049

    7148 3 cell shape and cell size control 28 0.0079

    7565 3 pregnancy 28 0.0079

    165 2 MAPKKK cascade 15 0.0082

    6805 2 xenobiotic metabolism 15 0.0082

    7169 3 receptor tyrosine kinase signaling 41 0.0293

    6832 2 small molecule transport 29 0.0493

  • 7/31/2019 CAN 8-1-06 Raponi

    10/17

    Supplementary Table 4. 50 lung SCC prognostic probe sets.

  • 7/31/2019 CAN 8-1-06 Raponi

    11/17

    Supplementary Table 5. 47 of 50 lung adenocarcinoma prognostic probe sets that were

    used in this study.

  • 7/31/2019 CAN 8-1-06 Raponi

    12/17

    Supplementary Figure 1. Overall survival of patients with SCC lung cancer as stratified

    by stage.

    Stage I

    Stage II

    Stage III

    months

    0 10 20 30

    0.0

    0

    .2

    0.4

    0.6

    0.8

    1.0

    P= 0.027

  • 7/31/2019 CAN 8-1-06 Raponi

    13/17

    Supplementary Figure 2. Validation of Affymetrix gene expression using Taqman RT-

    PCR of 4 genes. NTRK2, FGFR2, KRT13 were randomly chosen among the 121 genes

    that significantly differentiate the unsupervised clusters of lung SCC cancers.

    NTRK2 - Affy vs TaqMan FGFR2 - Affy vs TaqMan

    0

    0.5

    1

    1.5

    2

    2.5

    3

    3.5

    4

    4.5

    0 5 10 15Taqman dCT (FGFR2-ACTB)

    Affylogexpression

    0

    0.5

    1

    1.5

    2

    2.5

    3

    3.5

    4

    4.5

    -5 0 5 10 15Taqman dCT (NTRK2-ACTB)

    Affylog

    expression

    0

    0.5

    1

    1.5

    2

    2.5

    3

    3.5

    4

    4.5

    5

    -10 -5 0 5 10 15

    Taqman dCT (KRT13-ACTB)

    Affy

    log

    expression

    KRT13 - Affy vs TaqMan

    2.52.7

    2.93.13.33.53.73.94.1

    0 5 10

    VEGF - Affy vs TaqMan

    Taqman dCT (VEGF-ACTB)

    Affy

    logexpression

    R = 0.96 R = 0.83

    R = 0.71 R = 0.92

  • 7/31/2019 CAN 8-1-06 Raponi

    14/17

    Supplementary Figure 3. IHC of select proteins using SCC lung tissue microarrays.

    FGFR2

    High

    low

    Expression

    KRT19 KRT17

    KRT5

    High

    low

  • 7/31/2019 CAN 8-1-06 Raponi

    15/17

    Supplementary Figure 4. Identification of a cutoff from the percentile of the risk index inthe training set. Panel A shows the results for the SCC training set and Panel B shows the

    results for the adenocarcinoma training set.

    0.0 0.2 0.4 0.6 0.8 1.0

    0.4

    4

    0.4

    6

    0.4

    8

    0.5

    0

    0.52

    0.5

    4

    0.5

    6

    percentile of scores

    meanof5-foldcross-valid

    ationerrorrate

    0.0 0.2 0.40.0 0.2 0.4 0.6 0.8 1.0

    0.4

    4

    0.4

    6

    0.4

    8

    0.5

    0

    0.52

    0.5

    4

    0.5

    6

    percentile of scores

    meanof5-foldcross-valid

    ationerrorrate

    A

    0.0 0.2 0.4 0.6 0.8 1.0

    0.4

    5

    0.5

    0

    0.5

    5

    percentile of scores

    meanof5-foldcross-valid

    ationerrorrate

    0.0 0.2 0.40.0 0.2 0.4 0.6 0.8 1.0

    0.4

    5

    0.5

    0

    0.5

    5

    percentile of scores

    meanof5-foldcross-valid

    ationerrorrate

    B

  • 7/31/2019 CAN 8-1-06 Raponi

    16/17

    Supplementary Figure 5. Performance of 50-gene adenocarcinoma classifier in Duke samples. Panel A sh

    adenocarcinoma samples from Duke University. Panel B shows the Kaplan-Meier analysis using a

    training set (1). Panel C shows ROC analysis in 27 stage I samples only and Panel D shows the associate

    A B

    C D

    1-specificity

    sensitivity

    0.0 0.2 0.4 0.6 0.8 1.0

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    AUC = 0.83

    Overa

    llSurvival

    Years

    0 1 2 30.0

    0.2

    0.4

    0.6

    0.8

    1.0

    log rank P= 0.0008HR (95% CI): 8.33 (1.89-36.6)

    Good (n=21)

    Poor (n=15)

    1-specificity

    sensitivity

    0.0 0.2 0.4 0.6 0.8 1.0

    0.0

    0.2

    0.4

    0.6

    0.8

    1.0

    AUC = 0.85

    OverallSurvival

    Years

    0 1 2 30.0

    0.2

    0.4

    0.6

    0.8

    1.0

    log rank P= 0.0025HR (95% CI): 12.2 (1.55-95.7)

    Good (n=12)

    Poor (n=15)

  • 7/31/2019 CAN 8-1-06 Raponi

    17/17

    References

    1. Beer DG, Kardia SL, Huang CC, et al. Gene-expression profiles predict survivalof patients with lung adenocarcinoma. Nat Med 2002;8:816-824.

    2. Bild AH, Yao G, Chang JT, et al. Oncogenic pathway signatures in humancancers as a guide to targeted therapies. Nature 2006;439:353-357.3. Wang Y, Jatkoe T, Zhang Y, et al. Gene expression profiles and molecular

    markers to predict recurrence of Dukes' B colon cancer. J Clin Oncol

    2004;22:1564-1571.

    4. Kiernan PD, Sheridan MJ, Byrne WD, et al. Stage I non-small cell cancer of thelung results of surgical resection at Fairfax Hospital. Va Med Q 1993;120:146-

    149.