Supplementary Information
Supplementary Methods
Transcriptome microarray assay
The transcriptome profiles of 198 samples, including 165 triple-negative
breast cancer (TNBC) tissues (patients diagnosed from January 1, 2011 to
December 31, 2012) and 33 paired adjacent normal breast tissues, were
determined using the Affymetrix Human Transcriptome Array 2.0 (HTA 2.0)
GeneChips (Affymetrix, Santa Clara, CA, USA) (1, 2). According to the
standard Affymetrix protocol, biotinylated cDNA was prepared from 250 ng
total RNA using the Ambion® WT Expression Kit. Subsequently, 5.5 μg of
labeled cDNA was hybridized to the HTA 2.0 microarray. Following
hybridization and washing, the GeneChips were scanned with the GeneChip®
Scanner 3000 7G using Affymetrix® GeneChip Command Console (AGCC)
software. The Affymetrix Expression Console (version 1.2.1) implementation
of the Robust Multichip Analysis (RMA) algorithm was used for quantile
normalization and background correction. Combat Software was applied to
adjust the normalized intensity to remove batch effects.
Quantitative real-time PCR (qRT-PCR)
We detected the expression of candidate mRNAs and lncRNAs using
qRT-PCR in all the 275 patients and another cohort of 82 TNBC patients
received neo-adjuvant chemotherapy. cDNA was synthesized using the
PrimeScriptTM RT reagent kit (Takara Bio Inc., Otsu, Japan), and the SYBR®
Premix Ex TaqTM kit (Takara Bio Inc., Otsu, Japan) and ABI PRISM 7900HT
Sequence Detection System (Applied Biosystems, Foster City, CA, USA) were
used for qRT-PCR analysis. All experiments were conducted following the 1
standard protocol provided by the manufacturer. The results were normalized
to U6 expression.
Identification of mRNAs and lncRNAs for signature construction
The detailed filtration process of filtration is illustrated in Supplementary
Figure S1. The random variance model (RVM) (3) corrected t test was applied
to select RNAs that were differentially expressed between 33 pairs of breast
tumor tissues and adjacent normal tissues. We selected the differentially
expressed mRNAs (fold change >2 or <0.33 with a false discovery rate [FDR]
<0.001). The thresholds for differentially expressed lncRNAs were as follows:
fold change >1.5 and FDR <0.001. All differently expressed RNAs were
included in pool A (Supplementary Figure S2). By combining the RNA
expression data from microarray and follow-up data obtained from 165 TNBC
patients, we obtained a set of RNAs correlated with recurrence-free survival
(RFS) that we placed in pool B. For the selection of mRNAs, a P value of less
than 0.1 (log rank test) was determined to be significant, and duplicated
mRNAs were excluded. For the selection of lncRNAs, a P value of less than
0.2 (log rank test) was determined to be significant, as the correlations were
less significant than those observed with the mRNAs. Additionally, only
intergenic lncRNAs were included. We selected the overlapped mRNAs and
lncRNAs between pool A and pool B. These RNAs were both tumor-specific
and correlated with RFS. For each of the selected RNAs, we designed three
different pairs of primers and performed qRT-PCR analysis on the 33 paired
TNBC and normal tissues mentioned previously. If all three primers for a given
RNA did not amplify successfully (i.e., the CT values were undetermined), we
designed three different primers. The qRT-PCR results were analyzed using a
2
paired t test to validate the differential expression patterns, and P<0.1 was set
as significant. RNAs that failed to amplify with all six pairs of primers or that
showed expression patterns that were not concordant between the
transcriptome and qRT-PCR analyses were excluded. After this filtration step,
13 mRNAs and 6 lncRNAs were included. We performed qRT-PCR on all 137
samples in the training set to amplify the 19 RNAs. If there was discordance
between the expression trend of a given RNA and the results of survival
analysis, the RNA was excluded. After this exclusion step, the remaining
seven mRNAs and four lncRNAs were added to the prognostic signature one
by one until the model reached the highest area under curve (AUC) according
to the time-dependent receiver operating characteristic (ROC) curve (4-6).
Finally, three mRNAs and two lncRNAs were included in the final signature.
Continuing adding RNAs into the signature would only diminish the
signature’s performance (data not shown).
Development and validation of the integrated mRNA-lncRNA signature
based on qRT-PCR data
The detailed process of study design, patient selection and analytical
strategy is illustrated in Supplementary Figure S2. In addition to the 165
TNBC patients who provided samples for microarray experiments, we further
recruited another 110 TNBC patients who were diagnosed from January 1,
2010 to December 31, 2010 and randomly classified all 275 TNBC patients
into the training set (137 patients) and validation set (138 patients). All of the
275 TNBC tumor samples were tested using qRT-PCR, as stated before. We
selected an optimum cutoff score for the relative expression of each RNA
using X-tile plots (X-tile software version 3.6.1, Yale University School of
3
Medicine, New Haven, CT, USA) based on the association with patient RFS in
the training set (5, 7) (Supplementary Table S3). Cox proportional hazard
regression modeling was applied to analyze correlation between RNA
expression and RFS. The regression coefficients of each of the RNA were
used to construct a recurrence score formula (5, 6, 8). The optimum cutoff for
the model was determined by the ROC curve by using the Youden Index (9,
10). The integrated mRNA-lncRNA signature was validated in a validation set
and a neoadjuvant cohort using the same coefficients derived from the
training cohort.
RNA interference
Each target sequence was designed using BLOCK-iT RNAi Designer (Life
Technologies, Wilmington, DE, USA) and filtered using NCBI BLAST to
reduce off-target effect. All the siRNA oligonucleotides used in the study were
synthesised by GenePharma Co. Ltd. (Shanghai). For cell ability-based siRNA
screening, the reverse transfection in 96-well plate was performed as follows:
Briefly, for each well in 96-well plate, 0.3 l Lipofectamine RNAiMAX was
dissolved in 25 l Opti-MEM medium and 7.5 pmol siRNA duplex in other 25
l Opti-MEM medium. Then the two reagents were mixed, incubated for 15
min and added to each well, together with 3×103 cells suspensed in 100 l
antibiotic-free growth medium. The transfection medium was changed with
fresh growth medium for 12 h post-transfection, and the cells were incubated
with another 72 h before cell viability detection by CCK-8 (Dojindo
Laboratories, Kumamoto, Japan). The sequence for siRNA negative control
was: 5-UUCUCCGAACGUGUCACGU-3. All raw data were collected at 450
nm wave length.
4
Measurement of cell proliferation
Cells transfected with siRNA (5 × 103 per well) were seeded in 96-well
plates. Indicated concentrations of paclitaxel were added into the wells after 6
h from seeding and incubation for the next 48 h, while in control group
paclitaxel was replaced with PBS. Cell viability was assessed by analyzing the
metabolic reduction of WST-8 (CCK-8 cell proliferation assay), as described
previously (12). A 6 parameter nonlinear regression was used to calculate IC50
values using Sigma Plot 2001 software (Systat Software, Chicago, IL, USA).
Cell invasion assay
The Boyden chamber invasion assay was used to assess the invasion
ability. 800 μl of medium (containing 0.1% bovine serum albumin) and cells
were added to the lower and upper compartment of the chamber, respectively.
After incubating for 24 h, non-migrated cells were removed, and cells that had
migrated through the Matrigel filter (BD Biosciences, Franklin Lakes, NJ,
USA) were counted.
Cell cycle arrest assay
For cycle assay, 2 × 105 cells per well were seeded in 6-well plates and
transfected with siRNA. After 48 h transfection, cells were treated with 5 nM
paclitaxel for 16 h. For cycle arrest assay, cells were stained with propidium
iodide and tested using flow cytometry according to the standard protocol
(13).
Co-expression analysis of lncRNAs and mRNAs
To identify interactions between mRNAs and lncRNAs, we constructed co-
expression networks. We pre-processed the data using the median
expression value of all transcripts and then screened for differentially
5
expressed lncRNAs and mRNAs. For each pair of genes analyzed, we
calculated the Pearson correlation.
Gene Ontology (GO) and pathway analysis
GO analysis was applied to analyze the main function of genes co-
expressed with lncRNAs according to the GO database, which is the key
functional classification of the National Center for Biotechnology Information
(NCBI). The analysis can organize genes into hierarchical categories and
uncover the gene regulatory network based on biological process and
molecular function. Meanwhile, pathway analysis was used to determine the
significant pathways of the differential genes according to the Kyoto
Encyclopedia of Genes and Genomes database (KEGG). The Pearson Chi-
square test and Fisher’s exact test were used to select the significant
pathway.
6
Supplementary Tables
Supplementary Table S1. Selected mRNAs and lncRNAs by comparing the expression profiles in 33 paired tumor and adjacent
normal tissues of triple-negative breast cancer.
Symbol ProbeSet Category Tumor/normal
Fold change(tumor/normal)
Fold changeP value FDR Log rank
P value
CHRDL1 TC0X001278.hg.1 mRNA down-regulation 0.32 4.70E-06 2.24E-05 7.59E-04
FCGR1A TC01001172.hg.1 mRNA up-regulation 2.17 2.00E-07 1.73E-06 0.0926
RSAD2 TC02000034.hg.1 mRNA up-regulation 2.19 1.34E-04 3.93E-04 0.0456HIF1A-
AS2 TC14002040.hg.1 lncRNA up-regulation 1.65 < 1E-07 < 1E-07 0.0783
AK124454a TC19002388.hg.1 lncRNA up-regulation 2.01 1.55E-05 7.17E-05 0.1681
aGene symbol are not available for the noncoding RNA, thus GenBank accession number was used to mark the RNA.Abbreviations: FDR, false discovery rate.
7
Supplementary Table S2. Primers for the real-time RT-PCR analysis of
mRNAs and lncRNAs included in the signature.
Symbol Category
Primer
FCGR1A-ForwardFCGR1A-Reverse mRNA
TGGTGAATACAGGTGCCAGACCGTGAAGACTCTGCTGGA
RSAD2-ForwardRSAD2-Reverse mRNA
AGCATCGTGAGCAATGGAAGCGGCCAATAAGGACATTGAC
CHRDL1-ForwardCHRDL1-Reverse mRNA
ACAAGAAGTACAGAGTGGGTGAGGGCAGCACAGATGAGGAAT
HIF1A-AS2-ForwardHIF1A-AS2-Reverse lncRNA
CAACATACATTAAGGTGATGGCAGCTTCAACACCTCCAACTCA
AK124454-ForwardAK124454-Reverse lncRNA
TGTCTCTGCAGTCTCTTAAGCAGGGACAGCATGCACTTTGTT
U6-ForwardU6-Reverse
Internal control
CTCGCTTCGGCAGCACAAACGCTTCACGAATTTGCGT
8
Supplementary Table S3. Cutoff value for each RNA.
CHRDL3 FCGR1A RSAD2 HIF1A-AS2 AK124454Δcta 3.74 6.93 4.85 3.54 11.62
a Δct value (U6 expression as reference) was used to relatively represent each
RNAs expression level. Cutoff values for the expression of RNAs were
decided by the X-tile software based on the association with the patients’
relapse free survival. For each patient, RNA expression level were marked as
high expression if the Δct less than the cutoff value and vice versa. In the risk
calculating formula, high expression status equals 1 while low expression
status equals 0.
9
Supplementary Table S4. Univariate Cox proportional hazards regression
analysis of the integrated RNA signature and clinicopathological
characteristics with recurrence-free survival.
Training Set
HR (95%CI) P
Validation Set
HR (95%CI) P
Age(≤50y vs >50y) 0.33 (0.14-0.77) 0.011 0.73 (0.31-1.72) 0.467
Menopause(No vs Yes) 0.52 (0.22-1.20) 0.125 0.58 (0.24-1.37) 0.213
Tumor grade(≤II vs >II) 0.88 (0.36-2.16) 0.787 2.53 (0.73-8.76) 0.144
Tumor size(≤2cm vs >2cm) 1.46 (0.56-3.81) 0.436 3.01 (0.89-10.24) 0.077
Positive LNs(≤3 vs >3) 5.69 (2.33-13.90) <0.001 3.61 (1.50-8.71) 0.004
Ki67(≤20% vs >20%) 2.37 (0.76-7.43) 0.139 1.12 (0.445-2.81) 0.812
Radiotherapy(no vs yes) 5.40 (1.98-14.76) 0.001 1.74 (0.71-4.24) 0.222
Chemotherapy(non-taxane vs taxane) 1.00 (0.42-2.35) 0.997 0.97 (0.38-2.43) 0.940
Integrated RNA signature (low risk vs. high risk) 2.45 (1.29-4.65) 0.006 2.97 (1.52-5.79) 0.001
Abbreviations: CI, confidence interval; HR, hazard ratio; LN, lymph node.
10
Supplementary Table S5. Clinicopathologic characteristics of patients with
triple-negative breast cancer who received neoadjuvant chemotherapy.
NCT set (n=82)Characteristics No. Low risk (%) High risk (%)Age (y)
MedianIQR
4842-52
5048-52
4641-52
≤50 51 (62.2) 27 (56.2) 24 (70.6) >50 31 (37.8) 21 (43.8) 10 (29.4)Menopausal status Premenopausal 52 (63.4) 25 (52.1) 27 (79.4) Postmenopausal 30 (36.6) 23 (47.9) 7 (20.6)Pre-NCT tumor size (cm)
≤2 2 (2.4) 2 (4.2) 0 (0) >2, ≤5 26 (31.7) 17 (35.4) 9 (26.5)
>5 54 (65.9) 29 (60.4) 25 (73.5)Pre-NCT tumor grade I-II 32 (39.0) 20 (41.7) 12 (35.3) III 50 (61.0) 28 (58.3) 22 (64.7)Pre-NCT LN status Negative
Positive19 (23.2)63 (76.8)
13 (27.1)35 (72.9)
6 (17.6)28 (82.4)
Pathologic response pCR 29 (35.4) 23 (47.9) 6 (17.6) Non-pCR 53 (64.6) 25 (52.1) 28 (82.4)Follow-up time (mo)
MedianIQR
43.519.0-66.0
51.524.5-66.0
32.513.0-73.0
RFS event 28 11 17Abbreviations: IQR, interquartile range; LN, lymph node; NCT, neoadjuvant chemotherapy; pCR, pathological complete remission.
11
Supplementary Table S6. Multivariate logistic regression analysis of the
integrated RNA signature and clinicopathological characteristics with
pathological complete remission.
NCT set
Variablea ORb (95%CI) Pc
Age(≤50y vs >50y) 0.85 (0.31-2.39) 0.764
Menopausal status(pre vs post) 0.73 (0.26-2.07) 0.555
Pre-NCT tumor size(≤2cm vs >2cm) 0.54 (0.20-1.48) 0.227
Pre-NCT tumor grade(≤II vs >II) 0.71 (0.26-1.90) 0.494
Pre-NCT LN status(negative vs positive) 0.76 (0.24-2.34) 0.626
Integrated RNA signature(low risk vs high risk) 0.23 (0.07-0.71) 0.011
Abbreviations: LN, lymph node; NCT, neoadjuvant chemotherapy; OR, odds ratio.aAdjusted by multivariate logistic regression models including age, menopausal status, pre-NCT tumor size, pre-NCT tumor grade,pre-NCT LN status and integrated RNA signature.bOdds ratio for the likelihood of having pathological complete remission, for an increment in expression of one unit based on logistic regression.cP value for likelihood ratio test derived from probit regression.
12
Supplementary Table S7. Top ten mRNAs co-expressed with lncRNAs
AK124454 and HIF1A-AS2.
AK124454 HIF1A-AS2mRNA Coefficient mRNA CoefficientCHEK1 0.5928561 HIF1A 0.7392446
C11orf82 0.5818136 IL8 0.564712DEPDC1 0.5562664 ERO1L 0.5548231SPC25 0.5526421 CLEC5A 0.5244528XRCC2 0.5495506 PGK1 0.5846322DNA2 0.5482257 P4HA1 0.5783645KIF11 0.535331106 PLAUR 0.5484504FLRT2 -0.536526 PGAM1 0.5479989
GPR124 -0.545691 PLOD2 0.5102739MAF -0.555115 HLF -0.526427
13
Supplementary Table S8. Top ten gene-ontology terms and pathways in which the co-expressed mRNAs involved.
AK124454 HIF1A-AS2GO terms P Pathway P GO terms P Pathway P
mitotic prometaphase 5.87E-8 ECM-receptor
interaction 0.001 regulation of glycolysis 4.80E-6 Glycolysis /
Gluconeogenesis 4.86E-4
mitotic cell cycle 1.36E-6 Focal adhesion 0.045response to endoplasmic reticulum stress
7.41E-5 Biosynthesis of amino acids 0.001
M phase of mitotic cell cycle 1.78E-6 Mismatch repair 0.071 cellular response to
interleukin-1 9.91E-5 HIF-1 signaling pathway 0.001
cell adhesion 5.00E-6 Homologous recombination 0.086 gluconeogenesis 2.15E-4 Proteoglycans in
cancer 0.006
cell division 1.26E-5 DNA replication 0.110 glycolysis 2.25E-4 Metabolic pathways 0.011
mitotic anaphase 3.20E-5 PI3K-Akt signaling pathway 0.119 signal transduction 3.50E-4 Pathways in
cancer 0.012
DNA repair 2.45E-4 ABC transporters 0.134endoplasmic reticulum unfolded protein response
7.69E-4Glycine, serine and threonine metabolism
0.037
mitotic spindle organization 3.15E-4
Intestinal immune network for IgA production
0.155 cellular protein metabolic process 0.001 Bladder cancer 0.038
spindle organization 3.96E-4 p53 signaling
pathway 0.206vascular endothelial growth factor production
0.001 Lysine degradation 0.049
meiosis 0.003Complement and coagulation cascades
0.209 neural fold elevation formation 0.001 Malaria 0.049
14
Supplementary Figures
Supplementary Figure S1
Supplementary Figure S1. Flowchart of RNA filtration. Abbreviations: FDR, false discovery
rate; HTA, Human Transcriptome Array; TNBC, triple-negative breast cancer; qRT-PCR:
quantitative real-time PCR.15
Supplementary Figure S2
Supplementary Figure S2. Flowchart of study design, patient selection and analytical strategy. Abbreviations: HTA, Human
Transcriptome Array; RFS, recurrence-free survival; TNBC, triple-negative breast cancer.
16
Supplementary Figure S3
Supplementary Figure S3. Hierarchical clustering of 33 paired tumor and adjacent normal
breast tissues with the 5 differentially expressed RNAs using Euclidean distance and average
linkage clustering. Every row represents an individual mRNA/lncRNA, and each column
represents an individual sample. Pseudocolors indicate transcript levels from low to high on a
log 2 scale from -2 to 2, ranging from a low association strength (dark, black) to high (bright,
red, or green).
17
Supplementary Figure S4
Supplementary Figure S4. Validation of the expression of each mRNA/lncRNA incorporated in
the integrated signature in the 33 paired tumor and adjacent normal tissues from the training set
using quantitative real-time polymerase chain reaction (qRT-PCR). Expression of these mRNAs
and lncRNAs measured by qRT-PCR was notably different between tumor and non-cancer
breast tissues and were significantly correlated with their microarray data.
18
Supplementary Figure S5
Supplementary Figure S5. Predictive values of taxane benefit using the integrated signature.
(A) Estimates of recurrence-free survival (RFS) according to the scores calculated by the
integrated mRNA-lncRNA signature in the neoadjuvant chemotherapy cohort (n=82). (B) Time-
dependent ROC curves were plotted to assess the efficacy of the signature in predicting three-
year recurrence-free survival, with area under curve (AUC) reported.
19
References
1. Wang P, Xue Y, Han Y, Lin L, Wu C, Xu S, et al. The STAT3-binding long noncoding RNA lnc-DC controls human dendritic cell differentiation. Science 2014;344:310-3.2. Shan J, Balasubramanian MN, Donelan W, Fu L, Hayner J, Lopez MC, et al. A mitogen-activated protein kinase/extracellular signal-regulated kinase kinase (MEK)-dependent transcriptional program controls activation of the early growth response 1 (EGR1) gene during amino acid limitation. J Biol Chem 2014;289:24665-79.3. Wright GW, Simon RM. A random variance model for detection of differential gene expression in small microarray experiments. Bioinformatics 2003;19:2448-55.4. Heagerty PJ, Lumley T, Pepe MS. Time-dependent ROC curves for censored survival data and a diagnostic marker. Biometrics 2000;56:337-44.5. Zhang JX, Song W, Chen ZH, Wei JH, Liao YJ, Lei J, et al. Prognostic and predictive value of a microRNA signature in stage II colon cancer: a microRNA expression analysis. Lancet Oncol 2013;14:1295-306.6. Liu NQ, Stingl C, Look MP, Smid M, Braakman RB, De Marchi T, et al. Comparative proteome analysis revealing an 11-protein signature for aggressive triple-negative breast cancer. J Natl Cancer Inst 2014;106:djt376.7. Camp RL, Dolled-Filhart M, Rimm DL. X-tile: a new bio-informatics tool for biomarker assessment and outcome-based cut-point optimization. Clin Cancer Res 2004;10:7252-9.8. Liu N, Chen NY, Cui RX, Li WF, Li Y, Wei RR, et al. Prognostic value of a microRNA signature in nasopharyngeal carcinoma: a microRNA expression analysis. Lancet Oncol 2012;13:633-41.9. Nakayama T, Morita S, Takashima T, Kamigaki S, Yoshidome K, Ito T, et al. Phase I study of S-1 in combination with trastuzumab for HER2-positive metastatic breast cancer. Anticancer Res 2011;31:3035-9.10. Shimizu T, Hirano A, Kamimura M, Ogura K, Kim N, Watanabe O, et al. A phase II study of epirubicin and cyclophosphamide followed by weekly paclitaxel with or without trastuzumab as primary systemic therapy in locally advanced breast cancer. Anticancer Res 2010;30:4665-71.11. Jiang YZ, Yu KD, Peng WT, Di GH, Wu J, Liu GY, et al. Enriched variations in TEKT4 and breast cancer resistance to paclitaxel. Nat Commun 2014;5:3802.12. Zhang L, Wu H, Lu D, Li G, Sun C, Song H, et al. The costimulatory molecule B7-H4 promote tumor progression and cell proliferation through translocating into nucleus. Oncogene 2013;32:5347-58.13. Krishan A. Rapid flow cytofluorometric analysis of mammalian cell cycle by propidium iodide staining. The Journal of Cell Biology 1975;66:188-93.
20
Top Related