Genomic characteristics of triple-negative breast cancer ... · 11/8/2019 · 1 Original Article ....
Transcript of Genomic characteristics of triple-negative breast cancer ... · 11/8/2019 · 1 Original Article ....
1
Original Article
Genomic characteristics of triple-negative breast cancer nominate
molecular subtypes that predict chemotherapy response
Jihyun Kim1†
, Doyeong Yu1†
, Youngmee Kwon2, Keun Seok Lee
2, Sung Hoon Sim
2,3,
Sun-Young Kong3,4
, Eun Sook Lee2,4
, In Hae Park2,3
*, and Charny Park1*
1 Bioinformatics Analysis Team, Research Institute, National Cancer Center, 323
Ilsan-ro, Ilsandong-gu, Goyang 10408, Republic of Korea
2 Center for Breast Cancer Hospital, National Cancer Center, 323 Ilsan-ro, Ilsandong-
gu, Goyang 10408, Republic of Korea
3 Translational Cancer Research Branch, Division of Translational Science, National
Cancer Center, 323 Ilsan-ro, Ilsandong-gu, Goyang 10408, Republic of Korea
4 Graduate School for Cancer Science and Policy, National Cancer Center, 323 Ilsan-
ro, Ilsandong-gu, Goyang 10408, Republic of Korea
† These authors contributed equally to this work as first authors.
* To whom correspondence should be addressed.
Running Title: TNBC Subtypes and Chemotherapy Response
Keywords: triple-negative breast cancer (TNBC), consensus molecular subtype,
drug-response prediction, cisplatin resistance, prognosis
Corresponding authors:
Charny Park, 323 Ilsanro Ilsandonggu Goyangsi Gyeonggido 10408 Korea.
Research. on November 24, 2020. © 2019 American Association for Cancermcr.aacrjournals.org Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. Author Manuscript Published OnlineFirst on November 8, 2019; DOI: 10.1158/1541-7786.MCR-19-0453
2
Phone: +82-031-920-2581; E-mail: [email protected]; and In Hae Park; E-mail:
Disclosure of Potential Conflicts of Interest:
The authors declare no conflicts of interest.
Grant Support:
This research was supported by the National Cancer Center Grant (NCC-1611150,
NCC-1910100); the National Research Foundation of Korea (NRF) (NRF-
2018R1C1B6001475).
Authors’ contributions:
Conception, design and study supervision: IHP, CP
Acquisition of patient data (acquired and managed patients): IHP, YMK, KSL, SHS,
SYG, ESL
Computational analysis: JK, DYY, CP
Writing, review and/or revision of the manuscript: JK, IHP, CP
Word count: 5858
Total number of figures and tables: 5
Research. on November 24, 2020. © 2019 American Association for Cancermcr.aacrjournals.org Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. Author Manuscript Published OnlineFirst on November 8, 2019; DOI: 10.1158/1541-7786.MCR-19-0453
3
Abstract
The heterogeneity of triple-negative breast cancer (TNBC) poses difficulties for
suitable treatment and leads to poor outcome. This study aimed to define a consensus
molecular subtype (CMS) of TNBC and thus elucidate genomic characteristics and
relevant therapy. We integrated the expression profiles of 957 TNBC samples from
published datasets. We identified genomic characteristics of subtype by exploring the
pathway activity, microenvironment, and clinical relevance. Additionally, drug
response (DR) scores (n = 181) were computationally investigated using chemical
perturbation gene signatures and validated in our own TNBC patient (n = 38) who
received chemotherapy and organoid biobank data (n = 64). Subsequently,
cooperative functions with drugs were also explored. Finally, we classified TNBC
into four CMSs: stem-like; mesenchymal-like; immunomodulatory; luminal-androgen
receptor (AR). CMSs also elucidated distinct tumor-associated microenvironment and
pathway activities. Furthermore, we discovered metastasis-promoting genes, such as
secreted phosphoprotein 1 by comparing with primary. Computational DR scores
associated with CMS revealed drug candidates (n = 18) and it was successfully
evaluated in cisplatin response of both patients and organoids. Our CMS recapitulated
in-depth functional and cellular heterogeneity encompassing primary and metastatic
TNBC. We suggest DR scores to predict CMS-specific drug responses and to be
successfully validated. Finally, our approach systemically proposes a relevant
therapeutic prediction model as well as prognostic markers for TNBC.
Implications: We delineated the genomic characteristic and computational drug
responses prediction for TNBC CMS from gene expression profile. Our systematic
approach provides diagnostic markers for subtype and metastasis verified by machine-
learning and novel therapeutic candidates for TNBC patients.
Research. on November 24, 2020. © 2019 American Association for Cancermcr.aacrjournals.org Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. Author Manuscript Published OnlineFirst on November 8, 2019; DOI: 10.1158/1541-7786.MCR-19-0453
4
Introduction
Triple-negative breast cancer (TNBC) is known to have poor prognosis and the
tendency of early metastasis (1). Although TNBC shows a higher response rate to
standard chemotherapy than hormone receptor positive breast cancer, it has poor
prognosis due to early development of multi-drug resistance (2). Recently, targeted
therapy based on genomic characteristics was introduced with promising results. For
example, DNA-damaging agents like platinum agent or poly (ADP-ribose)
polymerase-1 inhibitor showed excellent efficacy in BRCA1/2 mutation carriers
among heterogeneous TNBC groups (3). In addition, immune-modulating agent, anti-
programmed death-ligand 1 (PD-L1) antibody, and atezolizumab demonstrated
outstanding results in metastatic TNBC patients whose tumors had higher PD-L1
expression (4). To find the appropriate drug for each TNBC patient, a thorough
understanding of the heterogeneity of TNBC and robust subtype identification are
prerequisites.
Previous TNBC subtype identification studies have proposed various classification
and subtypes of size 3 to 6. Lehmann’s classification has advanced to specify the
heterogeneity and clinically evaluated in several independent cohorts (5–7). Further
studies proposed different-size subtypes of 3 or 4 based on next-generation
sequencing using a more robust algorithm (8,9). However, owing to restricted sample
size (n < 200) originating from a single platform or insufficient algorithm
optimization to refer diagnostic gene set, a robust TNBC subtype classification has
not be established in the absence of multiple validations. Additionally, previous
classification was restricted to only primary tumors despite poor outcomes due to
metastatic tumor development.
Research. on November 24, 2020. © 2019 American Association for Cancermcr.aacrjournals.org Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. Author Manuscript Published OnlineFirst on November 8, 2019; DOI: 10.1158/1541-7786.MCR-19-0453
5
As an additional important factor of tumor heterogeneity, the tumor-associated
microenvironment (TME) confers genetic diversity to tumor bulk cells. Cellular
characteristics are decided by tumorigenic origin (10). As proven in a single-cell
platform, the TME cell population cooperates with the chemotherapy response (11,12).
Cell abundance investigation to induce transitional diversity is unrestricted to a
single-cell platform. Distinct cell abundance identification could be estimated from
bulk tumor sample gene expression profiles using cell deconvolution technologies
(13,14). Expanded cellular investigation in terms of TME could reveal the origin of
functional heterogeneity and specify therapeutic targets.
Due to the complex interactions between heterogeneous tumor cells and TME, there
have been a few successes in attempts to search novel drug targets computationally in
TNBC. To reveal the mode of action and to identify novel drug-target relationships,
pharmacogenomics approaches utilizing gene expression or mutation profiles have
been suggested (15,16). However, most computational methods were performed using
genetic profiles of cell lines, and validation was insufficient to follow actual patients’
responses. A systematic approach interconnected with molecular subtype diagnosis
and therapeutic strategy is required to find effective treatment.
In the current study, utilizing gene expression profiles, we established various
classification and prediction models. Machine-learning approaches referring related
gene expression signature classified two types: a consensus molecular subtype (CMS)
by SVM and primary-metastasis by random-forest model. In-depth investigation using
known gene signatures recapitulated genomic characteristics and clinical relevance of
CMS and metastasis. We also carried out TME and mammary gland cells by TNBC
bulk cell deconvolution and revealed the cellular heterogeneity of CMS. In additional
study, we established drug response prediction model to compute DR score based on
Research. on November 24, 2020. © 2019 American Association for Cancermcr.aacrjournals.org Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. Author Manuscript Published OnlineFirst on November 8, 2019; DOI: 10.1158/1541-7786.MCR-19-0453
6
chemical perturbation gene signatures. We identified subtype-related drug responses
and pathway interventions. The drug response predictions were validated in actual
patient’s cisplatin response and matched gene expression profile as well as TNBC
organoid (17).
Materials and Methods
CMS classification and functional investigation using gene expression profile
We collected 1,525 gene expression profiles from 14 high-throughput gene expression
datasets related to metastasis and triple-negative breast cancer from the Gene
Expression Omnibus (GEO) (Supplementary Table S1). To eliminate the batch effect
from the datasets, meta-analysis of the gene expression profiles was performed using
ComBat (18). Next, we confirmed the removal of the batch effect of the expression
profiles in each dataset using a principal component analysis (PCA) plot
(Supplementary Figure S1A). We removed samples considered as positive for each
estrogen receptor (ER+), progesterone receptor (PR
+) and human epidermal growth
factor receptor 2 (HER2+) followed by Lehmann study’s filtering strategy based on
bimodal distribution (5). Additionally, immunohistochemistry (IHC)-based diagnosis
profiles for ER+, PR
+ and HER2
+ were considered for judicious sample elimination.
We exceptionally considered metastasis cell isolation from primary breast pleural
cells as metastasis sample (Supplementary Table S1). Finally, one aggregated
expression profile was generated including 601 primary and 356 metastatic samples.
We clustered samples using the non-negative matrix factorization method, and the
number of clusters was verified by silhouette width for each subtype (Supplementary
Figure S1B). Next, we established a CMS classifier based on a support vector
machine (SVM) using the expression profile of 1227 genes (information gain ≥ 0.1;
Research. on November 24, 2020. © 2019 American Association for Cancermcr.aacrjournals.org Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. Author Manuscript Published OnlineFirst on November 8, 2019; DOI: 10.1158/1541-7786.MCR-19-0453
7
Supplementary Table S2) as a training dataset. The classification performance of the
SVM was investigated by 5-fold cross-validation with 1000 iterations. Accuracy,
sensitivity, and specificity were calculated for each subtype. We compared our TNBC
CMS with the Lehmann and Burstein expression profiles. Additionally, 10-year meta-
free survival (MFS) and overall survival (OS) differences in CMS were investigated
using Cox regression.
Matched clinical information is based on GEO. Sample counts after filtering out ER,
PR, and HER2 positive samples are listed under primary and metastasis. A part of the
samples include long-distance metastasis described in tissue type. In total, 601
primary and 356 metastasis samples were chosen for functional analysis.
Pathway activity scores were derived from gene expression profiles using Z-score
transformation after gene set variation analysis (GSVA) enrichment scoring by
referring to Hallmark gene sets from MSigDB or additional gene signatures (19–21).
Key pathways inducing distinct CMS characteristics were chosen by limma test (P <
1e-05). To verify functionality for each CMS, we collected seven representative gene
signatures to extract subtype characteristics, namely epithelial-to-mesenchymal
transition (EMT), stromal, immune, TME, stemness, chromosomal instability (CIN),
and hormone. Next, enrichment scores were estimated using GSVA. The EMT score
was defined from normalized average expression values using a previously studied
EMT gene set (22). The abundance of stromal and immune cells was estimated by the
ESTIMATE tool (23). To perform cross-validation of immune infiltration, the TME
score was calculated by xCell (14). The stemness score was determined from The
Cancer Genome Atlas (TCGA) pan-cancer stemness index (24). Additionally, we
referred to embryonic stem (ES) cell-like gene expression signatures of Nanog
homeobox (NANOG), octamer-binding transcription factor 4 (OCT4), SRY-related
Research. on November 24, 2020. © 2019 American Association for Cancermcr.aacrjournals.org Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. Author Manuscript Published OnlineFirst on November 8, 2019; DOI: 10.1158/1541-7786.MCR-19-0453
8
HMG-box 2 (SOX2), and MYC, which are known to be overexpressed in poorly
differentiated tumors (25). The CIN score was derived from the CIN70 gene
expression signature (26). Hormone response was summarized from the average score
of the ER and androgen receptor (AR) response pathway scores. We compared the
seven representative gene signature scores of our CMS with the Lehmann and
Burstein classification. The difference in scores among subtypes was tested by the
Mann-Whitney U test.
To reveal the TME heterogeneity, we estimated cell abundance, known as the
deconvolution method, of non-immune stromal and immune cells. The abundance of
mammary gland cells was computed using MCPcounter using cell marker gene sets
revealed by a single-cell study (27,28). Additionally, the proportion of immune cells
was estimated using CIBERSORT (13).
To evaluate association between BRCAness and SL type, we classified TCGA BRCA
TNBC into our CMSs and called germline mutations from TCGA breast cancer
TNBC samples (total n=1101, TNBC n=120) and mutation calling pipeline followed
by GATK v3.3 germline short variant discovery pipeline (29). First, the mutations
were called using HaplotypeCaller, and filtered out low-quality variants by
VariantFiltration. To extract pathogenic and functional germline mutations, we
chosen BRCA1/2 variants registered as ‘Pathogenic’ or ‘Likely pathogenic’ in
ClinVar, INDEL or truncated mutations (30).
To infer the genetic characteristics of metastatic transition in our aggregated data, we
analyzed differentially expressed genes (DEGs) between primary tumor and
metastasis samples using limma (adjusted P < 0.001; Benjamini-Hochberg correction).
Next, significant pathways were inferred from the DEGs by gene set enrichment
analysis (GSEA) using ReactomFI (31). Finally, the DEGs based on the GSEA test
Research. on November 24, 2020. © 2019 American Association for Cancermcr.aacrjournals.org Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. Author Manuscript Published OnlineFirst on November 8, 2019; DOI: 10.1158/1541-7786.MCR-19-0453
9
(FDR < 0.05) were used to develop a gene-gene interaction network, visualized using
Cytoscape. In addition, we generated a metastasis prediction model based on these
prognostic markers. First, we tested for MFS and concordance index (CI) from DEGs
previously comparing primary with metastasis samples. Tests were performed using
the R package survcomp, and the identified gene set was defined as the metastasis
prediction gene set (32). Next, to test the metastasis prediction availability of gene set,
we established a random-forest machine-learning model to classify primary and
metastasis samples using the metastasis prediction gene set identified from MFS test
(P < 0.05). Finally, the random-forest model was evaluated by area under the curve
(AUC) metric using a 4-fold cross validation procedure. Pathways of metastasis
prediction gene set were also investigated using GSEA.
Computational drug-response prediction applying chemical perturbation gene
signature and cross-validation
To explore the relevant drug response status for each patient, we computed the drug
response (DR) scores in both aggregated data and biobank expression profiles (17).
DR scores were calculated by the GSVA method using the MSigDB chemical and
genetic perturbation signature gene set (17). Next, we selected drug signatures to
examine response differences among CMSs using analysis of variance (ANOVA). To
select reliable drug responses, we investigated drug names registered for clinical trials
(NIH ClinicalTrials.gov database), considering synonyms provided in Drugbank (33).
Meanwhile, paired Pearson correlation coefficients between pathway activities and
DR scores were calculated to explore the functional association for each drug. To
evaluate DR scores of aggregated data, we referred biobank expression profiles. First,
organoid samples were classified into four subtypes using our CMS classifier, and DR
Research. on November 24, 2020. © 2019 American Association for Cancermcr.aacrjournals.org Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. Author Manuscript Published OnlineFirst on November 8, 2019; DOI: 10.1158/1541-7786.MCR-19-0453
10
scores were also equally calculated. Subtype-specificity of biobank DR scores was
also tested by ANOVA.
ROADMAP data analysis related to cisplatin response
Patients and sample preparation: In 2016, we introduced the ROADMAP program
to implement genomic sequencing of tumors and blood coupled with clinical data in
the management of advanced refractory breast cancer. As of June 2018, 38 female
patients with metastatic TNBC were enrolled, and their genomic and clinical data
were obtained after receiving written informed consent. RNA from fresh frozen or
paraffin-embedded, formalin-fixed (FFPE) tissue samples were extracted using the
RNeasy mini kit (QIAGEN, Hilden, Germany) according to the manufacturer’s
protocol. Total cellular RNA (at least 100 ng) was purified from 1 g total RNA from
each sample, and cDNA was synthesized using SuperScript II Reverse Transcriptase
(Thermo Fisher Scientific Inc., Waltham, MA, USA). Sequencing libraries were
prepared using the TruSeq RNA Library preparation kit (Illumina, San Diego, CA,
USA) and sequenced using HiSeq 2000.
Gene expression profile processing: After quality trimming using Trimmomatic, we
aligned reads to hg19 using the Mapsplice2-like TCGA RNASeqV2 pipeline (34,35).
Next, we quantified the gene expression profile using RSEM v1.1.13 referring to
UCSC Known Genes (36). Patient subtypes were assigned by utilizing SVM machine-
learning model for CMS TNBC classification described in first method section. The
seven representative gene signature scores and DR scores were computed by equal
method with an aggregated data. Cox-regression survival analysis was performed for
examining 5-year treatment-free survival.
Research. on November 24, 2020. © 2019 American Association for Cancermcr.aacrjournals.org Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. Author Manuscript Published OnlineFirst on November 8, 2019; DOI: 10.1158/1541-7786.MCR-19-0453
11
Further investigation of cisplatin drug response: The reliability of DR scores was
confirmed from both ROADMAP patient therapeutic profiles and biobank. Of the
ROADMAP-enrolled patients, 31 patients had undergone cisplatin therapy, and the
response log was available. Biobank cell models derived from TNBC patients were
divided into cisplatin response and non-response according to drug high-throughput
screening (HTS) 50% inhibitory concentration (IC50) values. Cisplatin DR scores of
ROADMAP patients and biobank cells were tested using the Wilcoxon rank-sum test.
We also explored genes involved in cisplatin response by analyzing DEGs between
cisplatin-response and -resistance samples using the Wilcoxon rank-sum test (P <
0.05) in the ROADMAP dataset. To avoid the artifact effect from the FFPE status, we
eliminated genes that showed differences in expression between FFPE and fresh
frozen samples (Wilcoxon rank-sum test, P < 0.1). Finally, pathways involved in
cisplatin response were investigated using ReactomeFI GSEA (31). Additionally,
survival analysis was performed to identify drug-related prognosis markers based on
DEGs. The combined CI and hazard ratio (HR) were computed to get an overall score
from independent datasets including aggregated data and the ROADMAP dataset of
prognostic marker genes.
Results
CMS classification
The summary of our aggregated data composed of primary (n = 601) and metastatic (n
= 356) TNBC is shown in Table 1. We confirmed to eliminate the batch effect via
PCA plot (Supplementary Figure S1A) and eliminated samples that highly expressed
ER, PR, and HER2 using a bimodal filter on the gene expression distribution. The
Lehmann subtype supports the most stringent ER, PR, and HER2 filtering to eliminate
Research. on November 24, 2020. © 2019 American Association for Cancermcr.aacrjournals.org Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. Author Manuscript Published OnlineFirst on November 8, 2019; DOI: 10.1158/1541-7786.MCR-19-0453
12
15.3% of our aggregated data samples. On the other hand, 56% of Burstein samples to
pass IHC were eliminated based on mRNA expression.
Performance evaluation using 5-fold cross-validation confirmed the cluster size (n = 4)
via an average silhouette width of 0.57 of our subtypes, indicating remarkably better
performance than Lehmann’s (0.09) and Burstein’s (0.18) (Supplementary Figure
S1B–D). The IM subtype shows the lowest sensitivity than others (Supplementary
Figure S1B, C; 84%). However, its silhouette width (0.3) is relatively much higher
than Burstein’s (0.08) and Lehmann’s (0.06) classification. In independent signature-
based validation, our CMS showed the best performance overall (Supplementary
Figure S1E). IM subtype’s discriminative power by ESTIMATE immune scores
shrunk in both ours and previous studies than others, but our CMS relatively well
classified IM subtype (Supplementary Figure S1E; P-value ours=1.299e-08;
Lehmann=1.096e-16; Burstein=0.513).
In Lehmann’s classification, the mesenchymal stem-like (SL) type showed the highest
stromal infiltration, but the discriminative power using the EMT score was better in
our CMS (t-test test, metaCMS P = 2.86E-69, Lehmann P = 7.69E-31;
Supplementary Figure S1E). The genomic characteristics of mesenchymal type to
associate with stemness were ambiguous (Supplementary Figure S1E). Stemness was
diffused across the basal-like (BL) 1, BL2, and mesenchymal subtypes. Consistent
with a previous study, the average silhouette width of BL1 showed the worst result (5).
Burstein’s classification resemble with our result rather than Lehmann’s six subtypes,
except for the basal-like immune-activation type (P < 2.2e-16, chi-square test). When
investigating Burstein’s dataset (n = 198), the BL immune-activation (BLIA) type
considerably included stemness samples (18%; Supplementary Figure S1D, E). In
immune and TME signatures (Supplementary Figure S1E), our CMS was more
Research. on November 24, 2020. © 2019 American Association for Cancermcr.aacrjournals.org Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. Author Manuscript Published OnlineFirst on November 8, 2019; DOI: 10.1158/1541-7786.MCR-19-0453
13
discriminative rather than Burstein’s. Therefore, we finalized the TNBC CMS into the
stem-like (SL, 32.1%), mesenchymal-like (MSL, 24.6%), immunomodulatory (IM,
12.9%), and luminal AR (LAR, 30.5%) subtypes (Fig. 1A).
Genomic characteristics and microenvironment of CMSs
Key pathways activated in each subtype were identified from computational scores (P
< 0.1e-06). Interestingly, IM and LAR-associated pathways were also up-regulated in
the MSL subtype (Fig. 1A). EMT, transforming growth factor beta (TGF-β), and
TP53 signals were constitutively up-regulated restricted in the MSL. Our LAR
showed the highest ER and AR response, particularly involved in adipogenesis and
fatty acid metabolism. IFN-α, IFN-γ response, IL6-JAK-STAT3, IL2-STAT5, and
inflammatory responses are activated in IM (Fig. 1A). In addition, both representative
immune cell infiltration and TME signature show the highest score in IM (Fig. 1B).
This indicates that IL-JAK-STAT signaling is implicated in immune infiltration, as
identified in previous studies on TNBC (37). Tumor necrosis factor alpha (TNF-α)
and apoptosis were activated in both IM and MSL. The characteristics of ES cells
included a SL subtype, in which cell cycle, proliferation, and ES-associated pathways
were activated, such as mitosis, E2F, MYC, WNT-β catenin, and Notch signaling (Fig.
1A,B). As previously known (26,38) , cell cycle deficiency in the SL subtype induced
the highest CIN (Fig. 1B; P = 1.35e-120). Meanwhile, DNA repair and glycolysis
were enhanced across SL and LAR.
The TME for each CMS presented distinct features in both stromal and immune cells.
First, the IM was strongly immune-infiltrated and harbored memory B cells, CD8 T
cells, activated CD4 T cells, gamma delta T cells, and activated natural killer cells
(Supplementary Figure S2A; P < 1.0e-07), while MSL showed a high incidence of
Research. on November 24, 2020. © 2019 American Association for Cancermcr.aacrjournals.org Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. Author Manuscript Published OnlineFirst on November 8, 2019; DOI: 10.1158/1541-7786.MCR-19-0453
14
M2 macrophages, which was consistent with another study (9). Next, we explored the
prognostic effect by immune cells to determine MFS or OS. The results showed that
activation of CD8 T cells was associated with good prognosis (Supplementary Figure
S2B; MFS, P = 0.005; OS, P = 0.02). Among the known immunotherapy targets such
as CD20, CD52, PD-L1, CTLA4, CD52, and CTLA4 were significantly enhanced in
IM (Supplementary Figure S2C,D; ANOVA P < 0.001 and Tukey test). Another bulk
cell deconvolution revealed distinct characteristics of seven mammary gland cells
depending on CMS (Fig. 1B; P < 3.66e-17). Myoepithelial cells were enriched in the
SL, consistent with a previous finding (39), and we additionally discovered that
luminal progenitor cells were enhanced in SL. In MSL, adipocytes and matrix
metalloproteinases categorizing as stromal cells were highly enriched, followed by
luminal epithelial and myoepithelial cells. Interestingly, key pathways of adipogenesis
and fatty acid metabolism were activated across the MSL and LAR (Fig. 1A), but
adipocytes were enriched only in the MSL type. Additionally, we confirmed that
tumor-associated stromal cells were infiltrated in MSL from up-regulation of nine
markers (Supplementary Figure S3) (40,41). Luminal epithelial cells showed a strong
presence in the LAR, and blood cells undergoing the mitotic cell cycle process were
mainly activated in the IM (Fig. 1B). In further investigation of stem-like
characteristics for the SL subtype, we referred to previously known target gene sets of
ES cell transcription factors and polycomb-regulated genes (25). As already identified
in a previous study, target genes of transcription factors NANOG, OCT4, SOX2, and
MYC were clearly activated in SL. Additionally, we confirmed that known targets of
polycomb repressive complex 2 (PRC2) down-regulated in SL, PRC2 subunit 2
(SUZ12), PRC subunit (EED), and methylation at histone 3 lysine 27, are up-
regulated in the MSL subtype (Fig. 1B).
Research. on November 24, 2020. © 2019 American Association for Cancermcr.aacrjournals.org Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. Author Manuscript Published OnlineFirst on November 8, 2019; DOI: 10.1158/1541-7786.MCR-19-0453
15
To investigate which subtype is dysregulated by BRCA1/2 germline mutations
associated with BRCAness, we classified TCGA BRCA TNBC samples into our
CMS and called germline mutations. 54 cases in TCGA TNBC samples (n=120)
belonged to SL type. Finally, 8 SL type samples in total 16 pathogenic BRCA1/2
germline mutations were identified (Fisher’s test, P < 1.89e-07). SL key pathways,
G2M checkpoint, MYC, mitotic spindle and NOTCH activation in BRCA1/2 mutation
samples resemble those shown in Fig. 1A (Supplementary Figure S4; P < 2.13e-04).
To investigate prognosis, we examined 10-year survival rates based on OS and MFS
rates among CMSs. Survival test details are described in Supplementary Table S3. SL
clearly presented poor MFS (median MFS = 5.9 years) and showed 2.53 times the HR
(95% confidence interval (CI): 1.69 – 3.7; Fisher’s test, P = 5.40e-06) compared with
MSL, which had good prognosis (Fig. 1C). Moreover, high-grade patients were
enriched in the SL subtype (44% of grade 3, chi-squared test P = 5.10e-05). The LAR
subtype showed poor MFS as well as OS, consistent with the primary TNBC of the
Curtis 2012 dataset METABRIC project (Fig. 1C), and the LAR subtype patients also
rapidly relapsed in line with the SL subtype (median MFS = 4.9 years) and had 1.5
times the ratio of risk than the MSL group (95% CI: 1.03 – 2.37; Fisher’s test P =
0.03) (42). The MSL and IM groups presented relatively good prognosis in terms of
both MFS and OS.
Comparison between primary and metastatic TNBC
We investigated the unique features between primary and metastatic TNBC.
Metastasis samples (37 %; n = 356) were originated from primary TNBC and derived
from several tissues, including skin, liver, lung, and brain (Table 1). When clustering
all samples, key pathways shared with primary TNBC also discriminated metastasis
Research. on November 24, 2020. © 2019 American Association for Cancermcr.aacrjournals.org Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. Author Manuscript Published OnlineFirst on November 8, 2019; DOI: 10.1158/1541-7786.MCR-19-0453
16
into four subtypes (Fig. 1A). Proportions of SL and LAR subtypes were increased in
metastasis (SL 35%, LAR 35%) compared with primary TNBC (SL 30%, LAR 28%),
and the remaining two subtypes in metastasis were relatively decreased. Meanwhile,
tissue types of metastatic samples were independent with CMS confirmed by Chi-
square test.
To characterize the functional difference between primary cases and metastasis, we
first found 734 DEGs (adjusted P < 0.001) and revealed the biological pathways that
were enriched in the DEGs using ReactomeFI (Methods; Fig. 2). These genes were
associated with complement and coagulation, IL6-mediated signaling, toll-like
receptors, TNF signaling, and JAK-STAT signaling pathways (P < 0.001; Fig. 2A).
Serpin family E member 1 (SERPINE1) and fibroblast growth factor 1 (FGF1),
implicated in metastasis (43,44) were found to be involved in metastasis via EGFR
and FGF signaling pathways and angiogenesis (Fig. 2A,C).
We established metastasis prediction models based on identified prognostic markers
(Supplementary Figure S5A). Metastasis prediction gene set was chosen using MFS
test (P < 0.05; n = 84) from 734 DEGs comparing primary with metastasis samples
and prediction model was established using random-forest machine-learning method
(Supplementary Table S4). The model was validated by a 4-fold cross validation
(AUC 0.8; Supplementary Figure S5B). Importantly, we report CIs from these
metastasis prediction gene sets (Fig. 2B). Among metastasis prediction genes, CIs
were greater than 0.4 in FGF, JAK-STAT, and integrin signaling regulated by SHC
adaptor protein 1 (SHC1), integrin subunit alpha M (ITGRAM), vascular cell adhesion
molecule 1 (VCAM1), integrin subunit alpha 1 (ITGA1), and jagged canonical Notch
ligand 2 (JAG2) (Fig. 2B; Supplementary Figure S5C,D). SHC1 involved in FGF
signaling and cell migration is used as a prognostic marker for cancer treatment (45).
Research. on November 24, 2020. © 2019 American Association for Cancermcr.aacrjournals.org Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. Author Manuscript Published OnlineFirst on November 8, 2019; DOI: 10.1158/1541-7786.MCR-19-0453
17
We show high concordance (c = 0.55; 95% CI: 0.47 – 0.62) of this gene, which was
activated in metastasis (Fig. 2C). ITGAM (c = 0.48; 95% CI: 0.44 – 0.53) and ITGA1
(c = 0.52; 95% CI: 0.48 – 0.58) seem to enhance distance metastasis via integrin
signal and coagulation cascading (Fig. 2A; Supplementary Table 5). JAG2 (c = 0.46;
95% CI: 0.43 – 0.52) was identified as a member of NOTCH signaling that promotes
metastasis (Fig. 2C).
Pharmacogenomics landscape investigation by CMS
We established a pharmacogenomics landscape depending on CMS. Computational
DR scores were calculated from 181 response or resistance drug signatures from
MSigDB. In total, 48 drugs of the 181 signatures extracted to overlap with breast
cancer clinical trials. DR scores were estimated in both our aggregated data and
biobank. We extracted final 18 DR scores of 17 drugs to indicate CMS specificity
(ANOVA FDR < 0.05). Next, DR scores and correlated pathways (γ > 0.65) were
extracted as network edges for each subtype (Fig. 3). DR score difference among our
CMSs remarkably resemble with biobank (Supplementary Figure S6A). Chosen 18
drug responses to shown difference among CMSs acquired high CI and partially
associated with clinical outcomes (Supplementary Figure S6B). Adaphostin,
doxorubicine and bexrotene assigned SL type were ranked in top 3 CI (CI ≤ 0.58) and
shown significant clinical outcomes (MFS P-value=3.31E-05). Travectenin response
of SL, ribabirin response and imatinib resistance of MSL were closely relevant to
MFS. Tamoxifen resistance was significantly detected in LAR.
Dasatinib, imatinib and cisplatin were estimated to have the highest resistance scores
in the MSL subtype (Fig. 3A), and the DR scores for these three drugs correlated
strongly with the EMT and KRAS pathways (Fig. 3B). In particular, EMT activation
Research. on November 24, 2020. © 2019 American Association for Cancermcr.aacrjournals.org Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. Author Manuscript Published OnlineFirst on November 8, 2019; DOI: 10.1158/1541-7786.MCR-19-0453
18
led to cisplatin resistance by functionally shared genes such as cellular
communication network factor 2 (CCN2), vimentin (VIM), cysteine-rich angiogenic
inducer 61 (CYR61), and 5’-nucleotidase (NT5E) (Fig 3B). The genes involved in
EMT activation are known to be associated with the therapeutic effect of cisplatin
(46,47). Ribavirin, an eIF4E inhibitor, has been suggested as a response drug in MSL
subtype. The target gene is known to promote EMT and metastasis (Fig. 3A) (48).
Interestingly, our MSL drug-pathway network demonstrates that ribavirin inhibits
TNFα via NF-κb and apoptosis pathways activated in MSL (Fig. 1A, 3B).
The LAR subtype appeared strongly high resistant score against tamoxifen and
gefitinib, though androgen blockade was considered as a suitable therapy (Fig. 3A,
3B). As already known, androgen receptor promotes tamoxifen agonist activity. Our
prediction shows a strong intervention in ER early response (Fig. 3B). In LAR
network, the drug with the highest resistance, gefitinib, inhibits insulin-like growth
factor I receptor (IGF1R) signaling. Consequently, the LAR subtype shows significant
fold change in IGF1R expression (fold change=8.38). On the other hand, troglitazone
was identified as the only response drug that inhibits fatty acid oxidation (49). Fatty
acid metabolism and adipogenesis pathways were activated across the MSL and LAR
subtypes (Fig. 1A), but the response to troglitazone was more closely correlated to the
LAR subtype.
The SL shown resistance in doxorubicin, dasatinib, and flurouracil treatment but
adaholstin, cisplatin, and trabectedin were responsible (Fig. 3A). E2F, G2M, and
MYC targets 1 signal were positively correlated with doxorubicin resistance (Fig. 3B,
Supplementary Figure S6A). Cyclin B2 (CCNB2), extra spindle pole bodies like 1,
separase (ESPL1), marker of proliferation Ki-67 (MKI67), and DNA topoisomerase II
alpha (TOP2A) genes, involved in E2F and G2M pathways, induced drug resistance
Research. on November 24, 2020. © 2019 American Association for Cancermcr.aacrjournals.org Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. Author Manuscript Published OnlineFirst on November 8, 2019; DOI: 10.1158/1541-7786.MCR-19-0453
19
and showed high expression in SL. Generally, ESPL1 and TOP2A encode for
topoisomerase or its binding partner and the genes’ topoisomerase activity are known
to trigger doxorubicin resistance (50).
Additionally, to examine cisplatin efficacy in genomic unstable samples, we
calculated CIN scores for each of these cases. Genomic unstable cases of high CIN
scores were classified as cisplatin treatment (Supplementary Figure S7; response r =
0.41, P = 0.06). Glycogen synthase kinase 3 (GKS3) inhibitor, activated in IM type,
regulates innate and adaptive immune response mediated by NK-κB and STATs (51).
Interestingly, biobank data show that the GSK3 inhibitor SB216763 responds not only
to IM, but also to partial LAR and SL types (Supplementary Fig. S5; see Discussion
for more details).
ROADMAP data analysis of cisplatin treatment patients
We validated the response of patients to cisplatin with our computational result based
on the pharmacogenomics approach. Of 38 TNBC patients, 31 patients received
cisplatin containing chemotherapy at the metastatic setting (Table 2 and
Supplementary Table S6). Our SVM model classified all patients into SL (55.3%),
MSL (10.5%), LAR (31.6%), and IM (2.6%; Figs. 3c and S7a). Key pathways
inferred from aggregated data were also investigated in ROADMAP data. EMT, ER
response, and MYC pathways clearly resembled subtype-dependent pathway activities
of the aggregated data, except for the IM subtype, which was due to sample size
limitation (Supplementary Figure S8A-B). Approximately 52% of the 31 patients
responded to cisplatin treatment (R group), whereas 48% were non-responders (NR
group). In progression-free survival (PFS) analysis, the NR group showed poor PFS
(P = 0.002; Fig. 3D).
Research. on November 24, 2020. © 2019 American Association for Cancermcr.aacrjournals.org Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. Author Manuscript Published OnlineFirst on November 8, 2019; DOI: 10.1158/1541-7786.MCR-19-0453
20
Computational cisplatin DR scores of ROADMAP remarkably associated with actual
patients’ drug response, and the computational scores showed subtype-dependency (P
< 0.1; Fig. 3E). Three MSL patients except one belonged to the NR. In contrast, six of
seven LAR patients showed response to cisplatin treatment. In other words, MSL type
patients appeared to be cisplatin-resistant compared with other subtypes, while LAR
type was most sensitive for this treatment (P = 0.1; Wilcoxon test). Our method
predicted that ribavirin and thiazolidinedione response scores resembled cisplatin
resistance DR scores in ROADMAP patients (Fig. 3C, upper rectangle). Cisplatin was
suitable for the LAR and SL subtypes (Fig. 3C, middle rectangle). LAR showed both
tamoxifen resistance and androgen target treatment response (Fig. 3C, lower
rectangle). In addition, investigation of biobank drug HTS and expression profiles
indicated that cisplatin efficacy (IC50 value) was remarkably consistent with our DR
score (P = 6.0e-04; Fig. 3F).
Genes involved in cisplatin resistance were identified in ROADMAP using the DEG
set and GSEA analysis (Supplementary Table S7). We found SL- and MSL-related
pathways such as mitosis pathways, DNA damage bypass, Wnt signaling pathway,
and Beta1 integrin cell surface pathway to be enriched among the 847 drug response-
related DEGs (P < 0.0018; Supplementary Table S8). We next explored cisplatin
response markers for the prognosis of PFS based on the expression of genes, such as
E2F transcription factor 1 (E2F1), metastasis associated 1 (MTA1), plakophilin 1
(PKP1), and microtubule nucleation factor (TPX2). Up-regulation of these four genes,
characterized by shorter response times, indicated poor prognosis (P < 0.04;
Supplementary Figure S8C) and was also consistent with our aggregated data
(Supplementary Figure S8D).
Research. on November 24, 2020. © 2019 American Association for Cancermcr.aacrjournals.org Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. Author Manuscript Published OnlineFirst on November 8, 2019; DOI: 10.1158/1541-7786.MCR-19-0453
21
Our DR scores strongly suggest that patients with cisplatin non-response are mainly
enriched in the MSL subtype, which could be treated with ribavirin and
thiazolidinedione (Fig. 3C). Ribavirin also inhibits histone methyltransferase (EZH2)
and treats leukemia cells (52). Thiazolidinedione (TZD) belongs to a class of drugs
used to improve glucose metabolism in type-2 diabetes, activates PPAR-γ to crosstalk
in mesenchymal stem cells, and inhibits adipocyte differentiation (53). In addition,
these drugs interfere in MSL-associated pathways that activate PRC2, H3K27, EED,
SUZ12 (related with methyltransferase activity) and TZD-related PPAR-γ inhibition
of adipogenesis (Fig. 1B).
Discussion
Through this study, we proposed four TNBC subtypes using an integrative expression
profile. The robustness of the classification was validated using several gene
signatures and a machine-learning method (Supplementary Figure S1). In our
classification, the absence of DNA variant evidence could be a limitation, but
tumorigenic functions by variants could be inferred from transcriptional consensus
change (21,26). For example, BRCAness by BRCA1/2 mutation could be related to
the SL subtype activated in pathways associated with cell cycle (54). Our pathogenic
BRCA1/2 enrichment in SL was consistent with Lehmann’s identification in BL1 and
BL2 (5). Meanwhile, our results also summarized key pathways specific to subtypes
encompassing primary and metastatic TNBC. These consensus key pathways are in
accordance with previous investigations on the ES signature of aggressive breast
cancer from glioma and bladder tumor (25).
We investigated precise stemness using various computational approaches like
machine-learning (Fig. 1C, left) or ES cell signature scoring regulated by transcription
Research. on November 24, 2020. © 2019 American Association for Cancermcr.aacrjournals.org Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. Author Manuscript Published OnlineFirst on November 8, 2019; DOI: 10.1158/1541-7786.MCR-19-0453
22
factor target gene set (Fig. 1C, right).. ES cell signatures showed more enriched
results in the SL subtype, indicating that transcription factors SOX2, OCT4, and
NANOG to play a critical role in proliferation (25). However, MYC targets activation
was unrestricted in the SL type differently than in a previous report (25). MYC-
amplified samples except the SL type existed in the IM subtype. In another study,
MYC reduction induced the down-regulation of CD47 and PD-L1 in mouse and
human cells, which was in line with our results (55). Therefore, we infer that MYC up-
regulation is involved across the SL and IM types. Meanwhile, we suggested another
regulator, PRC2, which was known to be exclusively activated in non-stemness
samples (25). We specifically revealed mutual exclusiveness of PRC2 targets between
SL and MSL and this finding narrow down a PRC2 complex inhibitor target (56).
Cellular heterogeneity and tumor progenitors of TNBC could be elucidated from
various cells in the breast tissue microenvironment. Based on our cell abundance
results, the proliferative characteristics of SL seem to be originated from luminal
progenitor cells (Fig. 1C). In a previous study, BRCA1/2 defective tumor samples
originated from luminal progenitors (10). As already described, the SL is the closest
to BRCA1/2 deficiency. Therefore, our hypothesis is consistent with a previous study
that reported that BRCA1/2 mutation frequently recurred in TNBC that originated
from luminal progenitor and not from basal stem cells. Another study has shown that
adipocytes induce EMT in breast cancer cells (57), which was relevant to our results
that adipocytes were enriched in the MSL subtype (Fig. 1C). Thus, our cell abundance
analysis indicates the specific progenitor or interaction with other cells in the
microenvironment, which could be involved in tumorigenic biological functions.
We successfully established metastasis prediction machine-learning models using 84
DEG gene set to compare between primary and metastasis and the genes could be
Research. on November 24, 2020. © 2019 American Association for Cancermcr.aacrjournals.org Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. Author Manuscript Published OnlineFirst on November 8, 2019; DOI: 10.1158/1541-7786.MCR-19-0453
23
considered as prognostic markers. The metastasis prediction gene set was enriched in
coagulation, cell migration, and cell-cell interaction required for enhancing metastasis
(45). Especially, one of the identified genes, SHC1, regulates apoptosis and drug
resistance in mammalian cells and is essential for tumor progression and metastatic
spread in breast cancer mouse models (54).
Our computational drug response estimated from the expression profile successfully
recapitulated the comprehensive characteristics of functional intervention and drug
target. Cisplatin response markers of SL subtype, E2F1, MTA1, PKP1, and TPX2 are
discriminative in PSF and MFS (Supplementary Fig. S8). We suggest that these four
genes could emerge as important drug resistance predictors. Despite the limited
patient expression profiles in ROADMAP data, we successfully predicted high
cisplatin resistance score in MSL, while the LAR subtype was associated with good
outcome (Fig. 3E). Further, we identified several candidate drugs that could intervene
with key pathways activated in each subtype: aplidin, thiazolidinediones, and ribavirin
for the MSL subtype by inhibiting EMT and metastasis (58,59); cisplatin and
trabectedin for SL type cases with defective DNA-repair machinery (60). Such
approach is useful to perform pre-clinical prediction. Additional evaluation using
biobank data further confirmed our data on cisplatin resistance and ribavirin response
in MSL (Supplementary Figure S6). Interestingly, the proportion of IM subtype
(4.2 %) in biobank is lesser than the CMS analysis dataset (12.9 %), and it overlaps
with some LAR and SL samples with respect to GSK3 inhibitor SB216763 response
(Supplementary Figure S6). It suggests that partial IM samples are predicted to LAR
and SL. We infer that this limitation originates from artificial reconstruction of TME
for organoid culture system. In spite of TME change in organoid culture, partial
Research. on November 24, 2020. © 2019 American Association for Cancermcr.aacrjournals.org Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. Author Manuscript Published OnlineFirst on November 8, 2019; DOI: 10.1158/1541-7786.MCR-19-0453
24
patients show consistent drug response in GSK3 inhibitor consistently with fresh
frozen samples of our aggregated dataset.
In summary, we defined TNBC CMS encompassing primary and metastasis tumors
using large-scale expression profiles. Our CMS recapitulated in-depth genomic
function and microenvironment heterogeneity. Particularly, we propose the abundance
of mammary gland cells as a marker for tumorigenic origin and prognosis. Further,
our computational approach to estimate drug response was remarkably consistent with
patients’ responses to cisplatin and drug screening of biobank organoid models.
Therapeutic candidates for each CMS were suggested with pathway intervention.
Thus, our study identifies molecular subtypes and provides a basis for relevant
diagnostic and prognostic markers. Finally, our systematic computational approach
suggests a potential to complement TNBC and drug response study.
Acknowledgments
We would like to acknowledge the TNBC patients and the clinical help obtained for
the ROADMAP project. TNBC gene expression meta-data were referred from GEO
and ROADMAP next-generation sequencing files have been deposited in NCBI SRA
accession SRP156081. Our TNBC classification approach is provided as an R
package in Bioconductor
(https://bioconductor.org/packages/devel/bioc/html/TNBC.CMS.html).
Research. on November 24, 2020. © 2019 American Association for Cancermcr.aacrjournals.org Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. Author Manuscript Published OnlineFirst on November 8, 2019; DOI: 10.1158/1541-7786.MCR-19-0453
25
References
1. TSENG LM, HSU NC, CHEN SC, LU YS, LIN CH, CHANG DY, et al.
Distant metastasis in triple-negative breast cancer. Neoplasma. 2013;60:290–4.
2. Isakoff SJ. Triple-negative breast cancer: role of specific chemotherapy agents.
Cancer J. 2010;16:53–61.
3. Litton JK, Rugo HS, Ettl J, Hurvitz SA, Gonçalves A, Lee K-H, et al.
Talazoparib in Patients with Advanced Breast Cancer and a Germline BRCA
Mutation. N Engl J Med. 2018;379:753–63.
4. Schmid P, Adams S, Rugo HS, Schneeweiss A, Barrios CH, Iwata H, et al.
Atezolizumab and Nab-Paclitaxel in Advanced Triple-Negative Breast Cancer.
N Engl J Med. 2018;379:2108–21.
5. Lehmann BD, Bauer JA, Chen X, Sanders ME, Chakravarthy AB, Shyr Y, et
al. Identification of human triple-negative breast cancer subtypes and
preclinical models for selection of targeted therapies. J Clin Invest.
2011;121:2750–67.
6. Lehmann BD, Jovanović B, Chen X, Estrada M V, Johnson KN, Shyr Y, et al.
Refinement of Triple-Negative Breast Cancer Molecular Subtypes:
Implications for Neoadjuvant Chemotherapy Selection. Sapino A, editor. PLoS
One. 2016;11:e0157368.
7. Ahn SG, Kim SJ, Kim C, Jeong J. Molecular Classification of Triple-Negative
Breast Cancer. J Breast Cancer. 2016;19:223.
8. Burstein MD, Tsimelzon A, Poage GM, Covington KR, Contreras A, Fuqua
SAW, et al. Comprehensive genomic analysis identifies novel subtypes and
targets of triple-negative breast cancer. Clin Cancer Res. 2015;21:1688–98.
Research. on November 24, 2020. © 2019 American Association for Cancermcr.aacrjournals.org Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. Author Manuscript Published OnlineFirst on November 8, 2019; DOI: 10.1158/1541-7786.MCR-19-0453
26
9. Jézéquel P, Loussouarn D, Guérin-Charbonnel C, Campion L, Vanier A,
Gouraud W, et al. Gene-expression molecular subtyping of triple-negative
breast cancer tumours: importance of immune response. Breast Cancer Res.
2015;17:43.
10. Molyneux G, Geyer FC, Magnay F-A, McCarthy A, Kendrick H, Natrajan R, et
al. BRCA1 basal-like breast cancers originate from luminal epithelial
progenitors and not from basal stem cells. Cell Stem Cell. 2010;7:403–17.
11. Ungefroren H, Sebens S, Seidl D, Lehnert H, Hass R. Interaction of tumor cells
with the microenvironment. Cell Commun Signal. 2011;9:18.
12. Kim C, Gao R, Sei E, Brandt R, Hartman J, Hatschek T, et al. Chemoresistance
Evolution in Triple-Negative Breast Cancer Delineated by Single-Cell
Sequencing. Cell. 2018;173:879-893.e13.
13. Newman AM, Liu CL, Green MR, Gentles AJ, Feng W, Xu Y, et al. Robust
enumeration of cell subsets from tissue expression profiles. Nat Methods.
2015;12:453–7.
14. Aran D, Hu Z, Butte AJ. xCell: digitally portraying the tissue cellular
heterogeneity landscape. Genome Biol. 2017;18:220.
15. Chen B, Ma L, Paik H, Sirota M, Wei W, Chua M-S, et al. Reversal of cancer
gene expression correlates with drug efficacy and reveals therapeutic targets.
Nat Commun. 2017;8:16022.
16. Sengupta S, Sun SQ, Huang K-L, Oh C, Bailey MH, Varghese R, et al.
Integrative omics analyses broaden treatment targets in human cancer. Genome
Med. 2018;10:60.
Research. on November 24, 2020. © 2019 American Association for Cancermcr.aacrjournals.org Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. Author Manuscript Published OnlineFirst on November 8, 2019; DOI: 10.1158/1541-7786.MCR-19-0453
27
17. Bruna A, Rueda OM, Greenwood W, Batra AS, Callari M, Batra RN, et al. A
Biobank of Breast Cancer Explants with Preserved Intra-tumor Heterogeneity
to Screen Anticancer Compounds. Cell. 2016;167:260-274.e22.
18. Chen C, Grennan K, Badner J, Zhang D, Gershon E, Jin L, et al. Removing
batch effects in analysis of expression microarray data: an evaluation of six
batch adjustment methods. Kliebenstein D, editor. PLoS One. 2011;6:e17238.
19. Lenz M, Müller F-J, Zenke M, Schuppert A. Principal components analysis and
the reported low intrinsic dimensionality of gene expression microarray data.
Sci Rep. 2016;6:25696.
20. Hänzelmann S, Castelo R, Guinney J. GSVA: gene set variation analysis for
microarray and RNA-seq data. BMC Bioinformatics. 2013;14:7.
21. Liberzon A, Subramanian A, Pinchback R, Thorvaldsdóttir H, Tamayo P,
Mesirov JP. Molecular signatures database (MSigDB) 3.0. Bioinformatics.
2011;27:1739–40.
22. Tan TZ, Miow QH, Miki Y, Noda T, Mori S, Huang RY-J, et al. Epithelial-
mesenchymal transition spectrum quantification and its efficacy in deciphering
survival and drug responses of cancer patients. EMBO Mol Med. 2014;6:1279–
93.
23. Yoshihara K, Shahmoradgoli M, Martínez E, Vegesna R, Kim H, Torres-
Garcia W, et al. Inferring tumour purity and stromal and immune cell
admixture from expression data. Nat Commun. 2013;4:2612.
24. Malta TM, Sokolov A, Gentles AJ, Burzykowski T, Poisson L, Weinstein JN,
et al. Machine Learning Identifies Stemness Features Associated with
Oncogenic Dedifferentiation. Cell. 2018;173:338-354.e15.
Research. on November 24, 2020. © 2019 American Association for Cancermcr.aacrjournals.org Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. Author Manuscript Published OnlineFirst on November 8, 2019; DOI: 10.1158/1541-7786.MCR-19-0453
28
25. Ben-Porath I, Thomson MW, Carey VJ, Ge R, Bell GW, Regev A, et al. An
embryonic stem cell–like gene expression signature in poorly differentiated
aggressive human tumors. Nat Genet. 2008;40:499–507.
26. Buccitelli C, Salgueiro L, Rowald K, Sotillo R, Mardin BR, Korbel JO. Pan-
cancer analysis distinguishes transcriptional changes of aneuploidy from
proliferation. Genome Res. 2017;27:501–11.
27. Han X, Wang R, Zhou Y, Fei L, Sun H, Lai S, et al. Mapping the Mouse Cell
Atlas by Microwell-Seq. Cell. 2018;172:1091-1107.e17.
28. Becht E, Giraldo NA, Lacroix L, Buttard B, Elarouci N, Petitprez F, et al.
Estimating the population abundance of tissue-infiltrating immune and stromal
cell populations using gene expression. Genome Biol. 2016;17:218.
29. Van der Auwera GA, Carneiro MO, Hartl C, Poplin R, del Angel G, Levy-
Moonshine A, et al. From FastQ Data to High-Confidence Variant Calls: The
Genome Analysis Toolkit Best Practices Pipeline. Curr Protoc Bioinforma.
Hoboken, NJ, USA: John Wiley & Sons, Inc.; 2013. page 11.10.1-11.10.33.
30. Landrum MJ, Lee JM, Benson M, Brown GR, Chao C, Chitipiralla S, et al.
ClinVar: improving access to variant interpretations and supporting evidence.
Nucleic Acids Res. 2018;46:D1062–7.
31. Wu G, Dawson E, Duong A, Haw R, Stein L. ReactomeFIViz: a Cytoscape app
for pathway and network-based data analysis. F1000Research. 2014;3:146.
32. Schröder MS, Culhane AC, Quackenbush J, Haibe-Kains B. survcomp: an
R/Bioconductor package for performance assessment and comparison of
survival models. Bioinformatics. Narnia; 2011;27:3206–8.
Research. on November 24, 2020. © 2019 American Association for Cancermcr.aacrjournals.org Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. Author Manuscript Published OnlineFirst on November 8, 2019; DOI: 10.1158/1541-7786.MCR-19-0453
29
33. Wishart DS, Knox C, Guo AC, Cheng D, Shrivastava S, Tzur D, et al.
DrugBank: a knowledgebase for drugs, drug actions and drug targets. Nucleic
Acids Res. 2008;36:D901–6.
34. Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina
sequence data. Bioinformatics. 2014;30:2114–20.
35. Wang K, Singh D, Zeng Z, Coleman SJ, Huang Y, Savich GL, et al.
MapSplice: accurate mapping of RNA-seq reads for splice junction discovery.
Nucleic Acids Res. 2010;38:e178.
36. Li B, Dewey CN. RSEM: accurate transcript quantification from RNA-Seq
data with or without a reference genome. BMC Bioinformatics. 2011;12:323.
37. Barbie TU, Alexe G, Aref AR, Li S, Zhu Z, Zhang X, et al. Targeting an
IKBKE cytokine network impairs triple-negative breast cancer growth. J Clin
Invest. 2014;124:5411–23.
38. Curtis C, Shah SP, Chin S-F, Turashvili G, Rueda OM, Dunning MJ, et al. The
genomic and transcriptomic architecture of 2,000 breast tumours reveals novel
subgroups. Nature. 2012;486:346–52.
39. Carter SL, Eklund AC, Kohane IS, Harris LN, Szallasi Z. A signature of
chromosomal instability inferred from gene expression profiles predicts clinical
outcome in multiple human cancers. Nat Genet. 2006;38:1043–8.
40. Nguyen QH, Pervolarakis N, Blake K, Ma D, Davis RT, James N, et al.
Profiling human breast epithelial cells using single cell RNA sequencing
identifies cell diversity. Nat Commun. 2018;9:2028.
41. Aboussekhra A. Role of cancer-associated fibroblasts in breast cancer
development and prognosis. Int J Dev Biol. 2011;55:841–9.
Research. on November 24, 2020. © 2019 American Association for Cancermcr.aacrjournals.org Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. Author Manuscript Published OnlineFirst on November 8, 2019; DOI: 10.1158/1541-7786.MCR-19-0453
30
42. Finak G, Bertos N, Pepin F, Sadekova S, Souleimanova M, Zhao H, et al.
Stromal gene expression predicts clinical outcome in breast cancer. Nat Med.
2008;14:518–27.
43. Rangel R, Lee S-C, Hon-Kim Ban K, Guzman-Rojas L, Mann MB, Newberg
JY, et al. Transposon mutagenesis identifies genes that cooperate with mutant
Pten in breast cancer progression. Proc Natl Acad Sci U S A. 2016;113:E7749–
58.
44. Brady N, Chuntova P, Bade LK, Schwertfeger KL. The FGF/FGFR axis as a
therapeutic target in breast cancer. Expert Rev Endocrinol Metab. 2013;8:391–
402.
45. van Roosmalen W, Le Dévédec SE, Golani O, Smid M, Pulyakhina I,
Timmermans AM, et al. Tumor cell migration screen identifies SRPK1 as
breast cancer metastasis determinant. J Clin Invest. American Society for
Clinical Investigation; 2015;125:1648–64.
46. Tsai H-C, Huang C-Y, Su H-L, Tang C-H. CCN2 Enhances Resistance to
Cisplatin-Mediating Cell Apoptosis in Human Osteosarcoma. Trackman PC,
editor. PLoS One. Public Library of Science; 2014;9:e90159.
47. Arumugam T, Ramachandran V, Fournier KF, Wang H, Marquis L,
Abbruzzese JL, et al. Cancer Research. Cancer Res. American Association for
Cancer Research; 2009;63:2649–57.
48. Robichaud N, del Rincon S V, Huor B, Alain T, Petruccelli LA, Hearnden J, et
al. Phosphorylation of eIF4E promotes EMT and metastasis via translational
control of SNAIL and MMP-3. Oncogene. PMC Canada manuscript
submission; 2015;34:2032–42.
Research. on November 24, 2020. © 2019 American Association for Cancermcr.aacrjournals.org Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. Author Manuscript Published OnlineFirst on November 8, 2019; DOI: 10.1158/1541-7786.MCR-19-0453
31
49. Fulgencio JP, Kohl C, Girard J, Pégorier JP. Troglitazone inhibits fatty acid
oxidation and esterification, and gluconeogenesis in isolated hepatocytes from
starved rats. Diabetes. 1996;45:1556–62.
50. Wijdeven RH, Pang B, van der Zanden SY, Qiao X, Blomen V, Hoogstraat M,
et al. Genome-Wide Identification and Characterization of Novel Factors
Conferring Resistance to Topoisomerase II Poisons in Cancer. Cancer Res.
American Association for Cancer Research; 2015;75:4176–87.
51. Beurel E, Michalek SM, Jope RS. Innate and adaptive immune responses
regulated by glycogen synthase kinase-3 (GSK3). Trends Immunol.
2010;31:24–31.
52. De la Cruz-Hernandez E, Medina-Franco JL, Trujillo J, Chavez-Blanco A,
Dominguez-Gomez G, Perez-Cardenas E, et al. Ribavirin as a tri-targeted
antitumor repositioned drug. Oncol Rep. 2015;33:2384–92.
53. Blanquicett C, Roman J, Hart CM. Thiazolidinediones as anti-cancer agents.
Cancer Ther. 2008;6:25–34.
54. Ahn R, Sabourin V, Bolt AM, Hébert S, Totten S, De Jay N, et al. The Shc1
adaptor simultaneously balances Stat1 and Stat3 activity to promote breast
cancer immune suppression. Nat Commun. 2017;8:14638.
55. Casey SC, Tong L, Li Y, Do R, Walz S, Fitzgerald KN, et al. MYC regulates
the antitumor immune response through CD47 and PD-L1. Science (80- ).
2016;352:227–31.
56. Kim KH, Roberts CWM. Targeting EZH2 in cancer. Nat Med. 2016;22:128–
34.
57. Lee Y, Jung WH, Koo JS. Adipocytes can induce epithelial-mesenchymal
transition in breast cancer cells. Breast Cancer Res Treat. 2015;153:323–35.
Research. on November 24, 2020. © 2019 American Association for Cancermcr.aacrjournals.org Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. Author Manuscript Published OnlineFirst on November 8, 2019; DOI: 10.1158/1541-7786.MCR-19-0453
32
58. Cuadrado A, Garcia-Fernandez LF, Gonzalez L, Suarez Y, Losada A, Alcaide
V, et al. Aplidin induces apoptosis in human cancer cells via glutathione
depletion and sustained activation of the epidermal growth factor receptor, Src,
JNK, and p38 MAPK. J Biol Chem. 2003;278:241–50.
59. Pettersson F, Del Rincon S V, Emond A, Huor B, Ngan E, Ng J, et al. Genetic
and pharmacologic inhibition of eIF4E reduces breast cancer cell migration,
invasion, and metastasis. Cancer Res. 2015;75:1102–12.
60. D’Incalci M, Zambelli A. Trabectedin for the treatment of breast cancer. Expert
Opin Investig Drugs. 2016;25:105–15.
Research. on November 24, 2020. © 2019 American Association for Cancermcr.aacrjournals.org Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. Author Manuscript Published OnlineFirst on November 8, 2019; DOI: 10.1158/1541-7786.MCR-19-0453
33
Table legends
Table 1. Summary of gene expression profiles from the aggregated data
Matched clinical information is based on Gene Expression Omnibus (GEO). Sample
counts after filtering out ER, PR, and HER2 positive samples are listed below under
primary and metastasis. A part of the samples include long-distance metastasis
described in tissue type. In total, 601 primary and 356 metastasis samples were
chosen for functional analysis.
Table 2. Molecular subtype and clinical summary of ROADMAP patients
Research. on November 24, 2020. © 2019 American Association for Cancermcr.aacrjournals.org Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. Author Manuscript Published OnlineFirst on November 8, 2019; DOI: 10.1158/1541-7786.MCR-19-0453
34
Figure legends
Figure 1. Overview of TNBC molecular subtypes
a Heatmap of hallmark pathway activity derived from the expression profiles clearly
shows the difference among subtypes. Rows indicate the mean selected pathways, and
columns represent the total TNBC samples. b Enrichment status of gene signature,
mammary gland cell abundance, and embryonic stem cell and proliferation signatures
of different CMSs. MMP, matrix metalloproteinase; L. progenitor, luminal progenitor;
L. epithelial, luminal epithelial; blood/mitotic, blood cell mitotic cell cycle process. c
Overall survival and disease-free survival plots using our aggregated data and Curtis
2012 (38) classified into four CMSs. All p-values were calculated by Cox regression
analysis.
Figure 2. Functional investigation and gene interactions of differentially
expressed genes between primary and metastatic TNBC
a Significantly enriched pathways derived from the DEG set between primary and
metastatic TNBC. b CI plot for metastasis prediction markers. Cox regression
analysis was performed between binary groups according to gene expression. c Gene
interaction network of DEGs after GSEA. Red nodes indicate activated genes in
metastatic samples, and blue nodes indicate activated genes in primary samples. Node
sizes are −log(p-values), and p-values were calculated by t-test with primary and
metastatic samples.
Figure 3. Overall status of functional interactions between drug response and
key pathway within each CMS
Research. on November 24, 2020. © 2019 American Association for Cancermcr.aacrjournals.org Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. Author Manuscript Published OnlineFirst on November 8, 2019; DOI: 10.1158/1541-7786.MCR-19-0453
35
a Boxplots of DR scores estimated from aggregated data, MSL, LAR, and SL,
respectively. b Interaction networks with drugs and key pathway to highly correlate (r
> 0.65) for each CMS. The subtype of each network is equally ordered within the
boxplots. Square and circle nodes represent key pathways and drugs, respectively.
Node colors of pathways indicate pathway activity scores. Red drug node indicates
resistance signature and blue, the response. The width of edges represents gene degree
overlap in drug and pathway gene sets. Numbers on edge represent the greatest gene
degree for each subtype. c DR score heatmap estimated from the gene expression
profiles of ROADMAP patients. d Progression-free survival plot of ROADMAP
patients that underwent cisplatin therapy. e Two waterfall plots enumerating cisplatin
resistance DR score estimated from the ROADMAP expression profiles colored based
on CMS (left panel) and patients’ cisplatin response profiles (right panel). f Cisplatin
IC50 boxplot of 64 biobank TNBC cells for non-responders (NR) and responders (R)
(left). Right boxplot is cisplatin resistance DR scores of NR and R from the biobank.
Research. on November 24, 2020. © 2019 American Association for Cancermcr.aacrjournals.org Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. Author Manuscript Published OnlineFirst on November 8, 2019; DOI: 10.1158/1541-7786.MCR-19-0453
Research. on November 24, 2020. © 2019 American Association for Cancermcr.aacrjournals.org Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. Author Manuscript Published OnlineFirst on November 8, 2019; DOI: 10.1158/1541-7786.MCR-19-0453
Research. on November 24, 2020. © 2019 American Association for Cancermcr.aacrjournals.org Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. Author Manuscript Published OnlineFirst on November 8, 2019; DOI: 10.1158/1541-7786.MCR-19-0453
Research. on November 24, 2020. © 2019 American Association for Cancermcr.aacrjournals.org Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. Author Manuscript Published OnlineFirst on November 8, 2019; DOI: 10.1158/1541-7786.MCR-19-0453
Table 1. Summary of gene expression profiles from the aggregated data
Matched clinical information is based on Gene Expression Omnibus (GEO). Sample counts
after filtering out ER, PR, and HER2 positive samples are listed below under primary and
metastasis. A part of the samples include long-distance metastasis described in tissue type. In
total, 601 primary and 356 metastasis samples were chosen for functional analysis.
ID Total Primary Metastasis Tissue type Platform
GSE103091 107 53 21 74 Breast GPL570
GSE11078 23 14 0 14 Breast GPL570
GSE2034 286 223 0 223 Breast GPL96
GSE46141 91 0 52 4 Breast, 8 Liver, 24 LN,
14 Skin, 1 Ascites, 1 Bone GPL10379
GSE12276 204 16 129 145 Breast GPL570
GSE14017 29 0 16 6 Bone, 8 Brain, 2 Lung GPL570
GSE14018 36 0 27 5 Bone, 5 Brain, 3 Liver,
14 Lung GPL96
GSE46563 94 14 0 14 Breast GPL6884
GSE46928 52 20 6 26 Breast GPL96
GSE5327 58 29 9 38 Breast GPL96
GSE54323 29 0 21 12 Bone, 6 Liver, 3 LN GPL570
GSE56493 120 0 74 11 Breast, 13 Liver, 29 LN,
20 Skin, 1 UNK GPL10379
GSE7390 198 141 0 141 Breast GPL96
GSE76124 198 91 1 92 Breast GPL570
Total 1,525 601 356
Research. on November 24, 2020. © 2019 American Association for Cancermcr.aacrjournals.org Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. Author Manuscript Published OnlineFirst on November 8, 2019; DOI: 10.1158/1541-7786.MCR-19-0453
Table 2. Molecular subtype and clinical summary of ROADMAP patients
Characteristics MSL
n (%) LAR
n (%) IM
n (%) SL
n (%) P
Number of samples 4 12 1 21
Age < 50 y 3 (75) 5 (41.7) — 10 (47.6) 0.67
≥ 50 y 1 (25) 7 (58.3) 1 (100) 11 (52.4)
Sample type Fresh frozen 2 (50) 2 (16.7) — 6 (28.6) 0.58
FFPE 2 (50) 10 (83.3) 1 (100) 15 (71.4)
Metastases No metastases — 4 (33.3) — 9 (42.9) 0.42
Metastases 4 (100) 8 (66.7) 1 (100) 12 (57.1)
Number of
chemo before
biopsy
0 - 3 (25) — 5 (23.8) 0.87
1 3 (75) 4 (33.3) 1 (100) 9 (42.9)
≥ 2 1 (25) 5 (41.7) — 7 (33.3)
Cisplatin
response
Response 1 (25) 7 (58.4) 1 (100) 7 (33.3) 0.04
Non-response 3 (75) 1 (8.3) — 11 (52.4)
Unknown — 4 (33.3) — 3 (14.3)
Survival status Alive — 7 1 7 0.15
Dead 3 5 — 13
Unknown 1 — — 1
Research. on November 24, 2020. © 2019 American Association for Cancermcr.aacrjournals.org Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. Author Manuscript Published OnlineFirst on November 8, 2019; DOI: 10.1158/1541-7786.MCR-19-0453
Published OnlineFirst November 8, 2019.Mol Cancer Res Jihyun Kim, Doyeong Yu, Youngmee Kwon, et al. responsenominate molecular subtypes that predict chemotherapy Genomic characteristics of triple-negative breast cancer
Updated version
10.1158/1541-7786.MCR-19-0453doi:
Access the most recent version of this article at:
Material
Supplementary
http://mcr.aacrjournals.org/content/suppl/2019/11/08/1541-7786.MCR-19-0453.DC1
Access the most recent supplemental material at:
Manuscript
Authorbeen edited. Author manuscripts have been peer reviewed and accepted for publication but have not yet
E-mail alerts related to this article or journal.Sign up to receive free email-alerts
Subscriptions
Reprints and
To order reprints of this article or to subscribe to the journal, contact the AACR Publications
Permissions
Rightslink site. Click on "Request Permissions" which will take you to the Copyright Clearance Center's (CCC)
.http://mcr.aacrjournals.org/content/early/2019/11/08/1541-7786.MCR-19-0453To request permission to re-use all or part of this article, use this link
Research. on November 24, 2020. © 2019 American Association for Cancermcr.aacrjournals.org Downloaded from
Author manuscripts have been peer reviewed and accepted for publication but have not yet been edited. Author Manuscript Published OnlineFirst on November 8, 2019; DOI: 10.1158/1541-7786.MCR-19-0453