of ovarian cancer transcriptomics data de... · Expression of TFs Scale Scale space classification...
Transcript of of ovarian cancer transcriptomics data de... · Expression of TFs Scale Scale space classification...
Expression of TFs
Scale
Scale space classificationof ovarian cancer transcriptomics dataJeroen de Ridder1,*, Theo Knijnenburg2,$, Ilya Shmulevich3,
Lodewyk Wessels1,2, Marcel Reinders1
1Delft Bioinformatics Lab, Faculty of Electrical Engineering, Mathematics and Computer Science,
Delft University of Technology, Delft, The Netherlands. 2Bioinformatics and Statistics, Division of Molecular
Biology, The Netherlands Cancer Institute, Amsterdam, The Netherlands, 3Institute for Systems Biology, Seattle, United States of America
*[email protected], [email protected]
The notion of scale plays an important role in how high-throughput biomolecular measurements are analyzed and interpreted. At one level of functional scale, for instance, all genes involved in the cell cycle may be considered as a functional unit, whereas a more fine grained scale would capture more specific cell-
cycle related events. Capturing components that manifest themselves as functional units across different scales may reveal key biological mechanisms.
We propose to improve classification of gene expression data by exploiting relevant structure in the data at multiple scales. The method is based on concepts of scale space analysis in machine vision, used for scale independent detection of features in images.
This multiscale framework was applied to a large gene expression dataset of patients diagnosed with ovarian cancer, measured as part of the Cancer Genome Atlas Project (TCGA). Our goal was to build a classifier that accurately discriminates between patients with local recurrence and distant recurrence (metastasis) of the
tumor after the primary tumor was surgically removed.
Lowest scaleuninformative
Different structure at different scales Structure exists across scales
Cross validation performancediffers between scales and TFs
Best cross validation performancefor different scales
Different yet relevant signature TFs are found
using the scale-space
Scale in expression data
TF to TF distance
Constructing scale space trees
Scale space representation of TCGA data
25NN Leave-one-out error per scale and per TF
Indentifying signature transcription factors
Scale in imagesOriginal Small-scale Medium-scale Large-scale
Can we use the distance between TFs to uncover patterns in their expression levels?
Scan for motifs Compute distances Map to 2 dimensions
Picture of a phone
Picture of the samephone but with randompixel to pixel distance
sox14pitx2foxa1sox2sox11hoxa9hmga2irx5meox1pou2af1pgrrunx3foxa2egr3foxc1egr1tcf7l1egr2fosmeis2pou2f3ebf3hoxb5foxa3hoxd3dmrt2irx2pax6tbx18nr2f1tfap2czbtb16fosbtfcp2l1tfap2abhlhe41esr1gli2vdrnfe2etv7znf423pou3f3sox9ebf1gli3hoxa2dmrt3pparggli1lef1onecut2mybalx1runx1hand2cdx2mafsox5raxfoxf2onecut1atf3roranr1h4nr1i2bach2znf219nr2e3rarbtfap2btp53junpou3f2hes1fli1mafbnfiamef2cirf1stat1xbp1junbpax8foxf1gata4myod1dlx2maffzeb1nr2f2nfe2l3ahrhoxd13sox4sox18nfibgfi1lmo2nkx3−2atf5nfatc1mecomsmad9pax3pou4f1stat4klf4bhlhe40ikzf1bcl6mycnznf238irf5tp63stat6sox21zbtb6tgif2otx1irf8irf6prdm1mtf1jundelf5gata2usf1spi1irf4hand1tfebgata3smad4cebpesox12restelf1nr3c1srebf1mitfnkx2−2runx2pou5f1tcf4sox6e2f1pbx1tead1gabpafoxj1sox13mterfhsf2cebpgmxi1arid5bmeis1tcf7smad6ets1tlx2nfyankx6−1six4myf6TCGA Ovarian gene expression data
164
TFs
251 patients
Distances between TFs Grow tree using scale space theory
Slice as scale 35, Class 1 is red, Class 2 is blue
Tran
scrip
tion
fact
ors
Zoom in
5 10 15 20 25 30 35 400.49
0.5
0.51
0.52
0.53
25N
N P
erfo
rman
ce
Scale
0.48
0.49
0.5
0.51
0.52
0.53
−15 −10 −5 0 5 10 15−8
−6
−4
−2
0
2
4
6
8
10
SOX14PITX2FOXA1
SOX2
SOX11
HOXA9
HMGA2
IRX5MEOX1
POU2AF1
PGR
RUNX3
FOXA2
EGR3
FOXC1
EGR1
TCF7L1
EGR2
FOS
MEIS2
POU2F3
EBF3
HOXB5
FOXA3
HOXD3
DMRT2IRX2PAX6
TBX18
NR2F1
TFAP2C
ZBTB16
FOSB
TFCP2L1
TFAP2A
BHLHE41
ESR1
GLI2
VDR
NFE2 ETV7
ZNF423
POU3F3
SOX9EBF1GLI3
HOXA2
DMRT3PPARG
GLI1
LEF1
ONECUT2
MYB
ALX1
RUNX1
HAND2
CDX2
MAF
SOX5
RAX
FOXF2
ONECUT1
ATF3
RORA
NR1H4
NR1I2
BACH2
ZNF219
NR2E3
RARB
TFAP2B
TP53
JUN
POU3F2
HES1
FLI1
MAFBNFIA
MEF2C
IRF1
STAT1
XBP1
JUNB
PAX8
FOXF1
GATA4
MYOD1
DLX2
MAFF
ZEB1
NR2F2NFE2L3
AHR
HOXD13
SOX4SOX18NFIB
GFI1
LMO2
NKX3−2ATF5
NFATC1
MECOM
SMAD9
PAX3
POU4F1STAT4
KLF4
BHLHE40IKZF1
BCL6
MYCN
ZNF238
IRF5
TP63
STAT6
SOX21
ZBTB6TGIF2
OTX1
IRF8
IRF6PRDM1
MTF1
JUND
ELF5
GATA2
USF1
SPI1
IRF4
HAND1
TFEBGATA3
SMAD4
CEBPE
SOX12
REST
ELF1
NR3C1
SREBF1
MITF
NKX2−2
RUNX2
POU5F1
TCF4
SOX6
E2F1PBX1
TEAD1GABPA
FOXJ1SOX13
MTERF
HSF2
CEBPG
MXI1
ARID5B
MEIS1TCF7
SMAD6ETS1
TLX2NFYA
NKX6−1
SIX4
MYF6SMAD4 TFP53 JUND …
MAGEA11 1 0 0 …
GPR64 1 1 0 …
C6orf15 0 1 1 …
TMPRSS2 1 0 0 …
NEB 0 0 1 …
… … … … …
TF SMAD4
gene MAGEA11AATCGACGCAGTCAGCCAGCTCTGACTGTGTGCTATCATGCACTCATACTGACCATCATACTACCC
SMAD4 MAGEA11
75th
per
cent
ile
25N
N P
erfo
rman
ce
−3.5 −3 −2.5 −2 −1.5 −1 −0.5 00.37
0.38
0.39
0.4
0.41
0.42
0.43
0.44
0.45
0.46
TFAP2BTFAP2A
NR1H4
TFAP2C
MTERFJUND
ZNF219GABPA
NR3C1IRF8
FOXF1SOX11SOX4
TP53
SMAD4XBP1
Min
imum
CV
erro
r ac
ross
sca
les
Log10 pvalue of t−score
251 trees, one for each sample
scal
e
TF dimension 1
TF dim
ensio
n 2
Loco regional recurrence Metastatic recurrence