of ovarian cancer transcriptomics data de... · Expression of TFs Scale Scale space classification...

1
Expression of TFs Scale Scale space classification of ovarian cancer transcriptomics data Jeroen de Ridder 1,* , Theo Knijnenburg 2,$ , Ilya Shmulevich 3 , Lodewyk Wessels 1,2 , Marcel Reinders 1 1 Delft Bioinformatics Lab, Faculty of Electrical Engineering, Mathematics and Computer Science, Delft University of Technology, Delft, The Netherlands. 2 Bioinformatics and Statistics, Division of Molecular Biology, The Netherlands Cancer Institute, Amsterdam, The Netherlands, 3 Institute for Systems Biology, Seattle, United States of America * [email protected], $ [email protected] The notion of scale plays an important role in how high-throughput biomolecular measurements are analyzed and interpreted. At one level of functional scale, for instance, all genes involved in the cell cycle may be considered as a functional unit, whereas a more fine grained scale would capture more specific cell- cycle related events. Capturing components that manifest themselves as functional units across different scales may reveal key biological mechanisms. We propose to improve classification of gene expression data by exploiting relevant structure in the data at multiple scales. The method is based on concepts of scale space analysis in machine vision, used for scale independent detection of features in images. This multiscale framework was applied to a large gene expression dataset of patients diagnosed with ovarian cancer, measured as part of the Cancer Genome Atlas Project (TCGA). Our goal was to build a classifier that accurately discriminates between patients with local recurrence and distant recurrence (metastasis) of the tumor after the primary tumor was surgically removed. Lowest scale uninformative Different structure at different scales Structure exists across scales Cross validation performance differs between scales and TFs Best cross validation performance for different scales Different yet relevant signature TFs are found using the scale-space Scale in expression data TF to TF distance Constructing scale space trees Scale space representation of TCGA data 25NN Leave-one-out error per scale and per TF Indentifying signature transcription factors Scale in images Original Small-scale Medium-scale Large-scale Can we use the distance between TFs to uncover patterns in their expression levels? Scan for motifs Compute distances Map to 2 dimensions Picture of a phone Picture of the same phone but with random pixel to pixel distance sox14 pitx2 foxa1 sox2 sox11 hoxa9 hmga2 irx5 meox1 pou2af1 pgr runx3 foxa2 egr3 foxc1 egr1 tcf7l1 egr2 fos meis2 pou2f3 ebf3 hoxb5 foxa3 hoxd3 dmrt2 irx2 pax6 tbx18 nr2f1 tfap2c zbtb16 fosb tfcp2l1 tfap2a bhlhe41 esr1 gli2 vdr nfe2 etv7 znf423 pou3f3 sox9 ebf1 gli3 hoxa2 dmrt3 pparg gli1 lef1 onecut2 myb alx1 runx1 hand2 cdx2 maf sox5 rax foxf2 onecut1 atf3 rora nr1h4 nr1i2 bach2 znf219 nr2e3 rarb tfap2b tp53 jun pou3f2 hes1 fli1 mafb nfia mef2c irf1 stat1 xbp1 junb pax8 foxf1 gata4 myod1 dlx2 maff zeb1 nr2f2 nfe2l3 ahr hoxd13 sox4 sox18 nfib gfi1 lmo2 nkx3−2 atf5 nfatc1 mecom smad9 pax3 pou4f1 stat4 klf4 bhlhe40 ikzf1 bcl6 mycn znf238 irf5 tp63 stat6 sox21 zbtb6 tgif2 otx1 irf8 irf6 prdm1 mtf1 jund elf5 gata2 usf1 spi1 irf4 hand1 tfeb gata3 smad4 cebpe sox12 rest elf1 nr3c1 srebf1 mitf nkx2−2 runx2 pou5f1 tcf4 sox6 e2f1 pbx1 tead1 gabpa foxj1 sox13 mterf hsf2 cebpg mxi1 arid5b meis1 tcf7 smad6 ets1 tlx2 nfya nkx6−1 six4 myf6 TCGA Ovarian gene expression data 164 TFs 251 patients Distances between TFs Grow tree using scale space theory Slice as scale 35, Class 1 is red, Class 2 is blue Transcription factors Zoom in 5 10 15 20 25 30 35 40 0.49 0.5 0.51 0.52 0.53 25NN Performance Scale 0.48 0.49 0.5 0.51 0.52 0.53 −15 −10 −5 0 5 10 15 −8 −6 −4 −2 0 2 4 6 8 10 SOX14 PITX2 FOXA1 SOX2 SOX11 HOXA9 HMGA2 IRX5 MEOX1 POU2AF1 PGR RUNX3 FOXA2 EGR3 FOXC1 EGR1 TCF7L1 EGR2 FOS MEIS2 POU2F3 EBF3 HOXB5 FOXA3 HOXD3 DMRT2 IRX2 PAX6 TBX18 NR2F1 TFAP2C ZBTB16 FOSB TFCP2L1 TFAP2A BHLHE41 ESR1 GLI2 VDR NFE2 ETV7 ZNF423 POU3F3 SOX9 EBF1 GLI3 HOXA2 DMRT3 PPARG GLI1 LEF1 ONECUT2 MYB ALX1 RUNX1 HAND2 CDX2 MAF SOX5 RAX FOXF2 ONECUT1 ATF3 RORA NR1H4 NR1I2 BACH2 ZNF219 NR2E3 RARB TFAP2B TP53 JUN POU3F2 HES1 FLI1 MAFB NFIA MEF2C IRF1 STAT1 XBP1 JUNB PAX8 FOXF1 GATA4 MYOD1 DLX2 MAFF ZEB1 NR2F2 NFE2L3 AHR HOXD13 SOX4 SOX18 NFIB GFI1 LMO2 NKX3−2 ATF5 NFATC1 MECOM SMAD9 PAX3 POU4F1 STAT4 KLF4 BHLHE40 IKZF1 BCL6 MYCN ZNF238 IRF5 TP63 STAT6 SOX21 ZBTB6 TGIF2 OTX1 IRF8 IRF6 PRDM1 MTF1 JUND ELF5 GATA2 USF1 SPI1 IRF4 HAND1 TFEB GATA3 SMAD4 CEBPE SOX12 REST ELF1 NR3C1 SREBF1 MITF NKX2−2 RUNX2 POU5F1 TCF4 SOX6 E2F1 PBX1 TEAD1 GABPA FOXJ1 SOX13 MTERF HSF2 CEBPG MXI1 ARID5B MEIS1 TCF7 SMAD6 ETS1 TLX2 NFYA NKX6−1 SIX4 MYF6 SMAD4 TFP53 JUND MAGEA11 1 0 0 GPR64 1 1 0 C6orf15 0 1 1 TMPRSS2 1 0 0 NEB 0 0 1 TF SMAD4 gene MAGEA11 AATCGACGCAGTCAGCCAGCTCTGACTGTGTGCTATCATGCACTCATACTGACCATCATACTACCC SMAD4 MAGEA11 75th percentile 25NN Performance −3.5 −3 −2.5 −2 −1.5 −1 −0.5 0 0.37 0.38 0.39 0.4 0.41 0.42 0.43 0.44 0.45 0.46 TFAP2B TFAP2A NR1H4 TFAP2C MTERF JUND ZNF219 GABPA NR3C1 IRF8 FOXF1 SOX11 SOX4 TP53 SMAD4 XBP1 Minimum CV error across scales Log10 pvalue of tscore 251 trees, one for each sample scale TF dimension 1 TF dimension 2 Loco regional recurrence Metastatic recurrence

Transcript of of ovarian cancer transcriptomics data de... · Expression of TFs Scale Scale space classification...

Page 1: of ovarian cancer transcriptomics data de... · Expression of TFs Scale Scale space classification of ovarian cancer transcriptomics data Jeroen de Ridder1,*, Theo Knijnenburg2,$,

Expression of TFs

Scale

Scale space classificationof ovarian cancer transcriptomics dataJeroen de Ridder1,*, Theo Knijnenburg2,$, Ilya Shmulevich3,

Lodewyk Wessels1,2, Marcel Reinders1

1Delft Bioinformatics Lab, Faculty of Electrical Engineering, Mathematics and Computer Science,

Delft University of Technology, Delft, The Netherlands. 2Bioinformatics and Statistics, Division of Molecular

Biology, The Netherlands Cancer Institute, Amsterdam, The Netherlands, 3Institute for Systems Biology, Seattle, United States of America

*[email protected], [email protected]

The notion of scale plays an important role in how high-throughput biomolecular measurements are analyzed and interpreted. At one level of functional scale, for instance, all genes involved in the cell cycle may be considered as a functional unit, whereas a more fine grained scale would capture more specific cell-

cycle related events. Capturing components that manifest themselves as functional units across different scales may reveal key biological mechanisms.

We propose to improve classification of gene expression data by exploiting relevant structure in the data at multiple scales. The method is based on concepts of scale space analysis in machine vision, used for scale independent detection of features in images.

This multiscale framework was applied to a large gene expression dataset of patients diagnosed with ovarian cancer, measured as part of the Cancer Genome Atlas Project (TCGA). Our goal was to build a classifier that accurately discriminates between patients with local recurrence and distant recurrence (metastasis) of the

tumor after the primary tumor was surgically removed.

Lowest scaleuninformative

Different structure at different scales Structure exists across scales

Cross validation performancediffers between scales and TFs

Best cross validation performancefor different scales

Different yet relevant signature TFs are found

using the scale-space

Scale in expression data

TF to TF distance

Constructing scale space trees

Scale space representation of TCGA data

25NN Leave-one-out error per scale and per TF

Indentifying signature transcription factors

Scale in imagesOriginal Small-scale Medium-scale Large-scale

Can we use the distance between TFs to uncover patterns in their expression levels?

Scan for motifs Compute distances Map to 2 dimensions

Picture of a phone

Picture of the samephone but with randompixel to pixel distance

sox14pitx2foxa1sox2sox11hoxa9hmga2irx5meox1pou2af1pgrrunx3foxa2egr3foxc1egr1tcf7l1egr2fosmeis2pou2f3ebf3hoxb5foxa3hoxd3dmrt2irx2pax6tbx18nr2f1tfap2czbtb16fosbtfcp2l1tfap2abhlhe41esr1gli2vdrnfe2etv7znf423pou3f3sox9ebf1gli3hoxa2dmrt3pparggli1lef1onecut2mybalx1runx1hand2cdx2mafsox5raxfoxf2onecut1atf3roranr1h4nr1i2bach2znf219nr2e3rarbtfap2btp53junpou3f2hes1fli1mafbnfiamef2cirf1stat1xbp1junbpax8foxf1gata4myod1dlx2maffzeb1nr2f2nfe2l3ahrhoxd13sox4sox18nfibgfi1lmo2nkx3−2atf5nfatc1mecomsmad9pax3pou4f1stat4klf4bhlhe40ikzf1bcl6mycnznf238irf5tp63stat6sox21zbtb6tgif2otx1irf8irf6prdm1mtf1jundelf5gata2usf1spi1irf4hand1tfebgata3smad4cebpesox12restelf1nr3c1srebf1mitfnkx2−2runx2pou5f1tcf4sox6e2f1pbx1tead1gabpafoxj1sox13mterfhsf2cebpgmxi1arid5bmeis1tcf7smad6ets1tlx2nfyankx6−1six4myf6TCGA Ovarian gene expression data

164

TFs

251 patients

Distances between TFs Grow tree using scale space theory

Slice as scale 35, Class 1 is red, Class 2 is blue

Tran

scrip

tion

fact

ors

Zoom in

5 10 15 20 25 30 35 400.49

0.5

0.51

0.52

0.53

25N

N P

erfo

rman

ce

Scale

0.48

0.49

0.5

0.51

0.52

0.53

−15 −10 −5 0 5 10 15−8

−6

−4

−2

0

2

4

6

8

10

SOX14PITX2FOXA1

SOX2

SOX11

HOXA9

HMGA2

IRX5MEOX1

POU2AF1

PGR

RUNX3

FOXA2

EGR3

FOXC1

EGR1

TCF7L1

EGR2

FOS

MEIS2

POU2F3

EBF3

HOXB5

FOXA3

HOXD3

DMRT2IRX2PAX6

TBX18

NR2F1

TFAP2C

ZBTB16

FOSB

TFCP2L1

TFAP2A

BHLHE41

ESR1

GLI2

VDR

NFE2 ETV7

ZNF423

POU3F3

SOX9EBF1GLI3

HOXA2

DMRT3PPARG

GLI1

LEF1

ONECUT2

MYB

ALX1

RUNX1

HAND2

CDX2

MAF

SOX5

RAX

FOXF2

ONECUT1

ATF3

RORA

NR1H4

NR1I2

BACH2

ZNF219

NR2E3

RARB

TFAP2B

TP53

JUN

POU3F2

HES1

FLI1

MAFBNFIA

MEF2C

IRF1

STAT1

XBP1

JUNB

PAX8

FOXF1

GATA4

MYOD1

DLX2

MAFF

ZEB1

NR2F2NFE2L3

AHR

HOXD13

SOX4SOX18NFIB

GFI1

LMO2

NKX3−2ATF5

NFATC1

MECOM

SMAD9

PAX3

POU4F1STAT4

KLF4

BHLHE40IKZF1

BCL6

MYCN

ZNF238

IRF5

TP63

STAT6

SOX21

ZBTB6TGIF2

OTX1

IRF8

IRF6PRDM1

MTF1

JUND

ELF5

GATA2

USF1

SPI1

IRF4

HAND1

TFEBGATA3

SMAD4

CEBPE

SOX12

REST

ELF1

NR3C1

SREBF1

MITF

NKX2−2

RUNX2

POU5F1

TCF4

SOX6

E2F1PBX1

TEAD1GABPA

FOXJ1SOX13

MTERF

HSF2

CEBPG

MXI1

ARID5B

MEIS1TCF7

SMAD6ETS1

TLX2NFYA

NKX6−1

SIX4

MYF6SMAD4 TFP53 JUND …

MAGEA11 1 0 0 …

GPR64 1 1 0 …

C6orf15 0 1 1 …

TMPRSS2 1 0 0 …

NEB 0 0 1 …

… … … … …

TF SMAD4

gene MAGEA11AATCGACGCAGTCAGCCAGCTCTGACTGTGTGCTATCATGCACTCATACTGACCATCATACTACCC

SMAD4 MAGEA11

75th

per

cent

ile

25N

N P

erfo

rman

ce

−3.5 −3 −2.5 −2 −1.5 −1 −0.5 00.37

0.38

0.39

0.4

0.41

0.42

0.43

0.44

0.45

0.46

TFAP2BTFAP2A

NR1H4

TFAP2C

MTERFJUND

ZNF219GABPA

NR3C1IRF8

FOXF1SOX11SOX4

TP53

SMAD4XBP1

Min

imum

CV

erro

r ac

ross

sca

les

Log10 pvalue of t−score

251 trees, one for each sample

scal

e

TF dimension 1

TF dim

ensio

n 2

Loco regional recurrence Metastatic recurrence