Supplementary Materials for - Rune Linding · Supplementary Materials for Linear Motif Atlas for...

23
www.sciencesignaling.org/cgi/content/full/1/35/ra2/DC1 Supplementary Materials for Linear Motif Atlas for Phosphorylation-Dependent Signaling Martin Lee Miller, Lars Juhl Jensen, Francesca Diella, Claus Jørgensen, Michele Tinti, Lei Li, Marilyn Hsiung, Sirlester A. Parker, Jennifer Bordeaux, Thomas Sicheritz-Ponten, Marina Olhovsky, Adrian Pasculescu, Jes Alexander, Stefan Knapp, Nikolaj Blom, Peer Bork, Shawn Li, Gianni Cesareni, Tony Pawson, Benjamin E. Turk, Michael B. Yaffe,* Søren Brunak,* Rune Linding* *To whom correspondence should be addressed. E-mail: [email protected] (S.B.), [email protected] (M.B.Y.), and [email protected] (R.L.) Published 2 September 2008, Sci. Signal. 1, ra2 (2008) DOI: 10.1126/scisignal.1159433 This PDF file includes: Figure S1: Overview of the NetPhorest pipeline. Figure S2: Phosphorylation data mapped onto domain trees. Figure S3: Coverage of classifiers for targets of kinases, SH2 domains, and PTB domains. Figure S4: Score calibration. Figure S5: Correlation between domain similarity and substrate specificity. Figure S6: Kinase matrices from Positional Scanning Peptide Libraries (PSPL). Figure S7: Sequence logos for kinases and pS/pT-binding domains. Figure S8: Receiver output characteristic (ROC) curves for the NetPhorest classifiers. Table S1: The selected set of NetPhorest classifiers. Table S2: Benchmark of the NetPhorest method.

Transcript of Supplementary Materials for - Rune Linding · Supplementary Materials for Linear Motif Atlas for...

Page 1: Supplementary Materials for - Rune Linding ·  Supplementary Materials for Linear Motif Atlas for Phosphorylation-Dependent Signaling

www.sciencesignaling.org/cgi/content/full/1/35/ra2/DC1

Supplementary Materials for

Linear Motif Atlas for Phosphorylation-Dependent Signaling

Martin Lee Miller, Lars Juhl Jensen, Francesca Diella, Claus Jørgensen, Michele Tinti,

Lei Li, Marilyn Hsiung, Sirlester A. Parker, Jennifer Bordeaux, Thomas Sicheritz-Ponten,

Marina Olhovsky, Adrian Pasculescu, Jes Alexander, Stefan Knapp, Nikolaj Blom, Peer

Bork, Shawn Li, Gianni Cesareni, Tony Pawson, Benjamin E. Turk, Michael B. Yaffe,*

Søren Brunak,* Rune Linding*

*To whom correspondence should be addressed. E-mail: [email protected] (S.B.), [email protected]

(M.B.Y.), and [email protected] (R.L.)

Published 2 September 2008, Sci. Signal. 1, ra2 (2008)

DOI: 10.1126/scisignal.1159433

This PDF file includes:

Figure S1: Overview of the NetPhorest pipeline.

Figure S2: Phosphorylation data mapped onto domain trees.

Figure S3: Coverage of classifiers for targets of kinases, SH2 domains, and PTB

domains.

Figure S4: Score calibration.

Figure S5: Correlation between domain similarity and substrate specificity.

Figure S6: Kinase matrices from Positional Scanning Peptide Libraries (PSPL).

Figure S7: Sequence logos for kinases and pS/pT-binding domains.

Figure S8: Receiver output characteristic (ROC) curves for the NetPhorest

classifiers.

Table S1: The selected set of NetPhorest classifiers.

Table S2: Benchmark of the NetPhorest method.

Page 2: Supplementary Materials for - Rune Linding ·  Supplementary Materials for Linear Motif Atlas for Phosphorylation-Dependent Signaling

Supplementary Information for:Linear motif atlas for phosphorylation-dependent signaling

Martin Lee Miller 1,2∗, Lars Juhl Jensen 2,3∗, Francesca Diella 3, Claus Jørgensen 4, Michele Tinti 5,Lei Li 6, Marilyn Hsiung 4, Sirlester A. Parker 7, Jennifer Bordeaux 7, Thomas Sicheritz-Ponten 1, Ma-rina Olhovsky 4, Adrian Pasculescu 4, Jes Alexander 8, Stefan Knapp 9, Nikolaj Blom 1, Peer Bork 2,10,Shawn Li 6, Gianni Cesareni 5, Tony Pawson 4, Benjamin E. Turk 7, Michael B. Yaffe 8†, Søren Brunak 1,2†

and Rune Linding 4,8,11†

1 Center for Biological Sequence Analysis, Technical University of Denmark, Lyngby, Denmark.2 The Novo Nordisk Foundation Centre for Protein Research, University of Copenhagen, Copenhagen, Denmark3 European Molecular Biology Laboratory, Heidelberg, Germany4 Samuel Lunenfeld Research Institute, Mount Sinai Hospital, Toronto, Canada5 University of Rome, Tor Vergata, Rome, Italy6 University of Western Ontario, London, Ontario, Canada7 Department of Pharmacology, Yale University School of Medicine, New Haven, USA8 Center for Cancer Research, Massachusetts Institute of Technology, Cambridge, USA9 Structural Genomics Consortium, University of Oxford, UK10 Max-Delbruck-Centre for Molecular Medicine, Berlin, Germany11 Cellular & Molecular Logic Team, The Institute of Cancer Research, London, UK∗ These authors contributed equally to this work† To whom correspondence should be addressed; E-mail: [email protected], [email protected] and [email protected]

Page 3: Supplementary Materials for - Rune Linding ·  Supplementary Materials for Linear Motif Atlas for Phosphorylation-Dependent Signaling

Linear motif atlas for phosphorylation-dependent signaling 2

Data organization

In vivo sites and in vitro peptide data are

mapped onto phylogenetic domain trees

Dataset compilation

Positive and negative examples are compiled

for each domain or group of similar domains

Redundancy reduction

Near identical sites and sites located within

similar proteins are removed

Data partitioning

The non-redundant datasets are partitioned

into separate training, test and validation sets

Training of ANNs

A cross-validated ensemble of ANNs is

trained on each of the partitioned datasets

Evaluation of PSSMs

The PSSMs from in vitro experiments are

benchmarked on the non-redundant sites

Score calibration

The benchmark results are used to make the

scores from different classifiers comparable

Selection of classifiers

The subset of the classifiers is selected that

gives the best overall predictive power

Figure S1: Overview of the NetPhorest pipeline. The pipeline consists of a combination of Python, Perl and C-code that takes as inputa set of manually organized data sets and phylogenetic domain trees. Most of the steps in the flow chart are shown in more detail inFigures 1 and 2.

Page 4: Supplementary Materials for - Rune Linding ·  Supplementary Materials for Linear Motif Atlas for Phosphorylation-Dependent Signaling

Linear motif atlas for phosphorylation-dependent signaling 3

B

A

PR

PK

CLIK

1L C

LIK1

SgK

071 KIS

TTK

NE

K10

NE

K6

NE

K7

NE

K2

NE

K11

NE

K4

NE

K3

NE

K5

NE

K1

NE

K8

NE

K9

MP

SK

1

GA

K

AA

K1

BIK

E

HR

I

GC

N2 1

EIF

2A

K2

PE

K MY

T1

Wee1

Wee1B

RN

AseL

IRE2

IRE1

CK2alpha

CK2a2

CDK10

PITS

LRE

CDK9

CHED

CR

K7

CD

K4

CD

K6

CD

K5

CD

K1

CD

K2

CD

K3

PC

TA

IRE

1

PC

TA

IRE

2

PC

TA

IRE

3

PFTA

IRE

1

PFTA

IRE

2

CD

K7

CD

K8

CD

K11

CC

RK

MA

PK

6

MA

PK

4

MA

PK

15

MA

PK

3

MA

PK

1

NLK

MA

PK

7

MA

PK

12

MA

PK

13

MA

PK

14

MA

PK

11

MA

PK

8

MA

PK

10

MA

PK

9

CD

KL5

CD

KL4

CD

KL1

CD

KL3

CD

KL2

MO

K

ICK

M

AK

GS

K3alp

ha

GS

K3beta

PR

P4

DY

RK

1A

D

YR

K1B

D

YR

K4

DY

RK

3

DY

RK

2

HIP

K4

HIP

K2

HIP

K1

HIP

K3

CLK

2

CLK

3

CLK

1

CLK

4

SR

PK

1

MS

SK

1

SR

PK

2

a

h

p

l a

K

K

M

a

C CaM

KK

beta

T

LK

1

TL

K2

a

h

p

l a

K

K

I IK

Kbeta

IK

Kepsilo

n

TB

K1

ULK3 ULK2

ULK1

Fused

ULK4

CHK2

PKD3 PKD1

PKD2

MSK2 2 MSK1 2 RSK4 2

RSK3 2 RSK2 2 RSK1 2

Mnk1 Mnk2

MAPKAPK5 MAPKAPK3 MAPKAPK2

PHKg2 PHKg1 CASK

CaMKIIalpha

CaMKIIdelta

CaMKIIbeta

CaMKIIgamma

DCAMKL3

DCAMKL2

DCAMKL1

PSKH2

PSKH1

VACAMKL

CaMKIV

CaMKIbeta

CaMKIgamma

CaMKIalpha

CaMKIdelta

STK33

DAPK3 DAPK1

DAPK2

DRAK2 DRAK1

TTN

smMLCK skMLCK SgK085

caMLCK

Trad Trio

Obscn 1

SPEG 1

SPEG 2

Obscn 2

HUNK MELK

MARK1 MARK2

MARK3 MARK4

QSK SIK QIK

NuaK1 NuaK2

AMPKa1 AMPKa2 BRSK2 BRSK1

LKB1 CHK1

NIM1 SNRK TSSK4 TSSK2 TSSK1 SSTK TSSK3

PASK Pim2 Pim3 Pim1

SgK495 Trb3 Trb1 Trb2

Auro

raA

A

uro

raC

A

uro

raB

P

LK

4

PLK

1

PLK

3

PLK

2

PD

K1

PK

Abeta

PK

Aalp

ha

PK

Agam

ma

PR

KY

PR

KX

P

KG

1cG

KI

PK

G2cG

KII

RSK1

RSK2

RSK3

RSK4

p70S

6Kb

p70S

6K

MSK2

1

RSK5 P

KBal

pha

PKBbe

ta

PKBga

mm

a SG

K2

SG

K3

SG

K1

PKN3

PKN1

PKN2

PKCiota

PKCze

ta

PKCde

lta

PKCth

eta

PKCet

a

PKCepsi

lon

PKCalpha

PKCbeta

PKCgam

ma

GR

K2

GR

K3

GR

K6

GR

K4

GR

K5

GR

K7

GR

K1

LA

TS

2

LA

TS

1

CR

IK

ND

R1

ND

R2

RO

CK

1

RO

CK

2

DM

PK

1

DM

PK

2

MR

CK

a

MR

CK

b

MA

ST

L

MA

ST

1

MA

ST

4

MA

ST

2

MA

ST

3

SgK

494

YA

NK

3

YA

NK

1

YA

NK

2

RS

KL1

RS

KL2

MAP4K1 MAP4K2 MAP4K5 KHS2

LOK SLK

MYO3A MYO3B NRK

MAP4K4 MAP4K6

TNIK

MST4 MST3

YSK1

MST1 MST2

TAO1 TAO2

TAO3

PAK1 PAK3

PAK2 PAK6

PAK4 PAK5

STLK3 OSR1

STLK6 STLK5

MAP2K5

MAP2K2

MAP2K1

MAP2K7

MAP2K4

MAP2K3

MAP2K6

MAP3K8

MAP3K14

GCN2 2

MAP3K4

MAP3K6

MAP3K7

MAP3K5

MAP3K1

MAP3K

8 2

MAP3K

2

MAP3K

3

TB

CK

C

DC

7

PIN

K1

SgK

269

SgK

223

Wnk3

Wnk1

Wn

k2

W

nk4

2

P

B

R

N

NR

BP

1

SC

YL2

SC

YL3

SC

YL1

PIK

3R

4

Slo

b

SgK

307

SgK

424

MO

S

SgK

496

PB

K

HS

ER

A

NP

a

AN

Pb

CY

GD

C

YG

F

ILK

BM

PR

1A

B

MP

R1B

A

LK

4

TG

FbR

1

ALK

7

ALK

2

ALK

1

TG

FbR

2

AC

TR

2B

A

CT

R2

MIS

R2

BM

PR

2

TE

SK

1

TE

SK

2

LIM

K2

LIM

K1

MAP3K

7 2

HH

498

ZAK

MAP3K

10

MLK

4 M

AP3K

11

MAP3K

9 DLK

M

AP3K

13

AR

AF

RAF1

BR

AF

KS

R1

KS

R2

LR

RK

1

LR

RK

2

RIP

K1

RIP

K3

RIP

K2

AN

KR

D3

SgK

288

MLK

L

IRA

K4

IRA

K2

IRA

K3

IRA

K1

Csk

CTK

Fer

Fes

FRK

BLK

Lck

HCK

Lyn

Fgr

Src

Yes Fyn

Brk

Srm

Abl2

Abl

ITK

Tec

TXK

BMX

BTK

FAK

PYK2

Syk

ZA

P70

EphA1

EphA2

EphA8

EphA5

EphA3 EphA4

EphA6

EphA7 EphB2

EphB1 EphB3

EphB4

Eph

B6

Eph

A10

Tie2

Tie1

PDGFRalpha

PDGFRbeta

FLT3

CSF1R

Kit

Ret FGFR4

FGFR2

FGFR3 FGFR1

FLT4

KDR

FLT1

Ros LTK ALK IRR InsR IGF1R DDR2 DDR1

Musk TRKA TRKC TRKB

ROR2 ROR1

RYK

Sky Axl Mer

Met Ron

TNK2 TNK1 Tyk2 JAK1 JAK2 JAK3 ErbB3 ErbB4 ErbB2 EGFR

CCK4 Lmr3 Lmr2 Lmr1 SuRTK106

Tyk2 2 JAK1 2 JAK3 2 JAK2 2

SB

K

SgK

069

SgK

110

SgK

493

Haspin

S

gK

196

SgK

396

BU

B1

BU

BR

1

CK

1gam

ma3

CK

1gam

ma1

CK

1gam

ma2

CK

1delta

C

K1epsilo

n

CK

1alp

ha2

CK

1alp

ha

TT

BK

2

TT

BK

1

VR

K3

VR

K1

VR

K2

RIOK2

RIOK3

RIOK1

TIF1beta

TIF1alpha

TIF1gamma

BrdT

Brd4

Brd2

Brd3

ADCK2

ADCK4

ADCK3

ADCK5

ADCK1

PDHK2

PDHK4

BCKD

K

PDHK1

PDHK3

DNAPK

SMG

1

TRRAP

mTO

R

ATM

ATR

AlphaK

3

Chak2

Chak1

eEF2K

AlphaK

2

AlphaK

1

Artificial neural network (ANN)

Position-specific scoring matrix (PSSM)

C

APPL2

APPL

SHC2

SHC3

SHC1

SHC4

FRS3

FRS2

DOK2

DOK3

DOK1

DOK6

DOK4

DOK5

DOK7

IRS1

IRS2

IRS4

PTB domains

SH2 domains

Kinase domains

50 sites

25 sites

50 sites

HSH

2D

SH

2D2A

2

SH2D

4A

SH2D

4B

ZAP70

2 Syk

2 Tyk

2 JAK1

JAK2

JAK3

GR

B10

GR

B14

GR

B7

ZA

P70 1

Syk

1

CTK

Csk

CB

LB

CB

L

CB

LC

ST

AT

4

ST

AT

1

ST

AT

3

ST

AT

2

ST

AT

5B

ST

AT

5A

ST

AT

6

SH

2B

2

SH

2B1

SH

2B

3

CH

N2

CHN1

SHC2

SHC3

SHC4

SHC1

Fer

Fes

SH2D2A

SH2D3C

BCAR3

RA

SA

1 2

R

AS

A1 1

P

LC

G1 1

P

LC

G2 1

SHB

SHF

SHD SHE

Tec

TXK

BM

X

BTK

ITK

LCP2

MIS

T

SH

2D6

BLN

K

INP

P5D

IN

PP

L1

SH

2D

1B

S

H2D

1A

P

LC

G1 2

PLC

G2 2

FR

K

SU

PT

6H

BR

DG

1

SH

3B

P2

CR

K C

RK

L

TN

S3

TN

S1

TE

NC

1

TN

S4

RIN

3

RIN

2

RIN

1

Brk

Srm

FRK 2

SH2D5

GRAP

GRB2

GRAP2

Abl2

Abl

SLAP2

SLAP

Src

Yes

Fyn

Fgr

HCK

Lyn

Lck

BLK

VAV1 V

AV3

VAV2

PTP

N11 1

PTP

N6 1

PTP

N6 2

PTP

N11 2

NC

K1

NC

K2

PIK

3R

3 2

PIK

3R

1 2

PIK

3R

2 2

PIK

3R

3 1

PIK

3R

2 1

PIK

3R

1 1

SOCS5 SOCS4

SOCS6 CISH SOCS1 SOCS3 SO

CS7 SOCS2

DAPP1

Figure S2: Phosphorylation data mapped onto domain trees. The number of phosphorylation sites annotated to be substrates ofindividual (orange bars) or groups (orange pies, centered on the group in question) of kinases (A), SH2 domains (B) and PTB domains(C) are shown on the respective phylogenetic domain trees. This data was used to develop artificial neural networks (ANNs). The bluebars show NetPhorest collection of position-specific scoring matrices (PSSMs) based on in vitro assays. NetPhorest also contains ANNsfor the WW domain of PIN1 and PSSMs for 14-3-3 and BRCT domains, all of which bind phosphorylated serines and threonines (notshown).

Page 5: Supplementary Materials for - Rune Linding ·  Supplementary Materials for Linear Motif Atlas for Phosphorylation-Dependent Signaling

Linear motif atlas for phosphorylation-dependent signaling 4

A

PR

PK

CLIK

1L C

LIK1

SgK

071 TTK

KIS

NE

K10

NE

K7

NE

K6

NE

K2

NE

K11

NE

K4

NE

K3

NE

K5

NE

K1

NE

K8

NE

K9

MP

SK

1

GA

K

BIK

E

AA

K1

HR

I

GC

N2 1

EIF

2A

K2

PE

K MY

T1

Wee1

Wee1B

RN

AseL

IRE2

IRE1

CK2alpha

CK2a2

CDK10

PITS

LRE

CDK9

CRK7

CH

ED

CD

K6

CD

K4

CD

K5

CD

K1

CD

K3

CD

K2

PC

TA

IRE

1

PC

TA

IRE

2

PC

TA

IRE

3

PFTA

IRE

1

PFTA

IRE

2

CD

K7

CD

K11

CD

K8

CC

RK

MA

PK

6

MA

PK

4

MA

PK

15

MA

PK

1

MA

PK

3

MA

PK

7

NLK

MA

PK

13

MA

PK

12

MA

PK

14

MA

PK

11

MA

PK

10

MA

PK

8

MA

PK

9

CD

KL5

CD

KL1

CD

KL4

CD

KL2

CD

KL3

MO

K

ICK

M

AK

GS

K3beta

GS

K3alp

ha

PR

P4

DY

RK

1B

D

YR

K1A

D

YR

K4

DY

RK

2

DY

RK

3

HIP

K4

HIP

K2

HIP

K1

HIP

K3

CLK

3

CLK

2

CLK

1

CLK

4

SR

PK

1

SR

PK

2

MS

SK

1

a

h

p

l a

K

K

M

a

C

Ca

MK

Kb

eta

T

LK

2

TL

K1

a

t e

b

K

K

I a

h

p

l a

K

K

I n

o

l i s

p

e

K

K

I T

BK

1

ULK3 ULK2

ULK1

Fused

ULK4

CHK2

PKD1 PKD3

PKD2

MSK1 2 MSK2 2 RSK4 2

RSK2 2 RSK3 2 RSK1 2

Mnk1 Mnk2

MAPKAPK5 MAPKAPK3 MAPKAPK2

PHKg2 PHKg1 CASK

CaMKIIdelta

CaMKIIalpha

CaMKIIbeta

CaMKIIgamma

DCAMKL3

DCAMKL2

DCAMKL1

PSKH1

PSKH2

VACAMKL

CaMKIV

CaMKIbeta

CaMKIgamma

CaMKIalpha

CaMKIdelta

STK33

DAPK3 DAPK1

DAPK2

DRAK1 DRAK2

TTN

smMLCK skMLCK

caMLCK SgK085

Trio

Trad

Obscn 1

SPEG 1

SPEG 2

Obscn 2

HUNK MELK

MARK1 MARK2

MARK3 MARK4

QSK QIK SIK

NuaK2 NuaK1

AMPKa2 AMPKa1 BRSK1 BRSK2

LKB1 CHK1

SNRK NIM1 TSSK4 TSSK2 TSSK1 TSSK3 SSTK

PASK Pim2 Pim3 Pim1

SgK495 Trb3 Trb1 Trb2

Auro

raA

A

uro

raC

A

uro

raB

P

LK

4

PLK

1

PLK

3

PLK

2

PD

K1

PK

Abeta

PK

Aalp

ha

PK

Agam

ma

PR

KY

PR

KX

P

KG

1cG

KI

PK

G2cG

KII

RSK2

RSK1

RSK3

RSK4

p70S

6K

p70S

6Kb

MSK2

1

RSK5 P

KBal

pha

PKBbe

ta

PKBga

mm

a SG

K2

SG

K1

SG

K3

PKN3

PKN2

PKN1

PKCiota

PKCze

ta

PKCde

lta

PKCth

eta

PKCet

a

PKCepsi

lon

PKCalpha

PKCbeta

PKCgam

ma

GR

K3

GR

K2

GR

K6

GR

K4

GR

K5

GR

K1

GR

K7

LA

TS

1

LA

TS

2

CR

IK

ND

R2

ND

R1

RO

CK

1

RO

CK

2

DM

PK

1

DM

PK

2

MR

CK

b

MR

CK

a

MA

ST

L

MA

ST

1

MA

ST

4

MA

ST

2

MA

ST

3

SgK

494

YA

NK

3

YA

NK

2

YA

NK

1

RS

KL1

RS

KL2

MAP4K1 MAP4K2 KHS2 MAP4K5

SLK LOK

MYO3A MYO3B NRK

MAP4K6 MAP4K4

TNIK

MST3 MST4

YSK1

MST1 MST2

TAO1 TAO2

TAO3

PAK1 PAK3

PAK2 PAK6

PAK4 PAK5

OSR1 STLK3

STLK5 STLK6

MAP2K5

MAP2K1

MAP2K2

MAP2K7

MAP2K4

MAP2K6

MAP2K3

MAP3K8

MAP3K14

GCN2 2

MAP3K4

MAP3K6

MAP3K7

MAP3K5

MAP3K1

MAP3K

8 2

MAP3K

3

MAP3K

2

CD

C7

TB

CK

P

INK

1

SgK

269

SgK

223

Wnk3

Wnk1

Wn

k2

W

nk4

2

P

B

R

N

N

RB

P1

SC

YL2

SC

YL1

SC

YL3

PIK

3R

4

Slo

b

SgK

424

SgK

307

SgK

496

MO

S

PB

K

HS

ER

A

NP

b

AN

Pa

CY

GF

C

YG

D

ILK

BM

PR

1B

B

MP

R1A

T

GF

bR

1

ALK

4

ALK

7

ALK

2

ALK

1

TG

FbR

2

AC

TR

2

AC

TR

2B

BM

PR

2

MIS

R2

TE

SK

2

TE

SK

1

LIM

K1

LIM

K2

HH

498

MAP3K

7 2

ZAK

MAP3K

10

MLK

4 M

AP3K

9

MAP3K

11

DLK

M

AP3K

13

AR

AF

BR

AF

RAF1 KS

R2

KS

R1

LR

RK

2

LR

RK

1

RIP

K1

RIP

K3

RIP

K2

AN

KR

D3

SgK

288

MLK

L

IRA

K4

IRA

K2

IRA

K1

IRA

K3

CTK

Csk

Fer

Fes

FRK

BLK

Lck

HCK

Lyn

Fgr

Src

Yes Fyn

Srm

Brk

Abl2

Abl

ITK

Tec

TXK

BMX

BTK

FAK

PYK2

Syk

ZA

P70

EphA1

EphA2

EphA8

EphA3

EphA5 EphA4

EphA6

EphA7 EphB1

EphB2 EphB3

EphB4

Eph

A10

Eph

B6

Tie1

Tie2

PDGFRalpha

PDGFRbeta

FLT3

CSF1R

Kit

Ret FGFR4

FGFR3

FGFR2 FGFR1

FLT4

KDR

FLT1

Ros ALK LTK IRR InsR IGF1R DDR2 DDR1

Musk TRKA TRKB TRKC

ROR1 ROR2

RYK

Sky Axl Mer

Ron Met

TNK2 TNK1 Tyk2 JAK1 JAK3 JAK2 ErbB3 ErbB4 ErbB2 EGFR

CCK4 Lmr3 Lmr1 Lmr2 SuRTK106

Tyk2 2 JAK1 2 JAK2 2 JAK3 2

SB

K

SgK

069

SgK

110

SgK

493

Haspin

S

gK

196

SgK

396

BU

B1

BU

BR

1

CK

1gam

ma3

CK

1gam

ma1

CK

1gam

ma2

CK

1delta

n

o

l i s

p

e

1

K

C

a

h

p

l a

1

K

C

2

a

h

p

l a

1

K

C

TT

BK

1

TT

BK

2

VR

K3

VR

K2

VR

K1

RIOK2

RIOK3

RIOK1

TIF1beta

TIF1alpha

TIF1gamma

BrdT

Brd4

Brd2

Brd3

ADCK2

ADCK4

ADCK3

ADCK5

ADCK1

PDHK2

PDHK4

BCKD

K

PDHK1

PDHK3

DNAPK

SMG

1

TRRAP

mTO

R

ATM

ATR

AlphaK

3

Chak2

Chak1

eEF2K

AlphaK

2

AlphaK

1

Artificial neural network (ANN)

Position-specific scoring matrix (PSSM)

B

HSH

2D

SH

2D2A

2

SH2D

4A

SH2D

4B

ZAP70

2 Syk

2 Tyk

2 JAK1 JA

K2 JAK3

GR

B10

GR

B14

GR

B7 Z

AP

70 1

Syk

1

CTK

C

sk

ST

AT

4

ST

AT

1

ST

AT

3

ST

AT

2

ST

AT

5B

ST

AT

5A

SHB

SHF

SHD SHE

Tec

TXK

BM

X

BTK

ITK

LCP2

MIS

T

SH

2D6

BLN

K

INP

P5D

IN

PP

L1

SH

2D

1B

S

H2D

1A

P

LC

G1 2

P

LC

G2 2

FR

K

SU

PT

6H

BR

DG

1

SH

3B

P2

CR

K

SH

2B

2

SH

2B1 SH

2B

3

CH

N2

CHN1

SHC2

SHC3

SHC4

SHC1

Fer

Fes

SH2D2A

SH2D3C

BCAR3

RA

SA

1 2

R

AS

A1 1

P

LC

G1 1

P

LC

G2 1

CR

KL

TN

S3

TN

S1

TE

NC

1

TN

S4

RIN

3

RIN

2

RIN

1

Brk

Srm

FRK 2 SH2D5

GRAP

GRB2

GRAP2

Abl2

Abl

SLAP2

SLAP

ST

AT

6

CB

LB

CB

L

CB

LC

Src

Yes

Fyn

Fgr

HCK

Lyn

Lck

BLK

VAV1

VAV3

VAV2

PTP

N11 1

PTP

N6 1

PTP

N6 2

PTP

N11 2

NC

K1

NC

K2

PIK

3R

3 2

PIK

3R

1 2

PIK

3R

2 2

PIK

3R

3 1

PIK

3R

2 1

PIK

3R

1 1

SOCS5 SOCS4 SOCS6 CISH SOCS1 SOCS3 SO

CS7 SO

CS2

DAPP1

C

APPL2 APPL

SHC2

SHC3

SHC1

SHC4

FRS3

FRS2

DOK2

DOK3

DOK1

DOK6

DOK4

DOK5

DOK7

IRS1

IRS2

IRS4

Kinase domains

SH2 domains

PTB domains

Figure S3: Coverage of classifiers for targets of kinases, SH2 domains and PTB domains. Bars show the kinases, SH2 domains andPTB domains that are covered by artificial neural networks (orange) and position-specific scoring matrices (blue). Based on current data,the NetPhorest pipeline yields a non-redundant collection of 125 sequence-based classifiers that cover 179 of 518 kinase domains, 93 of118 SH2 domains and 8 of 18 phosphotyrosine-binding PTB domains (the figure is best viewed by zooming in a PDF viewer). NetPhorestalso contains ANNs for the WW domain of PIN1 and PSSMs for 14-3-3 and BRCT domains, all of which bind phosphorylated serines andthreonines (not shown). The trees for kinase and SH2 domains were obtained from Manning et al. 1 and Liu et al. 2, respectively. It shouldbe noted that some of the kinase domains, for example TRRAP, do not appear to have catalytic activity. None of these are covered bythe sequence-based classifiers.

Page 6: Supplementary Materials for - Rune Linding ·  Supplementary Materials for Linear Motif Atlas for Phosphorylation-Dependent Signaling

Linear motif atlas for phosphorylation-dependent signaling 5

0 0.2 0.4 0.6 0.8 1 ANN output score

0

0.2

0.4

0.6

0.8

1

Positiv

e p

redic

tive v

alu

e (

PP

V)

Binned output scores

Fitted sigmoid

Rescaled sigmoid

Score calibration for CK2 family kinases

χ2 = 0.281

Figure S4: Score calibration. To make the output scores from different sequence-based classifiers directly comparable, we calibratedthe scores through benchmarking on our compilation of phosphorylation sites. As an example, we show here the score calibration forthe CK2 group of kinases, which includes the CK2α1 and CK2α2 isoforms. The fraction of correct predictions (positive predictive value,PPV) is calculated within different score windows (running bins) on the validation set (orange curve). Subsequently, a sigmoid functionis fitted to these values, minimizing the sum of squared errors. The resulting calibration curves enables us to estimate the posteriorprobability that a site is phosphorylated by a particular kinase or group of kinases. However, because the fractions of positive exampleswithin the redundancy-reduced data sets do not reflect the corresponding prior probabilities, the calibration curves has to be rescaled(dotted curve, see Methods for details).

B

SH2 domains

A

0 0.2 0.4 0.6 0.8 1 Domain similarity

0

0.2

0.4

0.6

0.8

1

y

t i r a

l i

m

i s

e

t

a

r t s

b

u

S

Kinase domains

ATM/ATR

DNAPK

CDK1

CDK2/CDK3

JNK

p38

0 0.2 0.4 0.6 0.8 1 Domain similarity

0

0.2

0.4

0.6

0.8

1

y

t i r a

l i

m

i s

e

t

a

r t s

b

u

S

Figure S5: Correlation between domain similarity and substrate specificity. Calculating the similarity between two sequence motifsis non-trivial, in particular if the motifs are described by different types of sequence models (ANNs and PSSMs). To assess the similaritybetween motifs, we created 100,000 random phosphopeptides and scored each of them using all classifiers in NetPhorest. We therebyobtained a set of 100,000 dimensional score vectors, one for each sequence model, and defined the similarity between two motifs asthe cosine similarity between their score vectors. The corresponding domain similarities were calculated based on pairwise sequencealignments (self-normalized bit scores, see Methods for details) between all domains in the kinase and SH2 tree. For internal nodesthis was calculated by averaging over the pairwise sequence-similarity of each member in the group across all other nodes. Substratesimilarity was plotted against domain similarity for kinases (A) and SH2 domains (B). Note that, although significant, the correlationbetween domain and substrate similarity is far from perfect (R2 = 0.18, P < 10−12 and R2 = 0.31, P < 10−6 for kinases and SH2domains, respectively; only domain pairs with a self-normalized bitscore of 0.3 or higher are considered). Two groups of kinase domainsthat have diverged greatly in sequence but have retained their substrate specificity are highlighted in orange and blue, respectively.

Page 7: Supplementary Materials for - Rune Linding ·  Supplementary Materials for Linear Motif Atlas for Phosphorylation-Dependent Signaling

Linear motif atlas for phosphorylation-dependent signaling 6

A

BFixed residue

Fix

ed

po

sitio

n G A V I C P L M F Y W H K R Q N D E

– 5 – 4 – 3 – 2 – 1 +1 +2 +3 +4

T S

Y - A - Z - X - X - X - X - S/T - X - X - X - X - A - G - K - K(biotin) – 5 set

Y - A - X - Z - X - X - X - S/T - X - X - X - X - A - G - K - K(biotin) – 4 set Y - A - X - X - Z - X - X - S/T - X - X - X - X - A - G - K - K(biotin) – 3 set Y - A - X - X - X - Z - X - S/T - X - X - X - X - A - G - K - K(biotin) – 2 set Y - A - X - X - X - X - Z - S/T - X - X - X - X - A - G - K - K(biotin) – 1 set

Y - A - X - X - X - X - X - S/T - Z - X - X - X - A - G - K - K(biotin) +1 set Y - A - X - X - X - X - X - S/T - X - Z - X - X - A - G - K - K(biotin) +2 set Y - A - X - X - X - X - X - S/T - X - X - Z - X - A - G - K - K(biotin) +3 set Y - A - X - X - X - X - X - S/T - X - X - X - Z - A - G - K - K(biotin) +4 set

Fixed residue

Fix

ed

po

sitio

n G A V I C P L M F Y W H K R Q N D E

– 5 – 4 – 3 – 2 – 1 +1 +2 +3 +4

T S

Fixed residue

Fix

ed

po

sitio

n G A V I C P L M F Y W H K R Q N D E

– 5 – 4 – 3 – 2 – 1 +1 +2 +3 +4

T S

Fixed residue

Fix

ed

po

sitio

n G A V I C P L M F Y W H K R Q N D E

– 5 – 4 – 3 – 2 – 1 +1 +2 +3 +4

T S

Fixed residue

Fix

ed

po

sitio

n G A V I C P L M F Y W H K R Q N D E

– 5 – 4 – 3 – 2 – 1 +1 +2 +3 +4

T S

Fixed residue

Fix

ed

po

sitio

n G A V I C P L M F Y W H K R Q N D E

– 5 – 4 – 3 – 2 – 1 +1 +2 +3 +4

T S

Fixed residue

Fix

ed

po

sitio

n G A V I C P L M F Y W H K R Q N D E

– 5 – 4 – 3 – 2 – 1 +1 +2 +3 +4

T S

Fixed residue

Fix

ed

po

sitio

n G A V I C P L M F Y W H K R Q N D E

– 5 – 4 – 3 – 2 – 1 +1 +2 +3 +4

T S

p T p Y p T p Y

Clk2

p T p Y p T p Y

Clk3

p T p Y p T p Y

LKB1

p T p Y p T p Y

MST1

p T p Y p T p Y

MST4

p T p Y p T p Y

S6K1

p T p Y p T p Y

TLK1

p T p Y p T p Y

TNIK Figure S6: Kinase matrices from Positional Scanning Peptide Libraries (PSPL). Protein kinase phosphorylation motifs for Clk2, Clk3,LKB1, Mst1, Mst4, S6K1, Tlk1 and TNIK were determined using arrayed positional scanning peptide libraries as previously described 3.Briefly, we used a series of biotinylated peptides in which each of nine positions surrounding a central phosphorylation site were system-atically substituted with each of the twenty proteogenic amino acids. This set of peptides was subjected in parallel to radiolabel kinaseassays followed by capture on streptavidin membranes. Membranes were washed and exposed to a phosphor storage screen.

Page 8: Supplementary Materials for - Rune Linding ·  Supplementary Materials for Linear Motif Atlas for Phosphorylation-Dependent Signaling

Linear motif atlas for phosphorylation-dependent signaling 7

AMPK_group KIN PSSM

-7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 70.0

2.0

4.0

bit

s

N

HR ILM

R TS

RL

C

ATM_ATR_group KIN ANN

-7 -6 -5 -4 -3 -2 -1 0+

1+

2+

3+

4+

5+

6+

7

0.0

1.5

3.0

bit

s

N

ILS

QES

DPQ

TESG

SESLT

SYA

QETSG

QVDS

STLES

C

Abl_group KIN ANN

-7 -6 -5 -4 -3 -2 -1 0+

1+

2+

3+

4+

5+

6+

7

0.0

2.0

4.0

bit

s

N

PGPSEAGN

AQDPEGS

GP

VYASTQ

IVP

SV

ASP

C

AuroraA KIN ANN

-7 -6 -5 -4 -3 -2 -1 0+

1+

2+

3+

4+

5+

6+

7

0.0

1.5

3.0

bit

s

N

LRVPKQ

LEASR

KESGPQN

EKPQLS

FKLQSPR

GQCAVKLR

IRT

SKGVAR

QTVKLP

IHSAEVP

GFENVLP

KQA

KNTEPS

LPQDEAR

C

AuroraC_AuroraB_group KIN ANN

-7 -6 -5 -4 -3 -2 -1 0+

1+

2+

3+

4+

5+

6+

7

0.0

1.5

3.0b

its

N

ELNPRVYISTK

NPQMSTK

GEKQS

QKVGA

YQPHTKAR

TLKR

NSVLEI

KR

TSNLS

YLKGCNSAIV

IGWKPDRVAS

VSNMIGDAREK

IPSTAE

SEPLRPA

C

CDK1 KIN ANN

-7 -6 -5 -4 -3 -2 -1 0+

1+

2+

3+

4+

5+

6+

7

0.0

1.5

3.0

bit

s

N

S PS

PSPGST

SGHKLEP

QSKV

SKR

KASLP

C

CDK2_CDK3_group KIN ANN

-7 -6 -5 -4 -3 -2 -1 0+

1+

2+

3+

4+

5+

6+

7

0.0

1.5

3.0

bit

s

N

LRGNPS

PGVSPP

KGS

RSV

AP

GELT

SHPLGVRS

TVAQHSRK

RQKS

LR

GAKD

C

CDK4_CDK6_group KIN ANN

-7 -6 -5 -4 -3 -2 -1 0+

1+

2+

3+

4+

5+

6+

7

0.0

1.5

3.0

bit

s

N

HKPRSDM

IMPSVR

KHPGSD

KI

MHQFEA

RAQPL

HYP

GFEALS

TSPT

RPLKIFA

IHPSTAR

LIHAGV

KIHREAS

KQREA

YRPLKI

Q

C

CDK5 KIN PSSM

-7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7

0.0

2.0

4.0

bit

s

N

TSPQ

RK

RHK

KHR

C

CDK7 KIN ANN

-7 -6 -5 -4 -3 -2 -1 0+

1+

2+

3+

4+

5+

6+

7

0.0

1.0

2.0

bit

s

N

LI

REDYG

LHVCI

RS

YVSRDAP

INQFVPK

KMGPSR

FQRDTVPA

GPFTLY

ST

IDHPLFNPQVYE

LPRDTVLNFQSAV

IYPTHTLR

IHGVW

C

CHK2 KIN PSSM

-7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7

0.0

2.0

4.0

bit

s

N

YWVFMIL

KR

AMSRQT

LNT

SMLFYI

HWCYRTS

C

CK1_group KIN PSSM -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7

0.0

2.0

4.0

bit

s

N

YETDS

GDSAT

AGT

S

C

CK2_group KIN ANN

-7 -6 -5 -4 -3 -2 -1 0+

1+

2+

3+

4+

5+

6+

7

0.0

1.5

3.0

bit

s

N

EEESDE

EDT

SGSED

SED

SDE

SDE

DE

DEE

C

CLK_group KIN PSSM

-7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7

0.0

2.0

4.0

bit

s

N

R HKR S

R

C

CaMKIIalpha_CaMKIIdelta_group KIN ANN

-7 -6 -5 -4 -3 -2 -1 0+

1+

2+

3+

4+

5+

6+

7

0.0

1.5

3.0

bit

s

N

GSL

KSRTV

KFYI

RL

VLRSE

TAKLPR

RLASQ

KTS

KQGLV

LERSD

KRDES

QSVEAL

LHVKS

IQAGLD

S

C

CaMKIIbeta_CaMKIIgamma_group KIN PSSM

-7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7

0.0

2.0

4.0

bit

s

N

YFK

KR

NM

QKLFQMT

SYLMVIF

TEISD

LMI

K

YKF

C

DMPK_group KIN PSSM

-7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7

0.0

2.0

4.0

bit

s

N

LRLR

RLT

SLV

C

DNAPK KIN ANN

-7 -6 -5 -4 -3 -2 -1 0+

1+

2+

3+

4+

5+

6+

7

0.0

1.5

3.0

bit

s

N

YNHGEMLVQS

MHALGFQST

PTVGS

QLTEA

NPGFT

QSATG

QDASLT

SLN

QQTGVAEDS

GNVASPQT

GHFKS

LQSGPD

STAFLNPGSTEQ

C

Page 9: Supplementary Materials for - Rune Linding ·  Supplementary Materials for Linear Motif Atlas for Phosphorylation-Dependent Signaling

Linear motif atlas for phosphorylation-dependent signaling 8

EGFR_group KIN PSSM -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7

0.0

2.0

4.0

bit

s

N

NPDE

PNDE

NDEP

TVI

DEYY

VLIF

FMYILVF

VP

C

EIF2AK2 KIN ANN

-7 -6 -5 -4 -3 -2 -1 0+

1+

2+

3+

4+

5+

6+

7

0.0

0.5

1.0

bit

s

N

DILMPS

IKMVE

GNPSL

ELMPD

FGIVS

EKLTP

EFLR

STEGPREFLRST

KRSTG

EGILR

DGRT

GRSVL

DEI

R

C

EphA7_EphA6_EphA4_EphA3_EphA5_group KIN PSSM

-7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7

0.0

2.0

4.0

bit

s

N

TYFIVDLEYC

YQADE

NIYDVE

FYLIVP

C

EphB4_EphB3_EphB1_EphB2_group KIN PSSM

-7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7

0.0

2.0

4.0

bit

s

N

EDNYD

E

C

FLT3_CSF1R_Kit_group KIN ANN

-7 -6 -5 -4 -3 -2 -1 0+

1+

2+

3+

4+

5+

6+

7

0.0

1.5

3.0b

its

N

IPESTD

LNHGIE

IGDAMS

ESTN

GEAND

HANSY

ILFEVNYK

FSIV

NDVY

IPFVKL

TRLAGD

HPQFDN

YTSRLCA

KPQSER

C

GRK_group KIN ANN

-7 -6 -5 -4 -3 -2 -1 0+

1+

2+

3+

4+

5+

6+

7

0.0

1.5

3.0

bit

s

N

DLQESD

STERD

TES

PEGVADT

SSTE

VDS

TDGVS

SSG

C

GSK3_group KIN ANN

-7 -6 -5 -4 -3 -2 -1 0+

1+

2+

3+

4+

5+

6+

7

0.0

1.5

3.0

bit

s

N

PSTVEG

SPEPS

QSPLSTAP

GLDSVPT

SKEALSVRP

SP

GSP

DRVPTS

HQTRSP

C

RCK_group KIN PSSM

-7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7

0.0

2.0

4.0

bit

s

N

RCKPR

QTSLRCP

ASTYRT

SRGTSAP

DRST

C

InsR_IGF1R_group KIN ANN

-7 -6 -5 -4 -3 -2 -1 0+

1+

2+

3+

4+

5+

6+

7

0.0

1.5

3.0

bit

sN

SGKE

YTKDASE

ER

KETGPND

GPQRSDYMS L

TSEY

SGPKLG

K

C

JAK2_JAK3_group KIN ANN

-7 -6 -5 -4 -3 -2 -1 0+

1+

2+

3+

4+

5+

6+

7

0.0

1.5

3.0

bit

s

N

ILDVAPS

HPSL

HGSALP

NPQEISLAT

HI

QDS

YSFGLKAD

PEDYV

YAL

PTVK

IMPETLV

FNPQRDVKS

KITVHEPFD

ITSP

EDTG

C

JNK_group KIN ANN

-7 -6 -5 -4 -3 -2 -1 0+

1+

2+

3+

4+

5+

6+

7

0.0

1.5

3.0

bit

s

N

PSA

S SDE

SALP

TAPSLT

SQAPEP

EAPTG

ASPS

C

LKB1 KIN PSSM -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7

0.0

2.0

4.0

bit

s

N

ASPYFW

MWYF

VLIMRT W

PT

C

Src_group KIN PSSM

-7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7

0.0

2.0

4.0

bit

s

N

DE

GDEELTVI YN

DEAQG

EYFVIL

C

MAP2K6_MAP2K3_MAP2K4_MAP2K7_group KIN ANN

-7 -6 -5 -4 -3 -2 -1 0+

1+

2+

3+

4+

5+

6+

7

0.0

1.0

2.0

bit

s

N

QRI

KLSTA

GNQRTCDVS

SRMLKFEAGTD

YVNMDQES

ISVGFTLM

LMEVAT

QAPKDGM

SYT

IGLMRDWPV

TVMDYA

DWKSTVA

MDPSWGTRV

YSRNMEACWT

KLPSVDYR

WEISYPDAR

C

p38_group KIN PSSM

-7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7

0.0

2.0

4.0

bit

s

N

IVLP

QTSP F

C

MAPK3_MAPK1_MAPK7_NLK_group KIN ANN

-7 -6 -5 -4 -3 -2 -1 0+

1+

2+

3+

4+

5+

6+

7

0.0

1.5

3.0

bit

s

N

SP

PS

ALSP

ALP

SLT

SPSP

SPL

SP

S

C

MAPKAPK_group KIN ANN

-7 -6 -5 -4 -3 -2 -1 0+

1+

2+

3+

4+

5+

6+

7

0.0

1.5

3.0

bit

s

N

KLPVRG

RSTVA

SPL

GKLEPARS

KLQSDAPTR

PVGTAQS

RTPELS

RVWPL

GRSFED

GEDPVLS

LEPQSELPS

RSRVASGP

C

MST_group KIN PSSM

-7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7

0.0

2.0

4.0

bit

s

N

SYW

HTQAWY

MRKGNT I

WLM

QNHRK

ATSKR

PKR

C

Page 10: Supplementary Materials for - Rune Linding ·  Supplementary Materials for Linear Motif Atlas for Phosphorylation-Dependent Signaling

Linear motif atlas for phosphorylation-dependent signaling 9

YSK_group KIN PSSM

-7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 70.0

2.0

4.0

bit

s

N

R CNRAHWY

MRNGTY

TWCLVM

TGKR

KR

HGKR

C

Met_group KIN ANN

-7 -6 -5 -4 -3 -2 -1 0+

1+

2+

3+

4+

5+

6+

7

0.0

1.5

3.0

bit

s

N

YRMES

GETDVA

LIHAD

SVL

NPSAQREGD

PSVADE

HLSTNEYT

PMHCLYDV

HNSVIP

INELPMV

QTNPD

YRNLKCQTAP

NTFS

SVYGT

C

NEK1_NEK5_NEK3_NEK4_NEK11_NEK2_group KIN PSSM

-7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7

0.0

2.0

4.0

bit

s

N

FW

YGWMLF

TSLVTSMI

SR

C

PAKA_group KIN PSSM

-7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7

0.0

2.0

4.0

bit

s

N

RRSWHKR

VPSTHR

RTS

WLY

C

PAKB_group KIN PSSM

-7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7

0.0

2.0

4.0b

its

N

WQSKR

HSYR

KRT

SSYMVI

W

VSYCA

YFWS

C

PDGFR_group KIN PSSM

-7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7

0.0

2.0

4.0

bit

s

N

TPSDE

TPSDE

TPSDE

TPSDEYE

YMFIV

PEYD

YLMFIV

E

C

PDK1 KIN ANN

-7 -6 -5 -4 -3 -2 -1 0+

1+

2+

3+

4+

5+

6+

7

0.0

1.5

3.0

bit

s

N

GLFPERAS

GNSYEP

QRSAEG

YVNKFTEASD

NSDAKLPRT

RMKGFQSEAT

QNMHDAKTFS

STPMLSYF

GFEDASTVC

ILNPFTVAG

IHVSPT

KINGEATPLTYDE

WFLNRTSAY

C

PKA_group KIN ANN

-7 -6 -5 -4 -3 -2 -1 0+

1+

2+

3+

4+

5+

6+

7

0.0

1.5

3.0

bit

s

N

PSKR

SKR

TS

C

PKB_group KIN PSSM

-7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7

0.0

2.0

4.0

bit

s

N

KR

KR

ST T

SMVLI

WF

NSGP

C

PKC_group KIN ANN

-7 -6 -5 -4 -3 -2 -1 0+

1+

2+

3+

4+

5+

6+

7

0.0

1.5

3.0

bit

s

N

SKR

R TS

KR

R

C

PKD_group KIN PSSM

-7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7

0.0

2.0

4.0

bit

s

N

AYPAK

TMSIVL

IALV

TSHRK

EQK

RGHLMK

CWTS

YWIFLMV

IAVM

MWYF

MWYVF

YWPMF

C

PKGcGK_group KIN ANN -7 -6 -5 -4 -3 -2 -1 0+

1+

2+

3+

4+

5+

6+

7

0.0

1.5

3.0

bit

s

N

PSVAGR

A IVAKR

ESLRVK

HSTVPQGAR

VGLKR

AI

KRVT

SKQSAR

GS

KPSGAQE

RGQVA

TPSK

PR

GLPDA

C

PLK1 KIN ANN

-7 -6 -5 -4 -3 -2 -1 0+

1+

2+

3+

4+

5+

6+

7

0.0

1.5

3.0

bit

s

N

LSELPES

VNTLES

LP

TALDSE

LESDT

SI

KSLP

KQALEVS

DNLI

TSPL

SDA

LTAKDS

C

Pim3_Pim1_group KIN PSSM

-7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7

0.0

2.0

4.0

bit

s

N

WHKR

RQHKR

QKNYRH T

S

C

Pim2 KIN PSSM

-7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7

0.0

2.0

4.0

bit

s

N

GPKR

KR

QHKR

YKNQRH T

SG

C

ROCK_group KIN ANN

-7 -6 -5 -4 -3 -2 -1 0+

1+

2+

3+

4+

5+

6+

7

0.0

1.0

2.0

bit

s

N

KIPEDLARS

KMNPQDGTRS

QSVIPAER

PFEVDAKGSR

YVTHFAQKR

LHGSVKYR

TPNKALSIYR

TS

KPTVYGLQS

GSEKQRAV

QPMIHGAVRS

NFRVGI

MS

WSMFEDAPGRV

EPQY

YWVSQLFMTGP

C

RSK_group KIN ANN

-7 -6 -5 -4 -3 -2 -1 0+

1+

2+

3+

4+

5+

6+

7

0.0

1.5

3.0

bit

s

N

HTGPRS

GRS

LSDYR

QERSLSWPGR

ILQRS

PMDLGR

T

STDLER

NESGP

GNPSDALR

GCVAIPS

YSQPARG

GEAPFS

RMLFADP

C

SGK_group KIN ANN

-7 -6 -5 -4 -3 -2 -1 0+

1+

2+

3+

4+

5+

6+

7

0.0

1.5

3.0

bit

s

N

ILQEGSDAR

ERTCGFQSPRG

QREDLPASRY

TRPHALS

GNQRWSCVL

TS

IWLNPSMKAV

LSVAGPDT

LEQTDGNRS

GLEQV

HVNPA

ILNPRYESG

C

Page 11: Supplementary Materials for - Rune Linding ·  Supplementary Materials for Linear Motif Atlas for Phosphorylation-Dependent Signaling

Linear motif atlas for phosphorylation-dependent signaling 10

SLK_group KIN PSSM

-7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7

0.0

2.0

4.0

bit

s

N

RKNQKHWR

WTFKHRYNYHRT I

FYLWM

GNHKR

HKR

HKR

C

Syk_group KIN ANN

-7 -6 -5 -4 -3 -2 -1 0+

1+

2+

3+

4+

5+

6+

7

0.0

1.5

3.0

bit

s

N

LSE

SPE

PED

LESTD

NQSED

GNSED

PEDYK

YQLDVE

NTED

MEALPV

NDLPEEDVP

C

ACTR2_ACTR2B_TGFbR2_group KIN PSSM

-7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7

0.0

2.0

4.0

bit

s

N

NSTR

QKRNSDE

SNDEKRT

SS

C

TLK_group KIN PSSM

-7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7

0.0

2.0

4.0

bit

s

N

ESD

GNTSDE

HNEDT

SRMNQ

DW W

C

MSN_group KIN PSSM

-7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7

0.0

2.0

4.0

bit

s

N

STRGYP

WYPHNGTC

WI

ML

TSKR

STNHPKR

TSNKR

C

Tec_group KIN ANN

-7 -6 -5 -4 -3 -2 -1 0+

1+

2+

3+

4+

5+

6+

7

0.0

1.5

3.0

bit

s

N

IKLQRSTE

SR

KNEVHQLS

HLPQRTVE

ILE

LQDA

IHPETVDLYY

TMLKIADV

KFPYINQET

MGL

QVLENPRT

IKQRFDNP

TE

YNLEDPRGA

C

mTOR KIN ANN

-7 -6 -5 -4 -3 -2 -1 0+

1+

2+

3+

4+

5+

6+

7

0.0

1.0

2.0

bit

s

N

LFPRS

GLFNQTP

IHMGPES

HGFETP

KGPDCVL

IFQTVYR

KFRTNS

TS

YTPERVSP

DSTAGR

FSKGD

ILGERT

TRLGAP

IGFSTAP

C

p70S6K_group KIN PSSM

-7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7

0.0

2.0

4.0

bit

s

N

TNHKR

RK

YSWKHR

SHLYRT

YTS

TNCSR

C

any_group 1433 PSSM

-7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7

0.0

2.0

4.0

bit

s

N

HKR T

SYFTSGP

C

any_group BRCT PSSM

-7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7

0.0

2.0

4.0

bit

s

N

TS

VYLVIF

KYLVIF

C

APPL_group PTB ANN

-7 -6 -5 -4 -3 -2 -1 0+

1+

2+

3+

4+

5+

6+

7

0.0

1.5

3.0

bit

s

N

KTYLPDVS

KDCQSVWYRL

HGFRASL

KNPQRVWSCHNH

AP

TQLKIGFASMYY

PNMLEARVF

NHGWALF

IGPFRAVL

C

FRS_group PTB ANN

-7 -6 -5 -4 -3 -2 -1 0+

1+

2+

3+

4+

5+

6+

7

0.0

1.5

3.0

bit

s

N

PTYS

YWVQCRLA

NFAISL

HQSCNK

HEAP

SELMGYY

SVMF

TMDANSLF

YVAL

C

SHC1_SHC2_SHC3_group PTB PSSM

-7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7

0.0

2.0

4.0

bit

s

N

NP YC

SHC4 PTB ANN

-7 -6 -5 -4 -3 -2 -1 0+

1+

2+

3+

4+

5+

6+

7

0.0

2.0

4.0

bit

s

N

VPTYS

KISFL

HSNK

EAP

VTEAMLYM

LRF

TLSF

L

C

any_group WW ANN

-7 -6 -5 -4 -3 -2 -1 0+

1+

2+

3+

4+

5+

6+

7

0.0

2.0

4.0

bit

s

N

S SPSPLT

SPPS

SLSP

C

Figure S7: Sequence logos for kinases and pS/pT-binding domains. Each logo shows the amino acid preferences at differentposition relative to the phosphorylation site. The height of the stack of symbols in each position is calculated to be the relative entropyfor this position, and the height the individual symbol within a stack is proportional to its frequency. The logos were constructed usingenoLOGOS 4. As new data become available, updated logos will be made available from http://netphorest.info.

Page 12: Supplementary Materials for - Rune Linding ·  Supplementary Materials for Linear Motif Atlas for Phosphorylation-Dependent Signaling

Linear motif atlas for phosphorylation-dependent signaling 11

Page 13: Supplementary Materials for - Rune Linding ·  Supplementary Materials for Linear Motif Atlas for Phosphorylation-Dependent Signaling

Linear motif atlas for phosphorylation-dependent signaling 12

Page 14: Supplementary Materials for - Rune Linding ·  Supplementary Materials for Linear Motif Atlas for Phosphorylation-Dependent Signaling

Linear motif atlas for phosphorylation-dependent signaling 13

Page 15: Supplementary Materials for - Rune Linding ·  Supplementary Materials for Linear Motif Atlas for Phosphorylation-Dependent Signaling

Linear motif atlas for phosphorylation-dependent signaling 14

Page 16: Supplementary Materials for - Rune Linding ·  Supplementary Materials for Linear Motif Atlas for Phosphorylation-Dependent Signaling

Linear motif atlas for phosphorylation-dependent signaling 15

Figure S8: Receiver output characteristic (ROC) curves for the NetPhorest classifiers. We benchmarked all classifiers for which atleast 12 positive examples were available after homology reduction. Each ROC curve shows the sensitivity as function of the rate of falsepositivies (1-specificity) for a particular classifier. The performance measures are based on an independent validation set that was notused for training or parameter optimization.

Page 17: Supplementary Materials for - Rune Linding ·  Supplementary Materials for Linear Motif Atlas for Phosphorylation-Dependent Signaling

Linear motif atlas for phosphorylation-dependent signaling 16

Table S1: The selected set of NetPhorest classifiers. The table lists the non-redundant set of classifiers currently incorporated in theNetPhorest atlas. Firstly, we list if the classifier is based on artificial neural networks (ANNs) or a position-specific scoring matrix (PSSM),whether it a classifier for a kinase (KIN) or phospho-binding domain (SH2, PTB, 14-3-3, BRCT or WW) and the types of phosphorylatedresidues it can recognize (AA). Secondly, we show the number of non-redundant positive sites (POS) used for benchmarking. Finally, wereport the performance as area under the receiver operating characteristic curve (AROC); the ROC curves are available in Figure S8 orat http://netphorest.info. Due to lack of data, PSSMs could in some cases not be benchmarked. These PSSMs we assumed tohave comparable performance to other PSSMs obtained from the same type of assay, and we thus estimate the performance based onthe PSSMs that could be benchmarked. For the ANNs, the AROC obtained using zero hidden neurons, that is a linear model, is shownin parenthesis.

Source Class AA POS AROC

14-3-3 PSSM 14-3-3 ST 113 0.894Abl family ANN KIN Y 44 0.652 (0.610)Abl family PSSM SH2 Y 187 0.696ACTR2/ACTR2B/TGFbR2 PSSM KIN SY - 0.830AMPK subfamily PSSM KIN ST 28 0.858APPL subfamily ANN PTB Y 16 0.957 (0.934)ATM/ATR ANN KIN ST 57 0.986 (0.985)AuroraA ANN KIN ST 26 0.809 (0.77)AuroraC/AuroraB ANN KIN ST 21 0.884 (0.881)BCAR3 ANN SH2 Y 92 0.736 (0.737)BLNK PSSM SH2 Y - 0.749BRCT PSSM BRCT ST 6 0.875BRDG1 ANN SH2 Y 85 0.870 (0.854)Brk ANN SH2 Y 126 0.815 (0.809)CaMKIIalpha/CaMKIIdelta ANN KIN ST 33 0.786 (0.770)CaMKIIbeta/CaMKIIgamma PSSM KIN ST - 0.830CBL subfamily ANN SH2 Y 123 0.698 (0.681)CDK1 ANN KIN ST 128 0.839 (0.82)CDK2/CDK3 ANN KIN ST 65 0.959 (0.959)CDK4/CDK6 ANN KIN ST 12 0.671 (0.810)CDK5 PSSM KIN ST 15 0.973CDK7 ANN KIN ST 14 0.664 (0.532)CHK2 PSSM KIN ST 15 0.756CISH ANN SH2 Y 16 0.807 (0.604)CK1 family PSSM KIN ST 63 0.723CK2 family ANN KIN ST 251 0.930 (0.921)CLK family PSSM KIN ST - 0.830CRK ANN SH2 Y 138 0.841 (0.823)CRKL ANN SH2 Y 135 0.822 (0.841)Csk ANN SH2 Y 65 0.866 (0.807)CTK ANN SH2 Y 53 0.691 (0.744)DAPP1 ANN SH2 Y 29 0.683 (0.694)DMPK family PSSM KIN ST - 0.830DNAPK ANN KIN ST 21 0.913 (0.918)EGFR family PSSM KIN Y 40 0.627EIF2AK2 ANN KIN ST 14 0.634 (0.560)

Page 18: Supplementary Materials for - Rune Linding ·  Supplementary Materials for Linear Motif Atlas for Phosphorylation-Dependent Signaling

Linear motif atlas for phosphorylation-dependent signaling 17

Source Class AA POS AROC

EphA7/EphA6/EphA4/EphA3/EphA5 PSSM KIN Y - 0.830EphB4/EphB3/EphB1/EphB2 PSSM KIN Y - 0.830Fes subfamily ANN SH2 Y 55 0.897 (0.846)FLT3/CSF1R/Kit ANN KIN Y 15 0.820 (0.541)FRK ANN SH2 Y 152 0.721 (0.758)FRS family ANN PTB Y 24 0.971 (0.939)GRB2 subfamily ANN SH2 Y 193 0.935 (0.923)GRB family ANN SH2 Y 224 0.683 (0.673)GRK subfamily ANN KIN ST 59 0.581 (0.598)GSK3 subfamily ANN KIN ST 73 0.860 (0.822)HSH2D/SH2D2A ANN SH2 Y 93 0.718 (0.713)INPP5D PSSM SH2 Y 65 0.655INPPL1 ANN SH2 Y 95 0.666 (0.667)InsR family ANN KIN Y 45 0.762JAK2/JAK3 ANN KIN Y 20 0.707 (0.661)JNK subfamily ANN KIN ST 49 0.918 (0.940)LCP2 ANN SH2 Y 59 0.730 (0.726)LKB1 PSSM KIN T 14 0.969MAP2K6/MAP2K3/MAP2K4/MAP2K7 ANN KIN STY 16 0.716 (0.662)MAPK3/MAPK1/MAPK7/NLK ANN KIN ST 143 0.869 (0.838)MAPKAPK family ANN KIN ST 28 0.801 (0.762)Met family ANN KIN Y 16 0.760 (0.601)MIST ANN SH2 Y 56 0.702 (0.683)MSN family PSSM KIN ST - 0.830MST family PSSM KIN ST - 0.830mTOR ANN KIN ST 13 0.667 (0.740)NCK family ANN SH2 Y 117 0.780 (0.785)NEK1/NEK5/NEK3/NEK4/NEK11/NEK2 PSSM KIN ST - 0.830p38 subfamily PSSM KIN ST 54 0.857p70S6K subfamily PSSM KIN ST - 0.830PAKA family PSSM KIN ST 32 0.850PAKB family PSSM KIN ST - 0.830PDGFR family PSSM KIN Y 23 0.673PDK1 ANN KIN ST 27 0.806 (0.773)PIK3R1 1 ANN SH2 Y 235 0.842 (0.818)PIK3R1 2 PSSM SH2 Y - 0.749PIK3R2 2 PSSM SH2 Y - 0.749PIK3R3 1/PIK3R2 1 ANN SH2 Y 140 0.888 (0.881)PIK3R3 2 PSSM SH2 Y - 0.749Pim2 PSSM KIN ST - 0.830Pim3/Pim1 PSSM KIN S - 0.830PKA family ANN KIN ST 282 0.899 (0.878)PKB family PSSM KIN ST 77 0.907PKC family ANN KIN ST 296 0.873 (0.867)PKD family PSSM KIN ST - 0.830PKGcGK family ANN KIN ST 35 0.761 (0.681)PLCG 1 subfamily PSSM SH2 Y 30 0.713PLCG 2 subfamily PSSM SH2 Y - 0.749

Page 19: Supplementary Materials for - Rune Linding ·  Supplementary Materials for Linear Motif Atlas for Phosphorylation-Dependent Signaling

Linear motif atlas for phosphorylation-dependent signaling 18

Source Class AA POS AROC

PLK1 ANN KIN ST 42 0.599 (0.498)PTPN11 1/PTPN6 1 ANN SH2 Y 90 0.809 (0.792)PTPN11 2/PTPN6 2 ANN SH2 Y 82 0.743 (0.703)RASA1 1 PSSM SH2 Y - 0.749RASA1 2 PSSM SH2 Y - 0.749RCK family PSSM KIN Y - 0.830ROCK family ANN KIN ST 21 0.792 (0.787)RSK family ANN KIN ST 32 0.725 (0.612)SGK family ANN KIN ST 18 0.896 (0.841)SH2B family ANN SH2 Y 184 0.727 (0.709)SH2D1A/SH2D1B ANN SH2 Y 190 0.808 (0.765)SH2D3C ANN SH2 Y 76 0.684 (0.652)SH2D6 PSSM SH2 Y - 0.749SH3BP2 PSSM SH2 Y 40 0.690SHB PSSM SH2 Y - 0.749SHC1/SHC2/SHC3 PSSM PTB Y 30 0.981SHC1/SHC4 ANN SH2 Y 112 0.739 (0.678)SHC2 PSSM SH2 Y - 0.749SHC3 ANN SH2 Y 58 0.636 (0.564)SHC4 ANN PTB Y 32 0.935 (0.968)SHD ANN SH2 Y 143 0.738 (0.748)SHF PSSM SH2 Y - 0.749SLK family PSSM KIN ST - 0.830SOCS2/SOCS7 ANN SH2 Y 96 0.675 (0.686)SOCS5/SOCS4 ANN SH2 Y 64 0.801 (0.708)Src family ANN SH2 Y 339 0.763Src family PSSM KIN Y 229 0.626 (0.743)Srm ANN SH2 Y 92 0.680 (0.64)SUPT6H ANN SH2 Y 101 0.698 (0.653)Syk 1 subfamily ANN SH2 Y 55 0.596 (0.581)Syk 2 subfamily ANN SH2 Y 54 0.829 (0.862)Syk family ANN KIN Y 48 0.705 (0.720)Tec family ANN KIN Y 25 0.700 (0.542)Tec family ANN SH2 Y 171 0.750 (0.716)TENC1/TNS3/TNS1 ANN SH2 Y 197 0.732 (0.689)TLK family PSSM KIN ST - 0.830TNS4 PSSM SH2 Y 47 0.794VAV1/VAV3 ANN SH2 Y 59 0.879 (0.869)VAV2 ANN SH2 Y 40 0.821 (0.816)WW ANN WW ST 119 0.884 (0.893)YSK family PSSM KIN ST - 0.830

Page 20: Supplementary Materials for - Rune Linding ·  Supplementary Materials for Linear Motif Atlas for Phosphorylation-Dependent Signaling

Linear motif atlas for phosphorylation-dependent signaling 19

Table S2: Benchmark of the NetPhorest method. NetPhorest was compared to four published methods for kinase-specific predictionof phosphorylation sites (Scansite 5, NetPhosK 6, GPS 7 and KinasePhos 8) and to the simple sequence patterns collected by the ELM 9,PROSITE 10 and HPRD 11 databases. GPS and KinasePhos were benchmarked exclusively on the phosphorylation sites that are dissimilarin sequence to those used for developing the method, whereas the pattern based methods were benchmarked on the full data set (seeMethods for details). First, we tested if the method performs significantly (P < 0.05) better than random; those which do not are markedwith ‘R’. Methods performing better than random were subsequently tested to see if they perform significantly poorer than NetPhorest (‘P’)or comparable to it (‘C’). No predictor from any of the tested methods performed significantly better than the corresponding NetPhorestpredictor. For each predictor from the other methods, the table lists the area under the receiver operating characteristic curve (AROC),the AROC of the corresponding NetPhorest classifier and the p-value for the comparison of the two. As Scansite 5 and NetPhosK 6 arepart of NetPhorest, some comparisons of classifiers from these methods will be self-comparisons; these are marked by an asterisk. Itwas impossible to compare with other published methods such as PREDIKIN 12 and PredPhospho 13, since these methods do not supportbatch predictions.

Name Class Method Result AROC AROC P-valueMethod NetPhorest

Abl KIN Scansite C 0.70 0.70 *AMPK family KIN Scansite C 0.79 0.79 *ATM KIN NetPhosK W 0.84 0.99 0ATM KIN Scansite W 0.95 0.99 0BLK/Lck/HCK/Lyn SH2 Scansite W 0.59 0.73 0CAMK family KIN Scansite C 0.79 0.81 0.23CaMKII family KIN NetPhosK W 0.54 0.79 0CaMKII family KIN Scansite C 0.81 0.81 *CDK1 KIN NetPhosK W 0.75 0.84 0CDK1 KIN Scansite C 0.85 0.85 *CDK5 KIN Scansite C 0.90 0.90 *CDK family KIN NetPhosK W 0.73 0.86 0CK1 family KIN Scansite C 0.83 0.83 *CK2 family KIN NetPhosK W 0.61 0.93 0CK2 family KIN Scansite C 0.92 0.93 0.32CRK family SH2 Scansite C 0.82 0.83 0.31DNAPK KIN NetPhosK C 0.88 0.91 0.24DNAPK KIN Scansite W 0.80 0.91 0.05EGFR family KIN NetPhosK R - 0.64 -EGFR KIN Scansite C 0.63 0.63 *Fgr/Fyn/Src/Yes SH2 Scansite W 0.66 0.73 0FLT3/CSF1R/Kit/PDGFR KIN Scansite C 0.65 0.65 *GRB2 family SH2 Scansite W 0.89 0.93 0.01GSK3 subfamily KIN NetPhosK W 0.65 0.86 0GSK3 subfamily KIN Scansite W 0.73 0.86 0INPPL1/INPP5D SH2 Scansite W 0.56 0.64 0.03InsR/IGF1R KIN NetPhosK C 0.56 0.76 1ITK SH2 Scansite W 0.76 0.85 0.01Lck/HCK/Lyn KIN Scansite C 0.59 0.59 0.53MAPK14 KIN Scansite C 0.75 0.79 0.2MAPK3 KIN Scansite W 0.79 0.91 0NCK family SH2 Scansite C 0.74 0.78 0.09

Page 21: Supplementary Materials for - Rune Linding ·  Supplementary Materials for Linear Motif Atlas for Phosphorylation-Dependent Signaling

Linear motif atlas for phosphorylation-dependent signaling 20

Name Class Method Result AROC AROC P-valueMethod NetPhorest

p38 subfamily KIN NetPhosK C 0.84 0.84 *p38 subfamily KIN Scansite C 0.78 0.84 0.11PIK3R 1 subfamily SH2 Scansite W 0.75 0.80 0.03PKA family KIN NetPhosK W 0.83 0.90 0PKA family KIN Scansite W 0.86 0.90 0PKB family KIN NetPhosK W 0.85 0.95 0PKB family KIN Scansite C 0.95 0.95 *PKCdelta/PKCtheta KIN Scansite C 0.61 0.64 0.39PKC family KIN NetPhosK W 0.76 0.87 0PKC family KIN Scansite W 0.74 0.87 0PKGcGK family KIN NetPhosK C 0.71 0.76 0.08RSK family KIN NetPhosK C 0.74 0.74 *SHC1/SHC4 SH2 Scansite R - 0.74 -Src family KIN NetPhosK C 0.59 0.62 0.37Src KIN Scansite C 0.56 0.60 0.19Tec family SH2 Scansite W 0.61 0.75 0

14-3-3 14-3-3 HPRD W 0.73 0.89 0Abl KIN GPS C 0.74 0.67 0.75Abl KIN HPRD R - 0.67 -Abl SH2 HPRD R - 0.70 -AGC group KIN HPRD R - 0.86 -AMPK subfamily KIN GPS C 0.77 0.86 0.2AMPK subfamily KIN HPRD C 0.66 0.78 0.07ATM KIN GPS W 0.85 0.99 0ATM KIN HPRD W 0.95 0.99 0.03ATM KIN KinasePhos C 0.91 0.99 0.06AuroraA KIN HPRD R - 0.81 -BRCT BRCT HPRD R - 0.88 -CaMKII family KIN GPS R - 0.79 -CaMKII family KIN HPRD W 0.74 0.79 0.05CDK1 KIN GPS W 0.7 0.84 0CDK1 KIN HPRD W 0.65 0.84 0CDK1 KIN KinasePhos W 0.7 0.84 0.02CDK4 KIN HPRD R - 0.82 -CDK5 KIN HPRD W 0.53 0.97 0CDK family KIN ELM W 0.67 0.86 0CDK family KIN HPRD W 0.67 0.86 0CDK family KIN KinasePhos C 0.76 0.87 0.12CDK family KIN Prosite W 0.67 0.86 0CK1 family KIN ELM W 0.58 0.72 0.01CK1 family KIN HPRD W 0.64 0.72 0.05CK1 family KIN Prosite W 0.58 0.72 0CK2 family KIN ELM W 0.71 0.93 0CK2 family KIN GPS W 0.72 0.93 0CK2 family KIN HPRD W 0.88 0.93 0CK2 family KIN KinasePhos R - 0.93 -CK2 family KIN Prosite W 0.82 0.93 0CRK family SH2 HPRD C 0.78 0.83 0.13

Page 22: Supplementary Materials for - Rune Linding ·  Supplementary Materials for Linear Motif Atlas for Phosphorylation-Dependent Signaling

Linear motif atlas for phosphorylation-dependent signaling 21

Name Class Method Result AROC AROC P-valueMethod NetPhorest

DNAPK KIN HPRD C 0.78 0.91 0.11EGFR family KIN HPRD R - 0.63 -EGFR KIN HPRD R - 0.59 -Fes SH2 HPRD C 0.77 0.83 0.09Fgr SH2 HPRD R - 0.65 -Fyn SH2 HPRD W 0.51 0.83 0GRB2 SH2 ELM C 0.87 0.88 0.45GRB family SH2 HPRD W 0.59 0.68 0GRK family KIN HPRD R - 0.58 -GSK3 subfamily KIN ELM W 0.73 0.86 0GSK3 subfamily KIN HPRD W 0.77 0.86 0GSK3 subfamily KIN Prosite W 0.73 0.86 0INPPL1 SH2 HPRD R - 0.67 -ITK SH2 HPRD R - 0.85 -JAK2/JAK3 KIN HPRD C 0.71 0.71 0.53JNK subfamily KIN HPRD R - 0.92 -LKB1 KIN HPRD R - 0.97 -MAP2K family KIN HPRD R - 0.81 -MAPK3 KIN HPRD R - 0.91 -MAPKAPK2 KIN HPRD R - 0.86 -MAPKAPK family KIN HPRD R - 0.80 -mTOR KIN HPRD R - 0.67 -NCK family SH2 HPRD R - 0.78 -PAK family KIN HPRD R - 0.78 -PDGFR family KIN HPRD R - 0.67 -PDK1 KIN HPRD R - 0.81 -PIKK family KIN ELM C 0.91 0.91 0.32PIKK family KIN Prosite C 0.91 0.91 0.35PKA family KIN ELM W 0.80 0.90 0PKA family KIN GPS W 0.77 0.9 0PKA family KIN HPRD W 0.86 0.90 0.01PKA family KIN KinasePhos W 0.72 0.9 0PKA family KIN Prosite W 0.66 0.90 0PKB family KIN ELM C 0.85 0.91 0.25PKB family KIN GPS R - 0.91 -PKB family KIN HPRD W 0.86 0.91 0.03PKB family KIN Prosite C 0.85 0.91 0.14PKCalpha KIN HPRD R - 0.82 -PKC family KIN GPS W 0.67 0.87 0PKC family KIN HPRD W 0.78 0.87 0PKC family KIN KinasePhos W 0.66 0.87 0PKC family KIN Prosite W 0.71 0.87 0PKGcGK family KIN HPRD R - 0.76 -PLCG 1 subfamily SH2 HPRD R - 0.71 -PLK1 KIN HPRD R - 0.60 -PTPN11 1 SH2 ELM W 0.54 0.78 0PTPN family SH2 HPRD W 0.63 0.76 0SHC1 SH2 HPRD R - 0.65 -

Page 23: Supplementary Materials for - Rune Linding ·  Supplementary Materials for Linear Motif Atlas for Phosphorylation-Dependent Signaling

Linear motif atlas for phosphorylation-dependent signaling 22

Name Class Method Result AROC AROC P-valueMethod NetPhorest

SHC family PTB HPRD W 0.90 0.97 0.02Src family KIN HPRD R - 0.62 -Src family SH2 HPRD W 0.56 0.76 0Src KIN HPRD C 0.58 0.60 0.37Src SH2 ELM W 0.61 0.74 0Syk 1 SH2 HPRD R - 0.62 -Syk 2 SH2 HPRD R - 0.83 -Syk KIN HPRD R - 0.67 -TK group KIN HPRD R - 0.79 -TNS family SH2 HPRD R - 0.70 -VAV family SH2 HPRD R - 0.84 -WW WW HPRD C 0.88 0.88 0.24

References

[1] G. Manning, D. B. Whyte, R. Martinez, T. Hunter, S. Sudarsanam, The protein kinase complement of thehuman genome. Science 298, 1912–1934 (2002).

[2] B. A. Liu, K. Jablonowski, M. Raina, M. Arce, T. Pawson, P. D. Nash, The human and mouse complementof SH2 domain proteins—establishing the boundaries of phosphotyrosine signaling. Mol. Cell 22, 851–868 (2006).

[3] J. E. Hutti, E. T. Rarrell, J. D. Chang, D. W. Abbott, P. Storz, A. Toker, L. C. Cantley, B. E. Turk, A rapidmethod for determining protein kinase phosphorylation specificity. Nat. Methods 1, 27–29 (2004).

[4] C. T. Workman, Y. Yin, D. L. Corcoran, T. Ideker, G. D. Stormo, P. V. Benos, enoLOGOS: a versatile webtool for energy normalized sequence logos. Nucleic Acids Res. 33, W389–W392 (2005).

[5] J. C. Obenauer, L. C. Cantley, M. B. Yaffe, Scansite 2.0: Proteome-wide prediction of cell signalinginteractions using short sequence motifs. Nucleic Acids Res. 31, 3635–3641 (2003).

[6] N. Blom, T. Sicheritz-Ponten, R. Gupta, S. Gammeltoft, S. Brunak, Prediction of post-translational gly-cosylation and phosphorylation of proteins from the amino acid sequence. Proteomics 4, 1633–1649(2004).

[7] Y. Xue, F. Zhou, M. Zhu, K. Ahmed, G. Chen, X. Yao, GPS: a comprehensive www server for phospho-rylation sites prediction. Nucleic Acids Res. 33, W184–W187 (2005).

[8] Y. H. Wong, T. Y. Lee, H. K. Liang, C. M. Huang, T. Y. Wang, Y. H. Yang, C. H. Chu, H. D. Huang, M. T.Ko, J. K. Hwang, KinasePhos 2.0: a web server for identifying protein kinase-specific phosphorylationsites based on sequences and coupling patterns. Nucleic Acids Res. 35, W588–W594 (2007).

[9] P. Puntervoll, R. Linding, C. Gemund, S. Chabanis-Davidson, M. Mattingsdal, S. Cameron, D. M. Martin,G. Ausiello, B. Brannetti, A. Costantini, F. Ferre, V. Maselli, A. Via, G. Cesareni, F. Diella, G. Superti-Furga, L. Wyrwicz, C. Ramu, C. McGuigan, R. Gudavalli, I. Letunic, P. Bork, L. Rychlewski, B. Kuster,M. Helmer-Citterich, W. N. Hunter, R. Aasland, T. J. Gibson, ELM server: a new resource for investigatingshort functional sites in modular eukaryotic proteins. Nucleic Acids Res. 31, 3625–3630 (2003).

[10] N. Hulo, A. Bairoch, V. Bulliard, L. Cerutti, E. De Castro, P. S. Langendijk-Genevaux, M. Pagni, C. J. A.Sigrist, The PROSITE database. Nucleic Acids Res. 34, D227–D230 (2006).

[11] R. Amanchy, B. Periaswamy, S. Mathivanan, R. Reddy, S. G. Tattikota, A. Pandey, A curated compendiumof phosphorylation motifs. Nat. Biotechnol. 25, 285–286 (2007).

[12] R. I. Brinkworth, R. A. Breinl, B. Kobe, Structural basis and prediction of substrate specificity in proteinserine/threonine kinases. Proc. Natl. Acad. Sci. U.S.A. 100, 74–79 (2003).

[13] J. H. Kim, J. Lee, B. Oh, K. Kimm, I. Koh, Prediction of phosphorylation sites using SVMs. Bioinformatics20, 3179–3184 (2004).