Analysis and visualization of protein-protein...
Transcript of Analysis and visualization of protein-protein...
1
Analysis and visualization of protein-protein interactions
Olga VitekAssistant Professor
Statistics and Computer Science
2
Outline
1. Protein-protein interactions
2. Using graph structures to study protein-protein interactions
3. Clustering of graphs
4. Evaluation of clusters
3
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info
!"#$%&$'"()%*"+,%-$..
• A cell is a smallest structural unit of an
organism that is capable of independent
functioning
• All cells have some common features
4
An Introduction to Bioinformatics Algorithms www.bioalgorithms.info
!"#$%&"#$%'()%*+,%-./0%/12(1/34'*5/(
TranslationTranscription
Replication
This model is known as the “central dogma”
5
Why should we study proteins?
● Proteins: large molecules made up of amino acids
◆ accomplish most of the function of the living cells■ by interacting (i.e. entering in physical contact)
with other molecules
◆ linear structures fold into 3-dimensional shapes■ the structure is used to accomplish the function
Ubiquitin-conjugating enzyme E2 G1
(PDB entry 2AWF)
6
Proteins accomplish function by forming complexes
● A protein complex is a group of tightly interacting proteins◆ also called functional module◆ protein interactions within
the complex help accomplish its function
● Example: exosome◆ a complex of 11 proteins◆ degrades RNA molecules◆ ring structure ensures the function
http://en.wikipedia.org/wiki/Exosome_complex
7
Discovery of the complex helps understanding its function
● Example: exosome◆ first discovered in yeast◆ helped discovering an
equivalent complex in humans◆ has clinical implications■ target of autoimmune
disease■ chemotherapies for cancer
block its activity
● Knowledge of protein complexes speeds up biological and clinical research
8
Complexity of a bacterial cell
Often study simpler “model” organisms to gain insight into the function of the cell
9
Modern technologies determine protein interactions on large scale● New terms◆ Proteome: all proteins that exist in an organism◆ Interactome: all protein-protein interactions
● New questions◆ Interactions of individual proteins◆ Network-wide patterns of interactions
● New challenges◆ Large, complex, noisy datasets◆ Computational approaches are key
10
BB II NN DD
DNA damage response and repairDNA metabolismDNA recombinationDNA replication
RNA localization and processing
autophagybuddingcarbohydrate metabolismcell cyclecell growth and/or maintenancecell organization and biogenesiscell shape and cell size controlgeneral metabolismmatingnucleolar and ribosome biogenesisprotein amino acid phosphorylation/desphosphorylationprotein biosynthesisprotein degradationprotein metabolism and modificationprotein transportsignal transductionsporulationstress responsetranscriptiontransport
Kss1Bck2
Ste7
Dig1
Ste12
Ste11
Bem3
Ylr154c
Rvb1
Aco1
Rvs167
Ded1
Gyp5Ecm29
Hom6
Ilv5
Lys12
Pho84
Pre10
Ubp7
Ura7
Met18
Ylr243w
Ubi4
Idh1
Apg12
Crm1
Fet3
Kap122
Kgd1
Met10
Pfk1
Rpn1
Car1
Hyp2
Ipp1
Mdh1
Cnb1
Cmp2
Cns1
Ecm10
Yhb1
Cof1
Crn1
Srv2
Hta1
Hhf1
Hir2
Htb1Kap114Kri1
Nap1
Nop4Ret1
Rpc82
Rpo31
Spt16
Rrp13
Ylr222c
Cct2
Cys4
Dig2
Pim1
Rpa135
Rpn6
Tec1
Yer093c
Ygl245w
Yjr072c
Yol078w
Lsm8
Apa1
Gar1Lsm2
Qcr2
Rpn12Rpn8
Rrp42
Smb1
Tif6
Ygl117wMig1
Mss116
Nop12
Nop2Brx1
Ckb1
Cox6
Fet4Nsa2
Nip7Nmd3
Nop1
Rrp1
Sik1
Nug1
Nsa1
Noc2
Ypl009c
Rpc19
Rpn5
Emp24
Kre6
Rpn9Rpt1
Rpt2
Sip2
Arc35Gal83
Idh2
Sec53Snf1
Snf4
Tcp1
Smt3
Cpr1
Gph1Pst2Rod1 Sip1
Yor267c
Spo12
Pse1
Tem1
Cdc15
Gcd11
Mcx1Sar1
Ybr281c
Yhr033w
Ynk1
Ubc6
Atp4
Gcn1
Los1
Pol5 Sec7
Uba1
Ybl004w
Ykl056c
Ypt1
Yju2
Cor1Ded81
Prp19
Dss4
Gdi1
Mrs6
Sec4
Cdc53
Por1
Skp1
Ela1
Loc1
Yra1Cic1
Glc7
Glc8
Mhp1
Reg1
Kap104
Ynl035c
Dim1
Enp1
Nab2
Hrp1Mrt4
Prp6
Hsl7
Rpp0
Ylr287c
Fyv14Krr1
Yjr041c
Rpf2
Kre33
Tsr1
Tif4631
Drs1
Erb1
Ydr131c
Yrb2
Yhr197w
Arf1
Arf2
Cmd1
Cmk2
Ede1
Myo2
Myo4
Myo5
Pgm2
Vas1
Vps13
Ils1
Sod1
Hch1
She3
Dmc1
Est1
Puf6
Rrp12
Cbf5
Dbp7
Hsh49
Pet56
Pdi1
Pwp1
Yer077cYjl109c
Ykl014c
Fpr1
Hom3Gsp1
Gsp2
Srm1
Rna1
Mog1
Rse1Ist3Bud13
Car2
Ssz1
Ydr341c
Sah1
Lsm7 Pat1
Ade5
Dhh1
Lsm4Prp24
Met30
Rub1
Sis1
Tef4
Met4
Met31Nop13
Ebp2
Imd2Imd3
Rrp5
Rad10
Rad1
Sod2
Ubc1
Adk1
Yhr115cYnl311c
Ypt6Ric1
Cdc11
Cdc3Cdc10
Cdc12
Tif4632Ctf13
Hrt1
Rtt101
Sec27
Guf1
Yll034c
Tps1
Ubp15
Rpc40
Rpa190
Sgn1
Ygr250c
Spt2
Cka1
Cka2
Ygr052w
His4
Gpi16
Cdc42
Bem4
San1
Gbp2 Hpr1Sub2Thp2
Mft1
Rlr1
Rho2
Ubc12
Ula1
Clb2
Cdc28
Dia2
Cks1
Hat1
Vps21
Ypt10Ypt52
Ypt32
Ypt31
Ypt7
Gpa2
Ymr029c
Grr1
Pfk2
Pfs2
Rpt3Ygl004c
Rpn10
Snp1
Bcy1
Tpk3
Tpk1
Sti1
Ypt53
Cce1Rnr3
Cmk1
Vph2
Cpr6
Qns1
Trr1
Caf120
Ptc4
Ydr247w
Gin4
Ydr071c
Mas1
Mas2
Gis4
Prb1
Sxm1
Ecm1
Lhp1
Yjl149w
Cdc13
Fun11
Clu1
Tif2
Gnd1
Kip3
Hef3
Inp52Sap190
Nta1
Hsp104
Vma4
Tfp1
Mge1
Ptc5
Yal027w
Spo13
Ydr219c
Yhr122wYdr306c
Yku80
Msu1
Ylr352w
Yol128c
Oye2
Bur2
Cln1
Clb5 Clb3 Yer138c
Ydr170wa
Nup85
Hem15Ymr209c Cbp3
Nup84
Seh1
Pre1Pup2
Pup3
Ylr199c
Pre5
Pre2
Pre3
Pre6
Pre7Scl1
Ykl206c
Pre9
Pre8 Rsp5
Rvs161
Bop2
Sgt1Ufo1
Cdc4
Yil007c
Yta6
Ygr086c
Ypl004c
Top2
Cdh1
Cct3
Cat5
Yjl068c
Gyp6
Trs120
Trs130
Rpb5
Rpa43
Ybr203w
Yck1
Yck2
Ypr015c
Bud20
Ydr101c
Prp43
Rsm25
Afg2
Fyv4
Ypl013c
Rsm24
Yjl122wYbl044w
Nog1
Ydr036c
Htb2
Mam33
Ygl068w
Cdc23Swm1
Cdc5
Mcd1 Smc1
Gsy1
Fpr3
Sds22
Fin1
Gpa1
Hex3
Ssk1
Hrr25
Prp3
Prp4
Ymr226c
Apc1
Nop58
Ygr145w
Kap95
Cbp2
She4
Mlc1
Nuf1
Hul5Myo3
Faa4
Rnq1
Ydr279wAdo1
Lsm1
Ygr173w
Ydr152wYkl078w
Ylr097c
Adh2
Imd4
Pet127
Tis11
Ytm1
Yrb1
Gal3
Bbc1
Cdc39
Mms1
Pma2
Yar009c
Ygp1Ylr035ca
Ylr106c
Adr1
Mkt1
Dur1Ecm33
Hsp12
Sec26
Qri8
Ahp1
Dop1
Sry1
Rvb2
Npl3
Pub1
Fun12
Ste23
Ssf1
Aac3
Kre31
Rli1
Srp1
Sup45
Ygr090w
Tpt1
Apc2
Mak5
Ynl116w
Dbf2
Mob1
Ess1
Tfg1Rpo21
Tom1Spt5
Hxt6
Rpb3
Ynl253wIsu1Nfs1
Lys1
Fol2
Tal1
Cct5Cct6
Cft1
Hgh1
Mer1
Lhs1
Rpn3 Rpn7
Rpt5
Nas6
Arp2
Rpt4
Rpn11Cpr3
Cdc33
Dut1Ygr066c
Thi22
Tpk2Ybr028c
Yck3
Ygr154c
Ssn8 Sno2
Ygr111w
Yjl207c
Yil113w
Slt2
Ynl260c
Asi3
Ypl170w
Pma1
Ynl227c
Cdc14
Mcr1 Dpm1
Atp7 Atp5
Fur1
Hms1
Ydr453c
Atp3
Spe3
Aut2
Spc72
Ydr229w
Irr1Smc3
Faa1
Cyr1
Dbf20
Ala1
Egd2
Axl1Rpb10
Dbp8
Cpa2
Rnr1
Rnr2
Ydl086w
Pdc6
Pdc5
Ynl157w
Prs5
Dia4
Bud32
Grx4
Ykr038cYml036w
Isw2
Isw1
Kns1
Msg5
Fus3
Msh1
Pph22
Cdc55
Ppe1
Rts1
Tpd3
Ygr161c
Hta2Ygl121c
Rpb11
Tap42
Pro1
Arc1
Fum1
Rad51
Mlh1
Tos3
Ykr096w
Yak1
Dnm1
Vps1
Ylr270wYpl247c
Mpc54
Yhr112cPyc2
Ybl108w
Oye3
Ycr079w
Vid31
Cdc60
Pro2
Pyc1
Ypl110cYkl215c
Emg1
Yer030w
Ynl099c
Siw14
Ypk2Pet112
Ygr016w
Yel023c
Yor154w
Yor220w
Rad3
Dun1
Far1
Gpd1
Gpd2
Sen1Ste20
Sec6Ylr368w
Rad26
Ach1
Adh4
Bio3
Erg20
Yhr076w
Ymr318c
Rhr2
Ydr326c
Ras2Ras1
Acc1Gfa1
Isa2
Ngl2Rpa49
Rpc25Rpo26
Tbs1
Yfl042c
Rpc34
Sen15
Egd1Erg13
Erg6
Frs2
Gnd2
Grx1
Aat2
Afr1
Lro1
Met6
Ntf2
Prm2
Rsn1
Scp160
Ses1
Snu13
Ths1
Vma5Wtm1
Ybr025c
Dog1
Has1
Prp8
Sap185
Sit4
Ylr386w
Yhr214waYdl025c
Yal049c
Yhr009c
Apt1
Aro1
Ydr128w
Ylr238w
Erg1
Cdc7
Bfr2
Bir1
Nut1
Thi3Ylr231c Ylr331c
Bms1
Cdc46
Ctf4Dbp10
Lst4Mcm2
Mcm3
Spb1
Ssf2
Ybl104c
Pph3
Ybl046w
Ynl201c
Prp11
Cop1
Nan1 Rex2
Shm2
Ssk2
Bck1
Gal7
Kic1
Kin2
Mkk2
Smk1
Ydr214w
Ylr187w
Gdb1
Rad50
Ubr1
Ylr241w
Yor173w
Ykl161c
Ynl056w
Yol045wFun30
Fun31
Aac1
Apg17
Cvt9
Ppx1
REP1
Sec18
Tyr1
Yhr020w
Ynl208w
Yor086c
Grx3
Idp2Pho81
Sec23
Yor073w
Dog2
Msk1
Rgd1Vma22
Aip1
Arp7
Mse1
Ydr239c
Ypr115w
Rad54
Ynl134c
Cdc9
Dbp9Pol30
Yor378w
Sld2
Trl1
Lif1 Dnl4Mec3
Mak16
Ydr198c
Ygl146c
Nop15 Mlh3
Pso2
Mgm101
Rad2 Pex15
Mre11
Vma8
Xrs2
Rad7
Elc1
Rfa2
Hir3
Ubc13
Aro9
Mms2
Rad18
Whi2
Csr2
Tdp1
Shp1
Yen1
Fpr4
Lcd1
Adh5
Mag1
Ai1
Hho1
Hht1
Mph1Msh2
Yor155c
Mus81
Anc1
Cdc16
Mes1
Mms4
Nhp2
Rad53
Rpc10
Yer078c
Ntg1Rfc2
Tif34
Rad24
Rfc3
Rfc5
Ylr413w
Asf1Ptc2
Swi4
Tbf1Ymr135c
Yta7
Rad59
Bem2
Hor2
Ilv2
Opy1
Pgm1Ptc3
Rad52
Rad9
Rfa1
Rfa3
Hxt7
Yjr141w
Brr2
Rfc4
Rom2
Vac8
Sml1Adh3
Nat1
Dpb11
Hpr5
Phr1
Msd1
Pol4
Rho5
Rad16Htz1
Pdx1
Rad30
Map2
Ycl042w
Efd1Rfc1
Rhc18
Imd1
Set1
Bre2
Trf4
Mtr4 Ydl175c
Yil079c
Ypl146c
Ddc1
Suv3
Rad17
Mgt1
Rip1
Sof1
Ira2
Rad14
Rad4
Rad28
Ccl1Kin28
Lsc1Tfb3 Ybr184w
Ald5 Bul1
Lsb1Ykr018c
Ylr392c
Sir3
Gas1
Sir1
Ubp8
Sir4
Blm3Sir2
Yfl006w
Yku70
Pex19
Arc15
Arc18Arc19
Arc40Arp3
Puf3
Fhl1
Fkh1
Fkh2
Gcn3Hmo1
Ceg1
Ckb2
Fyv8
Gcd2
Gcd7
Mbp1
Net1
Sec2
Sin3
Sui2
Sui3
Ubp12
Ure2
Ygr017w
Ymr144w
Ino80
Lap4
Ams1
Bik1
Pib1
Pph21
Prk1
Abp1Akl1
Dis3
Eaf3 Fip1
Nam8
Nup1
Nup2
Nup60
Pap1
Pct1
Reb1
Rnt1
Rrp4Rrp43
Rrp6
Rtt103
Sif2
Snu56Sto1
Tra1
Ume1
Ypr090w
Fal1
Gcd1
Gcd6
Ubc4Qcr7
Ufd4
Ydl100c
Ybr014c
Yer083c
Ygl020c
Ydr200c
Yfr008w
Yol054w
Gac1
Pob3
Ycr030c
Yor056c
Ypr093cRpb9
Rgp1
Hts1
Apm3
Apl5
Las17
Bzz1Gal2
Pep1
Sla1
Sqt1
Vma6
Vrp1 Yhm2Ynr065c
Pds1
Pkh2
Ygr033c
Cef1Clf1
Snt309
Ygr205w
Sat4
Pho85
Ydr516c
Ygr165w
Num1Sef1Skt5
Syf1
Ygl081w
Nip1
Gsy2Ylr016c
Smc4 Ynl094w
Ypl150w
Aro4
Cbk1Sgt2
Ssd1
Gip2
Hal5
Itr2
Kin82Ynr047w Ksp1
Chs1
Pri2
Yhr186c
Lem3
Nmd5
Yhr199c
Ylr326w
Ppq1
Yhl010c
Nha1
Ybl049w
Ykr017c
Trx2
Usa1
Pex6
Chk1
Ctr1
Ssk22Bud7
Mae1Bfa1
Nup53
Ybr063c
Dpb2
Lst8 Pho86
Srp54
Ykr051w
Ylr271w
Yml020w
Ypr003cRrp3
Slc1
Tfc7Ygr266w
Nog2
Met16
Slx1 Ybt1
Ynl040w
Bmh1
Bnr1
Boi2Kcs1
Nth1 Yfr017cYil028w
Cyk3 Sok1Stu1
Svl3
Caf4
Ent2
Ybl029w
Osh7
Sap1
Pex7
Fzo1
New1
Rrp9
Rtf1
Spt7
Ate1
Epl1
Mpt1
Vid21
Arp4
Esa1
Tuf1
Sdf1
Yel064c Snt1Trm3
Yil112w
Ylr409c
Ymr155wYrf1-3Zds2
Wtm2
Swd3
Sfp1
Sgs1
Ydr316w
Bud9
Dak2
Kin1
Ynl182c
Q0032
Bub3Q0092
Rpf1 Ypd1 Vps35
Ybr242wYil137c
Sec28
Ypl222w
Bud3
Cin8Hir1Mak11
Mck1
Pnt1Yil105c
Msi1
Set3
Pkh1
Prp46Cdc54
Ypl113c
Sap155
Eap1
Ybr187w
Ycr076c Ykr007w
Btn2
Hxt5
Swe1
Ahc1
Kel1
Tep1
Mlp2
Tup1
Cyc8
Ssy5
Sph1
Ydl156w
Rpl23b
Bcp1
Ypl208w
Rpn13
Pol2
Ygl104cNoc3
Yhl035c
Hot1Ydr116c
Ykl082c
Cna1
Ygr263c
Ctk1
Cdc37
Hrb1
Elp2
Elp3
Jip1
Iki3
Zms1
Hat2
Hif1
Hog1Rck2
Pac11
Ybl064cDyn2
Pbs2
Pps1Ade13
Ppz2Yor054c
Pwp2
Ygr210c
Sec13
Sec31
Nup133
Yhl039wLas1
Yor283w
Ycl039w
Ydr255c
Vid28
Fyv10
Vid30
Yer066ca
Ygr223cErg10
Yjr061w
Kkq8
Yjl069c
Arp10
Dip2
Dip5 Mum2
Yil055c
Bmh2
Cln2
Gcn2 Ynl213c
Pop2
Tfc4Kel2
Bud14
Mih1
Pac2
Ydr449c
Ygr130c
Nsr1
Sul2
Ydr267c
Ubp9
Yol111c
Ygl131c
Yor353c
Ctk3Stb3
Fap1
Mek1
Pcl9
Psr1
Ser1
Ufd2Tsl1Dsk2
Hmf1
Ydr049w
Npl4
Rad23
Ylr247cExo70Cin5
Ymr291wVps33
Ypl236cHym1
Yfr039c
Ltp1
Mot1 Mob2
Pcl6
Rpg1
Top1Yfr011c
Yol087c
Tif35
Yor227w
Aut10Apg2
Ptc1Ris1
Apg7
Ape3
Yml072c
Cpa1
Prp28
Rpd3
Ydr266c
Ymr093w
Rok1
Are2
Bre1
Yhr149c
Ypl055c
Ime4Pep3
Ymr086w
Sum1
Lsb3
Yfr024c
Prp12 Ylr422wYor042wYjl045w
Ygr280c
Cik1
Elm1
Scd5
Ydr412w
Ptp2
Ydl063cDps1
Msn5Gal11
Prp31
Skm1
Hmg2
Swd1
Ygr067c
Yir003w
Rlp24
Gpi15
Yhr046c
Bub2
Ism1
Ylr152cDcp2
Crz1
Dcp1Sfb3
Tgl1Ybr225w
Sgm1Mds3 Pin4
Sas10
Yel015w
Ynl207w
Rpm2
Cdc25
Ltv1
Yor215c
Mec1
Rad27
Yhr196w
Tps2Scs2
Mms22Esc4
Gdh2Arl3
Chd1
Mlc2
Prs3
Ira1
Rpa12
Yer067w
Yhr087w
Ura3
Isa1Ygr150c
Ygr198w
Red1
Rim11
Gcr2
Yjr028w
Cki1
Pmd1
Prs2
Yer160c
Yjr027w
Ybr094w
Ybr267w
Ydr339cPmc1
Ydr365c
Ynr054c
Ycr087w
Ydr102c Yfr003c
Yfr016c
Cap2 Cap1
Ylr427w
Are1
Msc3
Grs1
Swi1
Kex2
Tao3
Ccr4
Cdc36
Yer084w
Dbp2
Osh3
Ptp3
Swi5 Stb4
Faa2
Hfi1
Ygr002c
Tel1
Ybr280c
Ybr139wAah1
Ycr001w
Yfl034w
Ypr143w
Yhr105w
Ssq1
Mub1
Spb4
Ubr2
Sth1
Ysc84Scp1
Sps1
Apm1
Shs1
Gcn5
Ada2
Taf60
Sgf29
Hap2Cis1
Ykl214c
Ypr085c
Hap5Ypl166w
Nop16
Ynl063w
Grh1
Hap3
Mdh2
Exg1
Yil108w
Med4Zrg17
Psr2
Ssl2
Rad55
Rck1
Rim15
Rlf2
Ygl060w
Ski8Ski2
Ski3
Sln1
Taf90
Prp40Ykl099c
Yml093w
Ykr060w
Yor145c
Lcp5
Caf20
Cdc20
Mad3
Cse2
Ime2
Sdh2
Ydr372c
Ykr046c
Ste4
Ybl036c
Ydl193w
Ydr482c
Ygr054w
Ino4
Ydr324c
Msh3
Ygl220wYll029w
Yjr110w
Rgr1
Yfl030w
Pcl7
Efb1
Kgd2Krs1
Pmi40
Lpd1
Arg1
Trp5
Ara1
Thr4
Sac6
Ape2
Pab1
Mdh3
Acs2
Hom2
Ydl124wBat1
Gcy1
Ade6 Cys3
Ade3
Ura1Rho4
Msn4
Van1
Mnn9
Bni1
Cdc47
Yjr029w
Trp3
Imh1
Pdx3
Thi21
Ade17
Ymr145c
Ktr3
Nop14
Mdj1
Cvt19
Dld3Cdc123
Sui1
Arg4
Rho1Yfr044c
Ilv3
Apg5
Ymr315w
Ymr102c
Afg3
Tfg2
Bgl2
Cbp6Psd1
Oac1
Pet9
Ayr1
Pro3
Scw4
Msh6
Yil104c
Ymr323w
Pda1
Nbp2
Ppz1
Snu66
Sks1
Nup145
Ctf19
Ypl181w
Gsf2
Spc24
Glt1
Spc25
Ymr196w
Rmt2
Yjr070c
Sen2
Asc1
Trp2
Rcl1
Apl2
Ylr328w
Apl4
Pib2Rlp7
Apm4
Rad6
Yjl107c
Sec21
Erg27
Gpt2Ynl181w
Ydl204w
Om45 Ret2
Yer049w
Ydr398w
Sgd1
Pox1
Npr2 Yer182wVps8
Ydr233c
Vid24
Yor172w
Cac2
Ydl113c
Ynl124w
Sac1
Wbp1
Hsm3
Rpl5
Nup120
Vps41
Crc1
Figure S3: View of the entire HMS-PCI Dataset.Thick blue lines represent literature-derived interactions from PreBIND+MIPS in the HMS-PCI dataset.Thin orange lines represent potential novel interactions. Arrows point from bait to associated protein.Functional annotation derived from Gene Ontology.(www.geneontology.org) http://www.bind.ca
Ho et al., Nature, 2002
New technologies determine protein-protein interactions on a large scale
Such datasets are being increasingly produced, and are publicly available
11
Outline
1. Protein-protein interactions
2. Using graph structures to study protein-protein interactions
3. Clustering of graphs
4. Evaluation of clusters
Representing protein-protein interactions using graphs
12
protein A protein B
experimentally determined interaction
Protein attributes:• name• function• quantitative data
Protein attributes:• name• function• quantitative data
Interaction attributes:• type• confidence• direction
Representing protein-protein interactions using graphs
13
protein A protein B
experimentally determined interaction
Protein attributes:• name• function• quantitative data
Protein attributes:• name• function• quantitative data
Interaction attributes:• type• confidence• direction
Experimental artifact
Representing protein-protein interactions using graphs
14
● Use graphs to represent the large-scale information on proteins, interactions and their attributes
Graph-based representation of protein-protein interactions
15
BB II NN DD
DNA damage response and repairDNA metabolismDNA recombinationDNA replication
RNA localization and processing
autophagybuddingcarbohydrate metabolismcell cyclecell growth and/or maintenancecell organization and biogenesiscell shape and cell size controlgeneral metabolismmatingnucleolar and ribosome biogenesisprotein amino acid phosphorylation/desphosphorylationprotein biosynthesisprotein degradationprotein metabolism and modificationprotein transportsignal transductionsporulationstress responsetranscriptiontransport
Kss1Bck2
Ste7
Dig1
Ste12
Ste11
Bem3
Ylr154c
Rvb1
Aco1
Rvs167
Ded1
Gyp5Ecm29
Hom6
Ilv5
Lys12
Pho84
Pre10
Ubp7
Ura7
Met18
Ylr243w
Ubi4
Idh1
Apg12
Crm1
Fet3
Kap122
Kgd1
Met10
Pfk1
Rpn1
Car1
Hyp2
Ipp1
Mdh1
Cnb1
Cmp2
Cns1
Ecm10
Yhb1
Cof1
Crn1
Srv2
Hta1
Hhf1
Hir2
Htb1Kap114Kri1
Nap1
Nop4Ret1
Rpc82
Rpo31
Spt16
Rrp13
Ylr222c
Cct2
Cys4
Dig2
Pim1
Rpa135
Rpn6
Tec1
Yer093c
Ygl245w
Yjr072c
Yol078w
Lsm8
Apa1
Gar1Lsm2
Qcr2
Rpn12Rpn8
Rrp42
Smb1
Tif6
Ygl117wMig1
Mss116
Nop12
Nop2Brx1
Ckb1
Cox6
Fet4Nsa2
Nip7Nmd3
Nop1
Rrp1
Sik1
Nug1
Nsa1
Noc2
Ypl009c
Rpc19
Rpn5
Emp24
Kre6
Rpn9Rpt1
Rpt2
Sip2
Arc35Gal83
Idh2
Sec53Snf1
Snf4
Tcp1
Smt3
Cpr1
Gph1Pst2Rod1 Sip1
Yor267c
Spo12
Pse1
Tem1
Cdc15
Gcd11
Mcx1Sar1
Ybr281c
Yhr033w
Ynk1
Ubc6
Atp4
Gcn1
Los1
Pol5 Sec7
Uba1
Ybl004w
Ykl056c
Ypt1
Yju2
Cor1Ded81
Prp19
Dss4
Gdi1
Mrs6
Sec4
Cdc53
Por1
Skp1
Ela1
Loc1
Yra1Cic1
Glc7
Glc8
Mhp1
Reg1
Kap104
Ynl035c
Dim1
Enp1
Nab2
Hrp1Mrt4
Prp6
Hsl7
Rpp0
Ylr287c
Fyv14Krr1
Yjr041c
Rpf2
Kre33
Tsr1
Tif4631
Drs1
Erb1
Ydr131c
Yrb2
Yhr197w
Arf1
Arf2
Cmd1
Cmk2
Ede1
Myo2
Myo4
Myo5
Pgm2
Vas1
Vps13
Ils1
Sod1
Hch1
She3
Dmc1
Est1
Puf6
Rrp12
Cbf5
Dbp7
Hsh49
Pet56
Pdi1
Pwp1
Yer077cYjl109c
Ykl014c
Fpr1
Hom3Gsp1
Gsp2
Srm1
Rna1
Mog1
Rse1Ist3Bud13
Car2
Ssz1
Ydr341c
Sah1
Lsm7 Pat1
Ade5
Dhh1
Lsm4Prp24
Met30
Rub1
Sis1
Tef4
Met4
Met31Nop13
Ebp2
Imd2Imd3
Rrp5
Rad10
Rad1
Sod2
Ubc1
Adk1
Yhr115cYnl311c
Ypt6Ric1
Cdc11
Cdc3Cdc10
Cdc12
Tif4632Ctf13
Hrt1
Rtt101
Sec27
Guf1
Yll034c
Tps1
Ubp15
Rpc40
Rpa190
Sgn1
Ygr250c
Spt2
Cka1
Cka2
Ygr052w
His4
Gpi16
Cdc42
Bem4
San1
Gbp2 Hpr1Sub2Thp2
Mft1
Rlr1
Rho2
Ubc12
Ula1
Clb2
Cdc28
Dia2
Cks1
Hat1
Vps21
Ypt10Ypt52
Ypt32
Ypt31
Ypt7
Gpa2
Ymr029c
Grr1
Pfk2
Pfs2
Rpt3Ygl004c
Rpn10
Snp1
Bcy1
Tpk3
Tpk1
Sti1
Ypt53
Cce1Rnr3
Cmk1
Vph2
Cpr6
Qns1
Trr1
Caf120
Ptc4
Ydr247w
Gin4
Ydr071c
Mas1
Mas2
Gis4
Prb1
Sxm1
Ecm1
Lhp1
Yjl149w
Cdc13
Fun11
Clu1
Tif2
Gnd1
Kip3
Hef3
Inp52Sap190
Nta1
Hsp104
Vma4
Tfp1
Mge1
Ptc5
Yal027w
Spo13
Ydr219c
Yhr122wYdr306c
Yku80
Msu1
Ylr352w
Yol128c
Oye2
Bur2
Cln1
Clb5 Clb3 Yer138c
Ydr170wa
Nup85
Hem15Ymr209c Cbp3
Nup84
Seh1
Pre1Pup2
Pup3
Ylr199c
Pre5
Pre2
Pre3
Pre6
Pre7Scl1
Ykl206c
Pre9
Pre8 Rsp5
Rvs161
Bop2
Sgt1Ufo1
Cdc4
Yil007c
Yta6
Ygr086c
Ypl004c
Top2
Cdh1
Cct3
Cat5
Yjl068c
Gyp6
Trs120
Trs130
Rpb5
Rpa43
Ybr203w
Yck1
Yck2
Ypr015c
Bud20
Ydr101c
Prp43
Rsm25
Afg2
Fyv4
Ypl013c
Rsm24
Yjl122wYbl044w
Nog1
Ydr036c
Htb2
Mam33
Ygl068w
Cdc23Swm1
Cdc5
Mcd1 Smc1
Gsy1
Fpr3
Sds22
Fin1
Gpa1
Hex3
Ssk1
Hrr25
Prp3
Prp4
Ymr226c
Apc1
Nop58
Ygr145w
Kap95
Cbp2
She4
Mlc1
Nuf1
Hul5Myo3
Faa4
Rnq1
Ydr279wAdo1
Lsm1
Ygr173w
Ydr152wYkl078w
Ylr097c
Adh2
Imd4
Pet127
Tis11
Ytm1
Yrb1
Gal3
Bbc1
Cdc39
Mms1
Pma2
Yar009c
Ygp1Ylr035ca
Ylr106c
Adr1
Mkt1
Dur1Ecm33
Hsp12
Sec26
Qri8
Ahp1
Dop1
Sry1
Rvb2
Npl3
Pub1
Fun12
Ste23
Ssf1
Aac3
Kre31
Rli1
Srp1
Sup45
Ygr090w
Tpt1
Apc2
Mak5
Ynl116w
Dbf2
Mob1
Ess1
Tfg1Rpo21
Tom1Spt5
Hxt6
Rpb3
Ynl253wIsu1Nfs1
Lys1
Fol2
Tal1
Cct5Cct6
Cft1
Hgh1
Mer1
Lhs1
Rpn3 Rpn7
Rpt5
Nas6
Arp2
Rpt4
Rpn11Cpr3
Cdc33
Dut1Ygr066c
Thi22
Tpk2Ybr028c
Yck3
Ygr154c
Ssn8 Sno2
Ygr111w
Yjl207c
Yil113w
Slt2
Ynl260c
Asi3
Ypl170w
Pma1
Ynl227c
Cdc14
Mcr1 Dpm1
Atp7 Atp5
Fur1
Hms1
Ydr453c
Atp3
Spe3
Aut2
Spc72
Ydr229w
Irr1Smc3
Faa1
Cyr1
Dbf20
Ala1
Egd2
Axl1Rpb10
Dbp8
Cpa2
Rnr1
Rnr2
Ydl086w
Pdc6
Pdc5
Ynl157w
Prs5
Dia4
Bud32
Grx4
Ykr038cYml036w
Isw2
Isw1
Kns1
Msg5
Fus3
Msh1
Pph22
Cdc55
Ppe1
Rts1
Tpd3
Ygr161c
Hta2Ygl121c
Rpb11
Tap42
Pro1
Arc1
Fum1
Rad51
Mlh1
Tos3
Ykr096w
Yak1
Dnm1
Vps1
Ylr270wYpl247c
Mpc54
Yhr112cPyc2
Ybl108w
Oye3
Ycr079w
Vid31
Cdc60
Pro2
Pyc1
Ypl110cYkl215c
Emg1
Yer030w
Ynl099c
Siw14
Ypk2Pet112
Ygr016w
Yel023c
Yor154w
Yor220w
Rad3
Dun1
Far1
Gpd1
Gpd2
Sen1Ste20
Sec6Ylr368w
Rad26
Ach1
Adh4
Bio3
Erg20
Yhr076w
Ymr318c
Rhr2
Ydr326c
Ras2Ras1
Acc1Gfa1
Isa2
Ngl2Rpa49
Rpc25Rpo26
Tbs1
Yfl042c
Rpc34
Sen15
Egd1Erg13
Erg6
Frs2
Gnd2
Grx1
Aat2
Afr1
Lro1
Met6
Ntf2
Prm2
Rsn1
Scp160
Ses1
Snu13
Ths1
Vma5Wtm1
Ybr025c
Dog1
Has1
Prp8
Sap185
Sit4
Ylr386w
Yhr214waYdl025c
Yal049c
Yhr009c
Apt1
Aro1
Ydr128w
Ylr238w
Erg1
Cdc7
Bfr2
Bir1
Nut1
Thi3Ylr231c Ylr331c
Bms1
Cdc46
Ctf4Dbp10
Lst4Mcm2
Mcm3
Spb1
Ssf2
Ybl104c
Pph3
Ybl046w
Ynl201c
Prp11
Cop1
Nan1 Rex2
Shm2
Ssk2
Bck1
Gal7
Kic1
Kin2
Mkk2
Smk1
Ydr214w
Ylr187w
Gdb1
Rad50
Ubr1
Ylr241w
Yor173w
Ykl161c
Ynl056w
Yol045wFun30
Fun31
Aac1
Apg17
Cvt9
Ppx1
REP1
Sec18
Tyr1
Yhr020w
Ynl208w
Yor086c
Grx3
Idp2Pho81
Sec23
Yor073w
Dog2
Msk1
Rgd1Vma22
Aip1
Arp7
Mse1
Ydr239c
Ypr115w
Rad54
Ynl134c
Cdc9
Dbp9Pol30
Yor378w
Sld2
Trl1
Lif1 Dnl4Mec3
Mak16
Ydr198c
Ygl146c
Nop15 Mlh3
Pso2
Mgm101
Rad2 Pex15
Mre11
Vma8
Xrs2
Rad7
Elc1
Rfa2
Hir3
Ubc13
Aro9
Mms2
Rad18
Whi2
Csr2
Tdp1
Shp1
Yen1
Fpr4
Lcd1
Adh5
Mag1
Ai1
Hho1
Hht1
Mph1Msh2
Yor155c
Mus81
Anc1
Cdc16
Mes1
Mms4
Nhp2
Rad53
Rpc10
Yer078c
Ntg1Rfc2
Tif34
Rad24
Rfc3
Rfc5
Ylr413w
Asf1Ptc2
Swi4
Tbf1Ymr135c
Yta7
Rad59
Bem2
Hor2
Ilv2
Opy1
Pgm1Ptc3
Rad52
Rad9
Rfa1
Rfa3
Hxt7
Yjr141w
Brr2
Rfc4
Rom2
Vac8
Sml1Adh3
Nat1
Dpb11
Hpr5
Phr1
Msd1
Pol4
Rho5
Rad16Htz1
Pdx1
Rad30
Map2
Ycl042w
Efd1Rfc1
Rhc18
Imd1
Set1
Bre2
Trf4
Mtr4 Ydl175c
Yil079c
Ypl146c
Ddc1
Suv3
Rad17
Mgt1
Rip1
Sof1
Ira2
Rad14
Rad4
Rad28
Ccl1Kin28
Lsc1Tfb3 Ybr184w
Ald5 Bul1
Lsb1Ykr018c
Ylr392c
Sir3
Gas1
Sir1
Ubp8
Sir4
Blm3Sir2
Yfl006w
Yku70
Pex19
Arc15
Arc18Arc19
Arc40Arp3
Puf3
Fhl1
Fkh1
Fkh2
Gcn3Hmo1
Ceg1
Ckb2
Fyv8
Gcd2
Gcd7
Mbp1
Net1
Sec2
Sin3
Sui2
Sui3
Ubp12
Ure2
Ygr017w
Ymr144w
Ino80
Lap4
Ams1
Bik1
Pib1
Pph21
Prk1
Abp1Akl1
Dis3
Eaf3 Fip1
Nam8
Nup1
Nup2
Nup60
Pap1
Pct1
Reb1
Rnt1
Rrp4Rrp43
Rrp6
Rtt103
Sif2
Snu56Sto1
Tra1
Ume1
Ypr090w
Fal1
Gcd1
Gcd6
Ubc4Qcr7
Ufd4
Ydl100c
Ybr014c
Yer083c
Ygl020c
Ydr200c
Yfr008w
Yol054w
Gac1
Pob3
Ycr030c
Yor056c
Ypr093cRpb9
Rgp1
Hts1
Apm3
Apl5
Las17
Bzz1Gal2
Pep1
Sla1
Sqt1
Vma6
Vrp1 Yhm2Ynr065c
Pds1
Pkh2
Ygr033c
Cef1Clf1
Snt309
Ygr205w
Sat4
Pho85
Ydr516c
Ygr165w
Num1Sef1Skt5
Syf1
Ygl081w
Nip1
Gsy2Ylr016c
Smc4 Ynl094w
Ypl150w
Aro4
Cbk1Sgt2
Ssd1
Gip2
Hal5
Itr2
Kin82Ynr047w Ksp1
Chs1
Pri2
Yhr186c
Lem3
Nmd5
Yhr199c
Ylr326w
Ppq1
Yhl010c
Nha1
Ybl049w
Ykr017c
Trx2
Usa1
Pex6
Chk1
Ctr1
Ssk22Bud7
Mae1Bfa1
Nup53
Ybr063c
Dpb2
Lst8 Pho86
Srp54
Ykr051w
Ylr271w
Yml020w
Ypr003cRrp3
Slc1
Tfc7Ygr266w
Nog2
Met16
Slx1 Ybt1
Ynl040w
Bmh1
Bnr1
Boi2Kcs1
Nth1 Yfr017cYil028w
Cyk3 Sok1Stu1
Svl3
Caf4
Ent2
Ybl029w
Osh7
Sap1
Pex7
Fzo1
New1
Rrp9
Rtf1
Spt7
Ate1
Epl1
Mpt1
Vid21
Arp4
Esa1
Tuf1
Sdf1
Yel064c Snt1Trm3
Yil112w
Ylr409c
Ymr155wYrf1-3Zds2
Wtm2
Swd3
Sfp1
Sgs1
Ydr316w
Bud9
Dak2
Kin1
Ynl182c
Q0032
Bub3Q0092
Rpf1 Ypd1 Vps35
Ybr242wYil137c
Sec28
Ypl222w
Bud3
Cin8Hir1Mak11
Mck1
Pnt1Yil105c
Msi1
Set3
Pkh1
Prp46Cdc54
Ypl113c
Sap155
Eap1
Ybr187w
Ycr076c Ykr007w
Btn2
Hxt5
Swe1
Ahc1
Kel1
Tep1
Mlp2
Tup1
Cyc8
Ssy5
Sph1
Ydl156w
Rpl23b
Bcp1
Ypl208w
Rpn13
Pol2
Ygl104cNoc3
Yhl035c
Hot1Ydr116c
Ykl082c
Cna1
Ygr263c
Ctk1
Cdc37
Hrb1
Elp2
Elp3
Jip1
Iki3
Zms1
Hat2
Hif1
Hog1Rck2
Pac11
Ybl064cDyn2
Pbs2
Pps1Ade13
Ppz2Yor054c
Pwp2
Ygr210c
Sec13
Sec31
Nup133
Yhl039wLas1
Yor283w
Ycl039w
Ydr255c
Vid28
Fyv10
Vid30
Yer066ca
Ygr223cErg10
Yjr061w
Kkq8
Yjl069c
Arp10
Dip2
Dip5 Mum2
Yil055c
Bmh2
Cln2
Gcn2 Ynl213c
Pop2
Tfc4Kel2
Bud14
Mih1
Pac2
Ydr449c
Ygr130c
Nsr1
Sul2
Ydr267c
Ubp9
Yol111c
Ygl131c
Yor353c
Ctk3Stb3
Fap1
Mek1
Pcl9
Psr1
Ser1
Ufd2Tsl1Dsk2
Hmf1
Ydr049w
Npl4
Rad23
Ylr247cExo70Cin5
Ymr291wVps33
Ypl236cHym1
Yfr039c
Ltp1
Mot1 Mob2
Pcl6
Rpg1
Top1Yfr011c
Yol087c
Tif35
Yor227w
Aut10Apg2
Ptc1Ris1
Apg7
Ape3
Yml072c
Cpa1
Prp28
Rpd3
Ydr266c
Ymr093w
Rok1
Are2
Bre1
Yhr149c
Ypl055c
Ime4Pep3
Ymr086w
Sum1
Lsb3
Yfr024c
Prp12 Ylr422wYor042wYjl045w
Ygr280c
Cik1
Elm1
Scd5
Ydr412w
Ptp2
Ydl063cDps1
Msn5Gal11
Prp31
Skm1
Hmg2
Swd1
Ygr067c
Yir003w
Rlp24
Gpi15
Yhr046c
Bub2
Ism1
Ylr152cDcp2
Crz1
Dcp1Sfb3
Tgl1Ybr225w
Sgm1Mds3 Pin4
Sas10
Yel015w
Ynl207w
Rpm2
Cdc25
Ltv1
Yor215c
Mec1
Rad27
Yhr196w
Tps2Scs2
Mms22Esc4
Gdh2Arl3
Chd1
Mlc2
Prs3
Ira1
Rpa12
Yer067w
Yhr087w
Ura3
Isa1Ygr150c
Ygr198w
Red1
Rim11
Gcr2
Yjr028w
Cki1
Pmd1
Prs2
Yer160c
Yjr027w
Ybr094w
Ybr267w
Ydr339cPmc1
Ydr365c
Ynr054c
Ycr087w
Ydr102c Yfr003c
Yfr016c
Cap2 Cap1
Ylr427w
Are1
Msc3
Grs1
Swi1
Kex2
Tao3
Ccr4
Cdc36
Yer084w
Dbp2
Osh3
Ptp3
Swi5 Stb4
Faa2
Hfi1
Ygr002c
Tel1
Ybr280c
Ybr139wAah1
Ycr001w
Yfl034w
Ypr143w
Yhr105w
Ssq1
Mub1
Spb4
Ubr2
Sth1
Ysc84Scp1
Sps1
Apm1
Shs1
Gcn5
Ada2
Taf60
Sgf29
Hap2Cis1
Ykl214c
Ypr085c
Hap5Ypl166w
Nop16
Ynl063w
Grh1
Hap3
Mdh2
Exg1
Yil108w
Med4Zrg17
Psr2
Ssl2
Rad55
Rck1
Rim15
Rlf2
Ygl060w
Ski8Ski2
Ski3
Sln1
Taf90
Prp40Ykl099c
Yml093w
Ykr060w
Yor145c
Lcp5
Caf20
Cdc20
Mad3
Cse2
Ime2
Sdh2
Ydr372c
Ykr046c
Ste4
Ybl036c
Ydl193w
Ydr482c
Ygr054w
Ino4
Ydr324c
Msh3
Ygl220wYll029w
Yjr110w
Rgr1
Yfl030w
Pcl7
Efb1
Kgd2Krs1
Pmi40
Lpd1
Arg1
Trp5
Ara1
Thr4
Sac6
Ape2
Pab1
Mdh3
Acs2
Hom2
Ydl124wBat1
Gcy1
Ade6 Cys3
Ade3
Ura1Rho4
Msn4
Van1
Mnn9
Bni1
Cdc47
Yjr029w
Trp3
Imh1
Pdx3
Thi21
Ade17
Ymr145c
Ktr3
Nop14
Mdj1
Cvt19
Dld3Cdc123
Sui1
Arg4
Rho1Yfr044c
Ilv3
Apg5
Ymr315w
Ymr102c
Afg3
Tfg2
Bgl2
Cbp6Psd1
Oac1
Pet9
Ayr1
Pro3
Scw4
Msh6
Yil104c
Ymr323w
Pda1
Nbp2
Ppz1
Snu66
Sks1
Nup145
Ctf19
Ypl181w
Gsf2
Spc24
Glt1
Spc25
Ymr196w
Rmt2
Yjr070c
Sen2
Asc1
Trp2
Rcl1
Apl2
Ylr328w
Apl4
Pib2Rlp7
Apm4
Rad6
Yjl107c
Sec21
Erg27
Gpt2Ynl181w
Ydl204w
Om45 Ret2
Yer049w
Ydr398w
Sgd1
Pox1
Npr2 Yer182wVps8
Ydr233c
Vid24
Yor172w
Cac2
Ydl113c
Ynl124w
Sac1
Wbp1
Hsm3
Rpl5
Nup120
Vps41
Crc1
Figure S3: View of the entire HMS-PCI Dataset.Thick blue lines represent literature-derived interactions from PreBIND+MIPS in the HMS-PCI dataset.Thin orange lines represent potential novel interactions. Arrows point from bait to associated protein.Functional annotation derived from Gene Ontology.(www.geneontology.org) http://www.bind.ca
● View data as a graph
◆ Proteins are nodes and interactions are edges
◆ Nodes have attributes■ e.g. known function
◆ Directed edges ■ experimental artifact
The interactions are determined by tag-affinity purification (TAP)
● A protein (“bait”) is labeled by a chemical
● The bait forms its interactions (collects “prey”)
● The bait, and all other proteins in the complex are isolated
● All components of the complex are identified by mass spectrometry
16To appropriate a quote from JohnDonne, “no protein is an island entire ofitself” — or at least, very few proteins
are. Most seem to function within compli-cated cellular pathways, interacting withother proteins either in pairs or as compo-nents of larger complexes. A comprehensiveunderstanding of these interactions will beneeded before we can appreciate the mecha-nisms by which cellular pathways functionand interlink. On pages 141 and 180 of thisissue, Gavin et al.1 and Ho et al.2 describe significant advances towards this goal. Eachgroup has characterized hundreds of distinctmultiprotein complexes in the budding yeastSaccharomyces cerevisiae, using approachesin which individual proteins are tagged andused to pull down associated proteins, whichare then analysed by mass spectrometry.
These studies1,2 exemplify an emergingparadigm in protein biology: the systematicanalysis of an organism’s complete comple-ment of proteins (its ‘proteome’). Proteininteractions on a proteome-wide scale havealready been analysed in several ways. In apair of landmark papers, Uetz et al.3 and Itoet al.4 adapted the yeast ‘two-hybrid’ assay —a means of assessing whether two single proteins interact — into a high-throughputmethod of mapping pair-wise protein inter-actions on a large scale. The authors collec-tively identified over 4,000 protein–proteininteractions in S. cerevisiae. Our own group5
has developed a microarray technology inwhich purified, active proteins from almostthe entire yeast proteome are printed onto a microscope slide at high density, such that thousands of protein interactions (andother protein functions) can be assayedsimultaneously.
Gavin et al.1 and Ho et al.2 take a differentapproach — one that is particularly effectiveat identifying protein complexes that con-tain three or more components. Large-scaleefforts to characterize protein complexes are generally rate-limited by the need for anearly pure preparation of each complex. Inthe new studies1,2, protein complexes werepurified as follows (Fig. 1). First, the authorsattached tags to hundreds of different pro-teins (to create ‘bait’ proteins). They thenintroduced DNA encoding these bait pro-teins into yeast cells, allowing the modifiedproteins to be expressed in the cells and toform physiological complexes with otherproteins. Then, using the tag, each bait pro-
tein was pulled out, often fishing out theentire complex with it (hence the term‘bait’). The proteins extracted with thetagged bait were identified using standardmass-spectrometry methods.
Applying this approach on a proteome-
wide scale, Gavin et al.1 have identified1,440 distinct proteins within 232 multi-protein complexes in yeast. As 91% of thesecomplexes contain at least one protein ofpreviously unknown function, the studyprovides a wealth of new information on231 previously uncharacterized yeast pro-teins, and on a further 113 proteins to whichthe authors ascribe a previously unknowncellular role. Furthermore, Gavin et al. findthat most of these complexes have a compo-nent in common with at least one other multiprotein assembly, suggesting a meansof coordinating cellular functions into ahigher-order network of interacting proteincomplexes.
An understanding of this high-orderorganization will undoubtedly offer insightinto corresponding networks in otherorganisms, as most yeast complexes havecounterparts in more complex species (onereason why researchers are interested in thisunicellular organism). Gavin and colleaguesillustrate this point by purifying andanalysing three equivalent multiproteincomplexes from yeast and human cells: theArp2/3 complex, a component of the cellular‘skeleton’; the Ccr4–Not1 complex, which isfound in the nucleus; and the TRAPP com-plex, which is involved in transport from one intracellular compartment (the endo-plasmic reticulum) to another (the Golgi). In each case, the authors retrieved humanand yeast complexes that were similar, if notidentical, in composition.
Using the same general approach, Ho et al.2 constructed an initial set of 725 yeastbait proteins, from which they identified3,617 interactions involving 1,578 differentproteins. They describe interaction networksassembled around the protein kinase Kss1 —a known component of pathways involved inmating and filamentous growth — and com-plexes associated with the cyclin-dependentkinase Cdc28 and the gene-transcriptionfactors Fkh1 and Fkh2. In addition, Ho andcolleagues used 86 bait proteins that areimplicated in the DNA-damage response,allowing them to delineate much of the yeastdamage-response network. In particular,they reveal many regulators and targets of theprotein kinase Dun1, and a possible role forthe DNA-repair protein Rad7 in processes of targeted protein degradation.
The approach taken by Gavin et al. andHo et al. is clearly powerful, but it does have
news and views
Protein complexes take the baitAnuj Kumar and Michael Snyder
Figure 1 Analysing protein interactions. In the‘co-precipitation/mass spectrometry’ approachused by Gavin et al.1 and Ho et al.2, an ‘affinitytag’ is first attached to a target protein (the ‘bait’; a). b, Bait proteins are systematicallyprecipitated, along with any associated proteins,on an ‘affinity column’. c, Purified proteincomplexes are resolved by one-dimensionalSDS–PAGE, a technique that involves running an electric charge through the complexes on agel, so that proteins become separated according to mass. d, Proteins are excised fromthe gel, digested with the enzyme trypsin, andanalysed by mass spectrometry. Database-search algorithms (bioinformatics) are thenused to identify specific proteins from their mass spectra.
Affinitycolumn
a
b
c
d Protein 1Protein 2Protein 3Protein 4Protein 5
Isolate proteincomplex
SDS–PAGE
Excise bandsDigest with trypsin
Analyse by massspectrometry andbioinformatics
Tag1
2
34
1
5
Bait
Many cellular functions are carried out by proteins that are bound together incomplexes. In two new large-scale studies, labelled proteins are used as‘bait’ to capture and identify those complexes.
NATURE | VOL 415 | 10 JANUARY 2002 | www.nature.com 123
Kumar & Snyder, Nature, 2002
The interactions are determined by tag-affinity purification (TAP)
17
cell
proteincomplex
Tagbait
prey
1. Tagging 2. Purification 3. Identification
KLNFMTP...
PNGFLKK...
SRKNFSL...
KFWQTY...
KKRLMTP...
18
The technology yields false positive and false negative interactions
● Can not distinguish between various types of complexes
chain star complete graph
● Use the “spoke” model to represent results of experiments✦ directed edges from “bait” to “prey”✦ multiple proteins in a complex can be used as a bait
• direction of edges reflects experimental design, but not the underlying biology
spoke
bait
prey
tag
Global graph-based summaries:degree of a node
19
● Degree of a node: the number of edges that the node has to other nodes
◆ degree distribution: fraction of nodes in the network with a different degree
◆ mean degree: average degree over all nodes
Each node is labeled with its degree
http://en.wikipedia.org/wiki/Degree_(graph_theory)
Degree distribution: Gavin et al., 200220
!
!
!
!!!
!!!!!!!!!!!!!!!!!!!!!!! !!! ! ! ! !
0 10 20 30 40 50
01
00
20
03
00
40
05
00
Degree Distribution: Gavin2002
Degree
No
de
Co
un
t
● Only a few nodes have a large number of edges
Global graph-based summaries:clustering coefficient
21
● Clustering coefficient of a node: the fraction of the neighbors of a node that are also neighbors
● Clustering coefficient of a network: average clustering coefficient over all neighbors
http://en.wikipedia.org/wiki/Clustering_coefficient
Mean degree vs clustering coefficient of experimental networks
22
0 5 10 15 20 25
0.0
0.1
0.2
0.3
0.4
!
!
!!
!
!
!
!
!
!
!
Gavin2002
Gavin2006
Hazbun2003Ho2002
Ito2001
Krogan2004
Krogan2006
Tong2002
Uetz2000
Li2004
Stelzl2005
Mean degree
Clu
ste
ring c
oeffic
ientsExperimental networks
!
!
AP!MS
Y2H
tag affinity purification
yeast-two-hybrid
Two technologies:
Conclusion from these summaries for protein interaction networks:
23
● Most nodes have a low degree (i.e. few neighbors)
● Some nodes have a high clustering coefficient (i.e. their neighbors are also neighbors)
● Of interest are protein clusters (i.e. groups of proteins that interact with each other more closely than outside the group)
◆ close interactions can help infer biological function◆ challenge: large and noisy datasets
24
Outline
1. Protein-protein interactions
2. Using graph structures to study protein-protein interactions
3. Clustering of graphs
4. Evaluation of clusters
25Our goal: find protein clusters in the large and noisy interaction graph
Gavin et al., Nature, 2002
26
Step 1: “de-noise” the interaction graph
● We are more confident in protein interactions if they are determined using multiple baits
◆ remove isolated subgraphs◆ determine connected
components■ subgraphs where there is a
directed path from each protein to every other protein
Gavin et al., Nature, 2002
27
Step 1: “de-noise” the interaction graph
● We are more confident in protein interactions if they are determined using multiple baits
◆ remove isolated subgraphs◆ determine connected
components■ subgraphs where there is a
directed path from each protein to every other protein
Isolated component
Gavin et al., Nature, 2002
28
● Finding clusters■ ignore directions of edges■ use Markov Cluster (MCL)
algorithm for clustering
● The output are sets of closely interacting proteins
● Not every protein is expected to cluster
Step 2: based on the graph topology, find protein clusters in the connected
components
Gavin et al., Nature, 2002
29
Output of a clustering procedure
Gavin et al., Nature, 2002
Step 2: based on the graph topology, find protein clusters in the connected
components Exosome example: additional proteins were found by clustering the network
©!2006!Nature Publishing Group!
!
components could be found under clustering conditions with slightlypoorer accuracy or coverage. Therefore, we grouped similar com-plexes from conditions with coverage and accuracy above 70%. Theresulting 5,488 different protein-complex variations were termed‘complex isoforms’ (Fig. 1). This procedure increased the overallcoverage to 90%. The inclusion of parameters resulting in accuracy/coverage below 70% did not increase the coverage, but significantlydecreased accuracy (data not shown).Comparison with the complete collection of known complexes
(279 from MIPS and the literature) showed that 257 of 491 com-plexes were entirely novel, and just 20 of those previously knownlacked novel components (Supplementary Table S2). Of the known
complexes not recovered by the procedure above, 36 were partiallyfound in single purifications (Supplementary Table S4) but produceda signal too weak to be recovered automatically.
Modular organization of the cell machineryThe above procedure partitions proteins in complexes into two types:core components that are present in most isoforms, and attachmentspresent in only some of them (Fig. 1). This is reminiscent of anorganization structure proposed previously that was based on asmall-scale analysis27. Complex cores ranged from 1–23 proteins insize (average 3.1 ^ 2.5). Among the attachments, we noticed severalinstances where two or more proteins were always together andpresent in multiple complexes, which we call ‘modules’ (Supplemen-tary Table S3; on average, associated with 3.3 ^ 1.6 cores).We tested whether this organization was a reflection of biological
phenomena by first looking at transcriptional control of the complexcomponents. A quality controlled set of 975 differentially expressedgenes derived from microarray analyses15 showed that a largepercentage of pairs of proteins within cores were coexpressed at the
Figure 2 | Evidence supporting complex organization. Proteins in eachorganization level (cores, and so on) are referred to as groups. a, Percentageof cell cycle co-regulated genes found in the same group. b, Percentage of co-regulated proteins in the same group expressed at the same time during thecell cycle. c, d, are as for a, b, but for sporulation genes. e, Average dispersionranges for protein abundance within each group. f–h, Percentage of groupshaving exactly the same subcellular localizations, cellular functions orphylogenetic conservation, respectively. i, j, Percentage of pairs for which adirect interaction is known from three-dimensional structures or yeast two-hybrid experiments, respectively. Values on each bar show the total numberof counts; n.d., not determined. See Supplementary Information for furtherdetails.
Figure 3 |Architecture andmodularity of complexes. Proteins are colouredaccording to their localization20. The line attribute corresponds to socio-affinity indices: dotted lines, 5–10; dashed lines, 10–15; plain lines,.15. Baitproteins are shown in bold and shaded circles around groups of proteinsindicate cores andmodules. a, The exosome and the Ski module. b, Stages inde-adenylation-dependent mRNA degradation; arrows show the order ofevents. c, Two distinct families of cap-binding proteins: the nuclear CBC(cap-binding complex) and the cytoplasmic eIF4F.
NATURE|Vol 440|30 March 2006 ARTICLES
633
Gavin et al., Nature, 2006