A Comprehensive Resource for Integrating and Displaying Protein PTMs
-
Upload
amit007thechamp -
Category
Documents
-
view
224 -
download
0
Transcript of A Comprehensive Resource for Integrating and Displaying Protein PTMs
-
8/14/2019 A Comprehensive Resource for Integrating and Displaying Protein PTMs
1/20
This Provisional PDF corresponds to the article as it appeared upon acceptance. Fully formattedPDF and full text (HTML) versions will be made available soon.
A comprehensive resource for integrating and displaying proteinpost-translational modifications
BMC Research Notes 2009, 2:111 doi:10.1186/1756-0500-2-111
Tzong-Yi Lee ([email protected])Justin Bo-Kai Hsu ([email protected])
Wen-Chi Chang ([email protected])Ting-Yuan Wang ([email protected])
Po-Chiang Hsu ([email protected])Hsien-Da Huang ([email protected])
ISSN 1756-0500
Article type Data Note
Submission date 18 November 2008
Acceptance date 23 June 2009
Publication date 23 June 2009
Article URL http://www.biomedcentral.com/1756-0500/2/111
This peer-reviewed article was published immediately upon acceptance. It can be downloaded,printed and distributed freely for any purposes (see copyright notice below).
Articles in BMC Research Notes are listed in PubMed and archived at PubMed Central.
For information about publishing your research in BMC Research Notes or any BioMed Centraljournal, go to
http://www.biomedcentral.com/info/instructions/
BMC Research Notes
2009 Lee et al. , licensee BioMed Central Ltd.This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0),
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
mailto:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]://www.biomedcentral.com/1756-0500/2/111http://www.biomedcentral.com/info/instructions/http://creativecommons.org/licenses/by/2.0http://creativecommons.org/licenses/by/2.0http://www.biomedcentral.com/info/instructions/http://www.biomedcentral.com/1756-0500/2/111mailto:[email protected]:[email protected]:[email protected]:[email protected]:[email protected]:[email protected] -
8/14/2019 A Comprehensive Resource for Integrating and Displaying Protein PTMs
2/20
pp.1
Acomprehensiveresourceforintegratinganddisplayingprotein
post-translationalmodifications
Tzong-YiLee1,JustinBo-KaiHsu1,Wen-ChiChang1,Ting-YuanWang1,Po-ChiangHsu2,andHsien-DaHuang
1,3,4,
1DepartmentofBiologicalScienceandTechnology,InstituteofBioinformaticsandSystems
Biology,NationalChiaoTungUniversity,Hsin-Chu300,Taiwan2DepartmentofBiologicalScienceandTechnology,InstituteofBiochemicalEngineering,
NationalChiaoTungUniversity,Hsin-Chu300,Taiwan3DepartmentofBiologicalScienceandTechnology,NationalChiaoTungUniversity,
Hsin-Chu300,Taiwan4CoreFacilityforStructuralBioinformatics,NationalChiaoTungUniversity,Hsin-Chu300,
Taiwan
Correspondingauthor
Emailaddresses:
TYL: [email protected]
JBKH:[email protected]
WCC: [email protected]
TYW: [email protected]
PCH: [email protected]
HDH: [email protected]
-
8/14/2019 A Comprehensive Resource for Integrating and Displaying Protein PTMs
3/20
pp.2
Abstract
Background
ProteinPost-TranslationalModification (PTM)playsanessentialrole incellularcontrolmechanisms thatadjust
protein physicalandchemicalproperties,folding, conformation, stabilityandactivity, thusalsoaltering protein
function.
Findings
dbPTM (version 1.0), which was developed previously, aimed on a comprehensive collection of protein
post-translational modifications. In thisupdate version (dbPTM2.0),wedevelopeda PTMdatabase towards an
expertsystemofproteinpost-translationalmodifications.Thedatabasecomprehensivelycollectsexperimentaland
predictiveproteinPTMsites.Inaddition,dbPTM2.0wasextendedtoa knowledgebasecomprisingthemodified
sites,solventaccessibilityofsubstrate,proteinsecondaryandtertiarystructures,proteindomains,proteinintrinsic
disorderregion,andproteinvariations.Moreover,thisworkcompilesabenchmarktoconstructevaluationdatasets
forcomputationalstudytoidentifyingPTMsites,suchasphosphorylatedsites,glycosylatedsites,acetylatedsites
andmethylatedsites.
Conclusions
The current release not only provides the sequence-based information, but also annotates the structure-based
informationforproteinpost-translationalmodification.Theinterfaceisalsodesignedtofacilitatetheaccesstothe
resource. Thiseffectivedatabaseisnowfreelyaccessibleathttp://dbPTM.mbc.nctu.edu.tw/.
-
8/14/2019 A Comprehensive Resource for Integrating and Displaying Protein PTMs
4/20
pp.3
Findings
Background
Protein Post-Translational Modification (PTM) plays a critical role in cellular control mechanism, including
phosphorylation for signal transduction, attachment of fatty acids for membrane anchoring and association,
glycosylation for changing protein half-life, targeting substrates, and promoting cell-cell and cell-matrix
interactions, and acetylation and methylation of histone for gene regulation [1]. Several databases collecting
information about protein modifications have been established through high-throughput mass spectrometry in
proteomics. UniProtKB/Swiss-Prot [2] collects many protein modification information with annotation and
structure. Phospho.ELM [3], PhosphoSite [4] and Phosphorylation Site Database [5] were developed for
accumulatingexperimentallyverifiedphosphorylationsites.PHOSIDA[6]integratesthousandsofhigh-confidence
invivophosphorylationsitesidentifiedbymassspectrometry-basedproteomicsinvariousspecies.Phospho3D[7]
isadatabaseof3Dstructuresofphosphorylationsites,whichstoresinformationretrievedfromthephospho.ELM
databaseandisenrichedwithstructuralinformationandannotationsattheresiduelevel.O-GLYCBASE[8]isa
databaseofglycoproteins,mostofwhichincludeexperimentallyverifiedO-linkedglycosylationsites.UbiProt[9]
stores experimental ubiquitylated proteins and ubiquitylation sites, which are implicated in protein degradation
throughanintracellularATP-dependentproteolyticsystem.Moreover,theRESIDproteinmodificationdatabaseisa
comprehensivecollectionofannotationsandstructuresforproteinmodificationsandcross-links,includingpre-,co-,
andpost-translationalmodifications[10].
dbPTM [11] was developed previously to integrate several databases to accumulate known protein
modifications,aswellastheputativeproteinmodificationspredictedbyaseriesofaccuratelycomputationaltools
[12,13].ThisupdatedversionofdbPTMwasenhancedtobecomeaknowledgebaseforproteinpost-translational
modifications,whichcomprises a variety of new features including the modified sites, solvent accessibility of
substrate, protein secondary and tertiary structures, protein domains and protein variations. We also collected
literature related to PTM, protein conservations and the specificity of substrate site. Especially for protein
phosphorylation,thesite-specificinteractionsbetweencatalytickinasesandsubstratesareprovided.Furthermore,a
variety of prediction tools have been developed for more than ten PTM types [14], such as phosphorylation,
glycosylation,acetylation,methylation,sulfationandsumoylation.Thisworkconstructedabenchmarkdatasetfor
computationalstudiesofproteinpost-translationalmodification.Thebenchmarkdatasetcanprovideastandardfor
-
8/14/2019 A Comprehensive Resource for Integrating and Displaying Protein PTMs
5/20
pp.4
measuring the performance of prediction tools that have been presented for identifying post-translational
modificationsitesofproteins.ThewebinterfaceofdbPTMisalsoredesignedandenhancedtofacilitatetheaccess
totheproposedresource.
Dataconstructionandcontent
As shown in Figure 1, the system architecture of dbPTM2.0 database comprises threemajor components: the
integrationofexternalPTMdatabases,thecomputationalidentificationofPTMs,andthestructuralandfunctional
annotations of PTMs. We integrated five PTM databases, including UniProtKB/Swiss-Prot (release 55.0) [1],
Phospho.ELM (version 7.0) [15], O-GLYCBASE (version 6.0) [8], UbiProt (version 1.0) [9] and PHOSIDA
(version 1.0) [6] for obtaining experimental protein modifications. The description and data statistics of these
databasesarebrieflygiveninTableS1(seeAdditionalfile1 -TableS1).Additionally,HumanProteinReference
Database(HPRD)[16],whichcompilesinvaluableinformationrelevanttofunctionsandPTMsofhumanproteins
inhealthanddisease,wasalsointegrated.
In the part ofcomputational identificationofPTMs,KinasePhos-like method [11-13,17]wasapplied for
identifying20typesofPTM,whichcontainatleast30experimentallyverifiedPTMsites.Thedetailedprocessing
flowofKinasePhos-likemethodsisdisplayedinFigureS1(SeeAdditionalfile1-FigureS1).Thelearnedmodels
were evaluated using k-fold cross validation. Table S2 (See Additional file 1 - Table S2) lists the predictive
performanceofthesemodels.Toreducethenumberoffalsepositivepredictions,thepredictiveparameterswereset
toensureamaximalofpredictivespecificity.
ThestatisticsoftheexperimentalPTMsitesandputativePTMsitesinthisintegralPTMdatabaseisgivenin
Table1.AfterremovingtheredundantPTMsitesamongsixdatabases,therearetotally45833experimentalPTM
sitesin thisupdateversion.AllexperimentalPTMsitesarefurthercategorizedbyPTMtypes.For instance,there
are31,363experimentalphosphorylationsitesand2,080experimentalacetylationsitesinthedatabase.Inaddition
totheexperimentalPTMsites,UniProtKB/Swiss-ProtprovidesputativePTMsitesbyusingsequencesimilarityor
evolutionary potential. Moreover, KinasePhos-like methods [11-13, 17] were adopted to construct the profile
hiddenMarkovmodels(HMMs) for twentytypes ofPTMs.Thesemodelswereappliedto identifythe potential
PTMsitesagainstproteinsequencesobtainedfromUniProtKB/Swiss-Prot.AsgiveninTable1,2,560,047sitesfor
allPTMtypeswere identified.The structural andfunctionalannotations ofproteinmodificationswereobtained
fromUniProtKB/Swiss-Prot[18],InterPro[19],ProteinDataBank[20]andRESID[10](SeeAdditionalfile1-
TableS3).
-
8/14/2019 A Comprehensive Resource for Integrating and Displaying Protein PTMs
6/20
pp.5
Utilityandmajorimprovements
Inorder toprovidemore effective informationaboutprotein modifications inthisupdate version,weextended
dbPTMto aknowledge base containing structuralproperties forPTM sites,PTMrelated literature,evolutionary
conservationofPTMsites,subcellularlocalizationofmodifiedproteinsand thebenchmarksetfor computational
studies.Table2showstheenhancementandnewfeaturessupportedinthisstudy.Firstofall,theintegratedPTM
resourceismorecomprehensivethanpreviousdbPTM,whichenrichesthePTMtypes,varyingfrom373to431
PTMtypes.TodetectthepotentialPTMsitesinUniProtKB/Swiss-ProtproteinswithoutanyPTMannotations,the
KinasePhos-like method was applied to 20 PTM types. Especially in protein phosphorylation, more than 60
kinase-specificpredictionmodelswereconstructedandappliedtoidentifythephosphorylationsiteswithcatalytic
kinases.
StructuralpropertiesofPTMsites
In order to facilitate the investigation of structural characteristics surrounding the PTM sites, protein tertiary
structure obtained from ProteinDataBank [20] was graphicallypresentedbyJmolprogram.Forproteinswith
tertiary structures (5% of UniProtKB/Swiss-Prot proteins), the protein structural properties, such as solvent
accessibility and secondary structure of residues, were calculated by DSSP [21]. The solvent accessibility of
residuesandsecondarystructureofresiduesforproteinswithouttertiarystructureswerepredictedbyRVP-net[22]
andPSIPRED[23],respectively.TheintrinsicdisorderregionswereprovidedusingDisopred2[24].
Figure 2 depicts an illustrative example that Insulin Receptor Substrate 1 (IRS1) of human
(UniProtKB/Swiss-ProtID: IRS1_HUMAN)can interactwithInsulinReceptor (INSR)andinvolveintheinsulin
signaling pathway [25]. Three fragments of ISR1 protein have tertiary structures inPDB. Structure 1K3A the
protein region from 891AA to902AA.Two experimental phosphorylation sites S892 andY896 locate inthe
region, and their solvent accessibility and secondary structure can be derived from the tertiary structures.The
solventaccessibilityandsecondarystructureinotherproteinregionswithouttertiarystructureswerecalculatedby
theintegratedprograms,RVP-netandPSIPRED,respectively.
Annotationofcatalytickinasesofproteinphosphorylationsites
In addition to the experimental annotations of catalytic kinases of protein phosphorylation, we applied
KinasePhos-likepredictionmethod[11-13,17]foridentifying20typesofPTM.Figure2givesanexamplethatthe
experimentalphosphorylationsiteS892of IRS1waspredictedtobecatalyzedbyproteinkinaseMAPKandCDK
withthe preferenceof prolineoccurredon position-2and+1 surrounding thephosphorylation site(position0).
Besides,Y896is predictedtobe catalyzed bykinaseIGF1R, theresultis consistentwithprevious investigation
-
8/14/2019 A Comprehensive Resource for Integrating and Displaying Protein PTMs
7/20
pp.6
[26]. Moreover, S892 is a protein variation site, which was mapped to a non-synonymous single nucleotide
polymorphism(SNP),basedontheannotationobtainedfromdbSNP[27].
EvolutionaryconservationofPTMsites
InordertodeterminewhetheraPTMsitesisconservedamongorthologousproteinsequences,weintegratedthe
databaseofClustersofOrthologousGroups(COGs)[28],whichcollected4873COGsin66unicellulargenomes
and4852clustersofeukaryoticorthologousgroups(KOGs)in7eukaryoticgenomes.ClustalW[29]programwas
adopted to implement the alignment of multiple protein sequences in each cluster, and the aligned profile is
providedintheresource.Anexperimentallyverifiedacetyllysinelocatedinaprotein-conservedregionindicatesan
evolutionaryinfluenceinwhichorthologoussitesinotherspeciescouldbeinvolvedinthesametypeofPTM(See
Additional file 1 - Figure S2). Furthermore, as the example shown in Figure 2, two experimentally verified
phosphorylationsitesareconserved.
PTMbenchmarkdatasetforbioinformaticsstudy
Duetothehigh-throughputofmassspectrometryinproteomics,theexperimentalsubstratesequencesofmorethan
tenPTMtypes,suchasphosphorylation,glycosylation,acetylation,methylation,sulfationandsumoylation,were
investigatedandusedfordevelopingthepredictiontools[14].Tounderstandthepredictiveperformanceof these
toolspreviously developed, it iscrucial to have a common standard for evaluating thepredictive performance
amongvariouspredictiontools.Therefore,weconstructedabenchmark,whichcomprisetheexperimentalsubstrate
sequencesforeachPTMtype.
TheprocesstocompiletheevaluationsetsisdescribedinFigureS3(SeeAdditionalfile1-FigureS3),based
oncriteriadevelopedbyChenetal.[30].Toremovetheredundancy,theproteinsequencescontainingthesame
typeofPTMsitesaregroupedbyathresholdof30%identitybyBLASTCLUST[31].Iftheidentityoftwoprotein
sequencesisgreaterthan30%,were-alignedthefragmentsequencesofthesubstratesbyBL2SEQ.Ifthefragment
sequencesoftwo substrates with the same locationare identical, only one ofthe substratewas includedin the
benchmarkdataset.Therefore,twentyPTMtypescontainingmorethan30experimentalsiteswerecompliedinthe
benchmarkdataset.
Enhancedwebinterface
Auser-friendlywebinterfaceisprovidedforsimplesearching,browsing,anddownloadingofproteinPTMdata.In
additiontothedatabasequerybytheproteinname,genename,UniProtKB/Swiss-ProtIDoraccession,itallows
the input of protein sequences for similarity search against UniProtKB/Swiss-Prot protein sequences (See
Additionalfile1-FigureS4).ToprovideanoverviewofPTMtypesandtheirmodifiedresidues,asummarytable
-
8/14/2019 A Comprehensive Resource for Integrating and Displaying Protein PTMs
8/20
pp.7
isprovidedforbrowsingtheinformationandtheannotationsaboutthepost-translationalmodificationtypes,which
are referred to theUniProtKB/Swiss-Prot PTM list (http://www.expasy.org/cgi-bin/lists?ptmlist.txt) and RESID
[10].
Figure 3 shows an example that users can choose the acetylation of lysine (K) to obtain more detailed
informationsuchasthepositionofmodifiedaminoacid,thelocationof themodificationinproteinsequence,the
modifiedchemicalformula,themassdifference,andthesubstratesitespecificity,whichisthepreferenceofamino
acidssurroundingthemodificationsites.Furthermore,thestructuralinformation,suchassolventaccessibilityand
secondarystructuresurroundingthemodifiedsites,areprovided.AlltheexperimentalPTMsitesandputativePTM
sitescanbedownloadedfromthewebinterface.
Conclusions The proposed server enables both wet-lab biologists and bioinformatics researchers to easily explore the
informationabout protein post-translational modifications. This study not only accumulates the experimentally
verifiedPTMsiteswithrelevantliteraturereferences,butalsocomputationallyannotatestwentytypesofPTMsites
against UniProtKB/Swiss-Prot proteins. As given in Table 2, the proposed knowledge base provides effective
informationof proteinPTMs,including sequenceconservation, subcellular localization andsubstratespecificity,
theaveragesolventaccessibilityandthesecondarystructuresurroundingthemodifiedsite.Moreover,weconstruct
aPTMbenchmarkdatasetthatcanbeadoptedforcomputationalstudiesinevaluatingthepredictiveperformance
of various tools about determining PTM sites. Previous investigations have indicated that many protein
modificationscausebindingdomainsforspecificprotein-proteininteractiontoregulatecellularbehavior[32].All
the experimental PTM sites and putative PTM sites are available and downloadable in the web interface.
ProspectiveworkofdbPTMistointegrateprotein-proteininteractiondata.
Availabilityandrequirements
Projectname:dbPTM2.0:AKnowledgeBaseforProteinPost-TranslationalModifications
ASMDprojecthomepage:http://dbPTM.mbc.nctu.edu.tw/
Operatingsystem(s):Platform-independent
ProgrammingLanguage:PHPPerl
Otherrequirements:amodernwebbrowser(withCSSandJavaScriptsupport)
Restrictionstousebynon-academics:None
-
8/14/2019 A Comprehensive Resource for Integrating and Displaying Protein PTMs
9/20
pp.8
Listofabbreviations
PTM:Post-TranslationalModification;HMMs:hiddenMarkovmodels;PDB:ProteinDataBank;SNP:single
nucleotidepolymorphism.
Competinginterests
Theauthorsdeclarethattheyhavenocompetinginterests.
Authors'contributions
HDHconceptualizedtheproject.TYLandHDHdesignedandbuiltthedatabase.TYL,PCHandWCCperformed
dataanalysis.TYLandJBKHdesignedandbuilttheinterfaces.TYL,JBKHandTYWcompiledapreviousversion
ofthedatabase.HDH,TYLandWCCwrotethedraft.Allauthorstestedthedatabaseandinterfaces.Allauthors
readandapprovedthefinalmanuscript.
Acknowledgements
TheauthorswouldliketothanktheNationalScienceCounciloftheRepublicofChinaforfinanciallysupporting
thisresearchundercontractNo.NSC95-2311-B-009-004-MY3andNSC97-2627-B-009-007.
Special thanks for
financialsupportfromtheNationalResearchProgramforGenomicMedicine(NRPGM),Taiwan.Thisworkwas
alsopartiallysupportedbyMOEATU.Funding topaytheOpenAccesspublicationcharges forthis articlewas
providedbyNationalScienceCounciloftheRepublicofChinaandMOEATU.
-
8/14/2019 A Comprehensive Resource for Integrating and Displaying Protein PTMs
10/20
pp.9
References
1. Farriol-MathisN,GaravelliJS,BoeckmannB,DuvaudS,GasteigerE,GateauA,VeutheyAL,BairochA:
Annotationofpost-translationalmodifications intheSwiss-Protknowledgebase.Proteomics 2004,
4(6):1537-1550.
2. BoeckmannB,BairochA,ApweilerR,BlatterMC,EstreicherA,GasteigerE,MartinMJ,MichoudK,
O'DonovanC,PhanIetal:TheSWISS-PROTproteinknowledgebaseanditssupplementTrEMBLin
2003.NucleicAcidsRes2003,31(1):365-370.
3. Diella F, Gould CM, Chica C, Via A, Gibson TJ: Phospho.ELM: a database of phosphorylation
sites--update2008.NucleicAcidsRes2008,36(Databaseissue):D240-244.
4. HornbeckPV,ChabraI,KornhauserJM,SkrzypekE,ZhangB:PhosphoSite:Abioinformaticsresourcededicatedtophysiologicalproteinphosphorylation.Proteomics2004,4(6):1551-1561.
5. Wurgler-Murphy SM,King DM, Kennelly PJ: The Phosphorylation Site Database: A guide to the
serine-, threonine-, and/or tyrosine-phosphorylatedproteins inprokaryotic organisms.Proteomics
2004,4(6):1562-1570.
6. Gnad F, Ren S, Cox J, Olsen JV, Macek B, Oroshi M, Mann M: PHOSIDA (phosphorylation site
database):management,structuralandevolutionaryinvestigation,andpredictionofphosphosites .
GenomeBiol2007,8(11):R250.
7. Zanzoni A, Ausiello G, Via A, Gherardini PF, Helmer-Citterich M: Phospho3D: a database of
three-dimensional structures of proteinphosphorylation sites.NucleicAcids Res2007,35(Database
issue):D229-231.
8. GuptaR,BirchH,RapackiK,BrunakS,HansenJE:O-GLYCBASEversion4.0:areviseddatabaseof
O-glycosylatedproteins.NucleicAcidsRes1999,27(1):370-372.
9. Chernorudskiy AL, Garcia A, Eremin EV, Shorina AS, Kondratieva EV, Gainullin MR:UbiProt: a
databaseofubiquitylatedproteins.BMCBioinformatics2007,8:126.
10. Garavelli JS: The RESID Database of ProteinModifications as a resource and annotation tool.
Proteomics2004,4(6):1527-1533.
11. LeeTY,HuangHD,HungJH,HuangHY,YangYS,WangTH:dbPTM:aninformationrepositoryof
proteinpost-translationalmodification.NucleicAcidsRes2006,34(Databaseissue):D622-627.
12. HuangHD,LeeTY,TzengSW,WuLC,HorngJT,TsouAP,HuangKT:IncorporatinghiddenMarkov
models for identifying protein kinase-specific phosphorylation sites. J Comput Chem 2005,
26(10):1032-1041.
13. Huang HD, Lee TY, Tzeng SW, Horng JT: KinasePhos: a web tool for identifying protein
kinase-specificphosphorylationsites.NucleicAcidsRes2005,33(WebServerissue):W226-229.
14. Zhou F, Xue Y, Yao X, Xu Y: A general user interface for prediction servers of proteins'
post-translationalmodificationsites.NatProtoc2006,1(3):1318-1321.
15. DiellaF,CameronS,GemundC,LindingR,ViaA,KusterB,Sicheritz-PontenT,BlomN,GibsonTJ:
-
8/14/2019 A Comprehensive Resource for Integrating and Displaying Protein PTMs
11/20
pp.10
Phospho.ELM:adatabaseof experimentallyverifiedphosphorylationsites ineukaryoticproteins.
BMCBioinformatics2004,5(1):79.
16. MishraGR,SureshM,KumaranK,KannabiranN,SureshS,BalaP,ShivakumarK,AnuradhaN,Reddy
R,Raghavan TM et al:Human protein reference database--2006 update.Nucleic Acids Res2006,
34(Databaseissue):D411-414.
17. WongYH,LeeTY,LiangHK,HuangCM,WangTY,YangYH,ChuCH,HuangHD,KoMT,HwangJK:
KinasePhos2.0:awebserverforidentifyingproteinkinase-specificphosphorylationsitesbasedon
sequencesandcouplingpatterns.NucleicAcidsRes2007,35(WebServerissue):W588-594.
18. YipYL,ScheibH,DiemandAV,GattikerA, FamigliettiLM,GasteigerE,BairochA:TheSwiss-Prot
variant page and theModSNP database: a resource for sequence and structure information on
humanproteinvariants.HumMutat2004,23(5):464-470.
19. MulderNJ,ApweilerR,AttwoodTK,BairochA,BatemanA,BinnsD,BiswasM,BradleyP,BorkP,
BucherPetal: InterPro:anintegrateddocumentation resource for proteinfamilies,domainsandfunctionalsites.BriefBioinform2002,3(3):225-235.
20. DeshpandeN,AddessKJ,BluhmWF,Merino-OttJC,Townsend-MerinoW,ZhangQ,KnezevichC,XieL,
Chen L, Feng Z et al:The RCSB Protein Data Bank: a redesigned query system and relational
databasebasedonthemmCIFschema .NucleicAcidsRes2005,33(Databaseissue):D233-237.
21. Kabsch W, Sander C: Dictionary of protein secondary structure: pattern recognition of
hydrogen-bondedandgeometricalfeatures.Biopolymers1983,22(12):2577-2637.
22. AhmadS,GromihaMM,SaraiA:RVP-net:onlinepredictionofrealvaluedaccessiblesurfaceareaof
proteinsfromsinglesequences.Bioinformatics2003,19(14):1849-1851.
23. McGuffinLJ,BrysonK,JonesDT:ThePSIPREDproteinstructurepredictionserver.Bioinformatics
2000,16(4):404-405.
24. WardJJ,SodhiJS,McGuffinLJ,BuxtonBF,JonesDT:Predictionandfunctionalanalysisofnative
disorderinproteinsfromthethreekingdomsoflife.JMolBiol2004,337(3):635-645.
25. GustafsonTA,HeW,CraparoA, SchaubCD,O'Neill TJ:Phosphotyrosine-dependent interactionof
SHC and insulin receptor substrate 1 with the NPEYmotif of the insulin receptor via a novel
non-SH2domain.MolCellBiol1995,15(5):2500-2508.
26. HersI,BellCJ,PooleAW,JiangD,DentonRM,SchaeferE,TavareJM:Reciprocalfeedbackregulation
of insulin receptor and insulin receptor substrate tyrosine phosphorylation by phosphoinositide
3-kinaseinprimaryadipocytes.BiochemJ2002,368(Pt3):875-884.
27. Sherry ST,WardMH,KholodovM,Baker J, Phan L, Smigielski EM,SirotkinK:dbSNP: theNCBI
databaseofgeneticvariation.NucleicAcidsRes2001,29(1):308-311.
28. TatusovRL,FedorovaND,JacksonJD,JacobsAR,KiryutinB,KooninEV,KrylovDM,MazumderR,
MekhedovSL,Nikolskaya ANetal:TheCOG database: anupdatedversion includeseukaryotes.
BMCBioinformatics2003,4:41.
29. Thompson JD, Higgins DG, Gibson TJ: CLUSTAL W: improving the sensitivity of progressive
multiplesequencealignmentthroughsequenceweighting,position-specificgappenaltiesandweight
matrixchoice.NucleicAcidsRes1994,22(22):4673-4680.
-
8/14/2019 A Comprehensive Resource for Integrating and Displaying Protein PTMs
12/20
pp.11
30. ChenH,XueY,HuangN,YaoX,SunZ:MeMo: awebtool forprediction ofproteinmethylation
modifications.NucleicAcidsRes2006,34(WebServerissue):W249-253.
31. AltschulSF,MaddenTL,SchafferAA,ZhangJ,ZhangZ,MillerW,LipmanDJ:GappedBLASTand
PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 1997,
25(17):3389-3402.
32. SeetBT,DikicI,ZhouMM,PawsonT:Readingproteinmodificationswithinteractiondomains .Nat
RevMolCellBiol2006,7(7):473-483.
-
8/14/2019 A Comprehensive Resource for Integrating and Displaying Protein PTMs
13/20
pp.12
Figures
Figure1-Thesystemarchitectureoftheknowledgebaseforproteintranslational
modification.
Itcomprisesthethreemajorcomponents:integrationofexternalexperimentalPTMdatabases,learningand
predictionof20typesofPTM,andannotationsofPTMknowledge(moredetailsinthetext).
Figure2-Apartofresultpageonthewebinterface.
AnexampleofgraphicalpresentationofPTMsitesandthestructuralcharacteristicsofhumanproteinIRS1.
Figure3-Anillustrativeexampletoshowthecatalyticspecificityofacetyllysine.
-
8/14/2019 A Comprehensive Resource for Integrating and Displaying Protein PTMs
14/20
pp.13
Tables
Table1-ThestatisticsofexperimentalPTMsitesandputativePTMsitesinthisstudy.
PTMtypes Modifiedresidues
No.of
experimentalsites
No.ofputative
sitesfrom
UniProtKB/Swis
s-Prot
No.of
HMM-predictedsitesindbPTM
Phosphorylation Serine,threonine,tyrosine,andhistidine 31,363 36,080 1,815,472
N-linked
GlycosylationAsparagineandlysine 3,264 77,571 179,955
O-linked
Glycosylation
Lysine,praline,serine,threonine,and
tyrosine1,896 2,558 386,545
C-linked
GlycosylationTryptophan 53 52 4,015
AcetylationN-terminalofsomeresiduesandsidechain
oflysineorcysteine 2,080 5,143 1,206
Amidation
GenerallyattheC-terminalofamature
activepeptideafteroxidativecleavageof
lastglycine
2,150 1,117 24,352
Hydroxylation Generallyofasparagine,aspartate,proline
orlysine 1,033 1,074 9,743
Methylation
GenerallyofN-terminalphenylalanine,
sidechainoflysine,arginine,histidine,
asparagineorglutamate,andC-terminal
cysteine
746 2,846 18,716
Pyrrolidone
CarboxylicAcid
N-terminalglutaminewhichhasformedan
internalcycliclactam. 598 584 12,322
Gamma-Carboxygl
utamicAcid Glutamate 371 361 1,924
Farnesylation Cysteine 61 216 5,349
Myristoylation Glycine 108 765 10,998
N-Palmitoylation Cysteine 33 1,279 6,554
S-Palmitoylation Cysteine 177 2,303 21,287
Geranyl-geranylati
onCysteine 47 819 14,317
S-diacylglycerol
cysteineCysteine 36 1,529 8,977
GPIanchoring C-terminalasparagine,asparate,andserine 27 681 -
DeamidationAmidatedasparagineandglutamine(needs
tobefollowedbyaG)38 26 2,022
Sulfation Serine,threonine,andtyrosine 196 626 15,654
Sumoylation Lysine 77 259 10,342
Ubiquitylation Lysine 286 516 8,865
ADP-ribosylation Arginine 3 203 -
Formylation OftheN-terminalmethionine 28 35 -
Citrullination Arginine 27 91 -
Nitration Tyrosine 47 5 1,432Bromination Tryptophan 18 3 -
FAD
O-8alpha-FADtyrosine,Pros-8alpha-FAD
histidine,S-8alpha-FADcysteine,and
Tele-8alpha-FADhistidine
12 116 -
S-nitrosylation Cysteine 9 93 -
Others 1049 2,958 -
Total 45,833 139,909 2,560,047
-
8/14/2019 A Comprehensive Resource for Integrating and Displaying Protein PTMs
15/20
pp.14
Table2-TheenhancedfeaturesinthisexpandingPTMdatabase(dbPTM2.0).
Features PreviousPTMdatabase[11] dbPTM2.0
Proteinentry UniProtKB/Swiss-Prot(release46) UniProtKB/Swiss-Prot(release55)
ExperimentalPTMresource
UniProtKB/Swiss-Prot,Phospho.ELM,andO-GLYCBASE
UniProtKB/Swiss-Prot,Phospho.ELM,
PHOSIDA,HPRD, O-GLYCBASE,and
UbiProt
Computationally
predictedPTMs
Phosphorylation,glycosylation,and
sulfation
About25typesofPTM(phosphorylation,
glycosylation,sulfation,acetylation,
methylation,sumoylation,hydroxylation,
etc.)
Proteinstructure ProteinDataBank(PDB) ProteinDataBank(PDB)
PTMannotation RESID(373PTMannotations) RESID(431PTMannotations)
Structuralinvestigation
ofPTMsites-
Solventaccessibility, secondarystructure
andintrinsicdisorderregion
Kinasefamily
annotation- KinBase
Proteindomain InterPro InterPro
Proteinvariation Swiss-ProtandEnsembl Swiss-ProtandEnsembl
Site-specificPTM
literature-
ExtractingthePTM-relatedliteraturesfrom
UniProtKB/Swiss-Prot,Phospho.ELM,
HPRD, O-GLYCBASE,andUbiProt
Substratespecificity -Aminoacidfrequency,solventaccessibility,
secondarystructureanddisorderregion
surroundingmodifiedsites
Evolutionary
conservationofPTM
sites
- COGandClustalW
PTMbenchmarksetfor
computationalstudies-
Providingthebenchmarkforconstructing
PTMtestsettocomparethepredictive
performanceofpredictiontools
Relationshipbetween
PTMandsubcellular
localization
- Analyzing the relationship between PTMandsubcellularlocalization
Graphicalvisualization
PTM, solvent accessibility, secondary
structure, protein variation, protein
domain,andtertiarystructure
PTM,solventaccessibility,secondary
structure,proteinvariation,proteindomain,
tertiarystructure,orthologousconserved
regions,substratesitespecificityandprotein
interactionnetwork
-
8/14/2019 A Comprehensive Resource for Integrating and Displaying Protein PTMs
16/20
pp.15
Additionalfiles
Additionalfile1
Fileformat:DOC
Title:Supplementaryfigures(S1,S2,S3,andS4)andtables(S1,S2,andS3)
Description: The data provided 4 figures and 3 tables. The description of each figures and
tablesaregivenbelow.FigureS1.ThedetailedprocessingflowofKinasePhos-likemethods.
FigureS2.Themultiplesequencealignmentoforthologousconservedregions.FigureS3.The
flowcharttoremovedataredundance.FigureS4.Exampleofsearchwebpages.TableS1.Data
statisticsoftheintegratedresources.TableS2.Theparametersandpredictiveperformanceof
thetrainedmodelswithbestaccuracyforeachPTMtype.TableS3.Thelistofintegrated
databasesandprograms.
-
8/14/2019 A Comprehensive Resource for Integrating and Displaying Protein PTMs
17/20
-
8/14/2019 A Comprehensive Resource for Integrating and Displaying Protein PTMs
18/20
-
8/14/2019 A Comprehensive Resource for Integrating and Displaying Protein PTMs
19/20
-
8/14/2019 A Comprehensive Resource for Integrating and Displaying Protein PTMs
20/20
Additional files provided with this submission:
Additional file 1: dbptm2_bmc rn_additional filet_20090226_wenchi.doc, 1023Khttp://www.biomedcentral.com/imedia/1323127087257761/supp1.doc
http://www.biomedcentral.com/imedia/1323127087257761/supp1.dochttp://www.biomedcentral.com/imedia/1323127087257761/supp1.doc