Post on 14-Jul-2020
BioinfRes SoSe 16
Bioinforma)csResources-Swissprot-
Lecture&ExercisesProf.B.Rost,Dr.L.Richter,J.Reeb
Ins)tutfürInforma)kI12
BioinfRes SoSe 16
Puta)veSchedule
Apr. 22nd Intro, General Overview (1. sh.) Jun 10th No-SQL (7.sh.) Apr. 29th Sequence Databases (2. sh.) Jun 17th No-SQL (8.sh.)* May 6th No lecture Jun 24th JavaScript / UI (9.sh.) May 13th Sequence Databases (3. sh.) Jul 1st Web Services (10.sh.) May 20th Structure Databases (4. sh)* Jul 8th Bioinformatics Suites / Forums May 27th SQL (5. sh.) Jul 15th Wrap Up, Q&A Jun 3rd SQL (6. sh) Jul 28th Exam, 10:30-12:00 MW1050
* These exercises can earn you a bonus
BioinfRes SoSe 16
XMLInfusion(in10sec)● compila)onfromhMp://www.w3schools.com/xml/default.asp
● XMLisasoQware-andhardware-independenttooltostoreandtotransportdata
● XMLstandsforeXtensibleMarkupLanguage
● designedtostoreandtransportdata● designedtobeself-descrip)ve
● W3Crecommenda)on
● itdoesNOTDOanything
BioinfRes SoSe 16
AboutTags
● XMLtagsarenotpredefinedlikeHTMLtags● everybodycan/hastoinventhisowntags
● newtagscanbeaddedany)me
● theauthorhastodefinecontentandstructureofthedocument
● everythingisplaintext
BioinfRes SoSe 16
DocumentStructure<?xml version="1.0" encoding="UTF-8"?>!<bookstore>!! <book category="cooking">! <title lang="en">Everyday Italian</title>! <author>Giada De Laurentiis</author>! <year>2005</year>! <price>30.00</price>! </book>!! <book category="children">! <title lang="en">Harry Potter</title>! <author>J K. Rowling</author>! <year>2005</year>! <price>29.99</price>! </book>!!....!</bookstore>!!takenfromhMp://www.w3schools.com/xml/xml_usedfor.asp
BioinfRes SoSe 16
SyntaxRules● elementsaredefinedusingtags:<tagName> ... </tagName>or<tagName/>!
● elementscanbenested(containotherelements-parentandchildnodes,siblingnodes)
● elementscanhavetextcontent
● eachdocumentmustcontainONErootelementthatistheparentofallotherelements
BioinfRes SoSe 16
SyntaxRefined
● prologline<?xml ...>isop)onal● tagsmustbe(self-)closed
● tagarecasesensi)ve
● tagsmustbeproperlynested:<a><b>....</a></b> Wrong!<a><b>....</b></a>! Right!
BioinfRes SoSe 16
SyntaxRefined● tagsmayhaveaMributes● aMributevaluesmustalwaysbequoted
● somespecialcharacterscannotbeuseddirectly
● ->codedbyen)tyreferences:< < lessthan> > greaterthan& & ampersand' ‘ apostrophe" “ quota)onmark
● comments:<!-- .... -->!
BioinfRes SoSe 16
TagNames● casesensi)ve● muststartwithaleMerorunderscore
● mustnotstartwiththeleMersxmlinanycase
● cancontain:leMers,digits,hyphens,underscoresandperiods
● cannotcontainspaces
● applycommonsenseandaconsistentstyle● avoid:minus(-),period(.),colon(:),non-englishcharactersforcompa)bilityreasons
BioinfRes SoSe 16
XMLElement
● everythingbetweenthestartandtheendtag● tagsareincluded
● cancontain:- text- aMributes- otherelements- amixofall
● areextensible
BioinfRes SoSe 16
XMLAMributes
● valuesmustbequoted:singleordoublequotes● theunusedcharactercanbeusedinsidethevalue
● decisionforaMributeorelementundecided,but:- aMributescannotcontainmul)plevalues- aMributescannotcontaintreestructures- aMributesarenoteasilyexpandable
● usefultostoremetadata,likeelementid,etc.
BioinfRes SoSe 16
AGlimpseofNamespaces
● allowtopreventtagnamecollisionsbetweendifferentauthors/applica)ons/domains
● implementedbytheintroduc)onofprefixes● definedasanaMribute:xmlns:prefix=“URI”!
● usage:<prefix:tagName>!● theURIisonlyneededtobeunique
● usedtointegrateotherspecifica)ons,e.g.XSLT
BioinfRes SoSe 16
LevelsofCorrectness● wellformed:adocumentobeythesyntaxrules:- rootelement- closingtag- casesensi)ve- properlynested- aMributevaluesquoted
● validdocuments:inadd)ontobeingvalidthealsoconformtoadocumenttypedefini)on(formatspecifica)on)
BioinfRes SoSe 16
DocumentTypeDefini)ons
● twowaystospecifyadocumentstructure:● DTD:DocumentTypDefini)on
● XMLSchema:XMLbasedalterna)vetoDTD
BioinfRes SoSe 16
Example
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE note SYSTEM "Note.dtd”> <note> <to>Tove</to> <from>Jani</from> <heading>Reminder</heading> <body>Don't forget me this weekend! ©right; </body> </note>!
BioinfRes SoSe 16
Example
<!DOCTYPE note [ <!ELEMENT note (to,from,heading,body)> <!ELEMENT to (#PCDATA)> <!ELEMENT from (#PCDATA)> <!ELEMENT heading (#PCDATA)> <!ELEMENT body (#PCDATA)> <!ENTITY copyright “Copyright by ..”> ]>!
BioinfRes SoSe 16
XMLDTD
● referencedfromadocumentwith:<!DOCTYPE note SYSTEM "Note.dtd">!
● !DOCTYPEdefinestherootelement● !ELEMENTdefinesthestructureoftheelements
● #PCDTAmeansparse-abletextdata● !ENTITYdefinesspecialcharactersorstrings
BioinfRes SoSe 16
XMLSchema● alterna)vetoDTD<xs:element name="note”> <xs:complexType> <xs:sequence> <xs:element name="to" type="xs:string"/> <xs:element name="from" type="xs:string"/> <xs:element name="heading" type="xs:string"/> <xs:element name="body" type="xs:string"/> </xs:sequence> </xs:complexType> </xs:element>!
● supportofdatatypesandnamespaces
● wriMeninXMLandextensible!
BioinfRes SoSe 16
NamesandOtherComplica)ons
AmosBairoch
takenfromhMp://web.expasy.org/images/people/Amos_Bairoch.jpg
IoannisXenarios
takenfromhMp://www.isb-sib.ch/people/Ioannis.Xenarios
BioinfRes SoSe 16
History
1986 A.BairochcreatedSwiss-Protatthe UniversityofGeneva,since1988in
collabora)onwithEMBL/EBI
1993 togetherwithRonAppellaunchofExPASy
1998 Founda)onofSIB(SwissIns)tuteof Bioinforma)cs)
2002 Founda)onoftheUniProtconsor)umby EBI,SIBandPIR
BioinfRes SoSe 16
UniProtComponents:● UniProtKB:- UniProtKB/Swiss-Prot- UniProtKB/TrEMBL
● UniParc:puresequencearchive,noannota)ons
● UniRef:consistsfothreedatabasesofclusteredsetsofproteinsequences(UniRef100,UniRef90,UniRef50)usingtheCD-HITalgorithm
● UniMes:datafrommetagenomicandenvironmentalsamples,notinUniProtKB
BioinfRes SoSe 16
ExPASy
● hMp://www.expasy.org● ExpertProteinAnalysisSystem(1993)
● now:SIBExPASyBioinforma)csResourcesPortal● Ar)moP,JonnalageddaM,ArnoldK,Bara)nD,CsardiG,de
CastroE,DuvaudS,FlegelV,For)erA,GasteigerE,GrosdidierA,HernandezC,IoannidisV,KuznetsovD,Liech)R,MoreoS,MostaguirK,RedaschiN,RossierG,XenariosI,andStockingerH.ExPASy:SIBbioinforma9csresourceportal,NucleicAcidsRes,40(W1):W597-W603,2012.
BioinfRes SoSe 16
ExpasyCategories
● Proteomics● Genomics
● StructuralBioinforma)cs
● Systemsbiology● Phylogeny/evolu)on
BioinfRes SoSe 16
ExpasyCategories
● Popula)ongene)cs● Transcriptomics
● Biophysics
● Imaging● DrugDesign
BioinfRes SoSe 16
ResourceDescrip)on
1. Resourcenameanddescrip)on2. MaintainingSIBgroup
3. Scien)ficcategory4. Keywords:acontrolledvocabularyisusedtotag
theresource
BioinfRes SoSe 16
ResourceDescrip)on
5. URLforthewebinterfaceandforthedownloadifavailable
6. SoQwaretype:website,commandlineinterface,GUI,etc
7. Status:greencheckboxifcurrentlyavailable
BioinfRes SoSe 16
UniProt/SwissProtSta)s)cs
● Release2016_05,May.11th● takenfromhMp://web.expasy.org/docs/relnotes/relstat.html
● 551.193sequenceentries(548.454in2015_05)/196.822.649aminoacids(195.409.447in2015_05)
BioinfRes SoSe 16
UniProt/SwissProtSta)s)cs● Growthoveroneyear:2016_5vs2015_5
Protein existence (PE) Entries % 1. Evidence at protein level 92.536
(85.419) 16.8
(15.6) 2. Evidence at transcript level 57.757
(61.814) 10.5
(11.3) 3. Inferred from homology 387.589
(387.733) 70.3
(70.7) 4. Predicted 11358
(11.526) 2.1
(2.1) 5. Uncertain 1.953
(1.962) 0.4
(0.4)
BioinfRes SoSe 16
Development
takenfromhMp://web.expasy.org/docs/relnotes/relstat1.pngforrelease2015_5
BioinfRes SoSe 16
MoreNumbers(rel.2015_5)
● Representedspecies:13.209● Top20species:116.206sequences,i.e.21.3%ofthetotalnumberofsequences
Entries No of Species Entries No of Species 1 5.495 8 228 2 1.899 9 214 3 1.023 10 122 4 657 11-20 711 5 487 21-50 426 6 399 51-100 213 7 289 >100 1.046
BioinfRes SoSe 16
SpeciesRepresenta)on(rel.2015_5)Top Frequency Species
1 20.198 Homo sapiens (Human) 2 16.711 Mus musculus (Mouse) 3 13.888 Arabidopsis thaliana (Mouse-ear cress) 4 7.921 Rattus norvegicus (Rat) 5 6.718 Saccharomyces cerevisiae (Baker’s yest) 6 5.993 Bos taurus (Bovine) 7 5.103 Schizosaccheromyces pombe (Fission yeast) 8 4.433 Escherichia coli K12 9 4.185 Bacillus subtilis 10 4.131 Dictyostelium discoideum (Slime mold) ... ... ...
BioinfRes SoSe 16
Representa)onoftheDivisions(rel.2015_5)
Archaea (4%), 19340
Bacteria (61%), 332110
Eukaryota (33%), 180411
Viruses (3%), 16593
BioinfRes SoSe 16
Distribu)onofEukaryota(rel.2015_5)
Human (11%), 20199
Other Mammalia
(26%), 46146
Other Vertebrata
(10%), 17823
Viridiplantae (20%), 36480
Fungi (17%), 31527
Insecta (5%), 8781
Nematoda (2%), 4417
Other (8%), 15038
BioinfRes SoSe 16
LengthDistribu)on(rel.2015_5)
0
10000
20000
30000
40000
50000
60000
70000
BioinfRes SoSe 16
AminoAcidComposi)on(rel.2015_5)
figure taken from http://web.expasy.org/docs/relnotes/relstat.html gray=aliphatic, red=acidic, green=small hydroxy, blue=basic, black=aromatic, white=amide, yellow=sulfur
BioinfRes SoSe 16
SwissProtAnnota)onProcess
● definedinhMp://www.uniprot.org/docs/sop_manual_cura)on.pdf
● explainedinhMp://www.uniprot.org/help/manual_cura)on
BioinfRes SoSe 16
Annota)onPhases
1. Sequencecura)on2. Sequenceanalysis3. Literaturecura)on4. Family-basedcura)on5. EvidenceaMribu)on6. Qualityassurance,integra)onandupdate
BioinfRes SoSe 16
SequenceCura)on
● morethan95%aretranslatedCDSfromINSDC● othersources:PDB,directproteinsequencing,projectsnotsubmiongtoINSDC
● sequencesareselectedaccordingtocura)onpriori)es(hMp://www.uniprot.org/program/)
● resultsinthe“canonicalsequence”foragene/speciespair
BioinfRes SoSe 16
Stepstowardthecanonicalsequence
● Entryselec)on● RunBLASTsimilaritysearchestoiden)fyaddi)onalsequencesforthesamegene
● Iden)fyhomologsbyreciprocalBLASTandphylogenybasedresources
● Lockselectedentriesforothercuratorstopreventduplica)on
BioinfRes SoSe 16
Stepstowardthecanonicalsequence● PreparesequencealignmentswithT-Coffee,Muscle,ClustalW
● Mergeintothecanonicalsequence:- mostprevalent- mostsimilartoorthologssequencesfoundinotherspecies
- basedonlengthandaacomposi)onitallowstheclearestdescrip)on
- default:longest
● recordconflictsandvaria)ons
BioinfRes SoSe 16
SequenceAnalysis
● Severalanalysisprogramsareappliedtothesequencesfor:- topologicalfeatures- post-transla)onalmodifica)ons- domains
● allresultsaremanuallycheckedandin-orexcludedforannota)on
BioinfRes SoSe 16
TopologicalAnalysis
Tools Prediction Signal P Presence and location of signal peptides TargetP Presence and location of transit peptides Predotar Mitochondrial, plastid or ER targeting sequences ESKW Transmembrane domains MEMSAT Transmembrane domains TMHMM Transmembrane domains Phobius Discriminates transmembrane and signal regions
BioinfRes SoSe 16
Post-transla)onalmodifica)onAnalysis
Tools Prediction GPI-predictor GPI lipid anchor sites NetNGlyc N-glycosylation sites NetOGlyc O-glycosylation sites NMT Predictor N-terminal myristoylation sites Sulfinator Tyrosine sulfatation sites
BioinfRes SoSe 16
DomainAnalysis
Tools Prediction ps_scan internal PROSITE profile, pattern and rule scanning InterPro retrieves non-PROSITE motif matches using InterPro database or
InterProScan Coils Coiled-coils regions polyAA internal program which identifies homopolymeric stretches of amino
acids REPEAT identifies the following repeats: Ankyrin, Armadillo, HAT, HEAT,
Kelch, Leucine-rich, PFTA, PFTB, RCC1, TPR, WD40
BioinfRes SoSe 16
Automatically selected results are returned in a graphical interface which allows visualisation of the predictions (Figure 1). Selected features are shown in green and unselected features are shown in red. The selected/unselected state of a feature can be toggled by clicking on it.
Figure 1. UniProtKB sequence analysis results displayed in graphical interface
All predictions are manually reviewed and relevant results are selected for inclusion in the entry. The sequence analysis platform then transforms the selected features into UniProtKB annotation by applying a set of automatic annotation rules (Figure 2).
taken from http://www.uniprot.org/docs/sop_manual_curation.pdf
BioinfRes SoSe 16
LiteratureCura)on
● Iden)fica)onofrelevantscien)ficliteraturefrom- literatureandtextminingresources(PubMed,EuropePMC,iHOP,TextPresso)
- addi)onsfromothersourcesmadebythecurator
● Informa)onisextractedformthefulltext:- generalannota)ons(notposi)onspecific)- posi)onspecificannota)ons
BioinfRes SoSe 16
GeneralAnnota)ons
● hMp://www.uniprot.org/help/general_annota)on
● posi)on-independent● containsmostlygeneralbiologicalinforma)onlike:func)ons,cataly)cac)vity,cofactor,enzymeregula)on,subunitstructure,pathway,...
BioinfRes SoSe 16
SequenceAnnota)ons
● posi)ondependent● hMp://www.uniprot.org/help/sequence_annota)on
● regionsorsitesofinterestlikepost-transla)onalmodifica)ons,bindingsites,ac)vesites,etc.
● containsseveralsubsec)ons:moleculeprocessing,regions,sites,aminoacidmodifica)ons,naturalvariants,experimentalinfo,secondarystructure
BioinfRes SoSe 16
Family-basedCura)on
● Evalua)onandcura)onofhomologsasdescribedabove
● Standardiza)onofannota)onofhomologs● Propaga)onofannota)onacrossthehomologstoensureconsistency
BioinfRes SoSe 16
EvidenceAMribu)on● Everyannota)onisaMributedtoitsoriginalsource
● Everyannota)oncanbetracedbackandevaluated
● Forevidencedis)nc)onthereare7codesfromtheEvidenceCodeOntology(ECO)usedformanuallycuratedentries
● hMp://www.uniprot.org/help/evidences● Addi)onalGOtermannota)on
BioinfRes SoSe 16
done through the use of a subset of evidence codes from the Evidence Code Ontology (ECO) (24). There are seven ECO evidence codes used in manually curated entries as shown in Table 2.
Table 2. Evidence Code Ontology (ECO) codes used during the UniProt manual curation process
ECO code Term name Usage ECO:0000269 experimental evidence used in
manual assertion Information for which there is published experimental evidence
ECO:0000303 non-traceable author statement used in manual assertion
Information based on author statements in scientific articles for which there is no experimental support
ECO:0000250 sequence similarity evidence used in manual assertion
Information which has been propagated from a related experimentally characterised protein
ECO:0000312 imported information used in manual assertion
Information which has been imported from another database and manually verified
ECO:0000305 curator inference used in manual assertion
Information which has been inferred by a curator based on his/her scientific knowledge or on the scientific content of an article
ECO:0000255 match to sequence model evidence used in manual assertion
Information originating from the UniProt automatic annotation systems or any of the sequence analysis programs used during the manual curation process and which has been manually verified
ECO:0000244 combinatorial evidence used in manual assertion
Information which is manually curated based on a combination of experimental and computational evidence
Full details of the evidences used in UniProtKB are available at http://www.uniprot.org/manual/evidences. 4.11 GO annotation Gene Ontology (GO) terms are assigned based on experimental data from the literature. Relevant terms are identified using the QuickGO (25) browser and are assigned to entries using the Protein2GO curation tool. This tool has been developed within the UniProt group and is used both by UniProt and by other members of the GO Consortium. GO terms are also propagated to homologous proteins where appropriate. The procedure is described in more detail at http://www.ebi.ac.uk/GOA/ManualAnnotationEfforts. 4.12 Quality control and integration All finished entries are run through a series of automated checks which verify a large number of biological rules such as the positions and relevance of amino acids cited in the entry. Any reported errors are corrected. Once an entry has passed the automated checks, it undergoes manual review by a senior curator to ensure that all relevant sequences have been merged, that all relevant literature has been added, that the annotation has been added correctly, and that all relevant sequence analysis results have been included. Once an entry has passed the automated and manual quality control checks, it is integrated into the database. 4.13 Unlock finished entries Integrated entries are unlocked so that they are available for further curation.
taken from http://www.uniprot.org/docs/sop_manual_curation.pdf
BioinfRes SoSe 16
QualityControlandIntegra)on
● Finishedentriesrunthroughaseriesofrule-basedcheckedconcerningespeciallyposi)onsandregions
● Allerrorsarecorrected
● Manuallyreviewedbyaseniorcurator
● Finallyitisintegratedintothedatabase● Unlockthefinishedentriesforfurthercura)on
BioinfRes SoSe 16
Demostra)on
● hMp://www.uniprot.org/uniprot/P62756#sec)on_features
BioinfRes SoSe 16
TheSwiss-ProtFlatFile● hMp://web.expasy.org/docs/userman.html● Anentryiscomposedbydifferentlinetypes
● Linetypeshavetheirownformat
● FollowsEMBLNucleo)deSequenceDatabaseformatascloseaspossible
● 2sec)ons:- coredata(sequencedata,cita)oninfo,taxonomy)- annota)ons(func)on,modifica)on,domains,secandquartstructure,diseaseassocia)ons,conflicts,asf)
BioinfRes SoSe 16
Line Code
Content Occurence in an entry
ID Identification Once; starts the entry AC Accession number(s) Once or more DT Date Three times DE Description Once or more GN Gene name(s) Optional OS Organism species Once or more OG Organelle Optional OC Organism classification Once or more OX Taxonomy cross-reference Once OH Organism host Optional
--continued--
The following table lists the available two-letter line codes. Each code is followed by three blanks.
BioinfRes SoSe 16
Line Code
Content Occurence in an entry
RN Reference number Once or more RP Reference position Once or more RC Reference comment(s) Optional RX Reference cross-reference(s) Optional RG Reference group Once or more (Optional if RA line) RA Reference authors Once or more (Optional if R line) RT Reference title Optional RL Reference location Once or more CC Comments or notes Optional DR Database cross-references Optional PE Protein existence Once KW Keywords Optional FT Feature table data Once or more in Swiss-Prot, optional in TrEMBL SQ Sequence header Once (blanks) Sequence data Once or more // Termination line Once; ends the entry
BioinfRes SoSe 16
FieldsinMoreDetail
● IDline:IDEntryNameStatus;SequenceLength.
● EntryName:upto11uppercasealphanumericcharactersX_Y- Xisamnemoniccodeofatmost5alphanumericcharacters
- Yisamnemonicspeciesiden)fica)oncodeofatmost5alphanumericcharacters
● IDCYC_BOVINReviewed;104AA.
BioinfRes SoSe 16
● ACline:ACAC_number_1;[AC_number_2;]...[AC_number_N;]
● Accessionnumber:6or10characters1 2 3 4 5 6 7 8 9 10 [A-N,R-Z][0-9][A-Z] [A-Z,0-9][A-Z,0-9][0-9][O,P,Q] [0-9][A-Z,0-9][A-Z,0-9][A-Z,0-9][0-9][A-N,R-Z][0-9][A-Z] [A-Z,0-9][A-Z,0-9][0-9][A-Z] [A-Z,0-9] [A-Z,0-9] [0-9]
● RegEx:[OPQ][0-9][A-Z0-9]{3}[0-9]|[A-NR-Z][0-9]([A-Z][A-Z0-9]{2}[0-9]){1,2}
● Examples:P12345,Q1AAA9,A0A022YWF9
BioinfRes SoSe 16
● DTline:date,DD-MMM-YYYY● alwaysoneofthebiweeklyreleasedates
● alwaysthreelines:- dateofintegra)on- dateofsequenceversion,sequenceversionX- dateofentryversion,entryversionX
● Example:DT01-FEB-1999,integratedintoUniProtKB/TrEMBL.DT15-OCT-2000,sequenceversion2.DT15-DEC-2004,entryversion5.
BioinfRes SoSe 16
● DElines:- threecategoriesandaddi)onalsubcategories- containsarecommendedname- besides:fullname,shortname,ECnumber- alterna)venames:e.g.asanallergenorinbiotechnology,...
BioinfRes SoSe 16
DERecName:Full=AnnexinA5;DEShort=Annexin-5;DEAltName:Full=AnnexinV;DEAltName:Full=Lipocor)nV;DEAltName:Full=EndonexinII;DEAltName:Full=CalphobindinI;DEAltName:Full=CBP-I;DEAltName:Full=Placentalan)coagulantproteinI;DEShort=PAP-I;DEAltName:Full=PP4;DEAltName:Full=Thromboplas)ninhibitor;DEAltName:Full=Vascularan)coagulant-alpha;DEShort=VAC-alpha;DEAltName:Full=AnchorinCII;DERecName:Full=Granulocytecolony-s)mula)ngfactor;DEShort=G-CSF;DEAltName:Full=Pluripoie)n;DEAltName:Full=Filgras)m;DEAltName:Full=Lenogras)m;DEFlags:Precursor;
BioinfRes SoSe 16
● OSline:origina)ngorganism● OSHomosapiens(Human).● OSRoussarcomavirus(strainSchmidt-RuppinA)(RSV-SRA)(Avianleukosis
OSvirus-RSA).
● OClines:containthetaxonomicclassifica)onofthesourceorganismaccordingto(hMp://www.ncbi.nlm.nih.gov/Taxonomy/)
● OCNode[;Node...].
● OCEukaryota;Metazoa;Chordata;Craniata;Vertebrata;Euteleostomi;OCMammalia;Eutheria;Euarchontoglires;Primates;Catarrhini;Hominidae;OCHomo.
BioinfRes SoSe 16
RN,RP,RC,RX,RG,RA,RT,RL● canoccurmul)ple)me● orderinblockfixed
● e.g:RN[1]RPNUCLEOTIDESEQUENCE[MRNA](ISOFORMSAANDC),FUNCTION,INTERACTIONRPWITHPKC-3,SUBCELLULARLOCATION,TISSUESPECIFICITY,DEVELOPMENTALRPSTAGE,ANDMUTAGENESISOFPHE-175ANDPHE-221.RCSTRAIN=BristolN2;RXPubMed=11134024;DOI=10.1074/jbc.M008990200;RAZhangL.,WuS.-L.,RubinC.S.;RT"AnoveladapterproteinemploysaphosphotyrosinebindingdomainandRTexcep)onallybasicN-terminaldomainstocaptureandlocalizeanRTatypicalproteinkinaseC:characteriza)onofCaenorhabdi)selegansRTCkinaseadapter1,aproteinthatavidlybindsproteinkinaseC3.“;RLJ.Biol.Chem.276:10463-10475(2001).
BioinfRes SoSe 16
CClines
● freetext● containsmostoftheannotatedinforma)on● CC-!-TOPIC:Firstlineofacommentblock;
CCsecondandsubsequentlinesofacommentblock.
● structuredbypredefinedtopicslike:Allergen,Alterna)veProducts,..,Cofactor,...,Disease,..Domain,...,Func)on,Interac)on,.......
BioinfRes SoSe 16
CC -!- ALLERGEN: Causes an allergic reaction in human. Minor allergen of!
CC bovine dander.!
CC -!- ALTERNATIVE PRODUCTS:!
CC Event=Alternative initiation; Named isoforms=2;!
CC Name=Alpha;!
CC IsoId=P51636-1; Sequence=Displayed;!
CC Name=Beta;!
CC IsoId=P51636-2; Sequence=VSP_018696;!
CC -!- SUBCELLULAR LOCATION: Cell membrane {ECO:0000250}; Peripheral!
CC membrane protein {ECO:0000250}. Secreted {ECO:0000250}. Note=The!
CC last 22 C-terminal amino acids may participate in cell membrane!
CC attachment.!
CC -!- SUBCELLULAR LOCATION: Isoform 2: Cytoplasm {ECO:0000305}.!
!
!
BioinfRes SoSe 16
CrossReferences
● toomanytoenumerate● extensivereferenceswithnucleo)dedatabases,e.g.:inEMBLFTCDS302..2674FT/protein_id="CAA03857.1“FT/db_xref="SWISS-PROT:P26345“FT/gene="recA“FT/product="RecAprotein“inSwiss=ProtDREMBL;AJ297977;CAC17465.1;-;Genomic_DNA.DREMBL;X56491;CAA39846.1;ALT_FRAME;mRNA.
BioinfRes SoSe 16
KeyWords/FeatureTable
● KWKeyword[;Keyword...].● helpstosearchresp.indexthedatabase
● nolimits:KW3D-structure;Alterna)vesplicing;Alzheimerdisease;Amyloid;KWApoptosis;Celladhesion;Coatedpits;Copper;KWDirectproteinsequencing;Diseasemuta)on;Endocytosis;KWGlycoprotein;Heparin-binding;Iron;Membrane;Metal-binding;KWNotchsignalingpathway;Phosphoryla)on;Polymorphism;KWProteaseinhibitor;Proteoglycan;Serineproteaseinhibitor;Signal;KWTransmembrane;Zinc.
● FeaturetablelikeGenBank/EMBL/DDBJ
BioinfRes SoSe 16
Programma)cAccess
● hMp://www.uniprot.org/help/programma)c_access(rememberthislink!)
● severalusecasesdocumented,butnotasanAPI● bestway:usethewebinterfacetoconstruct/refineyourqueryfirstbeforeyoutrytoautomatetheprocess
BioinfRes SoSe 16
RetrievinganIndividualEntry
● usessimpleURLwhichcanbebookmarked● forindividualentries:hMp://www.uniprot.org/uniprot/P12345
● defaultresultisawebpage
● alterna)veformats:txt,xml,rdf,fasta,gff
● specifiedviatheaccessionsuffix
● structuredformatslikexmlorrdfcanincludereferencedentries
BioinfRes SoSe 16
UsingtheIDmappingservice
● hMp://www.uniprot.org/help/programma)c_access#batch_retrieval_perl_example
● useshMpPOSTmethod
● convertsbetweendifferentdatabaseIDs
● youhavetoknowthespecificabbrevia)onfortherespec)vedatabases
BioinfRes SoSe 16
RetrievingEntriesviaQueries
● useshMpGETmethodi.e.● thequerystringispartoftheURL
● structuremightbequitecomplex
● usethebrowsertoconfigurethequerystring● moreseongareavailableviathequerybuilderhMp://www.uniprot.org/help/advanced_search
● theURLlengthmightbelimitedto1000characters
BioinfRes SoSe 16
Examples● hMp://www.uniprot.org/uniprot/P12345.txt● hMp://www.uniprot.org/uniprot/P12345.xml
● hMp://www.uniprot.org/uniref/UniRef90_P04259.xml
● hMp://www.uniprot.org/uniref/UniRef90_P04259.rdf
● hMp://www.uniprot.org/uniref/UniRef90_P04259.fasta
● hMp://www.uniprot.org/uniref/UniRef90_P04259.tab