Protein functions prediction
description
Transcript of Protein functions prediction
![Page 1: Protein functions prediction](https://reader030.fdocuments.in/reader030/viewer/2022020417/56814457550346895db0f357/html5/thumbnails/1.jpg)
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2004.08
Protein functions prediction
![Page 2: Protein functions prediction](https://reader030.fdocuments.in/reader030/viewer/2022020417/56814457550346895db0f357/html5/thumbnails/2.jpg)
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2004.08
Introduction
Signal peptides Transmembrane
regions and topology PTM (post-
translational modifications)
Low complexity and biased regions
Repeats Coils
Secondary structure Antigenic peptides Domain/Motifs Tools The EMBOSS package
![Page 3: Protein functions prediction](https://reader030.fdocuments.in/reader030/viewer/2022020417/56814457550346895db0f357/html5/thumbnails/3.jpg)
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2004.08
Different techniques
Algorithms Sliding window, Nearest Neighbor Patterns, regular expression Weight matrices HMM, profiles Neural Networks Rules
![Page 4: Protein functions prediction](https://reader030.fdocuments.in/reader030/viewer/2022020417/56814457550346895db0f357/html5/thumbnails/4.jpg)
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2004.08
Sliding window
THISISATESTSEQVENCETHATDISPLAYSTHESLIDINGWINDQW
Score1Score2
Scoren
Width or Size=11, Step=5
Results are usually displayed as a graph, see example ->
![Page 5: Protein functions prediction](https://reader030.fdocuments.in/reader030/viewer/2022020417/56814457550346895db0f357/html5/thumbnails/5.jpg)
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2004.08
Patterns / regular expression
Pattern: <A-x-[ST](2)-x(0,1)-{V} Regexp: ^A.[ST]{2}.?[^V] Text: The sequence must start with an
alanine, followed by any amino acid, followed by a serine or a threonine, two times, followed by any amino acid or nothing, followed by any amino acid except a valine.
Simply the syntax differ…
![Page 6: Protein functions prediction](https://reader030.fdocuments.in/reader030/viewer/2022020417/56814457550346895db0f357/html5/thumbnails/6.jpg)
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2004.08
Weight matrices (PSSM)
![Page 7: Protein functions prediction](https://reader030.fdocuments.in/reader030/viewer/2022020417/56814457550346895db0f357/html5/thumbnails/7.jpg)
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2004.08
HMM / profiles
![Page 8: Protein functions prediction](https://reader030.fdocuments.in/reader030/viewer/2022020417/56814457550346895db0f357/html5/thumbnails/8.jpg)
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2004.08
Neural Networks
General principle: Example:
![Page 9: Protein functions prediction](https://reader030.fdocuments.in/reader030/viewer/2022020417/56814457550346895db0f357/html5/thumbnails/9.jpg)
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2004.08
Signals found in proteins
N-ter exportation - secretion mitochondria chloroplast
internal NLS (nuclear
localization signal)
C-ter GPI-anchor (Glycosyl
Phosphatidyl Inositol) other membrane
anchors (see PTM) other unknown ?
![Page 10: Protein functions prediction](https://reader030.fdocuments.in/reader030/viewer/2022020417/56814457550346895db0f357/html5/thumbnails/10.jpg)
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2004.08
Signals detection tools
SignalP MitoProt ChloroP Predotar PSort TargetP Sigcleave (EMBOSS) Phobius
Big-PI DGPI
![Page 11: Protein functions prediction](https://reader030.fdocuments.in/reader030/viewer/2022020417/56814457550346895db0f357/html5/thumbnails/11.jpg)
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2004.08
Transmembrane regions
Detection (signal peptide, hydropathy, helices) Organisation (topology)
![Page 12: Protein functions prediction](https://reader030.fdocuments.in/reader030/viewer/2022020417/56814457550346895db0f357/html5/thumbnails/12.jpg)
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2004.08
Transmembrane detection tools
TMHMM TMPred TopPred2 DAS HMMTop Tmap (EMBOSS)
Mixture of tools Phobius ConPred II
![Page 13: Protein functions prediction](https://reader030.fdocuments.in/reader030/viewer/2022020417/56814457550346895db0f357/html5/thumbnails/13.jpg)
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2004.08
Post translational modifications
Phosphorylation S - T - Y
N-glycosylation N
O-glycosylation S - T - (HO)K
Acetylation, methylation D - E - K
Sulfation Y
Farnesylation, myristylation, palmitoylation, geranylgeranylation, GPI-anchor C - Nter - Cter
Ubiquitination and family K - Nter
Inteins (protein splicing) Pre-translational
Selenoprotein C
![Page 14: Protein functions prediction](https://reader030.fdocuments.in/reader030/viewer/2022020417/56814457550346895db0f357/html5/thumbnails/14.jpg)
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2004.08
PTM detection
Pattern prediction (PROSITE)
Short or weak signal Frequent hit producer Best method is
experimental MS/MS detection
Most method use « rules » joining pattern detection and knowledge to predict sites.
NetOGlyc - Prediction of type O-glycosylation sites in mammalian proteins
DictyOGlyc - Prediction of GlcNAc O-glycosylation sites in Dictyostelium
YinOYang - O-beta-GlcNAc attachment sites in eukaryotic protein sequences
NetPhos - Prediction of Ser, Thr and Tyr phosphorylation sites in eukaryotic proteins
NMT - Prediction of N-terminal N-myristoylation
Sulfinator - Prediction of tyrosine sulfation sites
![Page 15: Protein functions prediction](https://reader030.fdocuments.in/reader030/viewer/2022020417/56814457550346895db0f357/html5/thumbnails/15.jpg)
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2004.08
Low complexity regions
repeats compositional bias PEST
![Page 16: Protein functions prediction](https://reader030.fdocuments.in/reader030/viewer/2022020417/56814457550346895db0f357/html5/thumbnails/16.jpg)
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2004.08
Low complexity / Repeats
DUST (DNA) / SEG de novo detection
RepeatMasker (DNA) search collection
REP search collection
REPRO, Radar de novo detection
PEST, PESTFind de novo detection
EMBOSS (DNA) einverted equicktandem etandem palindrome
EMBOSS (protein) oddcomp
![Page 17: Protein functions prediction](https://reader030.fdocuments.in/reader030/viewer/2022020417/56814457550346895db0f357/html5/thumbnails/17.jpg)
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2004.08
Coils
Helix of helix coiled-coil
Leu-zipper
![Page 18: Protein functions prediction](https://reader030.fdocuments.in/reader030/viewer/2022020417/56814457550346895db0f357/html5/thumbnails/18.jpg)
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2004.08
Coils detection
COILS Weight matrices
Paircoil, Multicoil Pairwise correlation
Marcoil HMM
Pepcoil (EMBOSS) Weight matrices
![Page 19: Protein functions prediction](https://reader030.fdocuments.in/reader030/viewer/2022020417/56814457550346895db0f357/html5/thumbnails/19.jpg)
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2004.08
Secondary structure
Structure to predict Alpha-helices Beta-sheets Turns Random coil
Garnier (EMBOSS) PHD DSC PREDATOR NNSSP Jpred Jnet Many others
![Page 20: Protein functions prediction](https://reader030.fdocuments.in/reader030/viewer/2022020417/56814457550346895db0f357/html5/thumbnails/20.jpg)
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2004.08
Antigenic peptide
Peptides binding to MHC class I
8, 9, 10 mers class II
15 mers (3+9+3) Depend highly on MHC
type
Use of experimental knowledge Databases of known
peptides
SYFPEITHI HLA_Bind (BIMAS) MAPPP combined expert Antigenic (EMBOSS) Many more
Prediction of proteasome cleavage sites
NetChop PaProc
![Page 21: Protein functions prediction](https://reader030.fdocuments.in/reader030/viewer/2022020417/56814457550346895db0f357/html5/thumbnails/21.jpg)
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2004.08
Domain / Motif
All the protein domain descriptors PROSITE PFAM SMART PRODOM BLOCKS PRINTS TIGRfam …
Federation: InterPro Many techniques
Patterns, Regexp PSSM (PSI-BLAST) Profiles HMM
![Page 22: Protein functions prediction](https://reader030.fdocuments.in/reader030/viewer/2022020417/56814457550346895db0f357/html5/thumbnails/22.jpg)
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2004.08
Other Tools
You can find some of them on our servers www.ch.embnet.org
Or on ExPASy server www.expasy.org/tools
Or ask Google!! www.google.com
![Page 23: Protein functions prediction](https://reader030.fdocuments.in/reader030/viewer/2022020417/56814457550346895db0f357/html5/thumbnails/23.jpg)
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2004.08
European Molecular Biology Open Software Suite
![Page 24: Protein functions prediction](https://reader030.fdocuments.in/reader030/viewer/2022020417/56814457550346895db0f357/html5/thumbnails/24.jpg)
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2004.08
How to use EMBOSS/Jemboss at SIB
![Page 25: Protein functions prediction](https://reader030.fdocuments.in/reader030/viewer/2022020417/56814457550346895db0f357/html5/thumbnails/25.jpg)
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2004.08
Free Open Source (for most Unix plateforms) GCG successor (compatible with GCG file format) More than 150 programs (ver. 2.9.0) Easy to install locally
but no interface, requires local databases Unix command-line only
Interfaces Jemboss, www2gcg, w2h, wemboss… (with account) Pise, EMBOSS-GUI, SRSWWW (no account) Staden, Kaptain, CoLiMate, Jemboss (local)
Access: www.emboss.org or emboss.sourceforge.net
![Page 26: Protein functions prediction](https://reader030.fdocuments.in/reader030/viewer/2022020417/56814457550346895db0f357/html5/thumbnails/26.jpg)
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2004.08
Format USA 'asis' :: Sequence [start : end : reverse] Format :: '@' ListFile [start : end : reverse] Format :: 'list' : ListFile [start : end : reverse] Format :: Database : Entry [start : end : reverse] Format :: Database - SearchField : Word [start : end : reverse] Format :: File : Entry [start : end : reverse] Format :: File : SearchField : Word [start : end : reverse] Format :: Program Program-parameters '|' [start : end : reverse]
Example: fasta::Swissprot:UBP5_HUMAN[200:300]
Databases Any can be added, use showdb to display the available databases
Some details
![Page 27: Protein functions prediction](https://reader030.fdocuments.in/reader030/viewer/2022020417/56814457550346895db0f357/html5/thumbnails/27.jpg)
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2004.08
showdbDisplays information on the currently available databases# Name Type ID Qry All Comment# ==== ==== == === === =======ipr_fetch P OK OK OK InterPro current by fetchipi_fetch P OK OK OK IPI current by fetchrefseq_fetch P OK OK OK refseq current by fetchrepbase_fetch P OK OK OK repbase current by fetchswiss_fetch P OK OK OK SwissProt current by fetchswissprot P OK OK OK SWISSPROT sequencestrembl P OK OK OK TREMBL sequencestrembl_fetch P OK OK OK trembl current by fetchtremblnew P OK OK OK TREMBL New sequencesug_fetch P OK OK OK Unigene by fetchembl N OK OK OK EMBL releaseemhum N OK OK OK EMBL release, Human section by emboss indexemrod N OK OK OK EMBL release, Rodent section by emboss indexemvrt N OK OK OK EMBL release, Vertebrate (nonhuman, nonrodent)
seqret (seqretall, seqretset, seqretsplit) entret (for complete untouched entry, e.g., for unigene, interpro,
swissprot…) Possible to define your own « .embossrc » file
databases
![Page 28: Protein functions prediction](https://reader030.fdocuments.in/reader030/viewer/2022020417/56814457550346895db0f357/html5/thumbnails/28.jpg)
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2004.08
Some tools for DNA redata Search REBASE for enzyme name, references, suppliers etc remap Display a sequence with restriction cut sites, translation etc restover Finds restriction enzymes that produce a specific overhang restrict Finds restriction enzyme cleavage sites showseq Display a sequence with features, translation etc silent Silent mutation restriction enzyme scan cirdna Draws circular maps of DNA constructs lindna Draws linear maps of DNA constructs revseq Reverse and complement a sequence …
![Page 29: Protein functions prediction](https://reader030.fdocuments.in/reader030/viewer/2022020417/56814457550346895db0f357/html5/thumbnails/29.jpg)
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2004.08
Example: remap
ECLAC E.coli lactose operon with lacI, lacZ, lacY and lacA genes. Hin6I TaqI | HhaI | Bsc4I | Bsu6I | | Hin6I | BssKI | | | HhaI AciI | | BsiSI \ \ \ \ \ \ \ \ GACACCATCGAATGGCGCAAAACCTTTCGCGGTATGGCATGATAGCGCCCGGAAGAGAGT 10 20 30 40 50 60 ----:----|----:----|----:----|----:----|----:----|----:----| CTGTGGTAGCTTACCGCGTTTTGGAAAGCGCCATACCGTACTATCGCGGGCCTTCTCTCA / / / / / / / /// | TaqI | Hin6I AciI | | ||BssKI Bsc4I HhaI | | |BsiSI | | Bsu6I | Hin6I HhaI# Enzymes that cut Frequency Isoschizomers AciI 1 Bsc4I 1 BsiSI 1 BssKI 1 Bsu6I 1 HhaI 2 Hin6I 2 HinP1I,HspAI TaqI 1# Enzymes that do not cutAclI BamHI BceAI Bse1I BshI ClaI EcoRI EcoRII Hin4I HindII HindIII HpyCH4IV KpnI NotI
![Page 30: Protein functions prediction](https://reader030.fdocuments.in/reader030/viewer/2022020417/56814457550346895db0f357/html5/thumbnails/30.jpg)
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2004.08
Example: cirdna File: ../../data/data.cirpStart 1001End 4270grouplabelBlock 1011 1362 3ex1endlabellabelTick 1610 8EcoR1endlabellabelBlock 1647 1815 1endlabellabelTick 2459 8BamH1endlabellabelBlock 4139 4258 3ex2endlabelendgroupgrouplabelRange 2541 2812 [ ] 5AluendlabellabelRange 3322 3497 > < 5MER13endlabelendgroup
![Page 31: Protein functions prediction](https://reader030.fdocuments.in/reader030/viewer/2022020417/56814457550346895db0f357/html5/thumbnails/31.jpg)
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2004.08
Example: plotorf
![Page 32: Protein functions prediction](https://reader030.fdocuments.in/reader030/viewer/2022020417/56814457550346895db0f357/html5/thumbnails/32.jpg)
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2004.08
EMBOSS format input/output
UFO Universal Feature Object gff, swissprot, embl, pir, nbrf (with or without sequence)
Alignments Multiple and pairwise, many flavors (FASTA, MSF, SRS…)
Reports Feature (UFO), SRS, motif, seqtable, excel, diffseq, listfile
(USA), etc… Sequences (compatible with USA)
Many!!! E.g., fasta, clustal, gcg, paup, gff, embl, swissprot, acedb, abi, etc…
![Page 33: Protein functions prediction](https://reader030.fdocuments.in/reader030/viewer/2022020417/56814457550346895db0f357/html5/thumbnails/33.jpg)
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2004.08
Web interfaces
PISE (Pasteur Institute Software Environment) http://www-alt.pasteur.fr/~letondal/Pise/
wEMBOSS (Belgium&Argentina) (not yet at SIB) http://www.wemboss.org
![Page 34: Protein functions prediction](https://reader030.fdocuments.in/reader030/viewer/2022020417/56814457550346895db0f357/html5/thumbnails/34.jpg)
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2004.08
Pise a tool to generate Web interfaces for Molecular Biology programs
http://emboss.ch.embnet.org/Pise
![Page 35: Protein functions prediction](https://reader030.fdocuments.in/reader030/viewer/2022020417/56814457550346895db0f357/html5/thumbnails/35.jpg)
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2004.08
http://www.wemboss.org
![Page 36: Protein functions prediction](https://reader030.fdocuments.in/reader030/viewer/2022020417/56814457550346895db0f357/html5/thumbnails/36.jpg)
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2004.08
![Page 37: Protein functions prediction](https://reader030.fdocuments.in/reader030/viewer/2022020417/56814457550346895db0f357/html5/thumbnails/37.jpg)
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2004.08
Launch Jembosshttp://emboss.ch.embnet.org/Jemboss
![Page 38: Protein functions prediction](https://reader030.fdocuments.in/reader030/viewer/2022020417/56814457550346895db0f357/html5/thumbnails/38.jpg)
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2004.08
Launch Jemboss
First time only…
Each time…
![Page 39: Protein functions prediction](https://reader030.fdocuments.in/reader030/viewer/2022020417/56814457550346895db0f357/html5/thumbnails/39.jpg)
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2004.08
Jemboss windows
![Page 40: Protein functions prediction](https://reader030.fdocuments.in/reader030/viewer/2022020417/56814457550346895db0f357/html5/thumbnails/40.jpg)
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2004.08
Jemboss windows other systems
![Page 41: Protein functions prediction](https://reader030.fdocuments.in/reader030/viewer/2022020417/56814457550346895db0f357/html5/thumbnails/41.jpg)
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2004.08
Summary
Anonymous web access through Pise Registered access through Jemboss Registered access through command-line
(requires UNIX skills)
Please report problems!
![Page 42: Protein functions prediction](https://reader030.fdocuments.in/reader030/viewer/2022020417/56814457550346895db0f357/html5/thumbnails/42.jpg)
Swiss Institute of BioinformaticsInstitut Suisse de Bioinformatique
LF-2004.08
Exercises
DEA Exercises web based sequence analysis The goal of this exercise is to use web based tools for protein sequence analysis
a) Take this TrEMBL sequence (Q9X252) and try a BLAST against swissprot with the complete protein or with the first 70 residues. Explain the difference. Use TMPred, SignalP, and COILS to help you.
b) Pass this sequence through PFSCAN and search all databases. Compare with this command on ludwig-sun1/2: hits -b "prf pat pfam" tr:Q9X252
c) use the different profile, motifs, pattern databases to get more information about the domain(s) you found.
d) How do you evaluate the PRINTS tropomyosin annotation in this TrEMBL entry (Q9WZH0)? List of useful links:
basic BLAST or advanced BLAST or PSI-BLAST TMPred prediction tool for transmembrane regions (or TMHMM) COILS prediction tool for coiled-coil regions SignalP prediction tool for signal-peptide cleavage site
Profile, domain, motifs databases and search sites: PFSCAN InterPro (Pfam, PRINTS, PROSITE, SMART) HITS