P a t t e r n d a t a b a s e s
description
Transcript of P a t t e r n d a t a b a s e s
![Page 1: P a t t e r n d a t a b a s e s](https://reader035.fdocuments.in/reader035/viewer/2022062423/568146ba550346895db3e951/html5/thumbnails/1.jpg)
PPaatttteerrnn ddaattaabbaasseess
Gopalan Vivek
![Page 2: P a t t e r n d a t a b a s e s](https://reader035.fdocuments.in/reader035/viewer/2022062423/568146ba550346895db3e951/html5/thumbnails/2.jpg)
Pattern databases - topics
Definition Applications Classifications Common Databases Conclusions
![Page 3: P a t t e r n d a t a b a s e s](https://reader035.fdocuments.in/reader035/viewer/2022062423/568146ba550346895db3e951/html5/thumbnails/3.jpg)
Pattern databases
Definition Applications Classifications Common Databases Conclusions
![Page 4: P a t t e r n d a t a b a s e s](https://reader035.fdocuments.in/reader035/viewer/2022062423/568146ba550346895db3e951/html5/thumbnails/4.jpg)
Secondary databases derived from conserved obtained from multiple sequence alignment of primary databases such as GenBank, EMBL,DDBJ, SP/TrEMBL,PIR,etc
Pattern databases – definition
![Page 5: P a t t e r n d a t a b a s e s](https://reader035.fdocuments.in/reader035/viewer/2022062423/568146ba550346895db3e951/html5/thumbnails/5.jpg)
Primary databases(SWISS-PROT - Protein
GenBank - DNA)
Millions of sequences
Pattern databases
Pattern Extraction - Multiple sequence alignment
Thousands of patterns
![Page 6: P a t t e r n d a t a b a s e s](https://reader035.fdocuments.in/reader035/viewer/2022062423/568146ba550346895db3e951/html5/thumbnails/6.jpg)
Pattern databases
Definition Applications Classifications Common Databases Conclusions
![Page 7: P a t t e r n d a t a b a s e s](https://reader035.fdocuments.in/reader035/viewer/2022062423/568146ba550346895db3e951/html5/thumbnails/7.jpg)
Pattern Databases - Applications
Function prediction of protein/ nucleotide sequences even when sequence similarity is low (<25%).
Useful for classification of protein sequences into families.
It takes less time to search the pattern than the primary database.– Since “patterns” is the compact representation of
features of many sequences.
![Page 8: P a t t e r n d a t a b a s e s](https://reader035.fdocuments.in/reader035/viewer/2022062423/568146ba550346895db3e951/html5/thumbnails/8.jpg)
Pattern databases
Definition Applications Classifications Common Databases Conclusions
![Page 9: P a t t e r n d a t a b a s e s](https://reader035.fdocuments.in/reader035/viewer/2022062423/568146ba550346895db3e951/html5/thumbnails/9.jpg)
Multiple Sequence Alignment (MSA)
Family based databases – considers full MSA
Motif -3Motif -1
Motif based databases – considers local regions in MSA
![Page 10: P a t t e r n d a t a b a s e s](https://reader035.fdocuments.in/reader035/viewer/2022062423/568146ba550346895db3e951/html5/thumbnails/10.jpg)
Pattern Databases – Protein
Motif based PROSITE PRINTS BLOCKS
Family based ProDom PIR-ALN ProtoMap DOMO ProClass Pfam SMART TIGRFAMs SBASE SYSTERS
![Page 11: P a t t e r n d a t a b a s e s](https://reader035.fdocuments.in/reader035/viewer/2022062423/568146ba550346895db3e951/html5/thumbnails/11.jpg)
InterPro - Integrated resources of protein families and sites PROSITE PRINTS BLOCKS Pfam ProDom
InterPro
![Page 12: P a t t e r n d a t a b a s e s](https://reader035.fdocuments.in/reader035/viewer/2022062423/568146ba550346895db3e951/html5/thumbnails/12.jpg)
Pattern databases
Definition Applications Classifications Common Databases
– PROSITE, PRINTS, BLOCKS & SMART (motif based)
– MetaFam, InterPro (Integrated databases)
Conclusions
![Page 13: P a t t e r n d a t a b a s e s](https://reader035.fdocuments.in/reader035/viewer/2022062423/568146ba550346895db3e951/html5/thumbnails/13.jpg)
Databases – General Tips
1. Source
2. Input formats & parameters
3. Output formats
4. Quality of the data
5. Other details – updates, coverage, speed, download, reference, methods etc.
![Page 14: P a t t e r n d a t a b a s e s](https://reader035.fdocuments.in/reader035/viewer/2022062423/568146ba550346895db3e951/html5/thumbnails/14.jpg)
Focus To search pattern databases using the text
or keyword search options in them for “Alkaline phosphatase” enzyme.
To analyze the quality of results from each of these database– Sensitivity, specificity.
Sequence & Pattern searches- In the afternoon’s practical.
![Page 15: P a t t e r n d a t a b a s e s](https://reader035.fdocuments.in/reader035/viewer/2022062423/568146ba550346895db3e951/html5/thumbnails/15.jpg)
PROSITE http://www.expasy.org/prosite/
consists of biologically significant protein sites, patterns and profiles that help to reliably identify to which known protein family (if any) a new sequence belongs.
Based on SWISSPROT/TrEMBL
![Page 16: P a t t e r n d a t a b a s e s](https://reader035.fdocuments.in/reader035/viewer/2022062423/568146ba550346895db3e951/html5/thumbnails/16.jpg)
Text Search
Sequence Scanner
ID and text Search
http://www.expasy.org/prosite/
![Page 17: P a t t e r n d a t a b a s e s](https://reader035.fdocuments.in/reader035/viewer/2022062423/568146ba550346895db3e951/html5/thumbnails/17.jpg)
![Page 18: P a t t e r n d a t a b a s e s](https://reader035.fdocuments.in/reader035/viewer/2022062423/568146ba550346895db3e951/html5/thumbnails/18.jpg)
Details about the pattern/profileDetails about the pattern/profile
PROSITE IDPROSITE ID
PROSITE PatternPROSITE Pattern
Result: PROSITE Documentaion pageResult: PROSITE Documentaion page
[IV]-x-D-S-[GAS]-[GASC]-[GAST]-[GA]-T [S is the active site residue]
![Page 19: P a t t e r n d a t a b a s e s](https://reader035.fdocuments.in/reader035/viewer/2022062423/568146ba550346895db3e951/html5/thumbnails/19.jpg)
Numerical ResultsNumerical Results
PROSITE PatternPROSITE Pattern
Detailed View - page 1Detailed View - page 1
![Page 20: P a t t e r n d a t a b a s e s](https://reader035.fdocuments.in/reader035/viewer/2022062423/568146ba550346895db3e951/html5/thumbnails/20.jpg)
Detailed View - page 2Detailed View - page 2
True PositivesTrue Positives
False PositivesFalse Positives
View entry in raw text format (no links)
![Page 21: P a t t e r n d a t a b a s e s](https://reader035.fdocuments.in/reader035/viewer/2022062423/568146ba550346895db3e951/html5/thumbnails/21.jpg)
Raw Text Format – PROSITE FormatRaw Text Format – PROSITE Format
![Page 22: P a t t e r n d a t a b a s e s](https://reader035.fdocuments.in/reader035/viewer/2022062423/568146ba550346895db3e951/html5/thumbnails/22.jpg)
ID Identification AC Accession number DT Date DE Short descriptionPA Pattern MA Matrix/profileRU RuleNR Numerical resultsCC CommentsDR Cross-references to SWISS-PROT3D Cross-references to PDBDO Pointer to the documentation file
// Termination line
![Page 23: P a t t e r n d a t a b a s e s](https://reader035.fdocuments.in/reader035/viewer/2022062423/568146ba550346895db3e951/html5/thumbnails/23.jpg)
PROSITE Profiles
![Page 24: P a t t e r n d a t a b a s e s](https://reader035.fdocuments.in/reader035/viewer/2022062423/568146ba550346895db3e951/html5/thumbnails/24.jpg)
Highly degenerate protein structural and functional domains– immunoglobulin domains, SH2 and SH3 domains.
Consensus sequences of repetitive DNA elements– SINEs, LINEs
Basic gene expression signals– promoter elements, RNA processing signals,
translational initiation sites.
DNA-binding protein motifs. Protein and nucleic acid compositional
domains– glutamine-rich activation domains, CpG islands.
![Page 25: P a t t e r n d a t a b a s e s](https://reader035.fdocuments.in/reader035/viewer/2022062423/568146ba550346895db3e951/html5/thumbnails/25.jpg)
PROSITE - features
Completeness High specificity Documentation Periodic reviewing Parallel update with SWISS-
PROT(primary database)
![Page 26: P a t t e r n d a t a b a s e s](https://reader035.fdocuments.in/reader035/viewer/2022062423/568146ba550346895db3e951/html5/thumbnails/26.jpg)
Multiple Sequence Alignment
Find 4-5 functionally conserved residues
cydeggiscyedggiscyeeggitcyhgdggscyrgdgnt
C-Y-x2-[DG]-G-x-[ST] CORE PATTERN
SWISS-PROT
MoreFALSE POSITIVES ?
Increase the sequence length of the pattern
PROSITE DBYES NO
motif
![Page 27: P a t t e r n d a t a b a s e s](https://reader035.fdocuments.in/reader035/viewer/2022062423/568146ba550346895db3e951/html5/thumbnails/27.jpg)
http://bioinf.man.ac.uk/dbbrowser/PRINTS/
Protein fingerprint database Fingerprint - set of motifs used that
represent the most conserved regions of multiple sequence alignment.
Improved diagnostic reliability than single motif methods
Source – SWISSPROT/TrEMBL
![Page 28: P a t t e r n d a t a b a s e s](https://reader035.fdocuments.in/reader035/viewer/2022062423/568146ba550346895db3e951/html5/thumbnails/28.jpg)
Multiple Sequence Alignment
Identification of ALL the conserved regions
cydeggiscyedggiscyeeggitcyhgdggs
Creation of frequency matrices
SWISS-PROT/ Tr-EMBL
PRINTS DB
xxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxx
Frequency matricesFrequency matrices
motif
fingerprint
Iterative database scanning of the frequency matrices with protein databases till convergence
![Page 29: P a t t e r n d a t a b a s e s](https://reader035.fdocuments.in/reader035/viewer/2022062423/568146ba550346895db3e951/html5/thumbnails/29.jpg)
http://bioinf.man.ac.uk/dbbrowser/PRINTS/
Database ID , no. of motifs and text Search
Motif scanner (for searching a sequence or pattern against PRINTS database)
![Page 30: P a t t e r n d a t a b a s e s](https://reader035.fdocuments.in/reader035/viewer/2022062423/568146ba550346895db3e951/html5/thumbnails/30.jpg)
![Page 31: P a t t e r n d a t a b a s e s](https://reader035.fdocuments.in/reader035/viewer/2022062423/568146ba550346895db3e951/html5/thumbnails/31.jpg)
Page 1 for ‘alkaline phosphatase’ entry in PRINTSPage 1 for ‘alkaline phosphatase’ entry in PRINTS
Documentation,Links & references
Documentation,Links & references
![Page 32: P a t t e r n d a t a b a s e s](https://reader035.fdocuments.in/reader035/viewer/2022062423/568146ba550346895db3e951/html5/thumbnails/32.jpg)
Page 2Page 2
Fingerprint detailsFingerprint details
Sequence SummarySequence Summary
![Page 33: P a t t e r n d a t a b a s e s](https://reader035.fdocuments.in/reader035/viewer/2022062423/568146ba550346895db3e951/html5/thumbnails/33.jpg)
Page 3Page 3
Motif no. 1Motif no. 1
Motif no. 2Motif no. 2
“Raw” motif“Raw” motif
SWISSPROT -IDsSWISSPROT -IDs
Start and Interval between motifs in the fingerprintStart and Interval between motifs in the fingerprint
![Page 34: P a t t e r n d a t a b a s e s](https://reader035.fdocuments.in/reader035/viewer/2022062423/568146ba550346895db3e951/html5/thumbnails/34.jpg)
BLOCKS http://blocks.fhcrc.org/blocks/
Blocks are multiple aligned ungapped segments corresponding to the most highly conserved regions of proteins
The BLOCKS database is a collection of blocks representing known protein families that can be used to compare a protein or DNA sequence with documented families of proteins.
![Page 35: P a t t e r n d a t a b a s e s](https://reader035.fdocuments.in/reader035/viewer/2022062423/568146ba550346895db3e951/html5/thumbnails/35.jpg)
Blocks Making
Blocks are produced by the automated PROTOMAT system (Henikoff and Henikoff, 1991), which applies a robust motif-finder to a set of related protein sequences.
![Page 36: P a t t e r n d a t a b a s e s](https://reader035.fdocuments.in/reader035/viewer/2022062423/568146ba550346895db3e951/html5/thumbnails/36.jpg)
http://blocks.fhcrc.org/blocks/blocksdiag.jpg
![Page 37: P a t t e r n d a t a b a s e s](https://reader035.fdocuments.in/reader035/viewer/2022062423/568146ba550346895db3e951/html5/thumbnails/37.jpg)
http://blocks.fhcrc.org/blocks/
Sequence, no. of blocksand text Searches
Blocks Maker
![Page 38: P a t t e r n d a t a b a s e s](https://reader035.fdocuments.in/reader035/viewer/2022062423/568146ba550346895db3e951/html5/thumbnails/38.jpg)
Page 1Page 1
SummarySummary
Search methods using blocksSearch methods using blocks
![Page 39: P a t t e r n d a t a b a s e s](https://reader035.fdocuments.in/reader035/viewer/2022062423/568146ba550346895db3e951/html5/thumbnails/39.jpg)
Page 2
BLOCK - 1BLOCK - 1
Represent start position of the blockRepresent start position of the block
SWISSPROT IDSWISSPROT ID
Weak Blocks - Strength < 1100 Strong Blocks - Strength >= 1100Weak Blocks - Strength < 1100 Strong Blocks - Strength >= 1100
![Page 40: P a t t e r n d a t a b a s e s](https://reader035.fdocuments.in/reader035/viewer/2022062423/568146ba550346895db3e951/html5/thumbnails/40.jpg)
Contains >500 domain families associated with signaling, extra-cellular and chromatin-associated proteins are found.
Each domain is extensively annotated with phyletic distributions, functional class, tertiary structures and functionally important residues.
http://smart.embl-heidelberg.de/
![Page 41: P a t t e r n d a t a b a s e s](https://reader035.fdocuments.in/reader035/viewer/2022062423/568146ba550346895db3e951/html5/thumbnails/41.jpg)
ID and text Search
ID & sequence Search
Domain & GO search
Alkaline Phosphatase
![Page 42: P a t t e r n d a t a b a s e s](https://reader035.fdocuments.in/reader035/viewer/2022062423/568146ba550346895db3e951/html5/thumbnails/42.jpg)
![Page 43: P a t t e r n d a t a b a s e s](https://reader035.fdocuments.in/reader035/viewer/2022062423/568146ba550346895db3e951/html5/thumbnails/43.jpg)
![Page 44: P a t t e r n d a t a b a s e s](https://reader035.fdocuments.in/reader035/viewer/2022062423/568146ba550346895db3e951/html5/thumbnails/44.jpg)
Results – Alkaline phosphatase “Signatures” PROSITE
– Represented as a single motif. PRINTS
– Represented as 5 motif regions. BLOCKS
– Represented as 6 block regions SMART
– Represented as a single profile
![Page 45: P a t t e r n d a t a b a s e s](https://reader035.fdocuments.in/reader035/viewer/2022062423/568146ba550346895db3e951/html5/thumbnails/45.jpg)
Composite Pattern Databases
MetaFam InterPro CDD (conserved Domain Database) IProClass
![Page 46: P a t t e r n d a t a b a s e s](https://reader035.fdocuments.in/reader035/viewer/2022062423/568146ba550346895db3e951/html5/thumbnails/46.jpg)
Metafam & PANAL
Metafam - http://metafam.ahc.umn.edu/
PANAL – Protein ANALysis tool page of Metafam http://mgd.ahc.umn.edu/panal/
Protein family classification built with Blocks+, DOMO, Pfam, PIR-ALN, PRINTS, Prosite, ProDom, SBASE, SYSTERS.
![Page 47: P a t t e r n d a t a b a s e s](https://reader035.fdocuments.in/reader035/viewer/2022062423/568146ba550346895db3e951/html5/thumbnails/47.jpg)
PANAL
![Page 48: P a t t e r n d a t a b a s e s](https://reader035.fdocuments.in/reader035/viewer/2022062423/568146ba550346895db3e951/html5/thumbnails/48.jpg)
Interpro http://www.ebi.ac.uk/interpro Built from PROSITE, PRINTS, Pfam,
ProDom, SMART, TIGRFAM, SWISS-PROT and TrEMBL
Text- and sequence-based searches.
![Page 49: P a t t e r n d a t a b a s e s](https://reader035.fdocuments.in/reader035/viewer/2022062423/568146ba550346895db3e951/html5/thumbnails/49.jpg)
![Page 50: P a t t e r n d a t a b a s e s](https://reader035.fdocuments.in/reader035/viewer/2022062423/568146ba550346895db3e951/html5/thumbnails/50.jpg)
http://www.ebi.ac.uk/interpro/
![Page 51: P a t t e r n d a t a b a s e s](https://reader035.fdocuments.in/reader035/viewer/2022062423/568146ba550346895db3e951/html5/thumbnails/51.jpg)
PRINTSPROSITEPfamPRODOMSMART
Detailed View - page 1Detailed View - page 1
![Page 52: P a t t e r n d a t a b a s e s](https://reader035.fdocuments.in/reader035/viewer/2022062423/568146ba550346895db3e951/html5/thumbnails/52.jpg)
Detailed View - page 2Detailed View - page 2
BLOCKS database link
![Page 53: P a t t e r n d a t a b a s e s](https://reader035.fdocuments.in/reader035/viewer/2022062423/568146ba550346895db3e951/html5/thumbnails/53.jpg)
PR – PRINTSPS – PROSITEPF – PfamPD – ProDomSM – SMART
![Page 54: P a t t e r n d a t a b a s e s](https://reader035.fdocuments.in/reader035/viewer/2022062423/568146ba550346895db3e951/html5/thumbnails/54.jpg)
Detailed View - page 2Detailed View - page 2
![Page 55: P a t t e r n d a t a b a s e s](https://reader035.fdocuments.in/reader035/viewer/2022062423/568146ba550346895db3e951/html5/thumbnails/55.jpg)
T – True PositiveF – False Positive
Range of the motif
![Page 56: P a t t e r n d a t a b a s e s](https://reader035.fdocuments.in/reader035/viewer/2022062423/568146ba550346895db3e951/html5/thumbnails/56.jpg)
Pattern databases
Definition Applications Classifications Common Databases
– PROSITE, PRINTS & BLOCKS (motif based)– MetaFam, InterPro (Integrated databases)
Conclusions
![Page 57: P a t t e r n d a t a b a s e s](https://reader035.fdocuments.in/reader035/viewer/2022062423/568146ba550346895db3e951/html5/thumbnails/57.jpg)
CONCLUSION
Diverse pattern databases from small patterns to profiles to complex HMM models
Different strength and weakness Different database formats
Best to combine and analyze results from different pattern databases.