BTN323: INTRODUCTION TO BIOLOGICAL DATABASES
description
Transcript of BTN323: INTRODUCTION TO BIOLOGICAL DATABASES
![Page 1: BTN323: INTRODUCTION TO BIOLOGICAL DATABASES](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814839550346895db556c5/html5/thumbnails/1.jpg)
BTN323:INTRODUCTION TO
BIOLOGICAL DATABASES
Day2: Specialized Databases
Lecturer: Junaid Gamieldien, PhD
http://www.sanbi.ac.za/training-2/undergraduate-training/
![Page 2: BTN323: INTRODUCTION TO BIOLOGICAL DATABASES](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814839550346895db556c5/html5/thumbnails/2.jpg)
WHAT YOU NEED TO LEARN:
What are protein pattern/fingerprint/motif databases and why are they important?
What are the benefits using ontologies in database design?
How do model organism databases support human health research?
![Page 3: BTN323: INTRODUCTION TO BIOLOGICAL DATABASES](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814839550346895db556c5/html5/thumbnails/3.jpg)
PATTERN DATABASES Sometimes alignment-based methods find no hits
to provide us with clues about a novel gene/protein’s function
Then we turn to finding MOTIFS - common conserved sequence elements in protein families
In many cases a motif consists of distinct subparts that are highly conserved in the sequences, while the regions between these subparts have little in common.
If we have a database of these patterns, we can assign potential function to a novel protein by finding one or more known motifs…
![Page 4: BTN323: INTRODUCTION TO BIOLOGICAL DATABASES](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814839550346895db556c5/html5/thumbnails/4.jpg)
PROTEIN
Similar sequence Similar function
Also true for subsections of a protein
Motifs or signature sequences e.g. DNA binding motifs
4
Sequence ASequence B
EVOLUTIONARY CONSTRAINT!
![Page 5: BTN323: INTRODUCTION TO BIOLOGICAL DATABASES](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814839550346895db556c5/html5/thumbnails/5.jpg)
INTERPRO: INTEGRATED PATTERN DATABASE
Integrated resource for protein families, domains, regions and sites
Combines several databases that use different methodologies well-characterised proteins to derive protein signatures.
Capitalises on their individual strengths => powerful integrated database and diagnostic tool (InterProScan)
![Page 6: BTN323: INTRODUCTION TO BIOLOGICAL DATABASES](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814839550346895db556c5/html5/thumbnails/6.jpg)
MEMBER DATABASES
ProDom: provider of sequence-clusters
PROSITE patterns: regular expressions.
PRINTS provide protein ‘fingerprints’
PANTHER, PIRSF, Pfam, SMART, TIGRFAMs, Gene3D and SUPERFAMILY: are providers of hidden Markov models (HMMs).
![Page 7: BTN323: INTRODUCTION TO BIOLOGICAL DATABASES](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814839550346895db556c5/html5/thumbnails/7.jpg)
INTERPRO PROTEIN ‘SITES’
Conserved Site - any short sequence pattern that may contain one or more unique residues
Active sites - one or more signatures cover all the active site residues
Binding sites bind chemical compounds
A Post-translational Modification modifies the primary protein structure, eg. glycosylation, phosphorylation, etc.
![Page 8: BTN323: INTRODUCTION TO BIOLOGICAL DATABASES](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814839550346895db556c5/html5/thumbnails/8.jpg)
INTERPRO SEQUENCE ANALYSIS: INTERPROSCAN
Searching against different functional site databases has become a vital for the prediction of protein function (where e.g. BLAST fails).
Different DB’s have different strengths and weaknesses of their underlying analysis methods.
Ideally, all of the secondary databases should be searched against to ensure the best results.
This is exactly what InterProScan does (part of todays practical topic)
![Page 9: BTN323: INTRODUCTION TO BIOLOGICAL DATABASES](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814839550346895db556c5/html5/thumbnails/9.jpg)
![Page 10: BTN323: INTRODUCTION TO BIOLOGICAL DATABASES](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814839550346895db556c5/html5/thumbnails/10.jpg)
![Page 11: BTN323: INTRODUCTION TO BIOLOGICAL DATABASES](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814839550346895db556c5/html5/thumbnails/11.jpg)
![Page 12: BTN323: INTRODUCTION TO BIOLOGICAL DATABASES](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814839550346895db556c5/html5/thumbnails/12.jpg)
BIO-ONTOLOGIES Community developed agreements on
terms/concepts describing a topic and also the relationships between them
The Gene Ontology (GO) is the most widely used
The GO provides common language to describe a gene product's biology in terms of: Molecular Function Biological Process Cellular Location
Several others e.g. anatomy, cell types, disease, phenotype, pathway, …
![Page 13: BTN323: INTRODUCTION TO BIOLOGICAL DATABASES](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814839550346895db556c5/html5/thumbnails/13.jpg)
GENE-X
involves
![Page 14: BTN323: INTRODUCTION TO BIOLOGICAL DATABASES](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814839550346895db556c5/html5/thumbnails/14.jpg)
ADVANTAGES OF GO (AND MANY OTHER BIO-ONTOLOGIES) IN DB DESIGN A common language applicable to any
organism
Represents and organises information in a way that both humans and machines can understand
GO terms can be used to annotate gene products from any species Enables easy comparison of information across
species
![Page 15: BTN323: INTRODUCTION TO BIOLOGICAL DATABASES](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814839550346895db556c5/html5/thumbnails/15.jpg)
ADVANTAGES OF GO (AND MANY OTHER BIO-ONTOLOGIES) IN DB DESIGN (2) Terms make good entry points for database
searches
Researchers can search for what they really mean (and meaning is more consistent between individuals)
Transitive links of biological objects query term via it’s child terms ensures that ALL relevant results are returned automatically
Reverse’ queries can easily be done to return terms when biological objects are used as queries
![Page 16: BTN323: INTRODUCTION TO BIOLOGICAL DATABASES](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814839550346895db556c5/html5/thumbnails/16.jpg)
GENE-X
involves
GENE-X will be returned even if query is done at this level
Using GENE-X as the query can return ‘cytokinesis’ and even all its parent terms
![Page 17: BTN323: INTRODUCTION TO BIOLOGICAL DATABASES](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814839550346895db556c5/html5/thumbnails/17.jpg)
MODEL ORGANISM GENETIC DATABASES Very useful for collecting results from genetic (and
other) experiments that cannot be done on humans Disease models Gene knockouts Drug testing Environmental manipulation
In terms of genomics, model organism data is invaluable to unravel: Gene and protein functions Gene to phenotype relationships Gene to disease associations
The aim of these databases is to integrate all relevant information in one place More easy to mine database for novel associations Enables linking between databases
![Page 18: BTN323: INTRODUCTION TO BIOLOGICAL DATABASES](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814839550346895db556c5/html5/thumbnails/18.jpg)
RAT AND MOUSE GENOME DB’S – DATA TYPES
Genes, proteins and their annotations including Gene Ontology links and expression information
Phenotypes – described by terms in the Mammalian Phenotype Ontology From gene knockout models produced by the
project and their partners From evidence mined from the literature
Disease, Pathway and Behaviour ontologies and relevant gene associations also present in RGD
![Page 19: BTN323: INTRODUCTION TO BIOLOGICAL DATABASES](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814839550346895db556c5/html5/thumbnails/19.jpg)
DESIGNED FOR EASE OF USE Web query interfaces are intuitive
Several traditional ways to query – gene names, symbols, chromosomal location
Query interfaces for ontologies (Disease, Phenotype, Pathway, Behaviour)
Ontology annotations can easily be retrieved for any gene or protein
Both databases have links to human genes, which simplifies mouse and rat evidence-driven in-silico exploration into human diseases and phenotypes
![Page 20: BTN323: INTRODUCTION TO BIOLOGICAL DATABASES](https://reader036.fdocuments.in/reader036/viewer/2022062409/56814839550346895db556c5/html5/thumbnails/20.jpg)