Bioinformatics for Genomic and Proteomic data analysis
-
Upload
leandra-buchanan -
Category
Documents
-
view
35 -
download
0
description
Transcript of Bioinformatics for Genomic and Proteomic data analysis
![Page 1: Bioinformatics for Genomic and Proteomic data analysis](https://reader031.fdocuments.in/reader031/viewer/2022020208/5681376f550346895d9f0a16/html5/thumbnails/1.jpg)
Bioinformatics for Genomic and Proteomic data analysis
• Sequence Analysis
-- Predicting Function, domains etc.
-- Predicting phyico-chemical properties of protein (ProtParam).
-- Predicting signal peptides and transmembrane proteins (SignalP).
-- finding homology between sequences, identifying repeats etc (DOTPLOT).
-- Major databases and retrieval techniques.
• Structure analysis
-- Gene Prediction
-- Phylogenetic analysis
-- Alignment techniques (BLAST, PSI-BLAST)
-- Analysis of Protein structure and conformation (Rasmol, SwissPDBViewer, VMD etc).
-- Protein structure predictions- Homology modeling (SwissModel, Modeller).
• Some practical applications
-- Sequence analysis
-- Structure analysis
![Page 2: Bioinformatics for Genomic and Proteomic data analysis](https://reader031.fdocuments.in/reader031/viewer/2022020208/5681376f550346895d9f0a16/html5/thumbnails/2.jpg)
Major Bioinformatics databases, Search engines and data
formats.
By: Sachin Pundhir Bioinformatics sub-centre DAVV, Indore
![Page 3: Bioinformatics for Genomic and Proteomic data analysis](https://reader031.fdocuments.in/reader031/viewer/2022020208/5681376f550346895d9f0a16/html5/thumbnails/3.jpg)
Database
• Collection of records and files
• Organized for a particular purpose
• Tables• Tuples (records)
– Attributes» Values
![Page 4: Bioinformatics for Genomic and Proteomic data analysis](https://reader031.fdocuments.in/reader031/viewer/2022020208/5681376f550346895d9f0a16/html5/thumbnails/4.jpg)
BIO520 Student Database
1998
Name ID Grade
Amy 123 A
Joe 456 B
Sue 789 C
Table
Tuple
.
Attribute.
Value
![Page 5: Bioinformatics for Genomic and Proteomic data analysis](https://reader031.fdocuments.in/reader031/viewer/2022020208/5681376f550346895d9f0a16/html5/thumbnails/5.jpg)
Database Operations
• Tables– Create, delete
• Tuples (Records)– Read,write, delete
• Search, sort, modify, print…
1998
Name ID Grade
Amy 123 A
Joe 456 B
Sue 789 C
![Page 6: Bioinformatics for Genomic and Proteomic data analysis](https://reader031.fdocuments.in/reader031/viewer/2022020208/5681376f550346895d9f0a16/html5/thumbnails/6.jpg)
International Nucleotide Sequence Database Collaboration (INSDC)
• Consists of
DDBJ (Japan)
GenBank (USA)
EMBL Nucleotide Sequence Database.
• The three databases exchange new and updated data on a daily basis to achieve optimal synchronisation.
![Page 7: Bioinformatics for Genomic and Proteomic data analysis](https://reader031.fdocuments.in/reader031/viewer/2022020208/5681376f550346895d9f0a16/html5/thumbnails/7.jpg)
Bioinformatics databases
• Nucleotide sequence database:
– Genbank: Nucleotide sequence database. Highly redundant.
– DDBJ: DNA Data Bank of Japan.
– EMBL: nucleotide sequence database.
– Refseq: integrated, non-redundant set of sequences, including genomic DNA, transcript (RNA), and protein
products, for major research organisms.
Primary databases
• Protein sequence database:
• Genpept: Protein sequence database.
• UniProtKB/Swiss-Prot: curated protein sequence database, minimal level of redundancy and high
level of integration with other databases.
• UniProtKB/TrEMBL: computer-annotated supplement of Swiss-Prot that contains all the
translations of EMBL nucleotide sequence entries not yet integrated in Swiss-Prot.
•Refseq: Well curated, non-redundant database.
• Structure Database
•PDB: Protein Data Bank
•MMDB: Molecular Modeling Database
Secondary database
![Page 8: Bioinformatics for Genomic and Proteomic data analysis](https://reader031.fdocuments.in/reader031/viewer/2022020208/5681376f550346895d9f0a16/html5/thumbnails/8.jpg)
GenBank Record
Header
information that apply to the whole record
Features
annotations on the record
Sequence
![Page 9: Bioinformatics for Genomic and Proteomic data analysis](https://reader031.fdocuments.in/reader031/viewer/2022020208/5681376f550346895d9f0a16/html5/thumbnails/9.jpg)
GeneBank Record
modification date
Header
GenBank Record
Locus Name
Sequence Length
Molecule Type
GenBank Division
Modification DateAccession Number
Version Number
![Page 10: Bioinformatics for Genomic and Proteomic data analysis](https://reader031.fdocuments.in/reader031/viewer/2022020208/5681376f550346895d9f0a16/html5/thumbnails/10.jpg)
GeneBank Record
Link to Seq
FEATURE
![Page 11: Bioinformatics for Genomic and Proteomic data analysis](https://reader031.fdocuments.in/reader031/viewer/2022020208/5681376f550346895d9f0a16/html5/thumbnails/11.jpg)
GenBank RecordSequence
![Page 12: Bioinformatics for Genomic and Proteomic data analysis](https://reader031.fdocuments.in/reader031/viewer/2022020208/5681376f550346895d9f0a16/html5/thumbnails/12.jpg)
Using Entrez
An integrated database
search and retrieval system
![Page 13: Bioinformatics for Genomic and Proteomic data analysis](https://reader031.fdocuments.in/reader031/viewer/2022020208/5681376f550346895d9f0a16/html5/thumbnails/13.jpg)
WWWAccess
Entrez&BLAST
![Page 14: Bioinformatics for Genomic and Proteomic data analysis](https://reader031.fdocuments.in/reader031/viewer/2022020208/5681376f550346895d9f0a16/html5/thumbnails/14.jpg)
Genomes
Taxonomy
Entrez: Database Integration
PubMed abstracts
Nucleotide sequences
Protein sequences
3-D Structure
3 -D Structure
Word weight
VAST
BLASTBLAST
Phylogeny
![Page 15: Bioinformatics for Genomic and Proteomic data analysis](https://reader031.fdocuments.in/reader031/viewer/2022020208/5681376f550346895d9f0a16/html5/thumbnails/15.jpg)
Database Searching with Entrez
Using limits and field restriction to find human MutL homologLinking and neighboring with MutL
![Page 16: Bioinformatics for Genomic and Proteomic data analysis](https://reader031.fdocuments.in/reader031/viewer/2022020208/5681376f550346895d9f0a16/html5/thumbnails/16.jpg)
Global Entrez Search
![Page 17: Bioinformatics for Genomic and Proteomic data analysis](https://reader031.fdocuments.in/reader031/viewer/2022020208/5681376f550346895d9f0a16/html5/thumbnails/17.jpg)
Document Summaries:MutL[All Fields]
![Page 18: Bioinformatics for Genomic and Proteomic data analysis](https://reader031.fdocuments.in/reader031/viewer/2022020208/5681376f550346895d9f0a16/html5/thumbnails/18.jpg)
Entrez Nucleotides: Limits & Preview/Index
Tabs
![Page 19: Bioinformatics for Genomic and Proteomic data analysis](https://reader031.fdocuments.in/reader031/viewer/2022020208/5681376f550346895d9f0a16/html5/thumbnails/19.jpg)
MutL
Entrez Nucleotides: LimitsAccessionAll FieldsAuthor NameEC/RN NumberFeature keyFilterGene NameIssueJournal NameKeywordModification DateOrganismPage NumberPrimary AccessionPropertiesProtein NamePublication DateSeqID StringSequence LengthSubstance NameText WordTitleUidVolume
Field Restriction
Exclude bulk sequences
![Page 20: Bioinformatics for Genomic and Proteomic data analysis](https://reader031.fdocuments.in/reader031/viewer/2022020208/5681376f550346895d9f0a16/html5/thumbnails/20.jpg)
MutL
Entrez Nucleotides: Limits
Title == Definition
Exclude Bulk Sequences
![Page 21: Bioinformatics for Genomic and Proteomic data analysis](https://reader031.fdocuments.in/reader031/viewer/2022020208/5681376f550346895d9f0a16/html5/thumbnails/21.jpg)
Document Summaries: Limits
![Page 22: Bioinformatics for Genomic and Proteomic data analysis](https://reader031.fdocuments.in/reader031/viewer/2022020208/5681376f550346895d9f0a16/html5/thumbnails/22.jpg)
Adding Terms: Preview/IndexAccessionAll FieldsAuthor NameEC/RN NumberFeature keyFilterGene NameIssueJournal NameKeywordModification DateOrganismPage NumberPrimary AccessionPropertiesProtein NamePublication DateSeqID StringSequence LengthSubstance NameText WordTitle UidVolume
![Page 23: Bioinformatics for Genomic and Proteomic data analysis](https://reader031.fdocuments.in/reader031/viewer/2022020208/5681376f550346895d9f0a16/html5/thumbnails/23.jpg)
Human MutL Search Results
![Page 24: Bioinformatics for Genomic and Proteomic data analysis](https://reader031.fdocuments.in/reader031/viewer/2022020208/5681376f550346895d9f0a16/html5/thumbnails/24.jpg)
Human MutL RefSeq
GenBank Records
![Page 25: Bioinformatics for Genomic and Proteomic data analysis](https://reader031.fdocuments.in/reader031/viewer/2022020208/5681376f550346895d9f0a16/html5/thumbnails/25.jpg)
NM_000249: Links
![Page 26: Bioinformatics for Genomic and Proteomic data analysis](https://reader031.fdocuments.in/reader031/viewer/2022020208/5681376f550346895d9f0a16/html5/thumbnails/26.jpg)
Literature Links
PubMed
OMIM
![Page 27: Bioinformatics for Genomic and Proteomic data analysis](https://reader031.fdocuments.in/reader031/viewer/2022020208/5681376f550346895d9f0a16/html5/thumbnails/27.jpg)
NM_000249: PubMed
Books
![Page 28: Bioinformatics for Genomic and Proteomic data analysis](https://reader031.fdocuments.in/reader031/viewer/2022020208/5681376f550346895d9f0a16/html5/thumbnails/28.jpg)
Books Link
![Page 29: Bioinformatics for Genomic and Proteomic data analysis](https://reader031.fdocuments.in/reader031/viewer/2022020208/5681376f550346895d9f0a16/html5/thumbnails/29.jpg)
OMIM: Human Disease Genes
Conserved Domain
![Page 30: Bioinformatics for Genomic and Proteomic data analysis](https://reader031.fdocuments.in/reader031/viewer/2022020208/5681376f550346895d9f0a16/html5/thumbnails/30.jpg)
Sequence Links
Nucleotide Protein
![Page 31: Bioinformatics for Genomic and Proteomic data analysis](https://reader031.fdocuments.in/reader031/viewer/2022020208/5681376f550346895d9f0a16/html5/thumbnails/31.jpg)
NM_000249: Related Sequences
simila
rity
Original GenBank mRNAs
Original GenBank genomic
Genome Project BAC
![Page 32: Bioinformatics for Genomic and Proteomic data analysis](https://reader031.fdocuments.in/reader031/viewer/2022020208/5681376f550346895d9f0a16/html5/thumbnails/32.jpg)
Taxonomy Link
The Tax Browser
NCBI’s Taxonomy
![Page 33: Bioinformatics for Genomic and Proteomic data analysis](https://reader031.fdocuments.in/reader031/viewer/2022020208/5681376f550346895d9f0a16/html5/thumbnails/33.jpg)
Taxonomy Link
![Page 34: Bioinformatics for Genomic and Proteomic data analysis](https://reader031.fdocuments.in/reader031/viewer/2022020208/5681376f550346895d9f0a16/html5/thumbnails/34.jpg)
NCBI Protein Databases
• GenPept GenBank, EMBL, DDBJ CDS translations
• RefSeq mRNA based (NP_) and genome based (XP_)
• Swiss-Prot curated high quality protein reviews
• PIR protein information resource Georgetown University
• PRF protein resource foundation
• PDB Protein Databank sequences from structures
![Page 35: Bioinformatics for Genomic and Proteomic data analysis](https://reader031.fdocuments.in/reader031/viewer/2022020208/5681376f550346895d9f0a16/html5/thumbnails/35.jpg)
Protein Link
BLAST Link
Conserved Domains
![Page 36: Bioinformatics for Genomic and Proteomic data analysis](https://reader031.fdocuments.in/reader031/viewer/2022020208/5681376f550346895d9f0a16/html5/thumbnails/36.jpg)
Related Proteins: Redundancy
Red
un
dan
t Seq
uen
ces
![Page 37: Bioinformatics for Genomic and Proteomic data analysis](https://reader031.fdocuments.in/reader031/viewer/2022020208/5681376f550346895d9f0a16/html5/thumbnails/37.jpg)
Sequence from MutL structure
Related Proteins: Links
![Page 38: Bioinformatics for Genomic and Proteomic data analysis](https://reader031.fdocuments.in/reader031/viewer/2022020208/5681376f550346895d9f0a16/html5/thumbnails/38.jpg)
BLink: non-redundant relatives
Arabidopsis homolog
Conserved Domain
![Page 39: Bioinformatics for Genomic and Proteomic data analysis](https://reader031.fdocuments.in/reader031/viewer/2022020208/5681376f550346895d9f0a16/html5/thumbnails/39.jpg)
MLH1 Domain Structure: CDD
ATPase Domain Mismatch Repair Domain
![Page 40: Bioinformatics for Genomic and Proteomic data analysis](https://reader031.fdocuments.in/reader031/viewer/2022020208/5681376f550346895d9f0a16/html5/thumbnails/40.jpg)
MLH1: ATPase Domain
![Page 41: Bioinformatics for Genomic and Proteomic data analysis](https://reader031.fdocuments.in/reader031/viewer/2022020208/5681376f550346895d9f0a16/html5/thumbnails/41.jpg)
ATPase structural alignment
ATP Binding site helix
![Page 42: Bioinformatics for Genomic and Proteomic data analysis](https://reader031.fdocuments.in/reader031/viewer/2022020208/5681376f550346895d9f0a16/html5/thumbnails/42.jpg)
Genome Resources
![Page 43: Bioinformatics for Genomic and Proteomic data analysis](https://reader031.fdocuments.in/reader031/viewer/2022020208/5681376f550346895d9f0a16/html5/thumbnails/43.jpg)
NM_000249: Genome Links
![Page 44: Bioinformatics for Genomic and Proteomic data analysis](https://reader031.fdocuments.in/reader031/viewer/2022020208/5681376f550346895d9f0a16/html5/thumbnails/44.jpg)
Higher Genome Resources
![Page 45: Bioinformatics for Genomic and Proteomic data analysis](https://reader031.fdocuments.in/reader031/viewer/2022020208/5681376f550346895d9f0a16/html5/thumbnails/45.jpg)
MLH1: UniGene Cluster
![Page 46: Bioinformatics for Genomic and Proteomic data analysis](https://reader031.fdocuments.in/reader031/viewer/2022020208/5681376f550346895d9f0a16/html5/thumbnails/46.jpg)
ESTs in UniGene
![Page 47: Bioinformatics for Genomic and Proteomic data analysis](https://reader031.fdocuments.in/reader031/viewer/2022020208/5681376f550346895d9f0a16/html5/thumbnails/47.jpg)
The New Homologene
early globin gene
A-chain gene B-chain gene
frog A chick A mouse A mouse B chick B frog B
paralogsorthologs orthologs
gene duplication
• No longer UniGene based• Protein similarities first• Guided by taxonomic tree• Includes orthologs and paralogs
![Page 48: Bioinformatics for Genomic and Proteomic data analysis](https://reader031.fdocuments.in/reader031/viewer/2022020208/5681376f550346895d9f0a16/html5/thumbnails/48.jpg)
The New Homologene
![Page 49: Bioinformatics for Genomic and Proteomic data analysis](https://reader031.fdocuments.in/reader031/viewer/2022020208/5681376f550346895d9f0a16/html5/thumbnails/49.jpg)
Entrez Genes: integrated gene-based access
LocusLinkComplete Genomes
•eukaryotic•microbial•organelle
![Page 50: Bioinformatics for Genomic and Proteomic data analysis](https://reader031.fdocuments.in/reader031/viewer/2022020208/5681376f550346895d9f0a16/html5/thumbnails/50.jpg)
Genes MLH1: Central Resource
![Page 51: Bioinformatics for Genomic and Proteomic data analysis](https://reader031.fdocuments.in/reader031/viewer/2022020208/5681376f550346895d9f0a16/html5/thumbnails/51.jpg)
QUESTIONS!!!