1 Bio-Trac 25 (Proteomics: Principles and Methods) April 4, 2008 Zhang-Zhi Hu, M.D. Research...
-
Upload
marvin-mills -
Category
Documents
-
view
216 -
download
0
Transcript of 1 Bio-Trac 25 (Proteomics: Principles and Methods) April 4, 2008 Zhang-Zhi Hu, M.D. Research...
1
Bio-Trac 25 (Proteomics: Principles and Methods)Bio-Trac 25 (Proteomics: Principles and Methods)
April 4, 2008April 4, 2008
Zhang-Zhi Hu, M.D. Zhang-Zhi Hu, M.D. Research Associate ProfessorResearch Associate ProfessorProtein Information Resource, Department of Protein Information Resource, Department of Biochemistry and Molecular & Cellular BiologyBiochemistry and Molecular & Cellular BiologyGeorgetown University Medical CenterGeorgetown University Medical Center
Tutorial: Bioinformatics Resources(http://pir.georgetown.edu/pirwww/workshop/bioinfo_resource.html)
2
computer + mouse = bioinformatics (information) (biology)
• NIH Biomedical Information Science and Technology Initiative (BISTI) Working Definition (2000) - Research, development, or application of computational tools and approaches for expanding the use of biological, medical, behavioral or health data, including those to acquire, store, organize, archive, analyze, or visualize such data.
What is Bioinformatics?
3
Molecular Biology Database Collection
(http://nar.oxfordjournals.org/cgi/content/full/36/suppl_1/D2)
1078 key databases of 14 categories
4
Database Collection in Nucleic Acids Res.
5
http://pir.georgetown.edu/pirwww/workshop/2005_database_update.html
Online Access to Database Collection
http://www.oxfordjournals.org/nar/database/cap/
2008
6
Overview
I.I. Text search / Information retrievalText search / Information retrieval
II.II. Sequence & genomics databasesSequence & genomics databases
III.III. Protein family databasesProtein family databases
IV.IV. Database of protein functionsDatabase of protein functions
V.V. Databases of protein structuresDatabases of protein structures
VI.VI. Proteomics databasesProteomics databases
Database Contents, Search and RetrievalDatabase Contents, Search and Retrieval
Lab sessionLab session
7
Entrez Text Searches
(http://www.ncbi.nlm.nih.gov/Entrez/) Lab
Integrated one-stop search
8
PubMed Literature Database(http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?CMD=Search&DB=PubMed)
Literature mining
LabPMID:14640721
9
iProLINK: Protein Literature Mining Resource
http://pir.georgetown.edu/iprolink/
Text mining for protein phosphorylation
Gene/protein name thesaurus: synonyms, ambiguous names…
Lab
RLIMS-P:
BioThesaurus:
10
BioThesaurus: Gene/protein name searches - synonyms, ambiguous names…
http://pir.georgetown.edu/iprolink/biothesaurus
Synonyms: CRYAAcrystallin, alpha ACRYA1HSPB4…
Lab
11
RLIMS-P: Text mining for protein phosphorylation
http://pir.georgetown.edu/iprolink/rlimsp/ Lab
12
UniProt Text Search(http://www.pir.uniprot.org/cgi-bin/textSearch) Google type search vs.
Boolean searches: AND, OR, NOT
Lab
13
PIR Text Search (I)
((http://pir.georgetown.edu/pirwww/search/textsearch.html) ) Search: alpha crystallin A chain that are in protein families?
Search for synonyms
Search for synonyms
Lab
14
PIR Text Search (II) Search: what crystallins are enzymes and what families they belong to?
Can you find which crystallins have 3D structure determined?
Lab
Argininosuccinate lyase (EC 4.3.2.1)
15
I. Sequence & Genomics Databases
• NCBI Resources– GenBank: An annotated collection of all publicly available nucleotide and
protein sequences.
– RefSeq: NCBI non-redundant set of reference sequences, including genomic DNA, transcript (RNA), and protein products
– Entrez Gene: Gene-centered information at NCBI.
– UniGene: Unified clusters of ESTs and full-length mRNA sequences .
– OMIM: Online Mendelian inheritance in man: a catalog of human genetic and genomic disorders.
• UniProt Consortium Database: Universal protein resource, a central repository of protein sequence and function.
• Model Organism Genome Databases: MGD, RGD, SGD, Flybase…• GeneCards: Integrated database of human genes, maps, proteins and
diseases.• SNP Consortium Database (dbSNP); International HapMap Project:
Genes associated with human diseases (http://www.oxfordjournals.org/nar/database/cap/)
16
UniProt Consortium Databases
((http://www.uniprot.org) )
Universal Protein Resource
New!
http://beta.uniprot.org/
5.8 million
Since October 2002
17
UniProt Sequence Report (I)
(http://www.pir.uniprot.org/cgi-bin/unipEntry?id=CRYAA_RABIT)
What’s the difference between CRYAA_RABIT & CYRBAA?
UniProtKB
Lab
18
UniProt Report (II): UniRef100 & 90
(http://www.pir.uniprot.org/cgi-bin/unipEntry?id=UniRef90_P02489)
(http://www.pir.uniprot.org/cgi-bin/unipEntry?id=UniRef100_P02489)
UniRef100
UniRef90
19
Entrez Gene – Gene centric information
http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=gene&cmd=Retrieve&dopt=Graphics&list_uids=12954#ubor0_RefSeq
20
OMIM: Online Mendelian inheritance in man
(http://www.ncbi.nlm.nih.gov/entrez/dispomim.cgi?id=123580)
Juvenile cataract of Down syndromeJuvenile cataract
of Down syndrome
Autosomal recessive congenital progressive
cataract
Autosomal recessive congenital progressive
cataract
21
II. Protein Family Databases
• Whole Proteins– PIRSF: Nonoverlapping Classification of Full Length Proteins Based on
Evolutionary Relationship– COG (Clusters of Orthologous Groups) of Complete Genomes– PANTHER: Proteins Classified into Families/Subfamilies of Shared Function– ProtoNet: Automatic Hierarchical Classification of Proteins
• Protein Domains– Pfam: Alignments and HMM Models of Protein Domains– SMART: Protein Domain Identification and Annotation– CDD: Conserved Domain Database
• Protein Motifs– PROSITE: Protein Patterns and Profiles– BLOCKS: Protein Sequence Motifs and Alignments– PRINTS: Compendium of Protein Fingerprints (a group of conserved motifs)
• Integrated Family Databases– InterPro: Integrate Pfam, PRINTS, PROSITES, ProDom, SMART, PIRSF,
SuperFamily…
22
Protein Clustering
COGs:COGs: ((http://www.ncbi.nlm.nih.gov/COG/))
Initial version
New version: Includes Eukaryotic Clusters - KOGs
23
PIRSF: Full Length Classification
iProClass Family Report
(http://pir.georgetown.edu/cgi-bin/ipcSF?id=SF002280)
Lab
24
Domain Classification – Pfam Domain
(http://pir.georgetown.edu/cgi-bin/ipcEntry?id=P02493)
(http://www.sanger.ac.uk/cgi-bin/Pfam/swisspfamget.pl?name=CRYAA_RABIT)
25
Pfam Domain(http://www.sanger.ac.uk/cgi-bin/Pfam/getacc?PF00525)
26
Protein Motifs: PROSITE – A database of protein families and domains. It consists of biologically significant sites, patterns and profiles.
(http://us.expasy.org/prosite/)
27
Integrated Family Classification
InterProInterPro: An integrated resource unifying PROSITE, PRINTS, ProDom, Pfam, SMART, and TIGRFAMs, PIRSF. (http://www.ebi.ac.uk/interpro/search.html
)
Mapping of families
28
III. Databases of Protein Functions• Metabolic Pathways, Enzymes, and Compounds
– Enzyme Classification: Classification and Nomenclature of Enzyme-Catalysed Reactions (EC-IUBMB)
– KEGG (Kyoto Encyclopedia of Genes and Genomes): Metabolic Pathways– LIGAND (at KEGG): Chemical Compounds, Reactions and Enzymes– EcoCyc: Encyclopedia of E. coli Genes and Metabolism– MetaCyc: Metabolic Encyclopedia (Metabolic Pathways)– BRENDA: Enzyme Database– UM-BBD: Microbial Biocatalytic Reactions and Biodegradation Pathways
• Inter-Molecular interactions and Regulatory Pathways– IntAct: Protein interaction data from literature and user submission– BIND: Descriptions of interactions, molecular complexes and pathways– DIP: Catalogs experimentally determined interactions between proteins – Reactome - A curated knowledgebase of biological pathways – BioCarta: Biological pathways of human and mouse– GO: Gene Ontology Consortium Database
• Pathway Resources - Pathguide
29
Biological Pathway Resource Collectionhttp://www.pathguide.org/
• Protein-protein interactions
• Metabolic pathways
• Signaling pathways
• Pathway diagrams
• Transcription factors / gene regulatory networks
• Protein-compound interactions
• Genetic interaction networks
30
http://www.pathwaycommons.org/pc/home.do
31
KEGG Metabolic & Regulatory Pathways
(http://www.genome.ad.jp/dbget-bin/show_pathway?hsa00220+4.3.2.1)
KEGG is a suite of databases and associated software, integrating our current knowledge on molecular interaction networks, the information of genes and proteins, and of chemical compounds and reactions. (http://www.genome.ad.jp/kegg/kegg2.html)
Lab
32
BioCyc: EcoCyc/MetaCyc Metabolic Pathways
The BioCyc Knowledge Library is a collection of Pathway/Genome Databases (http://biocyc.org/)
34
Reactome: http://www.reactome.org/ • Collaboration of CSHL, EBI and GO Consortium• Curated resource of core pathways and reactions in human biology• Authored by biological researchers of field experts• Cross-referenced with NCBI, Ensembl and UniProt, HapMap, KEGG…• Inferred orthologous events in 22 non-human species (mouse, rat…)
35
Transforming Growth Factor (TGF) beta signaling [Homo sapiens]
Event ->REACT_6879.1: Activated type I receptor phosphorylates R-SMAD directly [Homo sapiens] Object -> REACT_7364.1: Phospho-R-SMAD [cytosol]Event -> REACT_6760.1: Phospho-R-SMAD forms a complex with CO-SMAD [Homo sapiens]Object -> REACT_7344.1: Phospho-R-SMAD:CO-SMAD complex [cytosol]Event -> REACT_6726.1: The phospho-R-SMAD:CO-SMAD transfers to the nucleusObject -> REACT_7382.2: Phospho-R-SMAD:CO-SMAD complex [nucleoplasm] ……
(http://reactome.org/cgi-bin/eventbrowser?DB=gk_current&FOCUS_SPECIES=Homo%20sapiens&ID=170834&)
Reactome: events and objects (including modified forms and complex)
36
Protein-Protein Interaction Database - IntAct(http://www.ebi.ac.uk/intact/)
37
Gene Ontology (GO)
- Molecular Function - Biological Process - Cellular Component
(http://www.geneontology.org/)
38
IV. Databases of Protein Structures
• Protein Structure– PDB: Structure Determined by X-ray Crystallography and NMR– PDBsum: Summaries and analyses of PDB structures – MMDB: NCBI’s database of 3D structures, part of NCBI Entrez– SWISS-MODEL Repository: Database of annotated protein 3D
models– ModBase: Annotated comparative protein structure models
• Structure Classification– CATH: Hierarchical Classification of Protein Domain Structures– SCOP: Familial and Structural Protein Relationships– FSSP: Protein Fold Classification Based on Structure--Structure
Alignment
39
PDB: Experimental 3D Structure Repository
(http://www.rcsb.org/pdb/)
Rat gamma-Rat gamma-crystallin (chain A, crystallin (chain A, B.)B.)
Can you do a text search at PIR to find this (CRGE_RAT)?
Lab
40
PDBsum:Pictorial Database to Provide Summary and Analysis to PDB Entries
Search 3-D structure summary
2-D structure summary
(http://www.ebi.ac.uk/thornton-srv/databases/pdbsum/)
41
Protein Structural Classification (1)
CATH: Hierarchical domain classification of protein structures (http://www.cathdb.info/latest/index.html)
42
Protein Structural Classification (2)
(http://scop.mrc-lmb.cam.ac.uk/scop/data/scop.b.html)
SCOP: comprehensive description of structural and evolutionary relationships between all proteins whose structure is known.
43
SWISS-MODEL Repository
A database of annotated three-dimensional A database of annotated three-dimensional comparative protein structure modelscomparative protein structure models (http://swissmodel.expasy.org/repository/smr.php?sptr_ac=CRBA1_MOUSE&job=2)
http://swissmodel.expasy.org/repository/
http://swissmodel.expasy.org/
44
VI. Proteomic Resources• GELBANK (GELBANK (http://gelbank.anl.gov): 2D-gel patterns of species with ): 2D-gel patterns of species with
completed genomes. completed genomes. • SWISS-2DPAGESWISS-2DPAGE ( (http://www.expasy.org/ch2d/): index of 2D-gels): index of 2D-gels• PEP (PEP (http://cubic.bioc.columbia.edu/ pep/): Predictions for Entire ): Predictions for Entire
Proteomes: summarized analyses of protein sequences Proteomes: summarized analyses of protein sequences • Integr8 (Integr8 (http://www.ebi.ac.uk/integr8/): A browser for information ): A browser for information
relating to completed genomes and proteomes, based on data relating to completed genomes and proteomes, based on data contained in Genome Reviews and the UniProt proteome setscontained in Genome Reviews and the UniProt proteome sets
• PRIDE (PRIDE (http://www.ebi.ac.uk/pride/): PRoteomics IDEntifications ): PRoteomics IDEntifications database Expression Profiling databasesdatabase Expression Profiling databases
• GPMdb GPMdb ((http://gpmdb.thegpm.org/): Mass spec proteomics ): Mass spec proteomics DatabasesDatabases
• PeptideAtlas (http://www.peptideatlas.org/): compendium of PeptideAtlas (http://www.peptideatlas.org/): compendium of peptides identified in a large set of tandem mass spectrometry peptides identified in a large set of tandem mass spectrometry proteomic experimentsproteomic experiments
• HUPO (http://www.hupo.org/): Human Proteome Organization to HUPO (http://www.hupo.org/): Human Proteome Organization to foste international proteomics initiatives.foste international proteomics initiatives.
45
2D-Gel Image Databases
(http://us.expasy.org/swiss-2dpage/ac=P02489) Part of WORLD-2DPAGE: index to 2-D PAGE databases and services
(http://us.expasy.org/ch2d/)
Lab
46
GPMdb: MS Data Search (http://gpmdb.thegpm.org/)
Craig, et al., J Proteome Res. 2004, 3:1234-42.
47
PRIDE: centralized, standards compliant, public data repository for proteomics data
http://www.ebi.ac.uk/pride/
HUPO Plasma
Proteome Project
48
Protein Examples
• Rabbit alpha crystallin A (UniProtKB: CRYAA_RABIT/P02493)
• Delta crystallin II (Argininosuccinate lyase) (UniProtKB: ARLY2_ANAPL/P24058)
• Any additional proteins of your interest for search and retrieval
Lab:
I.I. Text search / Information retrievalText search / Information retrieval1. Literature search and text mining
– Finding synonyms (BioThesaurus)Finding synonyms (BioThesaurus)– Information extraction (e.g., protein phosphorylation sites)Information extraction (e.g., protein phosphorylation sites)
2. Find the sequence for the rabbit alpha crystallin A chain3. Find all alpha crystallin A chain classified in protein families4. Search crystallins that have active enzyme activities5. Find crystallins that have determined 3D structures
II.II. Database contents (reports)Database contents (reports)1.1. Sequence & genomics databases (UniProt)Sequence & genomics databases (UniProt)2.2. Protein family databases (PIRSF)Protein family databases (PIRSF)3.3. Database of protein functions (KEGG)Database of protein functions (KEGG)4.4. Databases of protein structures (PDB)Databases of protein structures (PDB)5.5. Proteomics databases (Swiss-2D)Proteomics databases (Swiss-2D)