Proteins and Protein Function Charles Yan Spring 2006.
-
date post
21-Dec-2015 -
Category
Documents
-
view
219 -
download
1
Transcript of Proteins and Protein Function Charles Yan Spring 2006.
2
Amino Acids
General structure of an amino acid
20 standard amino acids each with a different R group
3
Amino Acids
Amino Acid 3-letter code 1-letter code
Alanine Ala A
Arginine Arg R
Asparagine Asn N
Aspartate Asp D
Cysteine Cys C
Glutamine Gln Q
Glutamate Glu E
Glycine Gly G
Histidine His H
Isoleucine Ile I
Table 1. 20 standard amino acids
4
Amino Acids
Amino Acid 3-letter code 1-letter code
Leucine Leu L
Lysine Lys K
Methionine Met M
Phenylalanine Phe F
Proline Pro P
Serine Ser S
Threonine Thr T
Tryptophan Trp W
Tyrosine Tyr Y
Valine Val V
Table 1. 20 standard amino acids (Cont.)
5
Amino Acids
Amino Acid 3-letter code 1-letter code
Asparagine (N) or aspartate (D)
Asx B
Glutamine (Q) or glutamate (E) Glx Z
Any amino acid Xaa X
Authority IUPAC-IUB Joint Commission on Biochemical Nomenclature. Reference IUPAC-IUB Joint Commission on Biochemical Nomenclature.
Nomenclature and Symbolism for Amino Acids and Peptides. Eur. J. Biochem. 138:9-37(1984).
Amino Acid Abbreviations (IUPAC)
6
Proteins
Two separate amino acids can be linked together by a peptide bond
A chain of amino acids linked by peptide bonds is called a polypeptide.
A protein is made up of one or more polypeptide chains For simplicity, in this course, a protein is a chain of amino acids
linked by peptide bonds, e.g.
VSQLLKQRVRYAPYLSKVRRAEELLPLFKHGQYIGWSGFTGVGAPKVI
7
Protein Database
UniProt (Universal Protein Resource) (http://www.pir.uniprot.org/) is the world's most comprehensive catalog of information on proteins. It is a collaboration between
Swiss Institute of Bioinformatics (SIB) Department of Bioinformatics and Structural Biology of the Geneva University European Bioinformatics Institute (EBI) Georgetown University Medical Center's Protein Information Resource (PIR)
It includes three components
8
Protein Database
UniProt Knowledgebase (UniProtKB): the central access point for extensive curated protein information.
UniProtKB/Swiss-Prot: a manually annotated protein sequence database which provide a high level of annotation, a minimal level of redundancy and high level of integration with other databases. UniProtKB/Swiss-Prot Release 48.7 of 20-Dec-2005: 204,086 entries
UniProtKB/TrEMBL: a computer-annotated supplement of Swiss-Prot that contains all the translations of EMBL nucleotide sequence entries not yet integrated in Swiss-Prot. UniProtKB/TrEMBL Release 31.7 of 20-Dec-2005: 2,506,886 entries
UniProt Reference Clusters (UniRef): databases combine closely related sequences into a single record to speed searches.
UniProt Archive (UniParc): a comprehensive repository, reflecting the history of all protein sequences
15
Gene Ontology
Protein synthesis
Translation
Goal: find all the proteins that are involved protein synthesis
17
Gene Ontology
Ontology n. the branch of metaphysics dealing with the nature of being.
(The New Oxford American Dictionary, Edited by Elizabeth J. Jewell, Frank Abate, Oxford University Press, 2001,pp 1197.)
Metaphysicsn. the branch of philosophy that deals with the first principles of things, including abstract concepts such as being, knowing, substance, cause, identity, time, and space.(The New Oxford American Dictionary, Edited by Elizabeth J. Jewell, Frank Abate, Oxford University Press, 2001,pp 1074.)
18
Gene Ontology
The Gene Ontology (GO) (http://www.geneontology.org/) project is a collaborative effort to address the need for consistent descriptions of gene products in different databases. The project began as a collaboration between three model organism databases: FlyBase (Drosophila),the Saccharomyces Genome Database (SGD) and the Mouse Genome Database (MGD) in 1998. Since then, the GO Consortium has grown to include many databases, including several of the world's major repositories for plant, animal and microbial genomes.
19
Gene Ontology
Develop structured, controlled vocabularies (ontologies) that describe gene products
Make associations between the ontologies and the genes and gene products in the collaborating databases,
Develop tools that facilitate the creation, maintainence and use of ontologies
The use of GO terms facilitates uniform queries across databases
20
Gene Ontology
The three components of GO are molecular function, biological process and cellular component
GO terms are organized in structures called directed acyclic graphs (DAGs), which differ from hierarchies in that a child, or more specialized, term can have many parent, or less specialized, terms
hexose biosynthesis
monosaccharide biosynthesis hexose metabolism
21
Gene Ontology
The controlled vocabularies are structured so that you can query them at different levels
GO browser AmiGO (http://www.godatabase.org/cgi-bin/amigo/go.cgi)
23
Protein function
Three steps to get a set of proteins that have a certain function Search for the GO term(http://www.godatabase.org/cgi-bin/amigo/go.cgi) Search for the proteins belong to a certain GO(http://www.pir.uniprot.org/search/textSearch.shtml) Save the sequence in FASTA format