Proteins and Protein Function Charles Yan Spring 2006.

26
Proteins and Protein Function Charles Yan Spring 2006
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    219
  • download

    1

Transcript of Proteins and Protein Function Charles Yan Spring 2006.

Proteins and Protein Function

Charles YanSpring 2006

2

Amino Acids

General structure of an amino acid

20 standard amino acids each with a different R group

3

Amino Acids

Amino Acid 3-letter code 1-letter code

Alanine Ala A

Arginine Arg R

Asparagine Asn N

Aspartate Asp D

Cysteine Cys C

Glutamine Gln Q

Glutamate Glu E

Glycine Gly G

Histidine His H

Isoleucine Ile I

Table 1. 20 standard amino acids

4

Amino Acids

Amino Acid 3-letter code 1-letter code

Leucine Leu L

Lysine Lys K

Methionine Met M

Phenylalanine Phe F

Proline Pro P

Serine Ser S

Threonine Thr T

Tryptophan Trp W

Tyrosine Tyr Y

Valine Val V

Table 1. 20 standard amino acids (Cont.)

5

Amino Acids

Amino Acid 3-letter code 1-letter code

Asparagine (N) or aspartate (D)

Asx B

Glutamine (Q) or glutamate (E) Glx Z

Any amino acid Xaa X

Authority       IUPAC-IUB Joint Commission on Biochemical  Nomenclature. Reference      IUPAC-IUB Joint Commission on Biochemical   Nomenclature.   

Nomenclature   and    Symbolism   for   Amino   Acids   and  Peptides.                 Eur. J. Biochem. 138:9-37(1984).

Amino Acid Abbreviations (IUPAC)

6

Proteins

Two separate amino acids can be linked together by a peptide bond

A chain of amino acids linked by peptide bonds is called a polypeptide.

A protein is made up of one or more polypeptide chains For simplicity, in this course, a protein is a chain of amino acids

linked by peptide bonds, e.g.

VSQLLKQRVRYAPYLSKVRRAEELLPLFKHGQYIGWSGFTGVGAPKVI

7

Protein Database

UniProt (Universal Protein Resource) (http://www.pir.uniprot.org/) is the world's most comprehensive catalog of information on proteins. It is a collaboration between

Swiss Institute of Bioinformatics (SIB) Department of Bioinformatics and Structural Biology of the Geneva University European Bioinformatics Institute (EBI) Georgetown University Medical Center's Protein Information Resource (PIR)

It includes three components

8

Protein Database

UniProt Knowledgebase (UniProtKB): the central access point for extensive curated protein information.

UniProtKB/Swiss-Prot: a manually annotated protein sequence database which provide a high level of annotation, a minimal level of redundancy and high level of integration with other databases. UniProtKB/Swiss-Prot Release 48.7 of 20-Dec-2005: 204,086 entries

UniProtKB/TrEMBL: a computer-annotated supplement of Swiss-Prot that contains all the translations of EMBL nucleotide sequence entries not yet integrated in Swiss-Prot. UniProtKB/TrEMBL Release 31.7 of 20-Dec-2005: 2,506,886 entries

UniProt Reference Clusters (UniRef): databases combine closely related sequences into a single record to speed searches.

UniProt Archive (UniParc): a comprehensive repository, reflecting the history of all protein sequences

9

Protein Database

10

Protein Database

11

Protein Database

12

Protein Database

13

Protein Database

14

15

Gene Ontology

Protein synthesis

Translation

Goal: find all the proteins that are involved protein synthesis

16

Gene Ontology

Volkswagen Golf Golf

I like golf.

Me too!

17

Gene Ontology

Ontology n. the branch of metaphysics dealing with the nature of being.

(The New Oxford American Dictionary, Edited by Elizabeth J. Jewell, Frank Abate, Oxford University Press, 2001,pp 1197.)

Metaphysicsn. the branch of philosophy that deals with the first principles of things, including abstract concepts such as being, knowing, substance, cause, identity, time, and space.(The New Oxford American Dictionary, Edited by Elizabeth J. Jewell, Frank Abate, Oxford University Press, 2001,pp 1074.)

18

Gene Ontology

The Gene Ontology (GO) (http://www.geneontology.org/) project is a collaborative effort to address the need for consistent descriptions of gene products in different databases. The project began as a collaboration between three model organism databases: FlyBase (Drosophila),the Saccharomyces Genome Database (SGD) and the Mouse Genome Database (MGD) in 1998. Since then, the GO Consortium has grown to include many databases, including several of the world's major repositories for plant, animal and microbial genomes.

19

Gene Ontology

Develop structured, controlled vocabularies (ontologies) that describe gene products

Make associations between the ontologies and the genes and gene products in the collaborating databases,

Develop tools that facilitate the creation, maintainence and use of ontologies

The use of GO terms facilitates uniform queries across databases

20

Gene Ontology

The three components of GO are molecular function, biological process and cellular component

GO terms are organized in structures called directed acyclic graphs (DAGs), which differ from hierarchies in that a child, or more specialized, term can have many parent, or less specialized, terms

hexose biosynthesis

monosaccharide biosynthesis hexose metabolism

21

Gene Ontology

The controlled vocabularies are structured so that you can query them at different levels

GO browser AmiGO (http://www.godatabase.org/cgi-bin/amigo/go.cgi)

22

23

Protein function

Three steps to get a set of proteins that have a certain function Search for the GO term(http://www.godatabase.org/cgi-bin/amigo/go.cgi) Search for the proteins belong to a certain GO(http://www.pir.uniprot.org/search/textSearch.shtml) Save the sequence in FASTA format

24

Search for the GO

25

Search for the proteins belong to a certain GO

26

Save sequences in FASTA format