Protein analysis and proteomics July 29, 2009 August 5, 2009 Bioinformatics M.E:800.707 J. Pevsner...

109
Protein analysis and proteomics July 29, 2009 August 5, 2009 Bioinformatics M.E:800.707 J. Pevsner [email protected]

Transcript of Protein analysis and proteomics July 29, 2009 August 5, 2009 Bioinformatics M.E:800.707 J. Pevsner...

Page 1: Protein analysis and proteomics July 29, 2009 August 5, 2009 Bioinformatics M.E:800.707 J. Pevsner pevsner@kennedykrieger.org.

Protein analysis and proteomics

July 29, 2009August 5, 2009

BioinformaticsM.E:800.707J. Pevsner

[email protected]

Page 2: Protein analysis and proteomics July 29, 2009 August 5, 2009 Bioinformatics M.E:800.707 J. Pevsner pevsner@kennedykrieger.org.

Many of the images in this powerpoint presentationare from Bioinformatics and Functional Genomicsby J Pevsner (copyright © 2009 by Wiley-Blackwell).

These images and materials may not be usedwithout permission from the publisher.

Visit http://www.bioinfbook.org

Copyright notice

Page 3: Protein analysis and proteomics July 29, 2009 August 5, 2009 Bioinformatics M.E:800.707 J. Pevsner pevsner@kennedykrieger.org.

Outline for tonight: Protein analysis and proteomics

Individual proteins Perspective 1: Protein families (domains and motifs) Perspective 2: Physical properties (3D structure) Perspective 3: Localization Perspective 4: Function

Page 4: Protein analysis and proteomics July 29, 2009 August 5, 2009 Bioinformatics M.E:800.707 J. Pevsner pevsner@kennedykrieger.org.

protein

Page 389

RNADNA

Page 5: Protein analysis and proteomics July 29, 2009 August 5, 2009 Bioinformatics M.E:800.707 J. Pevsner pevsner@kennedykrieger.org.

protein

[1] Protein families

[4] Protein function

[2] Physical properties

Page 389

[3] Protein localization

Gene ontology (GO):--cellular component--biological process--molecular function

Page 6: Protein analysis and proteomics July 29, 2009 August 5, 2009 Bioinformatics M.E:800.707 J. Pevsner pevsner@kennedykrieger.org.

The Human Proteome Organisation (HUPO) Proteomics Standards Initiative (PSI)

Goals: defining standards for proteomic data representation to facilitate the comparison, exchange, and verification of data

Page 7: Protein analysis and proteomics July 29, 2009 August 5, 2009 Bioinformatics M.E:800.707 J. Pevsner pevsner@kennedykrieger.org.

The Human Proteome Organisation (HUPO) Proteomics Standards Initiative (PSI)

Work groups

# Gel Electrophoresis# Mass Spectrometry# Molecular Interactions# Protein Modifications# Proteomics Informatics# Sample Processing

Themes

# Controlled vocabularies# MIAPE: Minimum information about a proteomics experiment

Page 8: Protein analysis and proteomics July 29, 2009 August 5, 2009 Bioinformatics M.E:800.707 J. Pevsner pevsner@kennedykrieger.org.

The Human Proteome Organisation (HUPO) Proteomics Standards Initiative (PSI)

http://www.psidev.info/

Page 9: Protein analysis and proteomics July 29, 2009 August 5, 2009 Bioinformatics M.E:800.707 J. Pevsner pevsner@kennedykrieger.org.

Perspective 1: Protein domains and motifs

Page 389

Page 10: Protein analysis and proteomics July 29, 2009 August 5, 2009 Bioinformatics M.E:800.707 J. Pevsner pevsner@kennedykrieger.org.

Definitions

Signature: • a protein category such as a domain or motif

Page 390

Page 11: Protein analysis and proteomics July 29, 2009 August 5, 2009 Bioinformatics M.E:800.707 J. Pevsner pevsner@kennedykrieger.org.

Definitions

Signature: • a protein category such as a domain or motif

Domain: • a region of a protein that can adopt a 3D structure• a fold• a family is a group of proteins that share a domain• examples: zinc finger domain immunoglobulin domain

Motif (or fingerprint):• a short, conserved region of a protein• typically 10 to 20 contiguous amino acid residues

Page 390

Page 12: Protein analysis and proteomics July 29, 2009 August 5, 2009 Bioinformatics M.E:800.707 J. Pevsner pevsner@kennedykrieger.org.

15 most common domains (human)

Zn finger, C2H2 type 1093 proteinsImmunoglobulin 1032EGF-like 471Zn-finger, RING 458Homeobox 417Pleckstrin-like 405RNA-binding region RNP-1400SH3 394Calcium-binding EF-hand 392Fibronectin, type III 300PDZ/DHR/GLGF 280Small GTP-binding protein 261BTB/POZ 236bHLH 226Cadherin 226

Page 391Source: Integr8 at EBI website

Page 13: Protein analysis and proteomics July 29, 2009 August 5, 2009 Bioinformatics M.E:800.707 J. Pevsner pevsner@kennedykrieger.org.

15 most common domains (various species)

The European Bioinformatics Institute (EBI) offers many key proteomics resources at the Integr8 site:

http://www.ebi.ac.uk/proteome/

Page 391

Page 14: Protein analysis and proteomics July 29, 2009 August 5, 2009 Bioinformatics M.E:800.707 J. Pevsner pevsner@kennedykrieger.org.

1. Go to the Integr8 site: http://www.ebi.ac.uk/proteome/

2. Browse species; choose Homo sapiens.

3. Click “Proteome analysis”

4. Obtain a variety of statistics, such as common repeats, domains, average protein length

Page 15: Protein analysis and proteomics July 29, 2009 August 5, 2009 Bioinformatics M.E:800.707 J. Pevsner pevsner@kennedykrieger.org.

Source: Integr8 at EBI website (updated 7/09)

amino acid

fre

que

ncy

Amino acid composition

Page 16: Protein analysis and proteomics July 29, 2009 August 5, 2009 Bioinformatics M.E:800.707 J. Pevsner pevsner@kennedykrieger.org.

Source: Integr8 at EBI website (updated 7/09)

Re

lativ

e fr

eq

uen

cy

protein length

Average protein length : 468+/- 522 amino acid residues Size range: 4 - 34350 amino acid residues

Page 17: Protein analysis and proteomics July 29, 2009 August 5, 2009 Bioinformatics M.E:800.707 J. Pevsner pevsner@kennedykrieger.org.

Definition of a domain

According to InterPro at EBI (http://www.ebi.ac.uk/interpro/):

A domain is an independent structural unit, found aloneor in conjunction with other domains or repeats.Domains are evolutionarily related.

According to SMART (http://smart.embl-heidelberg.de):

A domain is a conserved structural entity with distinctivesecondary structure content and a hydrophobic core.Homologous domains with common functions usuallyshow sequence similarities.

Page 390

Page 18: Protein analysis and proteomics July 29, 2009 August 5, 2009 Bioinformatics M.E:800.707 J. Pevsner pevsner@kennedykrieger.org.

Varieties of protein domains

Page 393

Extending along the length of a protein

Occupying a subset of a protein sequence

Occurring one or more times

Page 19: Protein analysis and proteomics July 29, 2009 August 5, 2009 Bioinformatics M.E:800.707 J. Pevsner pevsner@kennedykrieger.org.

Example of a protein with domains: Methyl CpG binding protein 2 (MeCP2)

MBD

Page 393

TRD

The protein includes a methylated DNA binding domain(MBD) and a transcriptional repression domain (TRD).MeCP2 is a transcriptional repressor.

Mutations in the gene encoding MeCP2 cause RettSyndrome, a neurological disorder affecting girlsprimarily.

Page 20: Protein analysis and proteomics July 29, 2009 August 5, 2009 Bioinformatics M.E:800.707 J. Pevsner pevsner@kennedykrieger.org.

Page 393

Result of an MeCP2 blastp search:A methyl-binding domain shared by several proteins

domain

Page 21: Protein analysis and proteomics July 29, 2009 August 5, 2009 Bioinformatics M.E:800.707 J. Pevsner pevsner@kennedykrieger.org.

Page 393

Are proteins that share only a domain homologous?

Page 22: Protein analysis and proteomics July 29, 2009 August 5, 2009 Bioinformatics M.E:800.707 J. Pevsner pevsner@kennedykrieger.org.

Example of a multidomain protein: HIV-1 pol

Pol (NP_789740), 995 amino acids longGag-Pol (NP_057849), 1435 amino acids

• cleaved into three proteins with distinct activities:-- aspartyl protease-- reverse transcriptase-- integrase

We will explore HIV-1 pol and other proteins at theExpert Protein Analysis System (ExPASy) server.

Visit www.expasy.org/

Page 394

Page 23: Protein analysis and proteomics July 29, 2009 August 5, 2009 Bioinformatics M.E:800.707 J. Pevsner pevsner@kennedykrieger.org.

Page 394

http://www.expasy.ch/

Page 24: Protein analysis and proteomics July 29, 2009 August 5, 2009 Bioinformatics M.E:800.707 J. Pevsner pevsner@kennedykrieger.org.

1. Go to ExPASy (http://www.expasy.ch/)2. If you know the SwissProt accession of your protein, enter it at top.3. Otherwise go into Swiss-Prot/TrEMBL, click SRS (Sequence Retrieval System), click Start, then click continue, then search for your protein of interest.

Page 25: Protein analysis and proteomics July 29, 2009 August 5, 2009 Bioinformatics M.E:800.707 J. Pevsner pevsner@kennedykrieger.org.

SwissProt: which HIV-1 pol is “correct”?

Use P04587

Page 26: Protein analysis and proteomics July 29, 2009 August 5, 2009 Bioinformatics M.E:800.707 J. Pevsner pevsner@kennedykrieger.org.

The SwissProt entry for any protein provideshighly useful information…

Page 27: Protein analysis and proteomics July 29, 2009 August 5, 2009 Bioinformatics M.E:800.707 J. Pevsner pevsner@kennedykrieger.org.

Page 395

SwissProt entry for HIV-1 pol links to many databases

Page 28: Protein analysis and proteomics July 29, 2009 August 5, 2009 Bioinformatics M.E:800.707 J. Pevsner pevsner@kennedykrieger.org.

Page 395

ProDom entry for HIV-1 pol shows many related proteins

Page 29: Protein analysis and proteomics July 29, 2009 August 5, 2009 Bioinformatics M.E:800.707 J. Pevsner pevsner@kennedykrieger.org.
Page 30: Protein analysis and proteomics July 29, 2009 August 5, 2009 Bioinformatics M.E:800.707 J. Pevsner pevsner@kennedykrieger.org.

www.uniprot.org

Three protein databases recently merged to form UniProt:

• SwissProt

• TrEMBL (translated European Molecular Biology Lab)

• Protein Information Resource (PIR)

You can search for information on your favorite protein there; a BLAST server is provided.

Page 31: Protein analysis and proteomics July 29, 2009 August 5, 2009 Bioinformatics M.E:800.707 J. Pevsner pevsner@kennedykrieger.org.

Proteins can have both domains and motifs (patterns)

Domain(aspartylprotease)

Domain(reversetranscriptase)

Motif(severalresidues)

Motif(severalresidues)

Page 32: Protein analysis and proteomics July 29, 2009 August 5, 2009 Bioinformatics M.E:800.707 J. Pevsner pevsner@kennedykrieger.org.

Page 396

Page 33: Protein analysis and proteomics July 29, 2009 August 5, 2009 Bioinformatics M.E:800.707 J. Pevsner pevsner@kennedykrieger.org.

Definition of a motif

A motif (or fingerprint) is a short, conserved region of a protein. Its size is often 10 to 20 amino acids.

Simple motifs include transmembrane domains andphosphorylation sites. These do not imply homologywhen found in a group of proteins.

PROSITE (www.expasy.org/prosite) is a dictionary of motifs (there are currently 1600 entries). In PROSITE,a pattern is a qualitative motif description (a proteineither matches a pattern, or not). In contrast, a profileis a quantitative motif description. We will encounterprofiles in Pfam, ProDom, SMART, and other databases.

Page 394

Page 34: Protein analysis and proteomics July 29, 2009 August 5, 2009 Bioinformatics M.E:800.707 J. Pevsner pevsner@kennedykrieger.org.

Summary of Perspective 1: Protein domains and motifs

A signature is a protein category such as a domain or motif.

You can learn about domains at Integr8, and at databases such as InterPro and Pfam.

A motif (or fingerprint) is a short, conserved sequence. You can study motifs at Prosite at ExPASy.

Page 35: Protein analysis and proteomics July 29, 2009 August 5, 2009 Bioinformatics M.E:800.707 J. Pevsner pevsner@kennedykrieger.org.

Perspective 2: Physical properties of proteins

Page 397

Page 36: Protein analysis and proteomics July 29, 2009 August 5, 2009 Bioinformatics M.E:800.707 J. Pevsner pevsner@kennedykrieger.org.

Page 398

Page 37: Protein analysis and proteomics July 29, 2009 August 5, 2009 Bioinformatics M.E:800.707 J. Pevsner pevsner@kennedykrieger.org.

Physical properties of proteins

Many websites are available for the analysis ofindividual proteins. ExPASy and ISREC are twoexcellent resources.

The accuracy of these programs is variable. Predictions based on primary amino acid sequence (such as molecular weight prediction) are likely to be more trustworthy. For many other properties (such asposttranslational modification of proteins by specific sugars), experimental evidence may be required rather than prediction algorithms.

Page 399

Page 38: Protein analysis and proteomics July 29, 2009 August 5, 2009 Bioinformatics M.E:800.707 J. Pevsner pevsner@kennedykrieger.org.

Page 399

Access a variety of protein analysis programsfrom the top right of the ExPASy home page

Page 39: Protein analysis and proteomics July 29, 2009 August 5, 2009 Bioinformatics M.E:800.707 J. Pevsner pevsner@kennedykrieger.org.

Page 400

Page 40: Protein analysis and proteomics July 29, 2009 August 5, 2009 Bioinformatics M.E:800.707 J. Pevsner pevsner@kennedykrieger.org.

Page 400

Page 41: Protein analysis and proteomics July 29, 2009 August 5, 2009 Bioinformatics M.E:800.707 J. Pevsner pevsner@kennedykrieger.org.

Protein secondary structure

Protein secondary structure is determined by the amino acid side chains.

Myoglobin is an example of a protein having many-helices. These are formed by amino acid stretches4-40 residues in length.

Thioredoxin from E. coli is an example of a proteinwith many sheets, formed from strands composedof 5-10 residues. They are arranged in parallel orantiparallel orientations.

Page 425

Page 42: Protein analysis and proteomics July 29, 2009 August 5, 2009 Bioinformatics M.E:800.707 J. Pevsner pevsner@kennedykrieger.org.

Fig. 11.3Page 427

Myoglobin(John Kendrew, 1958)

Page 43: Protein analysis and proteomics July 29, 2009 August 5, 2009 Bioinformatics M.E:800.707 J. Pevsner pevsner@kennedykrieger.org.

Thioredoxin

Fig. ~11.3Page 427

Page 44: Protein analysis and proteomics July 29, 2009 August 5, 2009 Bioinformatics M.E:800.707 J. Pevsner pevsner@kennedykrieger.org.

Secondary structure prediction

Chou and Fasman (1974) developed an algorithmbased on the frequencies of amino acids found in helices, -sheets, and turns.

Proline: occurs at turns, but not in helices.

GOR (Garnier, Osguthorpe, Robson): related algorithm

Modern algorithms: use multiple sequence alignmentsand achieve higher success rate (about 70-75%)

Page 427

Page 45: Protein analysis and proteomics July 29, 2009 August 5, 2009 Bioinformatics M.E:800.707 J. Pevsner pevsner@kennedykrieger.org.

Secondary structure prediction

Web servers:

GOR4JpredNNPREDICTPHDPredatorPredictProteinPSIPREDSAM-T99sec

Table 11-3Page 429

Page 46: Protein analysis and proteomics July 29, 2009 August 5, 2009 Bioinformatics M.E:800.707 J. Pevsner pevsner@kennedykrieger.org.
Page 47: Protein analysis and proteomics July 29, 2009 August 5, 2009 Bioinformatics M.E:800.707 J. Pevsner pevsner@kennedykrieger.org.
Page 48: Protein analysis and proteomics July 29, 2009 August 5, 2009 Bioinformatics M.E:800.707 J. Pevsner pevsner@kennedykrieger.org.

Go to http://pbil.univ-lyon1.fr/,click “Secondary structure prediction”to access this prediction tool

Page 49: Protein analysis and proteomics July 29, 2009 August 5, 2009 Bioinformatics M.E:800.707 J. Pevsner pevsner@kennedykrieger.org.

Tertiary protein structure: protein folding

Main approaches:

[1] Experimental determination (X-ray crystallography, NMR)

[2] Prediction

► Comparative modeling (based on homology)

► Threading

► Ab initio (de novo) prediction (Ingo Ruczinski at JHSPH)

Page 430

Page 50: Protein analysis and proteomics July 29, 2009 August 5, 2009 Bioinformatics M.E:800.707 J. Pevsner pevsner@kennedykrieger.org.

Experimental approaches to protein structure

[1] X-ray crystallography-- Used to determine 80% of structures-- Requires high protein concentration-- Requires crystals-- Able to trace amino acid side chains-- Earliest structure solved was myoglobin

[2] NMR-- Magnetic field applied to proteins in solution-- Largest structures: 350 amino acids (40 kD)-- Does not require crystallization

Page 430

Page 51: Protein analysis and proteomics July 29, 2009 August 5, 2009 Bioinformatics M.E:800.707 J. Pevsner pevsner@kennedykrieger.org.

Steps in obtaining a protein structure

Target selection

Obtain, characterize protein

Determine, refine, model the structure

Deposit in repositoryFig 11.5page 431

Page 52: Protein analysis and proteomics July 29, 2009 August 5, 2009 Bioinformatics M.E:800.707 J. Pevsner pevsner@kennedykrieger.org.

Priorities for target selection for protein structures

Page 432

Historically, small, soluble, abundant proteins werestudied (e.g. hemoglobin, cytochromes c, insulin).

Modern criteria:• Represent all branches of life• Represent previously uncharacterized families• Identify medically relevant targets• Some are attempting to solve all structures within an individual organism (Methanococcus jannaschii, Mycobacterium tuberculosis)

Page 53: Protein analysis and proteomics July 29, 2009 August 5, 2009 Bioinformatics M.E:800.707 J. Pevsner pevsner@kennedykrieger.org.

The Protein Data Bank (PDB)

Page 434

• PDB is the principal repository for protein structures• Established in 1971• Accessed at http://www.rcsb.org/pdb or simply http://www.pdb.org• Currently contains 59,000 structure entities

Updated 7/28/09

Page 54: Protein analysis and proteomics July 29, 2009 August 5, 2009 Bioinformatics M.E:800.707 J. Pevsner pevsner@kennedykrieger.org.

Fig. 11.7Page 435

Page 55: Protein analysis and proteomics July 29, 2009 August 5, 2009 Bioinformatics M.E:800.707 J. Pevsner pevsner@kennedykrieger.org.

stru

ctur

es

Fig. 9.6Page 281

yearUpdated 12/08

PDB content growth (www.pdb.org)

1972198019802000

yearly

total

10,000

2008

50,000

30,000

Page 56: Protein analysis and proteomics July 29, 2009 August 5, 2009 Bioinformatics M.E:800.707 J. Pevsner pevsner@kennedykrieger.org.

PDB holdings (12/08)

50,621 proteins, peptides2,225 protein/nucl. complexes1,946 nucleic acids33 other; carbohydrates54,825 total

Table 11-4Page 435

Page 57: Protein analysis and proteomics July 29, 2009 August 5, 2009 Bioinformatics M.E:800.707 J. Pevsner pevsner@kennedykrieger.org.

Fig. ~11.9Page 436

Page 58: Protein analysis and proteomics July 29, 2009 August 5, 2009 Bioinformatics M.E:800.707 J. Pevsner pevsner@kennedykrieger.org.

Fig. ~11.10Page 436

Page 59: Protein analysis and proteomics July 29, 2009 August 5, 2009 Bioinformatics M.E:800.707 J. Pevsner pevsner@kennedykrieger.org.

Viewing hemoglobin (accession 2H35) at PDB

Page 60: Protein analysis and proteomics July 29, 2009 August 5, 2009 Bioinformatics M.E:800.707 J. Pevsner pevsner@kennedykrieger.org.

Viewing structures at PDB: WebMol

Fig. 11.11Page 437

Page 61: Protein analysis and proteomics July 29, 2009 August 5, 2009 Bioinformatics M.E:800.707 J. Pevsner pevsner@kennedykrieger.org.

Protein Data Bank

Swiss-Prot, NCBI, EMBL

CATH, Dali, SCOP, FSSP

gateways to access PDB files

databases that interpret PDB files

Page 62: Protein analysis and proteomics July 29, 2009 August 5, 2009 Bioinformatics M.E:800.707 J. Pevsner pevsner@kennedykrieger.org.

The CATH Hierarchy

Fig. 11.18Page 444

Page 63: Protein analysis and proteomics July 29, 2009 August 5, 2009 Bioinformatics M.E:800.707 J. Pevsner pevsner@kennedykrieger.org.

Access to PDB through NCBI

Page 437

You can access PDB data at the NCBI several ways.

• Go to the Structure site, from the NCBI homepage• Use Entrez• Perform a BLAST search, restricting the output to the PDB database

Page 64: Protein analysis and proteomics July 29, 2009 August 5, 2009 Bioinformatics M.E:800.707 J. Pevsner pevsner@kennedykrieger.org.

Fig. ~11.12Page 438

Page 65: Protein analysis and proteomics July 29, 2009 August 5, 2009 Bioinformatics M.E:800.707 J. Pevsner pevsner@kennedykrieger.org.

Do a blastp search;set the database to pdb (Protein Data Bank)

Page 66: Protein analysis and proteomics July 29, 2009 August 5, 2009 Bioinformatics M.E:800.707 J. Pevsner pevsner@kennedykrieger.org.
Page 67: Protein analysis and proteomics July 29, 2009 August 5, 2009 Bioinformatics M.E:800.707 J. Pevsner pevsner@kennedykrieger.org.

Structure links

Structure accession(e.g. 2JTZ)

Page 68: Protein analysis and proteomics July 29, 2009 August 5, 2009 Bioinformatics M.E:800.707 J. Pevsner pevsner@kennedykrieger.org.

Fig. 9.13 Page 288

Page 69: Protein analysis and proteomics July 29, 2009 August 5, 2009 Bioinformatics M.E:800.707 J. Pevsner pevsner@kennedykrieger.org.

Fig. 9.14 Page 289

Page 70: Protein analysis and proteomics July 29, 2009 August 5, 2009 Bioinformatics M.E:800.707 J. Pevsner pevsner@kennedykrieger.org.
Page 71: Protein analysis and proteomics July 29, 2009 August 5, 2009 Bioinformatics M.E:800.707 J. Pevsner pevsner@kennedykrieger.org.
Page 72: Protein analysis and proteomics July 29, 2009 August 5, 2009 Bioinformatics M.E:800.707 J. Pevsner pevsner@kennedykrieger.org.

Access to PDB structures through NCBI

Page 291

Molecular Modeling DataBase (MMDB)

Cn3D (“see in 3D” or three dimensions):structure visualization software

Vector Alignment Search Tool (VAST):view multiple structures

Page 73: Protein analysis and proteomics July 29, 2009 August 5, 2009 Bioinformatics M.E:800.707 J. Pevsner pevsner@kennedykrieger.org.

Fig. 9.15 Page 290

Page 74: Protein analysis and proteomics July 29, 2009 August 5, 2009 Bioinformatics M.E:800.707 J. Pevsner pevsner@kennedykrieger.org.

Fig. 9.15 Page 290

Page 75: Protein analysis and proteomics July 29, 2009 August 5, 2009 Bioinformatics M.E:800.707 J. Pevsner pevsner@kennedykrieger.org.

Fig. 9.16 Page 291

Page 76: Protein analysis and proteomics July 29, 2009 August 5, 2009 Bioinformatics M.E:800.707 J. Pevsner pevsner@kennedykrieger.org.

Fig. 9.17 Page 292

Page 77: Protein analysis and proteomics July 29, 2009 August 5, 2009 Bioinformatics M.E:800.707 J. Pevsner pevsner@kennedykrieger.org.

Access to structure data at NCBI: VAST

Page 294

Vector Alignment Search Tool (VAST) offers a varietyof data on protein structures, including

-- PDB identifiers-- root-mean-square deviation (RMSD) values to describe structural similarities-- NRES: the number of equivalent pairs of alpha carbon atoms superimposed-- percent identity

Page 78: Protein analysis and proteomics July 29, 2009 August 5, 2009 Bioinformatics M.E:800.707 J. Pevsner pevsner@kennedykrieger.org.

Fig. 9.18Page 293

Page 79: Protein analysis and proteomics July 29, 2009 August 5, 2009 Bioinformatics M.E:800.707 J. Pevsner pevsner@kennedykrieger.org.

Predicting protein structure: CASP8 competition

8th Community Wide Experiment on theCritical Assessment of Techniques for Protein Structure Prediction

http://predictioncenter.gc.ucdavis.edu/http://predictioncenter.org/casp8/index.cgi

Page 80: Protein analysis and proteomics July 29, 2009 August 5, 2009 Bioinformatics M.E:800.707 J. Pevsner pevsner@kennedykrieger.org.

Percent of residues (C)

Dis

tan

ce c

uto

ff (

Å)most groups had the right prediction for this structure (but not those at arrow 2)

most groups had the wrong prediction for this structure(but arrow 3 did better)

half the groups got it wrong(arrow 4), half got it right (arrow 5); the key difference is the multiple sequence alignment

2

1

3

4

5

Page 81: Protein analysis and proteomics July 29, 2009 August 5, 2009 Bioinformatics M.E:800.707 J. Pevsner pevsner@kennedykrieger.org.
Page 82: Protein analysis and proteomics July 29, 2009 August 5, 2009 Bioinformatics M.E:800.707 J. Pevsner pevsner@kennedykrieger.org.

Protein structure and human disease

In some cases, a single amino acid substitutioncan induce a dramatic change in protein structure.For example, the F508 mutation of CFTR altersthe helical content of the protein, and disruptsintracellular trafficking.

Other changes are subtle. The E6V mutation in thegene encoding hemoglobin beta causes sickle-cell anemia. The substitution introduces a hydrophobic patch on the protein surface,leading to clumping of hemoglobin molecules.

Page 311

Page 83: Protein analysis and proteomics July 29, 2009 August 5, 2009 Bioinformatics M.E:800.707 J. Pevsner pevsner@kennedykrieger.org.

Protein structure and human disease

Disease ProteinCystic fibrosis CFTRSickle-cell anemia hemoglobin beta“mad cow” disease prion proteinAlzheimer disease amyloid precursor protein

Table 9.5Page 312

Page 84: Protein analysis and proteomics July 29, 2009 August 5, 2009 Bioinformatics M.E:800.707 J. Pevsner pevsner@kennedykrieger.org.

Introduction to Perspectives 3 and 4: Gene Ontology (GO) Consortium

Page 237

Page 85: Protein analysis and proteomics July 29, 2009 August 5, 2009 Bioinformatics M.E:800.707 J. Pevsner pevsner@kennedykrieger.org.

The Gene Ontology Consortium

An ontology is a description of concepts. The GOConsortium compiles a dynamic, controlled vocabularyof terms related to gene products.

There are three organizing principles: Molecular functionBiological processCellular compartment

You can visit GO at http://www.geneontology.org.There is no centralized GO database. Instead, curatorsof organism-specific databases assign GO termsto gene products for each organism.

Page 237

Page 86: Protein analysis and proteomics July 29, 2009 August 5, 2009 Bioinformatics M.E:800.707 J. Pevsner pevsner@kennedykrieger.org.

Page 241

GO terms are assigned to Entrez Gene entries

Page 87: Protein analysis and proteomics July 29, 2009 August 5, 2009 Bioinformatics M.E:800.707 J. Pevsner pevsner@kennedykrieger.org.

Page 241

Page 88: Protein analysis and proteomics July 29, 2009 August 5, 2009 Bioinformatics M.E:800.707 J. Pevsner pevsner@kennedykrieger.org.

Page 241

Page 89: Protein analysis and proteomics July 29, 2009 August 5, 2009 Bioinformatics M.E:800.707 J. Pevsner pevsner@kennedykrieger.org.

The Gene Ontology Consortium: Evidence Codes

IC Inferred by curatorIDA Inferred from direct assayIEA Inferred from electronic annotationIEP Inferred from expression patternIGI Inferred from genetic interactionIMP Inferred from mutant phenotypeIPI Inferred from physical interactionISS Inferred from sequence or structural similarityNAS Non-traceable author statementND No biological dataTAS Traceable author statement

Page 240

Page 90: Protein analysis and proteomics July 29, 2009 August 5, 2009 Bioinformatics M.E:800.707 J. Pevsner pevsner@kennedykrieger.org.

Perspective 3: Protein localization

Page 242

Page 91: Protein analysis and proteomics July 29, 2009 August 5, 2009 Bioinformatics M.E:800.707 J. Pevsner pevsner@kennedykrieger.org.

protein

Protein localization

Page 242

Page 92: Protein analysis and proteomics July 29, 2009 August 5, 2009 Bioinformatics M.E:800.707 J. Pevsner pevsner@kennedykrieger.org.

Protein localization

Proteins may be localized to intracellular compartments,cytosol, the plasma membrane, or they may be secreted. Many proteins shuttle between multiple compartments.

A variety of algorithms predict localization, but thisis essentially a cell biological question.

Page 240

Page 93: Protein analysis and proteomics July 29, 2009 August 5, 2009 Bioinformatics M.E:800.707 J. Pevsner pevsner@kennedykrieger.org.
Page 94: Protein analysis and proteomics July 29, 2009 August 5, 2009 Bioinformatics M.E:800.707 J. Pevsner pevsner@kennedykrieger.org.
Page 95: Protein analysis and proteomics July 29, 2009 August 5, 2009 Bioinformatics M.E:800.707 J. Pevsner pevsner@kennedykrieger.org.

Page 242

Page 96: Protein analysis and proteomics July 29, 2009 August 5, 2009 Bioinformatics M.E:800.707 J. Pevsner pevsner@kennedykrieger.org.

Page 244

Page 97: Protein analysis and proteomics July 29, 2009 August 5, 2009 Bioinformatics M.E:800.707 J. Pevsner pevsner@kennedykrieger.org.

Page 244

Page 98: Protein analysis and proteomics July 29, 2009 August 5, 2009 Bioinformatics M.E:800.707 J. Pevsner pevsner@kennedykrieger.org.

Localization of 2,900 yeast proteins

Michael Snyder and colleagues incorporated epitopetags into thousands of S. cerevisiae cDNAs,and systematically localized proteins (Kumar et al., 2002).

See http://ygac.med.yale.edu for the TRIPLES database including ~3,000 fluorescence micrographs.

Page 243

Page 99: Protein analysis and proteomics July 29, 2009 August 5, 2009 Bioinformatics M.E:800.707 J. Pevsner pevsner@kennedykrieger.org.

Perspective 4: Protein function

Page 243

Page 100: Protein analysis and proteomics July 29, 2009 August 5, 2009 Bioinformatics M.E:800.707 J. Pevsner pevsner@kennedykrieger.org.

Protein function

Function refers to the role of a protein in the cell.We can consider protein function from a varietyof perspectives.

Page 243

Page 101: Protein analysis and proteomics July 29, 2009 August 5, 2009 Bioinformatics M.E:800.707 J. Pevsner pevsner@kennedykrieger.org.

1. Biochemical function(molecular function)

RBP binds retinol,could be a carrier

Page 245

Page 102: Protein analysis and proteomics July 29, 2009 August 5, 2009 Bioinformatics M.E:800.707 J. Pevsner pevsner@kennedykrieger.org.

2. Functional assignmentbased on homology

RBPcould bea carrier

too

Othercarrier proteins

Page 245

Page 103: Protein analysis and proteomics July 29, 2009 August 5, 2009 Bioinformatics M.E:800.707 J. Pevsner pevsner@kennedykrieger.org.

3. Functionbased on structure

RBP forms a calyx

Page 245

Page 104: Protein analysis and proteomics July 29, 2009 August 5, 2009 Bioinformatics M.E:800.707 J. Pevsner pevsner@kennedykrieger.org.

4. Function based onligand binding specificity

RBP binds vitamin A

Page 245

Page 105: Protein analysis and proteomics July 29, 2009 August 5, 2009 Bioinformatics M.E:800.707 J. Pevsner pevsner@kennedykrieger.org.

5. Function based oncellular process

DNA RNA

RBP is abundant,soluble, secreted

Page 245

Page 106: Protein analysis and proteomics July 29, 2009 August 5, 2009 Bioinformatics M.E:800.707 J. Pevsner pevsner@kennedykrieger.org.

6. Function basedon biological process

RBP is essential for vision

Page 245

Page 107: Protein analysis and proteomics July 29, 2009 August 5, 2009 Bioinformatics M.E:800.707 J. Pevsner pevsner@kennedykrieger.org.

7. Function based on “proteomics”or high throughput “functional genomics”

High throughput analyses show...

RBP levels elevated in renal failureRBP levels decreased in liver disease

Page 245

Page 108: Protein analysis and proteomics July 29, 2009 August 5, 2009 Bioinformatics M.E:800.707 J. Pevsner pevsner@kennedykrieger.org.

Functional assignment of enzymes:the EC (Enzyme Commission) system

Oxidoreductases 1,003Transferases 1,076Hydrolases 1,125Lyases 356Isomerases 156Ligases 126

Page 246

Page 109: Protein analysis and proteomics July 29, 2009 August 5, 2009 Bioinformatics M.E:800.707 J. Pevsner pevsner@kennedykrieger.org.

Functional assignment of proteins:Clusters of Orthologous Groups (COGs)

Information storage and processing

Cellular processes

Metabolism

Poorly characterized

Page 247