Download - Introduction Genomics involves study of mRNA expression-the full set of genetic information in an organism contains the recipes for making proteins Proteins.

Introduction

• Genomics involves study of mRNA expression-the full set of genetic information in an organism contains the recipes for making proteins

• Proteins constitute the “bricks and mortar” of cells and do most of the work

• Proteins distinguish various types of cells, since all cells have essentially the same “Genome” their differences are dictated by which genes are active and the corresponding proteins that are made

• Similarly, diseased cells may produce dissimilar proteins to healthy cells

• However task of studying proteins is often more difficult than genes (e.g. post-translational modifications can dramatically alter protein function)

Proteome

• Term was first proposed by an Australian post-doc, Marc Wilkins in 1994

• “Proteome”-the protein complement encoded by a genome

The taxonomy of genomics biology

Proteomics

• Identification of all the proteins made in a given cell, tissue or organism

• Identification of the intracellular networks associated with these proteins

• Identification of the precise 3D-structure of relevant proteins to enable researchers to identify potential drug targets to turn protein “on or off”

• Proteomics very much requires a coordinated focus involving physicists, chemists, biologists and computer scientists

Why Proteomics

• Major challenge-how do we go from the treasure chest of information yielded by genomics in understanding cellular function

• Genomics based approaches initially use computer-based similarity searches against proteins of known function

• Results may allow some broad inferences to be made about possible function

• However, a significant percentage (>30%) of the sequences thus far ascertained seem to code for proteins that are unrelated at this level to proteins of known function

Why Proteomics

• Beyond the genetic make-up of an individual or organism, many other factors determine gene and ultimately protein expression and therefore affect proteins directly

• These include environmental factors such as pH, hypoxia, drug treatment to name a few

• Examination of the genome alone can not take into account complex multigenic processes such as ageing, stress, disease or the fact that the cellular phenotype is influenced by the networks created by interaction between pathways that are regulated in a coordinated way or that overlap

Why Proteomics

• Genomic analysis has certainly provided us with much insight into the possible role of particular genes in disease

• However proteins are the functional output of the cell and their dynamic nature in specific biological contexts is critical

• The expression or function of proteins is modulated at many diverse points from transcription to post-translation and very little of this can be predicted from a simple analysis of nucleic acids alone

• There is generally poor correlation between the abundance of mRNA transcribed from the DNA and the respective proteins translated from that mRNA

• Furthermore, transcript splicing can yield different protein forms• Proteins can undergo extensive modifications such as glycosylation,

acetylation, and phosphorylation which can lead to multiple protein products from the same gene

Proteomics Tools

• The core methodologies for displaying the proteome are a combination of advanced separation techniques principally involving two-dimensional electrophoresis (2D-GE) and mass spectrometry

2D-GE: basic methodology• Sample (tissue, serum, cell extract) is solubilized and the proteins are denatured into

polypeptide components• This mixture is separated by isoelectric focusing (IEF); on the application of a

current, the charged polypeptide subunits migrate in a polyacrylamide gel strip that contains an immobilized pH gradient until they reach the pH at which their overall charge is neutral (isoelctric point or pI), hence prodcuing a gel strip with distinct protein bands along its length

• This strip is applied to the edge of a rectangular slab of polyacrylamide gel containing SDS. The focused polypeptides migrate in an electric current into the second gel and undergo separation on the basis of their molecular size

• The resultant gel is stained (Coomassie, silver, fluorescent stains) and spots are visualized by eye or an imager. Typically 1000-3000 spots can be visualized with silver. Complementary techniques, e.g. immunoblotting allow greater sensitivity for specific molecules.

• Multiple forms of individual proteins can be visualized and the particular subset of proteins examined from the proteome is determined by factors such as initial solubilization conditions, pH range of the IPG and gel gradient

General schematic of 2D-PAGE for protein identification in Toxicology

Sample growth Sample solubilization

Isoelectric focusing (IPG)

2D-PAGE

Image analysisImmunoblot (Western)

Isolation of spots of interest

Trypsin digestion of proteins

MS analysis of tryptic fragments

Identification of proteins

General strategy for proteomic analysis

Nature of IPG determines spot location on 2D-PAGE

Limitations of 2D-GE

• In the large scale analysis of proteomics, 2D-GE has been the major workhorse over the last 20 years-its unique application in being able to distinguish post-translational modifications and is analytically quantitative

• However despite the significant improvements (e.g. immobilized pH gradients) to the technique and its coupling with MS analysis it is still difficult to automate

• Although at first glance the resolution of 2D seems very impressive, it still lags behind the enormous diversity of proteins and thus comigrating protein spots are not uncommon

• This is especially of concern when trying to distinguish between highly abundant proteins e.g. actin (108 molecules/cell) and low abundant like transcription factors (100-1000)-this is beyond the dynamic range of 2D

• Enrichment or prefractionation can often overcome such discrepancies

• Chemical heterogeneity of proteins also presents a major limitation

• Thus the full range of pIs and MWs of proteins exceeds what can routinely be analyzed on 2D-GE. However improvements to IPGs is expected to overcome some of these constraints and greatly imrpove the coverage of the entire proteome of the cell

• Problems liked with extraction and solubilization of proteins prior to 2D-GE present an even greater challenge-especially for extremely hydrophobic proteins, such as membrane and nuclear proteins. Again recent advances in buffer composition has diminished the scale of this problem

Limitations of 2D-GE

Protein identification and characterization

• Specialized imaging software allows for a more detailed analysis of spot identification and comparison between gels, and treatments

• By a process of subtraction, differences (e.g. presence, absence, or intensity of proteins or different forms) between healthy and diseased samples can be revealed

• Cross-references to protein databases allow assignment by known pIs and apparent molecular size. Ultimate protein identification requires spot digestion (enzymatic) and analysis of charge and mass by mass spectrometry (MS)

• Spot cutter tools can be coupled to image analysis tools and in gel tryptic digestion techniques in 96 or 384 well format can greatly reduce the bottle-neck in sample identification by MS

Protein analysis by MS

• Compared to sequencing, MS is more sensitive (femtomole to attomole concentrations) and is higher throughput

• Digestion of excised spot with trypsin results in a mixture of peptides. These are ionized by electrospray ionization from liquid state or matrix-assisted laser desorption ionization from solid state (MALDI-TOF) and the mass of the ions is measured by various coupled analyzers (e.g. time of flight measures the time for ions to travel from the source to the detector, resulting in a peptide fingerprint

• The resultant signature is compared with the peptide masses predicted from theoretical digestion of protein sequences found in databases-identification of protein!

• Tandem MS allows one to obtain actual protein sequence information-discrete peptide ions can be selected and further fragmented, and complex algorithms employed to correlate exp data with database derived peptide sequences

Schematic of MALDI process and instrument

Schematic of a QTOPF instrument

MALDI peptide identification of a protein

MS detection/ sensitivity limits

Assessment of post-translational modification by proteomics

Nature Biotech. 2001 (19) 379-382

General strategy for MS-based Id of proteins and post-translational modifications

Proteomic bioinformatics

• Proteomic analysis requires highly sophisticated bioinformatic tools in not only electrophoretic and MS separation but also in the assignement of physicochemical properties and prediction of potential post-translational modifications and 3D structures

• Databases exist for the protein maps of a broad range of organisms, tissues, and disease states

• Ultimately, given the the dynamic nature of the proteome, complex experimental details and related results need to be extrapolated in the context of the relevant biochemical pathways or disease implications

Initiate database interrogations

Coordinate independent retrieval

Interpretation: Co-occurrence and rank

Re-ranking

Decision: “Novel or previously studied”

Comparison with data generated by genomic analysis

Published access tools for protein ID and databases on the web

Proteomics applications• Pharmaceutical development-functional genomics and proteomics have

generated a plethora of new potential drug targets

• Has increased efficiency in lead optimization and preclinical phases of drug development

• Signature patterns of drug toxicity (on/off, dose response, temporal effects)

• Resultant evaluation of drug toxicity and drug-drug interaction is further enhanced by both procedures e.g. drug toxicity of cyclosporine in mediating nephrotoxicity and liver toxicity of etomoxir-a potential anti-diabetic (2D-GE patterns revealed aberrant protein expression profiles in drug treatment

• Neurological disorders

• Heart disease

• Screening of microbial protein profiles conferring drug resistance

Assessment of acetaminophen toxicity in mouse liver

“Our Approach”

B

A

Comparison of sensitivity of silver (A) and

fluorescence-based SYPRO Ruby (B)stains for protein

detection by 2D-PAGE

Overlay comparing two separate 2D gels (Serum vs SF) demonstrating versatility of

PD Quest software in spot assignment

2D-PAGE examination of Ubiquitination

status of proteins isolated from serum-starved

p53 (+/+) MEF:Triangles identify spots thatare common to both native

gel and immunoblot

Ruby-stain

-Ub

SF

Serum

2D-PAGE analysisof 24h treatment inSerum-free (SF) orSerum controls in P53 (+/+) MEF: Ubiquitination as Monitored by immuno-blotting

MeHg

Lactacystin

Poly-Ub

Poly-Ub

2D-PAGE analysisof 24h treatment with2.5 M MeHg orLactacystin in P53 (+/+) MEF

pH 103MW

MS analysis

“Our Approach”

2D-GE modifications

Prediction of protein Expression via virtual gel

Future developments

• Towards a gel-free approach

• Automation

• More prediction based approaches

• Combinatorial functional genomics