Proteomics: principles and applications · between the principles and applications By Aaser...
Transcript of Proteomics: principles and applications · between the principles and applications By Aaser...
Proteomics between the principles
and applications
By
Aaser Abdelazim, PhD Assistant professor of Biochemistry
Zagazig University [email protected]
2018
What proteomics means?
100,000
proteins 20235 genes
The term “proteomics” was firstly
coined from merging “protein” and
“genomics” in the1990s
DNA (Gene)
RNA
PROTEIN
METABOLITE
Transcription
Translation
Enzymatic
reaction
Genomics/proteomics
Genomics
Metabolomics
Proteomics
Transcriptomics
Data bases statistics
21262
42108
Metagenomic studies Complete sequencing projects
Genome analysis projects
Non-metagenomic studies
ORGANISMS According to Genomes OnLine Database (https://gold.jgi.doe.gov/)
0
1000
2000
3000
4000
5000
6000
7000
16
5585
6343
5544 5559
5129
4679 4390
4043 3761
3520
3086
2417
1665
1040
519 219
59 22 1
# o
f p
rote
om
ics s
tud
ies
Year
Proteomics studies statistics
According to pubmed.gov /2016
Proteomics search statistics 29/10/2015
1
• Gene Expression and gene sequence
2
• Protein sequence data base
3
• Bioinformatics tools and their applications
4
• Microarray era and its applications
Stages of new Biology
What is the difference between proteomics and protein
chemistry?
Differences between protein chemistry and
proteomics
Protein chemistry Proteomics
Individual proteins Complex mixtures of proteins
Complete sequence analysis Partial sequence analysis
Emphasis on structural and
functions
Emphasis on identification by
database matching
Structural biology Systems biology
1
• The level of mRNA not necessarily predicts the level of
corresponding protein.
2
• Once formed, protein differ in stability and go to turn over.
3
• mRNA inform us nothing about the regularity status of
corresponding protein.
If we can measure gene expression why we use
proteomics ?
What is the challenge in proteomic techniques ?
1 • Proteins can not be amplified as genes.
2
• Cannot be hybridized with amino acids .
• Gene can be detected by oligonucleotides sequence.
3
• One gene can gives more than one protein.
• That due to post transitional modifications which varies from status to anther in the cell.
Proteomics in terms
1 2 3 4 5 6
Proteomics strategies
Bottom-up
Single Protein
Peptides
Top-down
Proteins
Proteins
Middle-down
Large peptides
Shotgun proteomics
Target
Depend on
Useful to study proteins modifications
Targets for shotgun proteomics
Data base
(data of all protein expressed )
Mass spectrometry
(protein mass, peptide mass and protein
sequence )
Soft ware
(matches mass data with specific protein
sequence )
Analytical protein separation
(allow investigation to target specific protein for
analysis )
1
2
3
4
Tools of proteomics
What is the fact which proteomics based on?
Peptidome
sample
Tissue
Fluids
Protein
extraction
Filtration
Organic solvents
Protein
separation
Chromatographic
separation
Electrophoresis
Protein
detection
ESI-MS/MS
MALDI-TOF
Data mining
SEQUEST
Mascot
Principles of proteomics steps
Separation Protein mixture Individual Protein
Separation
Digestion Digestion
MS analysis
MS data search
Peptides mixture Peptides
PROTEIN IDENTIFICATION
General flow of proteomics techniques
To answer this , imagine you copy a book on one paper , what is the
results?!
So we tend to increase the accuracy of analysis by decrease the complexity of proteins in mixture
Protein separation
Why we separate proteins rather used as mixture ?
1 D-SDS-PAGE
Reduction /denaturation
SDS-detergent binding
Apply to gel
2D SDS-PAGE
Load the strip
Focusing
Wash, add SDS
reluctant, join to
PAG
PAGE
Protein digestion
1
• Great molecular mass of proteins increase the error in mass measurement of most recent mass instruments.
2 • Not all masses of proteins can be measured.
3
• The sensitivity of measuring masses of intact proteins is less than that of peptides.
Why we don’t make MS simply on proteins ?
Why we digest proteins ?
Protease Cleavage specificity Common proteomic usage
Trypsin -K,R-↑-Z- not -K,R-↑-P- General protein digestion.
Endoproteinase Lys-C -K-↑-Z- Alternative to trypsin for increased
peptide length; multiple protease
digestion; 18O labeling.
Chymotrypsin -W,F,Y-↑-Z- and -L,M,A,D,E-↑-Z- at a slower
rate
Multiple protease digestion
Subtilisin broad specificity to native and denatured
proteins
Multiple protease digestion
Elastase -B-↑-Z- Multiple protease digestion
Endoproteinase Lys-N -Z-↑-K- Increase peptide length; create
higher charge state for ETD.
Endoproteinase Glu-C -E-↑-Z- and 3000 times slower at -D-↑-Z- Multiple protease digestion; 18O
labeling
Endoproteinase Arg-C -R-↑-Z- Multiple protease digestion
Endoproteinase Asp-N -Z-↑-D- and -Z-↑-cysteic acid- but not -Z-↑-C- Multiple protease digestion
Proteinase K -X-↑-Y- Nonspecific digestion of membrane-
bound proteins
Omp T -K,R-↑-K,R- Increased peptide length for middle-
down proteomics
B_ uncharged, non aromatic amino acid----- X_aliphatic, aromatic, hydrophobic amino acid---------------Z_ any amino acid
What we use to digest proteins ?
Success of digestion Staining techniques , acetic acid retard the penetration of trypsin into
gel and increase protein fixation in gel Residual components as SDS and acrylamide inhibit Trypsin .
Advantages
Display good activity in solution or in gel Mass instruments and lab nowadays were familiar to deal with tryptic peptides
Action
Cleavage the peptide bond formed by arginine or lysine (basic) followed by proline at C – terminal
Modifications
With TCPK (Tosylphenylalanylchloromethane) This to inhibits the residual chymotrypsin
Sources
Procaine pancreas Bovine pancreas
Trypsin digestion
In-gel trypsin digestion
• SDS remains one of the best protein solubilizers • It is detrimental to LC-MS peptide sensitivity. • it is a strong chaotropic agent.
(1) Organic solvents
• Commercially available surfactant as ProteaseMAX, Invitrosol, Rapigest, PPS Silent Surfactant.
• Volatile surfactant as Perfluorooctanoic acid and 1-butyl-3-methyl imidazolium tetrafluoroborate.
(2) MS-compatible surfactants
•Selectively cleaves proteins at aspartic acid residues. •Creating complementary peptides of similar length to trypsin digestion.
(3) Microwave heating of proteins under acidic
conditions
• Raising the pressure of proteolytic digestion has also improved the efficiency of proteolytic digestion. (4) Raising the pressure
Improvement of protein digestion
Protein identification
Proteins were identified by use high advanced mass spectrometers
MALDI – TOF
• Matrix assisted laser desorption ionization – time of flight
ESI- Tandem MS
• Elctrospray ionization
Types of mass instruments
The three main components of mass spectrometers
MALDI-TOF
1. Sample/matrix admixture placed on
plate or slide.
2. Laser then fires a beam of light at
target.
3. High energy from laser transferred to
peptides or proteins enable them to
ejected from source to gas phase.
4. Positive ions are usually formed by
accepting of proton when they ejected
from the matrix and these ions are
always of interest.
1. TOF simply measure the time that ions
will take to strike the detector.
2. This depends on ions m/z as greater
m/z the faster they fly.
3. Flying of ions in the TOF tube by this
way lead to poor resolution.
1. Reflectron focuses ions of the similar
m/z and allow them to reach detector
in the same time .
2. This improve the resolution of TOF.
Pros and cons of MALDI
In general there is no PERFECT MS instrument for analytical proteomics but MALDI-TOF deserves very high marks in four important categories.
It is very easy
Compatible with robotic sample preparation this increase the speed and the reproducibility of analyses.
High accuracy and resolution
High sensitivity it can acts on low quantities of peptides (femtomole)
/attomole (10-18 mole)
1 2 3 4
Cons of MALDI
ESI-Tandem MS source
1. Sample in aqueous
solution from HPLC.
2. All peptides
/proteins in acidic
medium (PH < 3.5)
will carry (+)
charges.
3. ESI is always done
in acidic medium .
Usually stainless steel needle
Fine mist of droplets contains
ions and HPLC mobile phase
(H2O, acetonitrile, acetic acid)
Separated peptide ions
Heated capillary Curtain of nitrogen gas
Triple quadrupole
Ion trap
quadrupole –time
of flight (Q-TOF)
ESI-MS analysis of bovine apomyoglobin. The “multicharge envelope” of signals from differently charged
forms of the protein.
ESI –MS spectrum Proteins in general may be single or multiple protonated For example 1. Protein with 20 kDa may accepts 10-30
protons in a solution.
2. If it have 20 protons its m/z will be 20,020/20 =
1001
3. If it have 19 protons its m/z will be 20,019/19 =
1053 and so on.
4. So ESI spectra will appears as so-called
(multicharge envelope) in which all the different
charge states of protein in solution are
represented .
ESI-MS analysis of bovine apomyoglobin. The deconvoluted spectrum indicating a single signal.
Charge –deconvolution algorthorism and software
can convert spectrum to one that represents the
actual protein mass. 16,949.0
ESI –MS spectrum
* amu= atomic mass unit = 1/12 mass of 12C ---------- one amu = 1.66x10-24 gram
4 metal rods magnetic filed
allows ions to flow between rods
According to the voltage, ions with
specific m/z will pass and others not.
Q1, Full scan:
yields signals
for all ions
(Q1): Q1 acts
as a mass
filter.
(q2) Ions
collide in q2
with argon
gas and go
fragmentation.
(Q3) Ions with
specific m/z were
analyzed according
to their m/z by Q3.
Triple quadrupole
Ion trap
Sequential ejection of ions based on their
m/z values.
Here full scan of all ions is occurred.
Collision with helium gas atoms induce the
fragmentation of ions
MS analysis is described here as (rocks in a can)
Automated data acquisition/MS-MS analysis in ESI
points Triple quadrupole MS
analyzer
Ion trap MS analyzer
Analysis technique On the fly On trapped ions
Collision gas Argon gas Helium gas
Ions flow strategy continuously according
to the voltage used.
Sequential according to
their m/z.
Fragmentation Less complete Much complete
Precursor ion signal Prominent Not seen
Mass resolution High Very high
Triple quadrupole Vs Ion trap analyzers
Other Mass Analyzers: Q-TOF MS Instruments.
Q3 in Quadrupole replaced by TOF
Higher mass resolution
Library
building
Virtual
digestion
Database of
sequences
(i.e. SwissProt)
Protein spot
from gel
Trypsin
digestion
MS analysis
and spectrum
generation
Matching
Lib
rary
Peptide identification by peptide mass
fingerprinting
Peptide identification by peptide mass
fingerprinting
This depends on
How we can do Peptide mass fingerprinting?!
to imagine this :
Trypsin Digestion
Total proteome
Peptides of specific
1. Length
2. Sequence
3. Most important Mass
Measured
peptide
mass by MS
Compared to
Matching of peptide mass with peptide mass list generated by virtual digestion
We will do this for all measured masses of tryptic peptides, one by one -------------gives indication about the unknown protein.
Although this is idealized example ….the success of PMF depends on : 1. Accurate measurement of peptide masses.
2. Accurate data base of protein sequence.
Success of peptide mass finger printing depend upon
Accurate measure of peptide mass
(1) Skill to do SDS- PAGE
(2) Skill of trypsin digestion
(3) High accuracy mass instrument
(4) High accuracy HPLC
Accurate data base
Good and strong soft ware To
provide accurate protein sequence
Success of peptide mass finger printing
Peptide mass matches for human Hb-α
Tryptic digestion of human hemoglobin alpha chain yields 14 tryptic peptides, of which the
peptide [VGAHAGEYGAEALER] has an exact monoisotopic mass of 1528.7348 Da.
Thus, the singly charged ion of this peptide has an m/z value of 1529.7348.
The results of searching this peptide against all mouse and human proteins in the SWISS-
PROT data base are illustrated in Table.
# hits= # peptide matches
When we use more stringent mass tolerance the results will be two proteins identified.
BUT THE QUESTION IS How we can Identify the right protein from these very similar matches?!
The answer: is the increase the number of peptides in the search (use multiple peptide matches)
Instead we search with ONE peptide we can search with TWO or THREE peptides
•The peptides in the sample are very huge to be calculated by human mind so always we
need soft ware for help.
•Several data base engines were used for identification e.g. SWISSPROT/ OWL/ NCBInr.
Soft ware for peptide mass finger printing
Data feeding
Peptide sequence analysis
Let us study the tandem mass analysis of the peptide AVAGCAGAR
Peptide bond
(C) (N)
Residue 72+99+ 71+57+103+71+57+71+168 =769
Average mass of each amino acid
1) Many bonds in the peptide subjected to collision by neutral gas but the most significant cleavages are along
peptide backbone .
2) Each peptide cleaved into y ions (in which + charge retained on the C terminus ) and b ions (in which + charge
retained on N terminus).
3) Other many cleavages are also appear but they are unusual as they need high much energy (like a, z, c and x).
Peptide ion fragmentation:
Peptide cleavage
Mass can detect this by making several cleavages of b and y series when you subtract
the mass of two successive series it will give you the mass of amino acid in between for
example the gap between y7 and y6 will give you 71 amu which is mass of Alanine and
so on
Bioinformatics tools in proteomics
BIOINFORMATICS Many programs and software are used to analyze the data obtained by MS
http://integratedproteomics.com/
ProteoIQ (http://www.bioinquire.com/)
Scaffold (http://www.proteomesoftware.com/)
MaxQuant (http://maxquant.org/)
Transproteomics Pipeline (http://tools.proteomecenter.org)
Proteome-Discoverer (www.thermoscientific.com)
pFind Studio (http://pfind.ict.ac.cn/)
Applications of proteomics
Applications of
proteomics
Protein expression
profiling Protein net
work mapping Mapping of
protein modifications
Protein mining Means simply identification of all proteins in sample
Identification of proteins in sample in particular sate of organism or cell as (disease, drug, exposure to chemicals or physical agents….etc)
Determine how proteins interacts with each other in living systems. Such as signal transduction cascades, complex biosynthetics , degradation pass ways.
Determine how and where protein were modified post transitional modification govern the targeting structure , function and turnover.
1- Protein mining (means discovering of proteins)
1. No cell in any organism contains all proteins encoded by its genes all together at one time.
2. In many tissues of higher organisms different genes and proteins are expressed as a common(which perform
vital functions) however others are expressed specially in close relation to the tissue e.g. contractile proteins
expressed in muscles, photoreceptors are express proteins as a result of light stimulation
3. However many proteomes are changed (posttranstional modification)or may be present extra cellular (CSF,
blood, urine..) as a results of many diseases.
4. Detection of these proteins can be used as a marker for diagnosis listed in the text
MS detections
Body parts
Protein markers Mining
Tissue
Abnormal proteins
detections Extra
expression
Fluids
Protein detection
2- Protein expression profiling
1. Biochemical studies revealed that the proteome of the cell is
continually changed, this changing is essential for life.
2. Other changes are induced by chemicals, environment, drugs, growth
and disease process.
3. The changes are used to study complicated pathologies like cancer .
4. So Protein profiling task is to measure the proteome in two or more
samples then compare them.
A B
C
Advantages of Protein profiling
Collection of cell for proteome analysis
1. Each type of selected cell should be representative for the proteome which we hope to study (kidney cells
are many but we should select one cell type )Many techniques are used to collect different cell types e.g
laser-captureMicrodissection which enable collection of unique type of cells
2. The problems of proeome analysis during collection of cells include
A. Proteome change due to stress condtions induced during cell manipulation which may produce
ROS which lead to protein oxidation.
B. Activation of internal proteases during cell hemogenization which lead to fragmentation of protein
C. Prepration,fixation and preservation manner of tissue samples e.g formaldehyde can prevent
protein digestion.
Mining approach
Use of proteomics as a tool for detection of new biomarkers
"Diabetes Associated Proteins Database (DAPD)" has been developed to link the diabetes
associated genes, pathways and proteins. The current version of DAPD has been built with
proteins associated with different types of diabetes. DAPD is open accessed via following URL:
www.mkarthikeyan.bioinfoau.org/dapd.
(1) Diabetes mellitus
References Diabetes type Proteomics platform
(Alkhalaf et al., 2010) 2 CE-MS
(Hsu et al., 2015) DN MS-based proteomics
(Sung et al., 2015) Diabetes complications MS –based proteomics
(Vujosevic et al., 2015) Diabetic retinopathy MS-based proteomics
(Preil et al., 2015) Cardiovascular diseases
associated with type 2
diabetes
LC/MS/MS
(Bhatt et al., 2015) 1 LC/MS/MS
Proteomics/diabetes studies
Malignancies
Therapy, Diagnosis and Prognosis
Genetic
instability
Impact of
progression
Impact of
prognosis
impact of
genesis
PROTEOMICS/ONCOPROTEOMICS
Proteomics/malignancies studies
Data search for proteomics/ malignancies
Oncoproteomics
(1) Leukemias
References Proteomics/Leukemias
(Baczek, 2014). Early diagnosis of leukemias
(Roboz and Roboz, 2015). Identification of therapeutic targets
Identification of biomarkers, drug discovery for leukemias
(Lopez Villar et al., 2015) Early diagnosis, management and development of
personalized treatment of acute lymphoblastic leukaemia (ALL)
(Aasebo et al., 2015) Identification of protein biomarkers in acute myeloid leukemia (AML)
(Priola et al., 2015). Studying the changes in CSF proteome in patients with
ALL
(Singh et al., 2015) Diagnosis of chronic lymphocytic leukemia (CLL)
(2) Gastrointestinal cancers
References Gastrointestinal cancers/proteomics
(Subbannayya et al., 2015). Many proteins show alterations in gasteric cancers.
(Shevchenko et al., 2014). Low molecular weight proteins detected as a strong markers for gasteric cancer.
(Abramowicz et al., 2015). Differentiation between locally advanced and metastatic cancer types.
(Subbannayya et al., 2015) Proteins markers for gasteric cancer.
(Lee et al., 2014). Complement factor b (CFB) as a candidate
serologic biomarker for pancreatic cancer diagnosis.
References Data obtained
(Dun et al., 2015) Identification of Potential Biomarkers and Therapeutic
Targets for Brain Metastasis.
(Thomas et al., 2015) Detection of proteomes with specific functional
significance in breast cancer.
(Sjostrom et al., 2015) Identification of proteins as potential biomarkers for
breast cancer recurrence.
(Lawrence et al., 2015) The genome and proteome complementary information
as powerful engine for therapeutic discovery.
(Edwards et al., 2015) Proteomic investigations of breast cancer.
(Milioli et al., 2015) Comparative proteomics of primary breast carcinomas
and lymph node metastases.
(Kern et al., 2015) Identification of Proteins in normal and malignant
mammary epithelial cells.
(3) Breast cancer
(4) Hepatocellular carcinoma
References Data obtained
(Reis et al., 2015) Novel and reliable protein biomarker in panel based
differential diagnostics of liver tumors.
(Goncalves Lda et al., 2014) Compare the salivary proteome to detect HCV.
(Tit-Oon et al., 2014) Comparative secretome analysis of
cholangiocarcinoma.
(Yin et al., 2015) Screen the changes in site-specific proteins in cirrhosis
patients.
(Borlak et al., 2015) Detect novel molecular pathogenesis liver cancer
(Shen et al., 2015; Zhang et
al., 2015a).
Detect Transcripted proteins in metastatic and non
metastatic liver cancers.
(Shao et al., 2015). Investigation of cirrhosis (CIR) and HCC based on a
urinary proteomics techniques.
3- Protein net work mapping
1. Proteins “work together” by actually binding to form multicomponent
complexes that carry out specific functions (protein-protein interaction)
3. Biochemists have come to appreciate that essentially all proteins bind to
or interact with at least one other protein.
Advantages of Protein network mapping
4- Mapping of protein modifications
1. Finding all modifications on a single protein Identified by examining the
measured mass and fragmentation spectra.
2. Proteome wide scanning of modifications.
The Future of Proteomics
Great potential for medical and
biological advances.
Rapid analysis of thousands of unique
proteins.
Complement to genome data.
1. Disease diagnostics.
2. Rationally designed drugs.
Aaser Abdelazim