Post on 19-May-2020
1
Development of Sensitive High Performance Analytical Methods for the
Comprehensive Characterization of Proteins and Glycoproteins from Samples of
Clinical and Biopharmaceutical Importance
A dissertation presented
by
Dipak A. Thakur
to
The department of Chemistry and Chemical Biology
In partial fulfillment of the requirements
for the degree of
Doctor of Philosophy
in the field of
Chemistry
Northeastern University
Boston, Massachusetts
June 2011
2
Development of Sensitive High Performance Analytical Methods for the
Comprehensive Characterization of Proteins and Glycoproteins from Samples of
Clinical and Biopharmaceutical Importance
by
Dipak A. Thakur
ABSTRACT OF DISSERTATION
Submitted in partial fulfillment of the requirements for the degree
of Doctor of Philosophy in Chemistry in the Graduate School of
Arts and Sciences of Northeastern University, June 2011
3
ABSTRACT
This thesis focuses on the development of ultra sensitive high resolution
analytical methods for the characterization of proteins and glycoproteins from samples of
clinical and biopharmaceutical origin. In the first instance the combination of laser
capture micro dissection (LCM) for the selective enrichment of homogenous but low
number cell populations in combination with down-stream porous layer open tubular
column (PLOT) liquid chromatography-mass spectrometry (LC-MS) using both one- and
two-dimensional separations is described. The second portion of the thesis describes the
ultra high performance analysis of intact recombinant a-human chorionic gonadotrophin
glycoforms using capillary electrophoresis with accurate mass high resolution Fourier
transform ion cyclotron resonance mass spectrometry (CE-FTMS).
In Chapter 1 an overview of current analytical methods and technologies applied
in the field of proteomics is discussed. A critique of these technologies is also performed
laying down the foundations for the developments and improvements in current state-of-
the-art as presented in the subsequent Chapters.
In Chapter 2 the development of a micro-proteomic workflow for the
comprehensive analysis of just 10,000 cells, collected by LCM, from invasive and
metastatic epithelial cell types from a breast cancer patient is described. To minimize
sample loss the development of an efficient sampling handling approach was necessary.
To achieve this protein level separation and subsequent enzymatic digestion of the cell
lysate was performed using short distance SDS-PAGE separation on tricine-PAGE gels.
By combining this sample clean-up and fractionation approach with ultrasensitive 1D
PLOT LC-MS in excess of 1,000 proteins were identified following injection of just
4
1/10th
of the digested lysate or approximately 1,000 cells. The micro-proteomic workflow
is highly suited for the comparative analysis of such small but highly informative LCM
collected cell populations, more than 100 proteins were found to be differentially
expressed thereby facilitating a deeper understanding of the associated biological changes
associated with the invasive to metastatic transition.
In Chapter 3 the application of an online 2D-RP/SCX/SPE/PLOT LC-FT-MS micro-
proteomics platform is presented for the comparative proteomic analysis of LCM
collected normal and triple negative breast cancer cell population. Using the effective
sample handling approach described in Chapter 2 followed by fractionation and ultra
sensitive analysis of the lysate, the tryptic digest corresponding to 4,000 cells using the
2D-RP/SCX/SPE PLOT LC-FT-MS platform in excess of 15,000 unique peptides
corresponding to 4,259 proteins were identified. This deep proteome coverage further
emphasizes the utility of the developed micro-proteomic platform for the analysis of trace
quantities of proteins generated from small but highly biologically important LCM
enriched cell populations.
In chapter 4 the development and application of a high resolution CE-FTMS method for
intact glycoform profiling of recombinant α-human chorionic gonadotrophin is described.
The CE separation parameters used allowed for the rapid analysis, 60 different glycoforms bearing up to nine sialic acids in addition to other
glycoforms differing by the number and extent of uncharged monosaccharides. A low
volume pressurized liquid junction, which preserves the high resolution of the CE
separation, was used to interface the CE system with high resolution FTMS thereby
allowing accurate determination of charge state and accurate mass of each intact
5
glycoform following deconvolution. In addition to the intact glycoform, profiling analysis
of glycopeptides and glycans was also performed to determine and assign the population
of oligosaccharides present at each individual glycosite, thereby facilitating complete and
comprehensive characterization of r-ahCG. The methodology developed in Chapter 4 was
further applied to the analysis of r-αhCG from different expression systems, CHO and
murine cell based. The CE-FTMS method is readily applicable for characterization of
drug substance/product as well as in process monitoring of these complex glycoforms.
6
ACKNOWLEDGEMENT
I want to express my sincere and heartfelt gratitude to many people, teachers, colleagues
and friends, who have helped me in reaching this milestone.
First, I would like to acknowledge my thesis advisor, Professor Barry L. Karger, for
accepting me as his student and giving me an opportunity to work in his research group.
His guidance was constructive and aimed at bringing best out of me as a scientist and a
person. Importantly, I was inspired and motivated by his wisdom, enthusiasm and
commitment to highest standards.
I would like to thank Dr. Tomas Rejtar for devoting his time and energy while guiding
me on various projects. I would like to appreciate Dr. Marina Hincapie, Dr. Andras
Guttman, Dr. Billy Wu, Dr. Shujia Dai, Dr. Sanwon Cha and Dr. Jonathan Bones for
sharing their knowledge and expertise.
I would like to thank my dissertation committee members, Prof. Paul Vouros, Prof.
Graham Jones and Prof. Roger Giese for their time, suggestions and guidance.
Many thanks to Dr. Buffie Clodfelder-Miller (Cellular and Molecular Neuropathology
Core, University of Alabama), Elizabeth Richardson, Shemeica Binns, Sonika Dahiya
and Dennis Sgroi (Massachusetts General Hospital) for providing precious LCM
samples. I would like to thank our collaborators N.Washburn, C.J. Bosques, N.S.Gunay,
Z.Shriver, and G.Venkataraman (Momenta Pharmaceuticals) for supporting glycoform
profiling project and for their full contribution towards the glycan analysis.
I would like to acknowledge the support and friendship of current and former researchers
of Barnett Institute, Dr. E.Moskovets, Dr. Vickor Andreev, Dr. Quanzhou Luo, Dr.
7
Guihua Yue, Mr. Laxmi Manohar Akella, Dr. Claudia Donnet, Dr. Enrique Avarelo, Dr.
Zoltan Sabo, Dr. Jim Glick, Somak Ray; previous and current graduate students lingyun
Li, Ye Gu, Dongdong Wang, Majlinda Kulloli, Agnes Rafalko, Jonna Linholm-Ventola,
Jack Liu, Chen Li, Peter Li, Chris Morgan, Vaneet Sharma, Rose Gathungu, Joshua
Klaene and Fateme Tousi.
I would like to express my gratitude to Jeffrey Kesilman, Felicia Hopkins, Richard
Pumphrey, Andrew Bean, Jana Volf and Bill O,Neil for their support.
I would like to acknowledge my wife, Vaishali, daughter Radhika, and son Hrishikesh for
their love, support, sacrifice and compromise during 5 long years. Many many thanks to
my parents, Sudha and Arjun Thakur, for their support, encouragement and care. I would
like to thank my brother, Ganesh and his family, for supporting, guiding and encouraging
me during my graduate studies. I would like to express my gratitude to my sister Jyoti
and her family for their support and encouragement.
8
TABLE OF CONTENTS
ABSTRACT………………………………………………………………………. 3
ACKNOWLEDGEMENT………………………………………………………… 6
TABLE OF CONTENTS………………………………………………………….. 8
LIST OF FIGURES.…………………………………………………………….…..14
LIST OF TABLES……………………………………………………………..……16
LIST OF ABBREVIATIONS AND CONVENTIONS…….……………………….16
Chapter 1: Overview of Technologies and Methodologies for Proteomics
Analysis…………………………………………………………………………..…19
1.1 Introduction………………………………………….…….…………………….20
1.1.1 Proteomics: An Overview………………………………………………….….20
1.2 Shotgun Proteomics Methodologies…………………………………………..…23
1.2.1 Samples………………………………………………………………………...25
1.2.1.1 In Vitro Sample Source: Cell lines…………………………………….….....25
1.2.1.2 In Vivo Sample Sources…………………………………….……………..…26
1.2.2 Tissue Microdissection………………………………….……………………...28
1.2.2.1 Laser Capture Microdissection………………………….……………….…...30
1.2.2.2 Laser Microbeam Microdissection (LMM) ….......................................... 32
1.2.2.3 Comparison of LCM and LMM……………………………..……………….33
1.2.3 Sample Preparation……………………………………………..………..….…34
1.2.3.1 SDS-Polyacrylamide Gel Electrophoresis (SDS-PAGE) ……….……...…..36
1.2.4 Separation Techniques…………………………………………………….…...38
1.2.4.1 High Pressure Liquid Chromatography…………………………………..... 38
1.2.5 Mass Spectrometry…………………………………………………..………..40
1.2.5.1 Ionization Methods……………………………………………………….. 40
1.2.5.2 Mass Analyzers………………………………………………………….. 42
1.2.5.3 Database Searching Tools for Proteomics……………………………….. 47
1.3 Microproteomics………………………………………………………….. 54
1.3.1 Alternative strategies for protein digestion………………………………….. 56
1.3.1.1 Solvents based approach………………………………………………….. 56
1.3.1.2 Cleavable surfactant……………………………………………………….. 57
9
1.3.1.3 Filter-Aided Sample Preparation (FASP) ……………………………….. 59
1.3.2 High Performance Liquid Chromatography for Microproteomics………….. 61
1.3.2.1 Peak Capacity………………………………………………………….. 61
1.3.2.2 Narrow-bore column and ESI-MS……………………………………….. 64
1.3.2.3 Porous Layer Open Tubular (PLOT) Columns…………………………….. 66
1.4 Protein Glycosylation Analysis……………………………………………….. 71
1.4.1 Intact Glycoprotein Analysis……………………………………………….. 73
1.4.1.2 Capillary Electrophoresis………………………………………………….. 73
1.4.1.3 Capillary Electrophoresis Coupled to Mass Spectrometry……………….. 77
1.4.1.4 Application of CE-MS for Analysis of Intact Glycoforms……………….. 80
1.4.2 Glycan analysis………………………………………………………….. 81
1.4.2.1 Glycan release methods………………………………………………….. 82
1.4.2.2 Enzymatic Sequencing of Oligosaccharides…………………………….. 82
1.4.2.3 HPLC analysis of glycans……………………………………………….. 85
1.5 References……………………………………………………………….. 89
Chapter 2: Proteomic Analysis of 10,000 Laser Captured Microdissected Breast
Tumor Cells Using Short Migration on SDS-PAGE and Porous Layer Open
Tubular (PLOT) LC-MS…........………………………………………….. 101
ABSTRACT……………..………………………………………………….. 102
2.1 Introduction……….…………………………………………………….. 104
2.2 Experimental Section………………………………………………………….. 106
2.2.1 Chemicals………………….………………………………………….. 106
2.2.2 Clinical Specimens………………………………………………………….. 106
2.2.3 Laser Capture Microdissection…………………………………………….. 107
2.2.4 Cell Lysis, SDS-PAGE and In-Gel Digestion……………………………….. 107
2.2.5 Nano LC-ESI-MS with 10 µm i.d. PLOT Column………………………….. 108
2.2.6 Protein Identification……………………………………………………….. 109
2.2.7 Identification of Differentially Abundant Proteins by Spectral Counts...…….110
2.2.8 Reproducibility of Replicate Analyses of Metastatic and Invasive Breast
Cancer Samples. ………………………………………………………………….. 111
2.2.9 Gene Ontology Annotation with DAVID (Database for Annotation,
Visualization and Integrated Discovery)………………………………………….. 111
2.3 Results and discussion……………………………………………………….. 112
10
2.3.1 Overview of Proteomic Workflow………………………………………….. 112
2.3.2 Cell Lysis and Protein Extraction from the LCM Cap…………………….. 113
2.3.3 Short SDS-PAGE Run for In-Gel Digestion……………………………….. 114
2.3.4 Online PLOT/LC-ESI-MS……………………………………………….. 114
2.3.5 Proteomic Analysis of Three Replicates of 10,000 Breast Cancer Cells…….. 118
2.3.6 Identification of Differentially Expressed Proteins………………………….. 119
2.3.7 Gene Ontology Analysis………………………………………………….. 121
2.4 Conclusions………………..…………………………………………….. 125
Addendum to Chapter 2………………………………………………………….. 127
Evaluation of Short SDS-PAGE Separation Distance for Sample
Preparation of Small Protein Amounts Prior to LC/MS Proteomic Analysis…….. 127
2.1A Methods and Materials……………………………………………………….. 127
2.1.1 Chemicals…………….……………………………………………….. 127
2.1.2 SDS-PAGE Separation and In-Gel Digestion……………………………….. 127
2.1.3 LC-MS/MS Analysis……………………………………………………….. 130
2.1.4 Protein Identification……………………………………………………….. 130
2.2A Results…………………………..…………………………………….. 131
2.3 Reference……………………………………………………………….. 132
Chapter 3: Comparative Proteomic Analysis of 10,000 Triple Negative
Breast Cancer and Normal Mammary Epithelial Laser Microdissected
Cells Using On-line 2D RP-SCX/Porous Layer Open Tubular Column
(PLOT) LC-MS…………………………………………………………….. 134
Abstract………………………….………………………………………….. 135
Introduction…………….…………………………………………………….. 136
2. Materials and Methods………………………………………………………….. 140
2.1. Chemicals and Materials……………………..……………………………….. 140
2.2. Laser Capture Microdissection……………….……………………………….. 140
2.3. Protein Extraction and Digestion…………………………………………….. 141
2.4. Column Preparation and Two-Dimensional Separation………………………. 142
2.5. MS Analysis and Data Analysis…………………………………………….. 145
2.6. Spectral Index (SpI) for Identification of Differentially Abundant Proteins….. 146
11
2.7. Gene Ontology by DAVID (Database for Annotation, Visualization
and Integrated Discovery) a Functional Annotation Clustering Tool……….. 147
2.8 Gene Set Enrichment Analyses (GSEA) for Functional Significance
of Differentially Abundant Proteins………………………………………….. 147
3. Results and Discussion………………………………………………………….. 148
3.1 Experimental and Bioinformatics Workflow for Proteomic Analysis of
10,000 LCM Collected Normal and Cancer Breast Epithelial Cells. ……….. 148
3.2. Peptide and Proteins Identification…………………………………………... 150
3.3. Spectral Index Analysis for Determination of Differentially Abundant
Proteins. ……………………………………..…………………………………….. 152
3.4 DAVID Functional Annotation Analysis of Differentially Abundant Proteins…154
3.5 Gene Set Enrichment Analyses (GSEA) for Canonical Pathway Analysis….. 156
Conclusions………….……………………………………………………….. 160
References…………….…………………………………………………….. 162
Chapter 4: Characterization of the Intact α- Subunit of Recombinant Human
Chorionic Gonadotropin Glycoforms by High Resolution CE-FT-MS*…….. 165
Abstract………………….………………………………………………….. 166
4.1 Introduction……………………….…………………………………….. 167
4.2 Experimental…………………………….……………………………….. 171
4.2.1 Recombinant r-αhCG ……………………………………………………….. 171
4.2.2 Chemicals………………………………….………………………….. 171
4.2.3 CE-MS System………………………………………………………….. 172
4.2.4 Deglycosylation and Analysis of Released Glycans……………………….. 176
4.2.5 Trypsin Digestion of r-αhCG Expressed in a Murine Cell Line…………….. 177
4.2.6 LC-MS Analysis of r-αhCG Tryptic Digest……………………………….. 177
4.2.7 Data Analysis………………………………………………………….. 178
4.3 Results and Discussion……………………………………………………….. 180
4.3.1 Intact Protein Analysis……………………………………………………….. 180
4.3.2 Repeatability of the Intact Protein Separation……………………………….. 185
4.3.3 Analysis of the Released Glycans………………………………………….. 188
4.3.4 Glycopeptide Analysis……………………………………………………….. 199
4.3.5 Analysis of Combined Data………………………………………………….. 202
4.3.6 Analysis of r r-αhCG Expressed in CHO Cell Culture…………………….. 214
12
4.4 Conclusions…….……………………………………………………….. 217
4.5 References ………………………………………………………………….…...219
Chapter 5: Summary and Future Directions…………………………………. 221
13
LIST OF FIGURES
Chapter 1
Figure 1.1 Conceptual organization of proteomic experiments………………... 22
Figure 1.2 Human islet protein reference map……………………………………... 23
Figure 1.3.The principles of laser capture microdissection (LCM) …………….... 31
Figure 1.4 Common matrices used in MALDI mass spectrometry…………….... 41
Figure 1.5 Operational principle of the FTICR…………………………………... 45
Figure 1.6 Cutaway view of the Orbitrap mass analyzer……………………………47
Figure 1.7 Low energy collision induced dissociation of peptide………………... 48
Figure 1.8 Mobile Proton Theory………………………………………………... 49
Figure 1.9. Illustration of effect of concentration of analytes and flow
rate on ESI processes………..................................................................…... 63
Figure 1.10 Comparison of normal flow rate electrospray vs. a lower
flow rate electrospray. ……………………………………………………………... 65
Figure 1.11 Schematic diagram of the low dead volume connections
used to design 1D and 2D SPE-PLOT system……………………………………... 67
Figure 1.12 Diagram of the advanced on-line 2-D SCX/PLOT/MS system using
a 3.2 m* 10 µm i.d. PLOT column and an online triphasic trapping column…….. 68
Figure 1.13 Chemical diversity of glycans………………………………………... 72
Figure 1.14 Electric double layer at the capillary wall and creation of EOF.......... 75
Figure 1.15 Different types of CE/MS interfaces…………………………………. 78
Figure 1.16 CZE-ESI-MS analysis of a recombinant human EPO. …..………….. 81
Figure 1.17 Exoglycosidases commonly used to determine the structure
of the N-glycans……………………………………………………………………. 84
14
Chapter 2
Figure 1. Shotgun proteomic workflow for the analysis of 10,000 LCM collected
breast cancer cells collected from breast tumor and lymph node tumor…………...113
Figure 2. Optimization of LC-MS parameters……………………………………….115
Figure 3. Assessment of the variability in proteomic profiles associated
with three replicate runs each of invasive and metastatic breast cancer
samples (three samples of 10,000 cells each)…………………..……………….. 120
Figure S1. Selection of gel type and SDS-PAGE separation distance for
proteomic analysis of small sample amounts……………………………………. 129
Chapter 3
Figure 1. Shotgun proteomics workflow to analyze breast epithelial
cells collected from normal and triple negative breast tumor epithelium……….... 148
Figure 2. Peptide and protein identifications from 6 salt steps……………... 150
Figure 3. Peptide and protein identifications in the six samples. …………………..151
Figure 4. Participants of cell cycle (G1-S Phases) were significantly
enriched in triple negative breast cancer (TNBC) cells……………………... 157
Figure 5. Structural molecular organization was significantly deficient
in triple negative breast cancer (TNBE)…………………………………….. 159
Chapter 4
Figure 1A Diagram of CE-MS system for analysis of intact glycoproteins………. 172
Figure 1B. Photograph of CE system coupled to LTQ-FTMS for
analysis of intact glycoproteins…….………………………………………. 175
Figure 2 Illustration of the separation resolution of CE-MS analysis
of intact α-hCG derived from a murine cell line………..……………………. 181
Figure 3A. CE-MS separation of r-αhCG produced in a murine cell line……….. 182
Figure 3B CE-MS separation of r-αhCG produced in a murine cell line……….. 183
Figure 4: Chromatograms and fragmentation spectra of glycan analysis……….. 189
Figure 5: LC/MS/MS analysis of sulfated and α-galactose containing N-glycans.. 190
Figure 6: Exoglycosidase characterization of
galactose-α-galactose-containing species……………………………………….... 191
Figure 7. CE-MS separation of r-αhCG produced in a CHO cell line….…... 214
15
LIST OF TABLES
Chapter 2
Table 1. Number of proteins identified per gel section per sample from
three technical replicates of 10,000 mouse liver cells……………………… 117
Table 2. Number of proteins identified per gel section per sample
from three replicates of 10,000 invasive breast cancer cells……...…………..……119
Table 3. Enriched Gene-Ontology (GO) terms for with FDR less than
5% and P value less than 0.05 are shown in bold………………………….. 123
Table S1. Peptides and proteins identified using three SDS-PAGE
separation conditions……………………………………………………….. 131
Chapter 3
Table 1. Details about normal breast specimens and triple negative breast
cancer specimens……………………………………………………………………141
Table 2. List of differentially abundant proteins between TNBE and BNE……….. 153
Table 3. Representative enriched, functional clusters with corresponding
GO terms for differentially expressed proteins identified by DAVID……………. 155
Table 4. List of the canonical pathways found to be overrepresented in
TNBE samples. ..…………………………………………………………………….156
Table 5. List of the canonical pathways found to be overrepresented
in NBE samples………………………………………………………………………158
Chapter 4
Table 1. Repeatability of peak area measurements for 20 glycoforms on r-αhCG….186
Table 2. Summary table N-linked glycans in r-αhCG……………………………….194
Table 3. Abundance of individual glycopeptides……………………………………200
Table 4. List of theoretical and observed glycoforms ………………………………204
Table 5. Abundance of r- hCG glycoforms produced in CHO cells ………………216
16
LIST OF ABBREVIATIONS AND CONVENTIONS
2D GE Two-dimensional gel electrophoresis
2-AB 2-amino benzamide
CE Capillary electrophoresis
CID Collision Induced Dissociation
CPAS Computational proteomics analysis system
CTC Circulating tumor cells
CZE Capillary zone electrophoresis
DAVID Database for annotation, visualization and integrated discovery
DTA Sequest data files
DTT dithiothreitol
EIE Extracted ion electropherograms
EOF Electroosmotic flow
ESI Electrospray ionization
FASP Filter-aided sample preparation
FDR False discovery rate
FFPE Formalin-fixed paraffin-embedded
FTICR Fourier Transform Ion Cyclotron Resonance
GO Gene ontology
GSEA Gene set enrichment analyses
HILIC Hydrophilic interaction liquid chomatography
17
IAA Iodoacetamide
ICAT Isotope-Coded Affinity Tag
INV Invasive
IPG Immobilized pH gradient
IPI International Protein Index
IR Infra red
IT Ion Trap
iTRAQ Isobaric tags for relative and absolute quantitation
LCM Laser capture microdissection
LMM Laser Microbeam microdissection
LTQ Linear Ion Trap
MALDI Matrix-assisted laser desorption/ionization
MBE Invasive malignant breast epithelial
MCM Minichromosomal maintenance
MET Metastatic
MS Mass spectrometry
NBE Normal breast epithelial
NBE Non-cancerous breast epithelial
NCBI National Center for Biotechnology Information
PALM Pressure assisted Laser microdissection
PGC Porous graphitic carbon
PLOT Porous-layer open-tabular
18
ppb Parts per billion
ppm Parts per million
PRLC Reverse-phase liquid chromatography
PS-DVB Poly Styrene- Divinyl benzene
r-αhCG Recombinant human chorionic gonadotrophin
SCX Strong Cation Exchange
SDS-PAGE Sodium dodecyl sulfate polyacrylamide gel electrophoresis
SILAC Stable isotope labelling by amino acids in cell culture
SPE Solid phase extraction
SpI Spectral index
TNBC Triple negative breast cancer
TNBE Triple negative malignant breast epithelial
TOF Time-of-flight
UV Ultraviolet
Xcorr Cross-correlation score
19
Chapter 1: Overview of Technologies and Methodologies for Proteomics Analysis
20
1.1 Introduction
1.1.1 Proteomics: An Overview
Proteomics[1] offers a complementary approach to genomic technologies by
investigating biological phenomena on the global protein level. The emergence of
mass spectrometric-based proteomic technologies has advanced our understanding of
the complexity and dynamic nature of proteomes, at the same time revealing that no
„one-size-fits-all‟ proteomic strategy can be used to solve all biological problems. Two
technologies have been responsible for the recent, rapid advance of proteomics : first,
the development of new strategies for peptide sequencing using mass spectrometry,
including soft ionization techniques, such as electrospray ionization (ESI) and
matrix-assisted laser desorption/ionization (MALDI); and second, the miniaturization
and automation of liquid chromatography. However, the high expectations on the
potential of proteomics have been slowed with the discovery of huge molecular
complexity and dynamic nature of the proteome, introducing difficulties greater than
those encountered for either genome or transcriptome studies. In particular,
complexities related to splice variants, post-translational modifications (PTM) ,
dynamic ranges covering ten orders of magnitude or more of protein abundance in
plasma, protein stability and dependence on cell type or physiological state have
challenged our ability to characterize proteomes comprehensively in a reasonable time
[2,3,4].
Despite the above challenges, proteomic technologies have already
significantly contributed to the life sciences and are today an integral part of biological
21
research efforts. Currently, the field of proteomics covers diverse research topics such
as, protein expression profiling, analysis of signaling pathways, and protein biomarker
discovery, among others [4]. It is important to be aware that within each area, unique
proteomic approaches need to be applied; these approaches differ widely in their
requirement of skills, difficulty and expense. Based on the objectives, the proteomic
experiments are categorized into either discovery or assay. Proteomic assay
experiments investigate a quantitative change in a small, predefined set of proteins or
peptides, whereas discovery experiments focus on the analysis of large, unbiased sets
of proteins. The measurement of cardiac troponins in human plasma samples is one
such example of an assay experiment [3,4]. An example of the discovery proteomic
experiment is the Human Proteome Organization Plasma Project, which aims to
catalog all proteins and peptides in the human plasma.
The discovery proteomics experiments are divided into comprehensive, broad scale or
focused approaches because these distinctions determine how a biological question is
approached technically. The comprehensive approaches aim at enumerating as many
components of a biological system as possible [5]. Next, broad-scale experiments
target a selected fraction of the expressed proteome, for example, the
phosphoproteome, glycoproteome, etc. The comprehensive and broad-scale
experiments are used to profile qualitative and quantitative changes in the system
taking place as a result of perturbation to a biological system or differences in genetic
background [6,7]. Whereas focused approaches, such as identification of components
of a protein complex, involve co-purification of relatively few interacting proteins and
their analysis, here, the aim is to identify the components of multiprotein complexes
22
and their interaction mechanisms in order to understand physiological and pathogenic
processes. Once components of multiprotein complexes are determined, they are
further monitored using the assay methods to develop therapies [8].
Characterization of a single protein that is isolated from natural or recombinant
sources involves determination of its mass, identity, post-translational modifications
and purity. The comprehensive characterization task draws on decades of experience
in protein chemistry [4]. Figure 1.1 presents a diagram of the various components of
proteomics discovery and assay.
Figure 1.1 Conceptual organization of proteomic experiments. Reprinted from
reference [4].
23
1.2 Shotgun Proteomics Methodologies
Figure 1.2 Human islet protein reference map. The proteins were loaded onto an IPG
strip (pH 3-10) and subsequently separated by mass on a gradient (8-12%) SDS-PAGE
gel. Reprinted from reference [9].
The combination of two dimensional gel electrophoresis and mass
spectrometry (2DE-MS) has traditionally been used to determine changes in protein
identity and protein abundance in a complex protein mixture [10]. Using this
combination, a protein mixture is first separated based on isoelectric point and then by
molecular weight to almost single protein spots, therefore this strategy is sometimes
called the “single protein” method [11]. To identify individual proteins separated by
2DE, the excised gel pieces are subjected to in-gel digestion and subsequent analysis
using tandem mass spectrometry. As this method provides very high resolution, the
visible image of a stained 2D gel is used to observe changes in protein abundance,
http://pubs.acs.org/action/showImage?doi=10.1021/pr050024a&iName=master.img-001.jpg&type=master
24
protein isoform and protein modification [9]. Figure 1.2 shows an example of a
complex 2D gel pattern from a proteome. While powerful, the method is difficult to
automate, is slow to operate and does not work well with highly hydrophobic proteins
[12].
In the past few years shotgun proteomics, introduced by Yates et al. (10) has
replaced conventional 2DE-MS (2-dimensional gel electrophoresis- mass
spectrometry) due to its inherent high throughput capability and its ability to detect
and quantitate more proteins than 2D gel electrophoresis. Shotgun proteomics is a
method, in which the total proteome is digested to peptides, and the resulting highly
complex peptide mixture is separated by one-dimensional or 2- dimensional liquid
chromatography coupled to mass spectrometry (MS). The method consists of four
steps: sample preparation, liquid chromatography, MS and data processing. The results
are interpreted using bioinformatics tools that are rapidly developing [13]. The sample
preparation for proteomic analysis involves multiple steps such as protein extraction,
enrichment, digestion and peptide clean-up. The sample preparation step extracts the
proteins from the biological specimen such as blood, cell lines or tissues. The
extracted protein mixture may be further fractionated to reduce the protein complexity
using chromatographic, electrophoretic or affinity purification procedures. To facilitate
their identification, the proteins are digested with highly specific proteolytic enzymes,
such as trypsin, to generate fragments of suitable mass for MS detection. The digested
peptides are subsequently separated using high performance liquid chromatography
coupled to ESI or MALDI mass spectrometry. Both precursor mass and MS/MS
fragmentation spectra can be used to determine and quantitate the peptides. Generally,
25
the tandem mass spectra, which provide peptide sequence data based on MS/MS
fragmentation patterns, are searched against a specific protein database (e.g. NCBI and
Swiss-Prot[14]) using various algorithms (e.g., Mascot[15] or SEQUEST [16]) to
determine protein identity. The advantage of shotgun proteomics over the 2DE
approach is that the former can analyze hydrophobic membrane proteins as well as
proteins with a broad range of pI or size. In addition, the protein dynamic range which
shotgun method covers can be higher than that covered by the 2DE method [17].
1.2.1 Samples
Cancer is one of the leading causes of death worldwide. In order to develop
treatment for cancer, protein biomarkers, which can be an (1) indicator of presence of
disease, (2) disease reduction or progression, and (3) response to the treatment, are
highly desired. During biomarker discovery proteomic experiments, a variety of
sample sources can be used, such as cell lines, tissues and body fluids.
1.2.1.1 In Vitro Sample Source: Cell lines
Cell lines are routinely used in proteomic studies as they may be easily
manipulated with different chemical additives or physical conditions. Because the
population of cells can be large (as many as 100,000,000 cells), there are no
limitations with respect to the amount of sample available. Cancer cell lines are
extensively studied using quantitative proteomics for:
1) identification of differentially abundant proteins between diseased and normal cells of
the same type,
26
2) identification of pathways associated with specific phenotype. e.g., cancer progression,
3) drug resistance studies, and proteins secreted by cancer cell lines for potential
biomarker discovery [18].
One must, however, always keep in mind that a cell line is a model system that may
or may not represent the in vivo condition [19].
1.2.1.2 In Vivo Sample Sources
Biofluids
In contrast to cell lines, body fluids such as serum[20], plasma[21], saliva[22],
urine, nipple aspirate, cervical –vaginal fluid[23] and exhaled breath condensate[24]
closely represent the in-vivo biological events. Compared to biopsied samples, the
biofluids are easy to collect at low cost using less invasive methods [25,26]. Among
the body fluids, blood, the most common human sample used in diagnosis, is often the
focus for the discovery of protein biomarkers for disease [26,27]. However, the
challenges with analysis of serum or plasma are high complexity of proteome with a
wide dynamic range (at least 10 orders of magnitude[28]) and anticipated low relative
abundance of many disease-specific biomarkers.
Compared to blood, proximal fluids, a body fluid which is close to or in direct
contact with the site of disease, can be an attractive alternative sample type for
biomarker discovery. The proteins or peptides secreted, shed or leaked from diseased
tissue, are likely to be enriched in proximal fluids with respect to both blood and
disease-free control fluid of the same type[29]. The examples of proximal fluids are
urine for bladder and kidney disease, nipple aspirate or ductal lavage for breast cancer,
27
and cerebrospinal fluid for intracranial processes[30]. Evidence of marker enrichment
in proximal fluids was demonstrated with a study of ovarian cancer, where both
ovarian cyst fluid and ascites fluid constituted proximal fluid[31].
Tissue Samples
Compared to blood and proximal fluids, analysis of tissue offers several
important advantages. 1) During the biomarker discovery on tissue samples, the
proteins are studied in their surroundings. 2) The possibility of identifying potential
biomarkers is highest in damaged/diseased tissues as they are likely to be concentrated
in those tissues. Therefore, it makes sense to look for markers in tissue samples due to
their higher concentration and relatively narrower dynamic range of proteins. To
perform the discovery studies, tissue samples can be used either from animal models
or from human biopsied samples. Mouse [32-34] and rat [35,36] are two of the most
widely used animal models for proteomic research, though human biopsies are the
most appropriate samples to study human diseases. However, human biopsied samples
are not as easily available as tissue samples from animal models, and controlled
experiments are clearly much easier to perform on animal models. The biopsied
samples require extra care during their processing and storage. That is, the tissue
specimens are frozen immediately after their excision and stored at -80ºC.
Conventionally, in order to preserve all the biopsied samples and to maintain their
morphology, the samples are fixed in formalin and embedded in paraffin[37]. The
formalin fixation causes cross-linking of the proteins, and the paraffin limits water
contact.
28
Huge collections of formalin fixed and paraffin embedded biopsied samples, are
preserved and last many years [38]. Such samples have a well documented clinical
history of individual patients and are available for prospective analysis. To perform
proteomic analysis of FFPE samples, decross-linking and efficient extraction of
proteins are necessary. To remove paraffin from FFPE tissue blocks, the bocks are
treated with xylene. Further, formalin fixed tissue blocks are boiled in a solution
containing metal ions. This procedure is termed heat induced antigen retrieval [39,40].
High temperature, above 90ºC, is found to be essential to decross-link methylene
bridges between the proteins. The studies performed on FFPE samples, in order to
obtain the comprehensive proteome, use two different approaches. The first is
extraction of intact proteins with SDS and high temperature [41-43]. The
commercialized product, called Qproteome FFPE tissue kit (QIAGEN, Germantown,
MD), uses proprietary chemistry to extract full length proteins for subsequent analysis.
The second approach, a novel approach, is to perform in-solution enzymatic digestion
on FFPE samples, after heat induced decross-linking, to directly obtain peptides for
shotgun proteomic analysis[25]. The commercialized product, based on the later
extraction principle, is called Liquid Tissue-MS protein prep kit (Expression
Pathology, Inc. Rockville, MD).
1.2.2 Tissue Microdissection
The microenvironment of a tumor tissue sample is highly heterogeneous[44].
The pathologist identifies malignant cells based on their differential staining and
morphology. The malignant cells are surrounded by normal-related and other types of
29
cells in the tissue matrix. In order to perform a detailed study of biopsied samples to
gain information on proteomic changes between malignant and normal cells, the
malignant cells need to be separated into a homogeneous population. Tissue
microdissection is an indispensible tool to enrich distinct cell types from
heterogeneous tissue matrix in an efficient and accurate manner. Before the advent of
microdissection, fluorescence-activated cell sorting (flow cytometry)[45] and
magnetic-bead based cell sorting[46] were the methods of choice for cell separation.
However, these methods employed enzymes for breakage of tissue structure, which
may alter or modify the cellular constituents in a number of ways. Microdissection
techniques have the advantage over cleavage that they allow selection of individual
cells under the microscopic inspection of the intact tissue.
Microdissection techniques can be classified into two major classes, manual
microdissection and laser assisted microdissection. Early efforts to dissect specific
cell types from tissue sections used sharp tools such as scalpel blades and needles [47].
The other manual dissection technique called “negative ablation”, as the name
suggests, destroys the unwanted cells surrounding cells of interest and collects the non
ablated cells using the needle [48].
Though, manual microdissection techniques were useful in obtaining a
homogeneous cell population, these methods were slow, tedious and required
considerable expertise to perform. In addition, the manual microdissection techniques
suffered due to issues such as sample handling and contamination. To address these
issues and to perform fast, clean and accurate microdissection, laser based-
30
microdissection technology which includes laser capture microdissection and laser
microbeam microdissection, was developed [49]. Over the years, this technique has
proved to be effective, as more than one thousand research articles have been
presented on the samples procured using this technique [50] .
1.2.2.1 Laser Capture Microdissection
Laser Capture Microdissection (LCM) is a laser based cell procurement
method that was developed in mid 1990s by Emmert-Buck, Liotta and colleagues at
the National Institute of Health (NIH) and designed to perform fast and accurate
microdissection of tissue samples [49,51]. The earliest design of the LCM system was
commercialized by Arcturus Biosciences Inc. (now part of Applied Biosciences), and
later on Leica and PALM introduced a non-contact based LCM system based on a
technique called laser microbeam microdissection.
31
Figure 1.3. The principles of laser capture microdissection (LCM). (a) The scheme of
LCM. (b) Comparison of properly melted polymer spots and poor spots. Only cell
lying within the dark ring of melted polymer will be targeted for LCM. (c) Physical
forces involved in LCM. (d) A single cell bound to the thermolabile polymer.
Reprinted from reference [52].
The principles of contact based LCM technology are shown in Figure 1.3. In
brief, the tissue specimens are first processed by sectioning and staining, and then
examined to identify cells of interest based on their staining and morphology. To
selectively capture cells of interest from the tissue sections, an LCM cap with a
thermolabile polymer membrane is placed on the tissue section. An infrared (IR) laser
is focused through the transparent cap material, heating and melting the membrane,
and thus causing the targeted cells to adhere to the membrane. The cells of interest are
then dissected by lifting the LCM cap away from tissue section. The thickness of the
32
tissue sections (5-15 µm) used for microdissection is critical from an operational point
of view. The tissue thickness
33
Pathology introduced “DIRECTOR” Microdissection slides, which are based on Laser
Induced Forward Transfer (LIFT) Technology utilizing a thin layer energy transfer
coating. Laser energy is transferred to the coating and thus results in evaporation of
the coating. The evaporation of the transfer coating causes the selected feature of the
tissue section to fall into collection tube.
1.2.2.3 Comparison of LCM and LMM
Using the older version of Arcturus LCM instrument, any material adhering to
the LCM cap was collected. This type of nonspecific collection of loose material from
tissue specimen is a potential source of contamination. To overcome this issue,
Arcturus introduced a newer design of LCM caps in which the cap remains slightly
away from the tissue specimen, allowing collection of only cells which are in contact
with the melted thermolabile membrane. In addition, to avoid contamination, such as
keratin and loose tissue material, sticky “prep strips” can be used [12]. In contrast to
LCM, the primary source of contamination in LMM is fine tissue material resulting
from laser ablation of the edges of targeted cells/tissue areas [58].
In case of LCM, the tissue preparation procedure which includes slide
selection, tissue staining and dehydration, and microscopic evaluation of tissue
specimen has to be strictly followed in order to obtain effective microdissection.
Whereas, in case of LMM, the tissue preparation is less complicated than LCM, and
parameter such as tissue thickness is more flexible.
The LCM, contact-based microdissection technique, is advantageous compared
to LMM, since the cells collected on the thermolabile membrane can be easily viewed
34
under the microscope for their homogeneity. In LMM, a thin polyethylene naphthalate
(PEN) membrane is required between the glass slide and the tissue section; otherwise,
the catapulted cells might pulverize to debris in the collection tube. Thus the collected
cells remain relatively intact and can be visualized.
One example of an application of LCM is to investigate the molecular basis of breast
tumor formation. This disease is not clearly understood due to difficulties encountered
while studying the early stages of disease progression. The breast cancer progression is
a multistep process, involving the premalignant stage of atypical ductal hyperplasia
(ADH), the preinvasive stage of ductal carcinoma in situ (DCIS), and the potentially
lethal stage of invasive ductal carcinoma (IDC)[59]. The obstacles in studying breast
cancer disease lesions are complexity and heterogeneity of tissue and microscopic size
(
35
a single protein, this approach is suitable as it requires minimal sample preparation.
Chapter 4 of this thesis describes glycoform profiling of recombinant α-human
chorionic gonadotropin (α-hCG) using high resolution capillary electrophoresis
coupled with high mass resolution FT-MS [61].
In the bottom-up approach, there are two ways to convert proteins extracted
from biological specimens to peptides which are suitable for mass-spectrometry (MS)
based proteome analysis. The first solubilizes the proteins with detergents and
separates the proteins by sodium dodecyl sulfate (SDS) polyacrylamide gel
electrophoresis. The proteins trapped by the gel are subjected to enzymatic digestion,
i.e., “in-gel” digestion. The second sample preparation method is detergent-free, as it
uses strong chaotropic reagents such urea and thiourea for cell lysis, protein extraction
and solubilization. The enzymatic digestion of the proteins in the presence of
denaturing reagents is termed “in-solution” digestion.
The in-gel digestion method is advantageous over in-solution digestion due to
the absence of most impurities which could interfere with digestion; however, the gel
may limit peptide recovery. On the other hand, in-solution digestion can be more
readily automatable and can minimize losses associated with sample handling.
However, the use of chaotropes may result in incomplete solubilization of the
proteome, and digestion may be impeded by interfering substances.
36
1.2.3.1 SDS-Polyacrylamide Gel Electrophoresis (SDS-PAGE)
The mobility of proteins during gel-electrophoresis depends upon the following
factors:
1) electric field strength,
2) total charge on the molecule,
3) size and shape of the molecule and
4) ionic strength of the buffer and properties of the gel matrix through which the
molecules are migrating.
The polyacrylamide gel matrix is in extensive use for protein prefractionation
[62]. Gel matrices act like a molecular sieve, and their sieving function depends on the
mesh size of the gel. The polyacrylamide gels are synthesized by the polymerization of
acrylamide monomers into long chains and the reaction of these chains with
bifunctional compounds such as N, N-methylene-bisacrylamide (bis) to form a sieve
like structure. The mesh size of the gel is determined by the concentration of
acrylamide and bisacrylamide (%T and %C).
%T=concentration of total monomer
%C=concentration of cross linker (as a percentage of the total monomer)
The higher the concentration of monomer (%T), the smaller the mesh size of the gel
[63].
Gel electrophoresis is performed under either continuous or discontinuous buffer
conditions. The running buffer and gel buffer are same in the continuous buffer
system; whereas the discontinuous buffer system has different gel and running buffers.
The gel system contains two gel layers, the stacking and separating layer.
37
Electrophoresis with a discontinuous buffer system provides sample concentration and
higher resolution. SDS-PAGE is performed under denaturing conditions, where the
detergent denatures and opens the protein by wrapping around the peptide backbone of
the protein. SDS binds to the protein approximately at a ratio of 1:1.4. The highly
negative SDS-protein complexes are separated on the gel based on their molecular
weight rather than their charge, as protein acquires net negative charge which is
proportional to the length of the protein. The electrophoretic mobility of the proteins
through the gel is inversely proportional to the logarithm of the protein molecular
weight[64].
Prefractionation of samples is required in proteomics, and gel electrophoresis is
a versatile and reliable method to achieve such prefractionation. The discontinuous
buffer system is frequently used as it provides higher protein resolution compared to
continuous buffer system. The discontinuous buffer system offers the ability to
manipulate buffer systems to achieve “steady-state-stacking” or “isotacophoresis”
which is responsible for focusing of the proteins before their separation by PAGE.
Though the separation of proteins in SDS-PAGE is primary based on the molecular
weight, the molecular weight range that can be preferentially resolved depends upon
the gel composition, buffer system used and the pH of the buffer system. The presence
of post-translational modification, such as glycosylation on the protein, results in
anomalous migration of the glycoprotein on SDS-PAGE. This anomalous
electrophoretic migration of glycoproteins, resulting in inaccurate molecular weight
determination, is due to little or no SDS binding of the sugar moieties.
38
1.2.4 Separation Techniques
Peptide mass spectrometry (shotgun proteomics) identifies proteins by
measuring mass-to-charge ratios of peptides and their fragments in the MS spectra. In
order to perform unambiguous identification of proteins and to achieve deep proteome
coverage, mass-spectrometry is highly dependent on separation to reduce the very
complex samples prior to their analysis. This facilitates the identification of low-
abundant species that would otherwise be overshadowed by the high abundant species,
i.e., increase the dynamic range.
1.2.4.1 High Pressure Liquid Chromatography
High-pressure liquid chromatography (HPLC) is often directly coupled to mass
spectrometric instruments with electrospray ionization (ESI) source. The continuous
separation of analytes using HPLC is physically compatible with an electrospray
ionization source. Due to efficient coupling of HPLC and ESI source, the combination
has become a standard sample introduction setup for peptide analysis. The most
commonly used chromatographic materials for separation of analytes are: ion
exchange (IEX), reverse phase, hydrophilic interaction chromatography (HILIC),
affinity, and hybrid materials.
Reverse phase liquid chromatography (RPLC or RP) separates analytes based
on their hydrophobicity, and a significant advantage of RPLC, when coupled with
mass spectrometer, is that the buffers used are generally compatible with ESI. The use
of acidic pH and organic solvents (acetonitrile and methanol) are conducive for
analysis of peptides by ESI-MS. Due to its high resolution, efficiency, reproducibility,
39
and mobile phase compatibility with ESI-MS, RPLC has emerged as a preferred
separation phase for the analysis of proteins and peptides. Over the years, significant
efforts have been made to increase peak capacity, sensitivity, reproducibility, and
analysis speed of reverse phase chromatography. It has been observed that packing
long, narrow capillary RP columns results into significant improvement in loading
capacity, sensitivity, and dynamic range of the RPLC. Shen et al. have reported use of
50 µm i.d. 40-200 cm long, small-particle-size (1.4 μm) RPLC columns with high
peak capacity (1000-1500, compared with an average of 400) operated in an ultrahigh
pressure regime (20 kpsi) for proteomic and metabolomics analysis[65]. The use of a
small diameter particle stationary phase (1.7 μm diameter) contributes significantly
towards the efficiency of the separation. The efficiency is inversely proportional to
the size of the particles used for packing the column. However, columns packed with
small diameter particles exhibit high back pressure, high pressure pumps (up to 15,000
psi) are required for their operation [66].
Multidimensional separation is a common way to increase the peak capacity of
chromatographic analysis. This approach combines several separation techniques, such
as ion exchange, high pH reverse phase separation, low pH reverse phase separation
and so forth, to improve the resolving power. For effective performance of
multidimensional separation, the individual separation methods should be as
orthogonal as possible to other methods in which each dimension utilizes different
molecular properties as a basis of separation. One of the first and most practiced two
dimensional setups is combination of strong cation exchange (SCX) chromatography
with reverse phase chromatography known as multidimensional protein identification
40
technology (MudPIT [67]). In this multidimensional separation, a highly complex
peptide mixture is loaded onto an SCX column and eluted in a series of steps with
increasing salt concentration. Each fraction is transferred onto an RP column either
off-line or directly, and peptides are further separated and eluted into the MS.
1.2.5 Mass Spectrometry
Mass spectrometry usually involves three parts: ion source and optics, mass analyzer
and data processing software.
1.2.5.1 Ionization Methods
A rapid growth in mass spectrometry based proteomic analysis can be
attributed to major contributions of experimental methods, instrumentation and data
analysis. Among the most important developments in mass spectrometry related
instrumentation is the invention of soft ionization methods i.e. matrix assisted laser
desorption ionization (MALDI) and electrospray ionization (ESI), allowing peptides
and proteins to be directly analyzed by MS.
MALDI
MALDI functions just as its name suggests: the matrix assists in desorption
and ionization of ions. In this type of ionization technique, the incident laser energy is
absorbed by the matrix and transferred to the acidified analyte. The rapid laser heating
results in desorption of matrix and positively charged analyte into the gas phase.
Singly charged ions are predominantly generated by MALDI, which makes it
applicable for top-down analysis of high-molecular weight proteins [68]. However,
41
low shot-to shot reproducibility and strong dependence on sample preparation are the
drawbacks of this technique. MALDI-TOFMS is suitable for high throughput analysis.
However, the high ionization energy can be detrimental in the analysis of
compounds with labile modifications [69].
Figure 1.4 Common matrices used in MALDI mass spectrometry. Reprinted from
reference [69].
ESI
ESI, unlike MALDI , generates ions from solution. Electrospray ionization is
created by application of high voltage between the emitter end of the separation
column and the inlet of the mass spectrometer [68]. Physicochemical processes of ESI
involve formation of a Taylor cone, i.e. an electrically charged spray of liquid eluting
from the separation column, followed by generation and desolvation of eluent droplets.
The unique feature of ESI compared to other ionization methods is its ability to
produce multiply charged ions from high molecular weight biological molecules like
42
proteins, which enables the analysis of these molecules with instrument having a small
mass to charge range (400-2000 m/z). A most important development in ESI
technique, which led to the sensitive proteomic analysis, is known as nano-ESI. In
Chapters 2 and 3 of this dissertation, nano-ESI, operated at 20 nL/min, is a primary
technique used for the analysis of 10,000 laser captured microdissected breast cancer
cells. The diagram of the ESI process is discussed in the PLOT related section.
1.2.5.2 Mass Analyzers
Ion Trap
As the name suggests an ion-trap mass spectrometer works by trapping the ions
in a vacuum. The ion trap functions by repeating the steps of ion collection, ion
storage and ejection of ions from the ion trap as flow from the LC column occurs. The
unique feature of ion-trap lies in its ability to isolate and fragment peptide ions from
complex mixtures, this operation is called tandem MS. Due to their fast scan rates,
MSn scans, high sensitivity, high-duty cycle, high ion storage capacity (compared to
2D and 3D traps), reasonable resolution and mass accuracy, linear ion traps (e.g. LTQ,
Thermo Fisher) are considered as the high-throughput workhorses in proteomic
research. Therefore, for our initial development work, as mentioned in Chapters 2 and
3, we employed LTQ-MS for bottom-up 10 µm i.d. Porous Layer Open Tubular
(PLOT) LC-MS analysis of 10,000 LCM cells. Furthermore, the LTQ is coupled with
Orbitrap and FTICR as the front end of hybrid MS instruments to perform ion
trapping, ion selection and high resolution ion analysis.
Mass spectrometry has been extensively used for determination of molecular masses
of the intact proteins. Among the mass spectrometric techniques, the ESI- high mass
43
accuracy MS is preferred as ESI generated multiply charged ions fall in the m/z range
of most mass spectrometers. A variety of mass spectrometers can be used for this
purpose; including ion trap (IT), orthogonal time-of-flight, time-of-flight and Fourier
transform ion cyclotron (FTICR) and Orbitrap instruments. However, mass
spectrometers such as ion traps are not suitable for this purpose due to their low
resolving power at full scan speed. However, the mass spectrometers such as TOF,
FTICR and Orbitrap, due to their high mass resolution and high mass accuracy, have
become the preferred instruments for accurate mass determination of intact proteins.
Quadrupole -Time of Flight Mass Spectrometer
Time-of-flight mass spectrometry (TOFMS) determines the mass-to-charge
ratio of the ions using a time measurement. Ions are accelerated in the flight tube by an
electric field. This acceleration provides the same kinetic energy to all the ions bearing
the same charge. The velocity gained by the ion due to acceleration depends on the
mass-to-charge ratio. Then, the time that an ion takes to travel to the detector is
measured. The heavier ions will take longer time to reach the detector compared to
lighter ones. Based on the flight time of the ion and the known experimental
parameters, the mass-to-charge ratio of the ion can be determined.
Fourier Transform Ion Cyclotron Resonance (FTICR)
FTICR mass analyzer determines the mass to charge ratio of the ions based on
their cyclotron frequency under the influence of constant magnetic field. In the ICR
mass analyzer, the ions are stored in a Penning trap under the influence of constant
magnetic and electric fields. The ions are excited to a larger cyclotron radius by an
44
oscillating electric field perpendicular to the magnetic field. The energy applied to the
ions in ICR cell can be tuned to excite, dissociate and eject ions. The detector plates on
the opposite sides of the trap measures the cyclotron frequency of all the ions
simultaneously and with the help of Fourier transform convert these frequencies into
m/z values (Figure 1.5). FTICR is a very high mass resolution technique contributing
to accurate mass measurement[70]. The high mass resolution and high mass accuracy
of the FTICR is due to following reasons. 1) The mass of the ion is calculated from the
measurement of cyclotron frequency, a parameter that is more precisely measurable
than any other parameter. 2) The ion cyclotron frequency is defined by the magnetic
field. The better the time stability of the magnetic field (1 ppb/hour) compared to time
stability of rf voltage (100 ppb/hour) results in a superior mass precision. 3) In the
spatially uniform magnetic field, the cyclotron frequency of an ion is independent of
the ion speed. 4) In order to attain high mass precision, ICR, unlike ion-beam-based
mass measurement, does not require the use of narrow slits [71].
45
Figure1.5 Operational principle of the FTICR. Reprinted from reference [69].
Among the many applications of the FTICR, the high resolving power of
FTICR is useful for the study of large macromolecules such as proteins with several
multiple charges generated by electrospray ionization. The FTICR instrument provides
mass resolution in the range of 50,000-750,000 and mass accuracy of less than 2 ppm.
However, FTICR suffers due to relatively slow acquisition speed and low sensitivity
of analysis. In order to obtain high sensitivity and improved acquisition time, we
acquired MS scans over a limited mass window, corresponding to m/z values of the 9+
charge state of intact alpha-human chorionic gonadotropin (Chapter 4).
46
Orbitrap
In 1999, Markov invented a new type of mass analyzer called the Orbitrap [72]
which was applied for proteomic research in 2005[73]. Among the high mass
resolution FTMS instruments, the Orbitrap superceded the FT-ICR due to low cost of
operation, while providing equivalent high mass accuracy. The Orbitrap consist of two
electrodes, an outer barrel-like electrode and a coaxial inner spindle-like electrode with
an electrostatic field formed between them (Fig.1.6). The ions are tangentially injected
in the gap between the two electrodes and made to rotate around the inner electrode
due to the electrostatic attraction by the inner electrode and the balancing centrifugal
forces. While cycling around the central axis, the ions move back and forth along the
central axis. The frequency of these harmonic oscillations is Fourier transformed to
determine the mass-to charge ratio of the ions. The Orbitrap offers a high resolving
power of roughly 50,000 and a mass accuracy of less than 2 ppm, with proper
standards. With an average acquisition speed of at least 6 MS/MS spectra per second
in parallel with a single high-resolution spectrum (60,000 resolution) significantly
improved protein coverage can be achieved.
47
Figure1.6 Cutaway view of the Orbitrap mass analyzer. Ions are injected into the
Orbitrap at a point (arrow) offset from its equator and perpendicular to the z-axis,
where they begin coherent axial oscillations without the need for any further
excitation. Reprinted from reference [69].
1.2.5.3 Database Searching Tools for Proteomics
Database searching plays an important role in large-scale proteomics. Database
searching tools enable the use of mass spectrometric data of peptides to identify
proteins in sequence databases. Two mass spectrometric- based database search
principles are mainly used for identification of proteins. The first method uses the
molecular weight fingerprint of the protein digest (peptides) obtained by a site-specific
protease [74,75], and the second method uses the tandem mass spectra obtained on the
individual peptides of a digested protein[16,76]. Since each tandem mass spectrum
stands as a unique and verifiable piece of data, the second method has the ability to
identify a wide range of proteins and thus provide a comprehensive approach to
handle complex protein mixtures[77].
Tandem Mass-Spectrometry and Data Processing
48
Figure 1.7 Low energy collision induced dissociation of peptide. Reprinted from
reference [78]
In tandem mass-spectrometry (MS/MS), the gas phase peptide ions undergo
fragmentation due to process such as collision-induced dissociation (CID. The gas
phase CID is the most widely used technique in tandem mass-spectrometry. The
dissociation pathways are exclusively dependent on the collision energy. The low
energy collisions (
49
Figure 1.8 Mobile Proton Theory. Reprinted from reference [84].
To explain the intensity patterns observed in the tandem mass spectra, a mobile proton
model has been proposed[83]. The mobile proton model states that to initiate backbone
cleavages for production of b and y ions, the protons are transferred intramolecularly
from basic side-chains to the heteroatoms along the backbone. Figure 1.8A shows that
the proton exists in equilibrium between all possible basic sites. The energy required
to mobilize the proton from a basic side-chain or from the amino terminus to the
peptide backbone depends on the amino acid composition of the peptide. Therefore,
the dissociation or the fragmentation energy for the peptides containing amino acids
having greater gas-phase basicity is higher compared to the peptides with amino acids
50
having lower gas-phase basicity. An example of a lysine- terminated peptide is shown
Figure 1.8B.
SEQUEST- Database Search Algorithm
Given the mass of the precursor ion (m/z of the peptide ion) and its fragment
ions, the goal of the database search algorithm is to determine peptide sequence and
protein identity. SEQUEST [16] is a database search program which uses a descriptive
model for peptide fragmentation and correlative matching to a tandem mass
spectrum[16]. To access the quality of the match between the experimental spectrum
and amino acid sequence from the database, SEQUEST applies a two-tiered scoring
scheme. It first calculates the empirically derived preliminary score (Sp) that restricts
the number of sequences to be analyzed in the correlation analysis. Sp is calculated by
summing the peak intensities of fragment ions as well as accounting for continuity of
the fragment ion series and the length of the amino acid sequence. The second and
decisive score is a cross-correlation score, referred to as XCorr, which correlates the
experimental and theoretical spectra. The theoretical spectrum is generated from the
predicted fragmentations, i.e. b- and y-ions for each of the sequence in the database.
The similarity between the theoretical and experimental spectra is evaluated based on
the cross-correlation of the two spectra. Apart from preliminary and cross-correlation
scores, SEQUEST calculates another important difference, ΔCn, the normalized
difference of XCorr values between the best matched sequence and each of the other
sequences. ΔCn is a useful indicator of the uniqueness of the match. If the value of
ΔCn is greater than 0.1, then the match is considered as reasonably unique to a
sequence. XCorr, which is not dependent of the database size, suggests the quality of
51
the match between the spectrum and sequence, whereas ΔCn, which is dependent on
the size of the database, indicates the quality of the match relative to near misses.
Label Free Quantitative Microproteomics
Currently, a number of stable isotope labeling approaches are in use for
„shotgun” quantitative proteomic analysis. The stable isotope labeling approaches
include Isotope-Coded Affinity Tag (ICAT), Stable Isotope Labeling by Amino Acids
in cell culture (SILAC), 15
N/14
N metabolic labeling, 18
O/16
O enzymatic labeling,
Isotope Coded Protein Labeling (ICPL), Tandem Mass Tags (TMT), Isobaric Tags for
Relative and Absolute Quantification (iTRAQ) and other chemical labeling[85,86].
These stable isotope labeling methods have offered valuable flexibility while using
quantitative proteomic methods to study protein abundance changes in complex
samples. However, most labeling based quantification methods are limited in their
application due to increased time and complexity of sample preparation, the
requirement of higher sample concentration, high cost of the reagents and incomplete
labeling. Therefore, for relative quantitation of small sample amounts, there is
increased interest in label-free approaches in order to achieve more sensitive and
simpler quantification results.
Label-free protein quantitation is generally based on two approaches. The first
involves the measurement of ion intensity changes such as peptide peak areas or peak
heights in chromatography (i.e. total or single ion analysis). The second approach is
based on spectral counting of the identified peptides after MS/MS analysis. Peptide
peak intensity and spectral counting are measured for individual LC-MS/MS runs, and
52
changes in protein abundance are determined by direct comparison between different
analyses.
Relative Quantitation by Peak Intensity
In this approach, relative quantitation of the peptides was achieved by direct
comparison of peak area of each peptide ion in multiple LC-MS datasets. However,
application of this method for determination of protein abundance changes in complex
biological samples had some practical limitations. The differences in the sample
preparation and sample injection, in addition to experimental changes in retention time
and m/z value, significantly influence the direct and accurate comparison of multiple
LC-MS datasets. Therefore, highly reproducible LC-MS performance and careful
chromatographic peak alignment are critical for the quantitation approach[87].
Relative Quantitation by Spectral Count
In the spectral counting approach, comparison of the number of identified
MS/MS spectra from the same proteins (spectral count) are compared between
multiple LC-MS/MS datasets. The increase in protein sequence coverage, the number
of identified unique peptides and the number of identified total MS/MS spectra
(spectral count) correspond with the increase in protein abundance. However from
these three factors of identification, only spectral count showed strong linear
correlation with relative protein abundance with a dynamic range over 2 orders of
magnitude. Therefore spectral counting is considered as a simple and reliable index
for relative protein quantification[88]. In comparison to peak intensity, which uses
computer algorithms for automatic LC-MS peak selection, alignment and comparison,
the spectral counting approach is much easier to implement.
53
However, for accurate and reliable detection of protein changes in complex
mixtures, normalization and statistical analysis of spectral counting databases is
necessary. One of the simple normalization methods, which accounts for the run to
run variability, uses total spectral counts[89]. Another approach to normalization
involving calculation of a normalized spectral abundance factor (NPAF) was
suggested to account for the effect of protein length on spectral count[90].
Zhang et al. compared five different statistical tests on spectral count data
collected by analysis of yeast digests to evaluate the significance of comparative
quantification by spectral counts[91]. These statistical tests were 1) Fisher‟s exact test,
2) goodness-of-fit test (G-test) 3) AC test, 4) Student‟s t-test and 5) Local-Pooled-error
(LPE) test. For datasets with three or more replicates, the Student‟s t-test was found to
be the best, whereas, in case of datasets with one or two replicates, the Fisher‟s exact
test, G-test and AC test can be used.
Relative quantitation by spectral count has been successfully applied for
different clinical applications[92], including analysis of normal and acute
inflammation, biomarker discovery in human saliva proteome in type-2 diabetes[93],
comparison of protein expression in mammalian and yeast cells under different culture
conditions, distinguishing normal and diseased lung cancer samples[94,95], discovery
of phosphotyrosine-binding proteins in mammalian cells and identification of
differential plasma membrane proteins in terminally differentiated mouse cell
lines[95].
Another label -free method, the spectral index, is used to analyze relative protein
abundances in large-scale data sets obtained from biological samples by shotgun
54
proteomics is called spectral index. The spectral index method is made up of two
biochemically plausible features i.e. 1) Spectral counts (indicative of relative protein
abundance and 2) the number of samples within a group with detectable peptides [96].
We used this method to assess differentially abundant proteins between 9 non-
cancerous, normal breast epithelial (NBE) samples and 9 estrogen receptor (ER)-
positive (luminal subtype), invasive malignant breast epithelial (MBE) samples [97].
However, for a low number of replicates of breast cancer samples (n=3), we used
spectral counting (PatternLab software [98]) for determination of differentially
abundant proteins between invasive breast cancer cells and metastatic breast cancer
cells (Chapter 2).
1.3 Microproteomics
Mass spectrometry-based proteomic methods are extensively used to study
global changes in protein expression caused due to pathological stimuli in an
organism. Current methods use sample total protein amounts in the range of
micrograms or milligrams [99] and extensive protein/peptide level separations in order
to achieve comprehensive proteomic analysis. However, in many cases, obtaining
these sample amounts can be practically impossible or challenging. There are several
reasons for low availability of sample amounts e.g. rarity of the sample itself,
collection of many thousand cells takes several hours or days using a technique such
as laser capture microdissection, multiple experiments on a homogeneous sample and
so forth.
55
One of the examples of such a rare/limited sample type is brain tissue specimen
related to neurodegenerative diseases such as Parkinson, Alzheimer, and Huntington
disease. These neurodegenerative diseases are characterized by selective degeneration
of particular types of neurons; while the tissue of rest of the brain is under normal
pathological state[100]. Researchers, trying to understand the causes behind these
diseases, are using laser capture microdissection (LCM) to selectively collect
degenerative neurons. However, obtaining even 10,000 to 50,000 neurons is
impractical because the degenerative neurons are limited in numbers[101]. Another
similar example of limited sample amount is malignant cells collected from a solid
tumor. Solid tumors are heterogeneous in composition, i.e. they are made up of a
subpopulation of cancer cells, along with stromal elements that collectively form a
microenvironment[41]. The subtypes of malignant cells differ among themselves in
many properties, such as production and expression of cell surface markers, sensitivity
to therapeutics, growth rate, etc. The studies aimed at determining the proteomic
changes in these individual cell types are limited due to the time and cost required to
collect large cell numbers using the LCM procedure. The proteomic analysis of
circulating tumor cells (CTCs), which can be an indicator of potential metastasis, is
thought to provide a noninvasive way of determining tumor metastasis or the impact of
treatment on the number of CTCs[102]. As the number of CTCs circulating in the
blood is very low, advances in proteomics are required to analyze them.
To accomplish microproteomics of clinically relevant and limited amounts of
sample, one must use a minimum number of steps in the proteomic platform, and each
of these steps must limit sample losses[99]. Considerable sample losses during sample
56
preparation and limited dynamic range of LC-MS/MS system are two main obstacles
in analyzing small protein amounts. In order to improve sample preparation, low
protein binding tubes and the use of MS–friendly acid labile detergents are
suggested[99]. The use of MS-friendly detergents results in shorter extraction and
digestion procedures.
One of the recent examples of low sample proteomics was the analysis of 500-
5,000 CTCs, generating proteomic profiles of ~150-650 proteins[103]. The cells were
lysed using NP-40 detergent, and the detergent was separated by precipitating the
proteins from cell lysate. The in-solution digest of these samples were subjected to
nanoflow LC/Q-TOF analysis. In an another approach to a small sample amount,
quantitative comparison of a proteome of LCM collected single pancreatic islets,
containing 2,000-4,000 cells, treated with high and low levels of glucose, was carried
out. The cells were lysed with acid labile detergent followed by in-solution digestion.
Sensitive LC-MS/MS analysis was performed using a low column flow rate and long
chromatographic separation time. In Chapter 2 and 3, we have presented a short run on
SDS-PAGE based sample handling step, followed by sensitive LC-MS analysis using
PLOT column.
1.3.1 Alternative strategies for protein digestion
1.3.1.1 Solvents based approach
In 2007, Veenstra et al. introduced a membrane protein digestion method with
60% methanol in place of chaotropes, as a membrane protein solubilizing solvent
during trypsin digestion[44]. In this approach, the plasma membrane protein
57
population was isolated from the human epidermis and dispersed in 50 mM
ammonium bicarbonate, pH 7.9. The proteins were reduced and alkylated using TCEP
and iodoacetamide (IAA), respectively. The membrane proteins were separated using
ultracentrifugation at 100,000 g. The protein pellet was further solubilized in 60% v/v
methanol in 50 mM ammonium bicarbonate. The proteins were digested by trypsin
(trypsin/protein ratio: 1/20) at 37˚C for 5 hours in the same solubilizing buffer. The
acidified digest was analyzed using two dimensional (SCX/RP) LC-MS.
This strategy was found to have advantages compared to detergent- and
chaotrope-based solubilization as 1) the same methanol based buffer conditions were
used for solubilization, denaturation and proteolysis, 2) sample dilution and dialysis
steps were completely eliminated, and these steps typically decrease solubilizing
capacity and subsequent proteolytic efficiency, 3) methanol and ammonium
bicarbonate , volatile water soluble compounds, are removable by lyophilization after
digestion, making the methanol-based buffer approach MS-friendly. Other solvents
such as acetonitrile and trifluroethanol are also used for solubilization and digestion of
membrane proteins.
1.3.1.2 Cleavable surfactant
The surfactants are stable and strong solubilizing agents. The environmental
concerns such as a low biodegradability rate of the surfactant, has become one of the
main driving forces for the development of cleavable surfactants. Although cleavable
surfactants were first synthesized many years ago, Norris et.al applied nonacid
cleavable detergents for MALDI mass spectrometry profiling of whole cells [104].
58
They showed that cleavable surfactant results in an increase in the number of proteins
analyzed by increasing protein solubility. Cl