Development of sensitive high performance analytical ...707/fulltext.pdf · Comprehensive...

1

Development of Sensitive High Performance Analytical Methods for the

Comprehensive Characterization of Proteins and Glycoproteins from Samples of

Clinical and Biopharmaceutical Importance

A dissertation presented

by

Dipak A. Thakur

to

The department of Chemistry and Chemical Biology

In partial fulfillment of the requirements

for the degree of

Doctor of Philosophy

in the field of

Chemistry

Northeastern University

Boston, Massachusetts

June 2011

2

Development of Sensitive High Performance Analytical Methods for the

Comprehensive Characterization of Proteins and Glycoproteins from Samples of

Clinical and Biopharmaceutical Importance

by

Dipak A. Thakur

ABSTRACT OF DISSERTATION

Submitted in partial fulfillment of the requirements for the degree

of Doctor of Philosophy in Chemistry in the Graduate School of

Arts and Sciences of Northeastern University, June 2011

3

ABSTRACT

This thesis focuses on the development of ultra sensitive high resolution

analytical methods for the characterization of proteins and glycoproteins from samples of

clinical and biopharmaceutical origin. In the first instance the combination of laser

capture micro dissection (LCM) for the selective enrichment of homogenous but low

number cell populations in combination with down-stream porous layer open tubular

column (PLOT) liquid chromatography-mass spectrometry (LC-MS) using both one- and

two-dimensional separations is described. The second portion of the thesis describes the

ultra high performance analysis of intact recombinant a-human chorionic gonadotrophin

glycoforms using capillary electrophoresis with accurate mass high resolution Fourier

transform ion cyclotron resonance mass spectrometry (CE-FTMS).

In Chapter 1 an overview of current analytical methods and technologies applied

in the field of proteomics is discussed. A critique of these technologies is also performed

laying down the foundations for the developments and improvements in current state-of-

the-art as presented in the subsequent Chapters.

In Chapter 2 the development of a micro-proteomic workflow for the

comprehensive analysis of just 10,000 cells, collected by LCM, from invasive and

metastatic epithelial cell types from a breast cancer patient is described. To minimize

sample loss the development of an efficient sampling handling approach was necessary.

To achieve this protein level separation and subsequent enzymatic digestion of the cell

lysate was performed using short distance SDS-PAGE separation on tricine-PAGE gels.

By combining this sample clean-up and fractionation approach with ultrasensitive 1D

PLOT LC-MS in excess of 1,000 proteins were identified following injection of just

4

1/10th

of the digested lysate or approximately 1,000 cells. The micro-proteomic workflow

is highly suited for the comparative analysis of such small but highly informative LCM

collected cell populations, more than 100 proteins were found to be differentially

expressed thereby facilitating a deeper understanding of the associated biological changes

associated with the invasive to metastatic transition.

In Chapter 3 the application of an online 2D-RP/SCX/SPE/PLOT LC-FT-MS micro-

proteomics platform is presented for the comparative proteomic analysis of LCM

collected normal and triple negative breast cancer cell population. Using the effective

sample handling approach described in Chapter 2 followed by fractionation and ultra

sensitive analysis of the lysate, the tryptic digest corresponding to 4,000 cells using the

2D-RP/SCX/SPE PLOT LC-FT-MS platform in excess of 15,000 unique peptides

corresponding to 4,259 proteins were identified. This deep proteome coverage further

emphasizes the utility of the developed micro-proteomic platform for the analysis of trace

quantities of proteins generated from small but highly biologically important LCM

enriched cell populations.

In chapter 4 the development and application of a high resolution CE-FTMS method for

intact glycoform profiling of recombinant α-human chorionic gonadotrophin is described.

The CE separation parameters used allowed for the rapid analysis, 60 different glycoforms bearing up to nine sialic acids in addition to other

glycoforms differing by the number and extent of uncharged monosaccharides. A low

volume pressurized liquid junction, which preserves the high resolution of the CE

separation, was used to interface the CE system with high resolution FTMS thereby

allowing accurate determination of charge state and accurate mass of each intact

5

glycoform following deconvolution. In addition to the intact glycoform, profiling analysis

of glycopeptides and glycans was also performed to determine and assign the population

of oligosaccharides present at each individual glycosite, thereby facilitating complete and

comprehensive characterization of r-ahCG. The methodology developed in Chapter 4 was

further applied to the analysis of r-αhCG from different expression systems, CHO and

murine cell based. The CE-FTMS method is readily applicable for characterization of

drug substance/product as well as in process monitoring of these complex glycoforms.

6

ACKNOWLEDGEMENT

I want to express my sincere and heartfelt gratitude to many people, teachers, colleagues

and friends, who have helped me in reaching this milestone.

First, I would like to acknowledge my thesis advisor, Professor Barry L. Karger, for

accepting me as his student and giving me an opportunity to work in his research group.

His guidance was constructive and aimed at bringing best out of me as a scientist and a

person. Importantly, I was inspired and motivated by his wisdom, enthusiasm and

commitment to highest standards.

I would like to thank Dr. Tomas Rejtar for devoting his time and energy while guiding

me on various projects. I would like to appreciate Dr. Marina Hincapie, Dr. Andras

Guttman, Dr. Billy Wu, Dr. Shujia Dai, Dr. Sanwon Cha and Dr. Jonathan Bones for

sharing their knowledge and expertise.

I would like to thank my dissertation committee members, Prof. Paul Vouros, Prof.

Graham Jones and Prof. Roger Giese for their time, suggestions and guidance.

Many thanks to Dr. Buffie Clodfelder-Miller (Cellular and Molecular Neuropathology

Core, University of Alabama), Elizabeth Richardson, Shemeica Binns, Sonika Dahiya

and Dennis Sgroi (Massachusetts General Hospital) for providing precious LCM

samples. I would like to thank our collaborators N.Washburn, C.J. Bosques, N.S.Gunay,

Z.Shriver, and G.Venkataraman (Momenta Pharmaceuticals) for supporting glycoform

profiling project and for their full contribution towards the glycan analysis.

I would like to acknowledge the support and friendship of current and former researchers

of Barnett Institute, Dr. E.Moskovets, Dr. Vickor Andreev, Dr. Quanzhou Luo, Dr.

7

Guihua Yue, Mr. Laxmi Manohar Akella, Dr. Claudia Donnet, Dr. Enrique Avarelo, Dr.

Zoltan Sabo, Dr. Jim Glick, Somak Ray; previous and current graduate students lingyun

Li, Ye Gu, Dongdong Wang, Majlinda Kulloli, Agnes Rafalko, Jonna Linholm-Ventola,

Jack Liu, Chen Li, Peter Li, Chris Morgan, Vaneet Sharma, Rose Gathungu, Joshua

Klaene and Fateme Tousi.

I would like to express my gratitude to Jeffrey Kesilman, Felicia Hopkins, Richard

Pumphrey, Andrew Bean, Jana Volf and Bill O,Neil for their support.

I would like to acknowledge my wife, Vaishali, daughter Radhika, and son Hrishikesh for

their love, support, sacrifice and compromise during 5 long years. Many many thanks to

my parents, Sudha and Arjun Thakur, for their support, encouragement and care. I would

like to thank my brother, Ganesh and his family, for supporting, guiding and encouraging

me during my graduate studies. I would like to express my gratitude to my sister Jyoti

and her family for their support and encouragement.

8

TABLE OF CONTENTS

ABSTRACT………………………………………………………………………. 3

ACKNOWLEDGEMENT………………………………………………………… 6

TABLE OF CONTENTS………………………………………………………….. 8

LIST OF FIGURES.…………………………………………………………….…..14

LIST OF TABLES……………………………………………………………..……16

LIST OF ABBREVIATIONS AND CONVENTIONS…….……………………….16

Chapter 1: Overview of Technologies and Methodologies for Proteomics

Analysis…………………………………………………………………………..…19

1.1 Introduction………………………………………….…….…………………….20

1.1.1 Proteomics: An Overview………………………………………………….….20

1.2 Shotgun Proteomics Methodologies…………………………………………..…23

1.2.1 Samples………………………………………………………………………...25

1.2.1.1 In Vitro Sample Source: Cell lines…………………………………….….....25

1.2.1.2 In Vivo Sample Sources…………………………………….……………..…26

1.2.2 Tissue Microdissection………………………………….……………………...28

1.2.2.1 Laser Capture Microdissection………………………….……………….…...30

1.2.2.2 Laser Microbeam Microdissection (LMM) ….......................................... 32

1.2.2.3 Comparison of LCM and LMM……………………………..……………….33

1.2.3 Sample Preparation……………………………………………..………..….…34

1.2.3.1 SDS-Polyacrylamide Gel Electrophoresis (SDS-PAGE) ……….……...…..36

1.2.4 Separation Techniques…………………………………………………….…...38

1.2.4.1 High Pressure Liquid Chromatography…………………………………..... 38

1.2.5 Mass Spectrometry…………………………………………………..………..40

1.2.5.1 Ionization Methods……………………………………………………….. 40

1.2.5.2 Mass Analyzers………………………………………………………….. 42

1.2.5.3 Database Searching Tools for Proteomics……………………………….. 47

1.3 Microproteomics………………………………………………………….. 54

1.3.1 Alternative strategies for protein digestion………………………………….. 56

1.3.1.1 Solvents based approach………………………………………………….. 56

1.3.1.2 Cleavable surfactant……………………………………………………….. 57

9

1.3.1.3 Filter-Aided Sample Preparation (FASP) ……………………………….. 59

1.3.2 High Performance Liquid Chromatography for Microproteomics………….. 61

1.3.2.1 Peak Capacity………………………………………………………….. 61

1.3.2.2 Narrow-bore column and ESI-MS……………………………………….. 64

1.3.2.3 Porous Layer Open Tubular (PLOT) Columns…………………………….. 66

1.4 Protein Glycosylation Analysis……………………………………………….. 71

1.4.1 Intact Glycoprotein Analysis……………………………………………….. 73

1.4.1.2 Capillary Electrophoresis………………………………………………….. 73

1.4.1.3 Capillary Electrophoresis Coupled to Mass Spectrometry……………….. 77

1.4.1.4 Application of CE-MS for Analysis of Intact Glycoforms……………….. 80

1.4.2 Glycan analysis………………………………………………………….. 81

1.4.2.1 Glycan release methods………………………………………………….. 82

1.4.2.2 Enzymatic Sequencing of Oligosaccharides…………………………….. 82

1.4.2.3 HPLC analysis of glycans……………………………………………….. 85

1.5 References……………………………………………………………….. 89

Chapter 2: Proteomic Analysis of 10,000 Laser Captured Microdissected Breast

Tumor Cells Using Short Migration on SDS-PAGE and Porous Layer Open

Tubular (PLOT) LC-MS…........………………………………………….. 101

ABSTRACT……………..………………………………………………….. 102

2.1 Introduction……….…………………………………………………….. 104

2.2 Experimental Section………………………………………………………….. 106

2.2.1 Chemicals………………….………………………………………….. 106

2.2.2 Clinical Specimens………………………………………………………….. 106

2.2.3 Laser Capture Microdissection…………………………………………….. 107

2.2.4 Cell Lysis, SDS-PAGE and In-Gel Digestion……………………………….. 107

2.2.5 Nano LC-ESI-MS with 10 µm i.d. PLOT Column………………………….. 108

2.2.6 Protein Identification……………………………………………………….. 109

2.2.7 Identification of Differentially Abundant Proteins by Spectral Counts...…….110

2.2.8 Reproducibility of Replicate Analyses of Metastatic and Invasive Breast

Cancer Samples. ………………………………………………………………….. 111

2.2.9 Gene Ontology Annotation with DAVID (Database for Annotation,

Visualization and Integrated Discovery)………………………………………….. 111

2.3 Results and discussion……………………………………………………….. 112

10

2.3.1 Overview of Proteomic Workflow………………………………………….. 112

2.3.2 Cell Lysis and Protein Extraction from the LCM Cap…………………….. 113

2.3.3 Short SDS-PAGE Run for In-Gel Digestion……………………………….. 114

2.3.4 Online PLOT/LC-ESI-MS……………………………………………….. 114

2.3.5 Proteomic Analysis of Three Replicates of 10,000 Breast Cancer Cells…….. 118

2.3.6 Identification of Differentially Expressed Proteins………………………….. 119

2.3.7 Gene Ontology Analysis………………………………………………….. 121

2.4 Conclusions………………..…………………………………………….. 125

Addendum to Chapter 2………………………………………………………….. 127

Evaluation of Short SDS-PAGE Separation Distance for Sample

Preparation of Small Protein Amounts Prior to LC/MS Proteomic Analysis…….. 127

2.1A Methods and Materials……………………………………………………….. 127

2.1.1 Chemicals…………….……………………………………………….. 127

2.1.2 SDS-PAGE Separation and In-Gel Digestion……………………………….. 127

2.1.3 LC-MS/MS Analysis……………………………………………………….. 130

2.1.4 Protein Identification……………………………………………………….. 130

2.2A Results…………………………..…………………………………….. 131

2.3 Reference……………………………………………………………….. 132

Chapter 3: Comparative Proteomic Analysis of 10,000 Triple Negative

Breast Cancer and Normal Mammary Epithelial Laser Microdissected

Cells Using On-line 2D RP-SCX/Porous Layer Open Tubular Column

(PLOT) LC-MS…………………………………………………………….. 134

Abstract………………………….………………………………………….. 135

Introduction…………….…………………………………………………….. 136

2. Materials and Methods………………………………………………………….. 140

2.1. Chemicals and Materials……………………..……………………………….. 140

2.2. Laser Capture Microdissection……………….……………………………….. 140

2.3. Protein Extraction and Digestion…………………………………………….. 141

2.4. Column Preparation and Two-Dimensional Separation………………………. 142

2.5. MS Analysis and Data Analysis…………………………………………….. 145

2.6. Spectral Index (SpI) for Identification of Differentially Abundant Proteins….. 146

11

2.7. Gene Ontology by DAVID (Database for Annotation, Visualization

and Integrated Discovery) a Functional Annotation Clustering Tool……….. 147

2.8 Gene Set Enrichment Analyses (GSEA) for Functional Significance

of Differentially Abundant Proteins………………………………………….. 147

3. Results and Discussion………………………………………………………….. 148

3.1 Experimental and Bioinformatics Workflow for Proteomic Analysis of

10,000 LCM Collected Normal and Cancer Breast Epithelial Cells. ……….. 148

3.2. Peptide and Proteins Identification…………………………………………... 150

3.3. Spectral Index Analysis for Determination of Differentially Abundant

Proteins. ……………………………………..…………………………………….. 152

3.4 DAVID Functional Annotation Analysis of Differentially Abundant Proteins…154

3.5 Gene Set Enrichment Analyses (GSEA) for Canonical Pathway Analysis….. 156

Conclusions………….……………………………………………………….. 160

References…………….…………………………………………………….. 162

Chapter 4: Characterization of the Intact α- Subunit of Recombinant Human

Chorionic Gonadotropin Glycoforms by High Resolution CE-FT-MS*…….. 165

Abstract………………….………………………………………………….. 166

4.1 Introduction……………………….…………………………………….. 167

4.2 Experimental…………………………….……………………………….. 171

4.2.1 Recombinant r-αhCG ……………………………………………………….. 171

4.2.2 Chemicals………………………………….………………………….. 171

4.2.3 CE-MS System………………………………………………………….. 172

4.2.4 Deglycosylation and Analysis of Released Glycans……………………….. 176

4.2.5 Trypsin Digestion of r-αhCG Expressed in a Murine Cell Line…………….. 177

4.2.6 LC-MS Analysis of r-αhCG Tryptic Digest……………………………….. 177

4.2.7 Data Analysis………………………………………………………….. 178

4.3 Results and Discussion……………………………………………………….. 180

4.3.1 Intact Protein Analysis……………………………………………………….. 180

4.3.2 Repeatability of the Intact Protein Separation……………………………….. 185

4.3.3 Analysis of the Released Glycans………………………………………….. 188

4.3.4 Glycopeptide Analysis……………………………………………………….. 199

4.3.5 Analysis of Combined Data………………………………………………….. 202

4.3.6 Analysis of r r-αhCG Expressed in CHO Cell Culture…………………….. 214

12

4.4 Conclusions…….……………………………………………………….. 217

4.5 References ………………………………………………………………….…...219

Chapter 5: Summary and Future Directions…………………………………. 221

13

LIST OF FIGURES

Chapter 1

Figure 1.1 Conceptual organization of proteomic experiments………………... 22

Figure 1.2 Human islet protein reference map……………………………………... 23

Figure 1.3.The principles of laser capture microdissection (LCM) …………….... 31

Figure 1.4 Common matrices used in MALDI mass spectrometry…………….... 41

Figure 1.5 Operational principle of the FTICR…………………………………... 45

Figure 1.6 Cutaway view of the Orbitrap mass analyzer……………………………47

Figure 1.7 Low energy collision induced dissociation of peptide………………... 48

Figure 1.8 Mobile Proton Theory………………………………………………... 49

Figure 1.9. Illustration of effect of concentration of analytes and flow

rate on ESI processes………..................................................................…... 63

Figure 1.10 Comparison of normal flow rate electrospray vs. a lower

flow rate electrospray. ……………………………………………………………... 65

Figure 1.11 Schematic diagram of the low dead volume connections

used to design 1D and 2D SPE-PLOT system……………………………………... 67

Figure 1.12 Diagram of the advanced on-line 2-D SCX/PLOT/MS system using

a 3.2 m* 10 µm i.d. PLOT column and an online triphasic trapping column…….. 68

Figure 1.13 Chemical diversity of glycans………………………………………... 72

Figure 1.14 Electric double layer at the capillary wall and creation of EOF.......... 75

Figure 1.15 Different types of CE/MS interfaces…………………………………. 78

Figure 1.16 CZE-ESI-MS analysis of a recombinant human EPO. …..………….. 81

Figure 1.17 Exoglycosidases commonly used to determine the structure

of the N-glycans……………………………………………………………………. 84

14

Chapter 2

Figure 1. Shotgun proteomic workflow for the analysis of 10,000 LCM collected

breast cancer cells collected from breast tumor and lymph node tumor…………...113

Figure 2. Optimization of LC-MS parameters……………………………………….115

Figure 3. Assessment of the variability in proteomic profiles associated

with three replicate runs each of invasive and metastatic breast cancer

samples (three samples of 10,000 cells each)…………………..……………….. 120

Figure S1. Selection of gel type and SDS-PAGE separation distance for

proteomic analysis of small sample amounts……………………………………. 129

Chapter 3

Figure 1. Shotgun proteomics workflow to analyze breast epithelial

cells collected from normal and triple negative breast tumor epithelium……….... 148

Figure 2. Peptide and protein identifications from 6 salt steps……………... 150

Figure 3. Peptide and protein identifications in the six samples. …………………..151

Figure 4. Participants of cell cycle (G1-S Phases) were significantly

enriched in triple negative breast cancer (TNBC) cells……………………... 157

Figure 5. Structural molecular organization was significantly deficient

in triple negative breast cancer (TNBE)…………………………………….. 159

Chapter 4

Figure 1A Diagram of CE-MS system for analysis of intact glycoproteins………. 172

Figure 1B. Photograph of CE system coupled to LTQ-FTMS for

analysis of intact glycoproteins…….………………………………………. 175

Figure 2 Illustration of the separation resolution of CE-MS analysis

of intact α-hCG derived from a murine cell line………..……………………. 181

Figure 3A. CE-MS separation of r-αhCG produced in a murine cell line……….. 182

Figure 3B CE-MS separation of r-αhCG produced in a murine cell line……….. 183

Figure 4: Chromatograms and fragmentation spectra of glycan analysis……….. 189

Figure 5: LC/MS/MS analysis of sulfated and α-galactose containing N-glycans.. 190

Figure 6: Exoglycosidase characterization of

galactose-α-galactose-containing species……………………………………….... 191

Figure 7. CE-MS separation of r-αhCG produced in a CHO cell line….…... 214

15

LIST OF TABLES

Chapter 2

Table 1. Number of proteins identified per gel section per sample from

three technical replicates of 10,000 mouse liver cells……………………… 117

Table 2. Number of proteins identified per gel section per sample

from three replicates of 10,000 invasive breast cancer cells……...…………..……119

Table 3. Enriched Gene-Ontology (GO) terms for with FDR less than

5% and P value less than 0.05 are shown in bold………………………….. 123

Table S1. Peptides and proteins identified using three SDS-PAGE

separation conditions……………………………………………………….. 131

Chapter 3

Table 1. Details about normal breast specimens and triple negative breast

cancer specimens……………………………………………………………………141

Table 2. List of differentially abundant proteins between TNBE and BNE……….. 153

Table 3. Representative enriched, functional clusters with corresponding

GO terms for differentially expressed proteins identified by DAVID……………. 155

Table 4. List of the canonical pathways found to be overrepresented in

TNBE samples. ..…………………………………………………………………….156

Table 5. List of the canonical pathways found to be overrepresented

in NBE samples………………………………………………………………………158

Chapter 4

Table 1. Repeatability of peak area measurements for 20 glycoforms on r-αhCG….186

Table 2. Summary table N-linked glycans in r-αhCG……………………………….194

Table 3. Abundance of individual glycopeptides……………………………………200

Table 4. List of theoretical and observed glycoforms ………………………………204

Table 5. Abundance of r- hCG glycoforms produced in CHO cells ………………216

16

LIST OF ABBREVIATIONS AND CONVENTIONS

2D GE Two-dimensional gel electrophoresis

2-AB 2-amino benzamide

CE Capillary electrophoresis

CID Collision Induced Dissociation

CPAS Computational proteomics analysis system

CTC Circulating tumor cells

CZE Capillary zone electrophoresis

DAVID Database for annotation, visualization and integrated discovery

DTA Sequest data files

DTT dithiothreitol

EIE Extracted ion electropherograms

EOF Electroosmotic flow

ESI Electrospray ionization

FASP Filter-aided sample preparation

FDR False discovery rate

FFPE Formalin-fixed paraffin-embedded

FTICR Fourier Transform Ion Cyclotron Resonance

GO Gene ontology

GSEA Gene set enrichment analyses

HILIC Hydrophilic interaction liquid chomatography

17

IAA Iodoacetamide

ICAT Isotope-Coded Affinity Tag

INV Invasive

IPG Immobilized pH gradient

IPI International Protein Index

IR Infra red

IT Ion Trap

iTRAQ Isobaric tags for relative and absolute quantitation

LCM Laser capture microdissection

LMM Laser Microbeam microdissection

LTQ Linear Ion Trap

MALDI Matrix-assisted laser desorption/ionization

MBE Invasive malignant breast epithelial

MCM Minichromosomal maintenance

MET Metastatic

MS Mass spectrometry

NBE Normal breast epithelial

NBE Non-cancerous breast epithelial

NCBI National Center for Biotechnology Information

PALM Pressure assisted Laser microdissection

PGC Porous graphitic carbon

PLOT Porous-layer open-tabular

18

ppb Parts per billion

ppm Parts per million

PRLC Reverse-phase liquid chromatography

PS-DVB Poly Styrene- Divinyl benzene

r-αhCG Recombinant human chorionic gonadotrophin

SCX Strong Cation Exchange

SDS-PAGE Sodium dodecyl sulfate polyacrylamide gel electrophoresis

SILAC Stable isotope labelling by amino acids in cell culture

SPE Solid phase extraction

SpI Spectral index

TNBC Triple negative breast cancer

TNBE Triple negative malignant breast epithelial

TOF Time-of-flight

UV Ultraviolet

Xcorr Cross-correlation score

19

Chapter 1: Overview of Technologies and Methodologies for Proteomics Analysis

20

1.1 Introduction

1.1.1 Proteomics: An Overview

Proteomics[1] offers a complementary approach to genomic technologies by

investigating biological phenomena on the global protein level. The emergence of

mass spectrometric-based proteomic technologies has advanced our understanding of

the complexity and dynamic nature of proteomes, at the same time revealing that no

„one-size-fits-all‟ proteomic strategy can be used to solve all biological problems. Two

technologies have been responsible for the recent, rapid advance of proteomics : first,

the development of new strategies for peptide sequencing using mass spectrometry,

including soft ionization techniques, such as electrospray ionization (ESI) and

matrix-assisted laser desorption/ionization (MALDI); and second, the miniaturization

and automation of liquid chromatography. However, the high expectations on the

potential of proteomics have been slowed with the discovery of huge molecular

complexity and dynamic nature of the proteome, introducing difficulties greater than

those encountered for either genome or transcriptome studies. In particular,

complexities related to splice variants, post-translational modifications (PTM) ,

dynamic ranges covering ten orders of magnitude or more of protein abundance in

plasma, protein stability and dependence on cell type or physiological state have

challenged our ability to characterize proteomes comprehensively in a reasonable time

[2,3,4].

Despite the above challenges, proteomic technologies have already

significantly contributed to the life sciences and are today an integral part of biological

21

research efforts. Currently, the field of proteomics covers diverse research topics such

as, protein expression profiling, analysis of signaling pathways, and protein biomarker

discovery, among others [4]. It is important to be aware that within each area, unique

proteomic approaches need to be applied; these approaches differ widely in their

requirement of skills, difficulty and expense. Based on the objectives, the proteomic

experiments are categorized into either discovery or assay. Proteomic assay

experiments investigate a quantitative change in a small, predefined set of proteins or

peptides, whereas discovery experiments focus on the analysis of large, unbiased sets

of proteins. The measurement of cardiac troponins in human plasma samples is one

such example of an assay experiment [3,4]. An example of the discovery proteomic

experiment is the Human Proteome Organization Plasma Project, which aims to

catalog all proteins and peptides in the human plasma.

The discovery proteomics experiments are divided into comprehensive, broad scale or

focused approaches because these distinctions determine how a biological question is

approached technically. The comprehensive approaches aim at enumerating as many

components of a biological system as possible [5]. Next, broad-scale experiments

target a selected fraction of the expressed proteome, for example, the

phosphoproteome, glycoproteome, etc. The comprehensive and broad-scale

experiments are used to profile qualitative and quantitative changes in the system

taking place as a result of perturbation to a biological system or differences in genetic

background [6,7]. Whereas focused approaches, such as identification of components

of a protein complex, involve co-purification of relatively few interacting proteins and

their analysis, here, the aim is to identify the components of multiprotein complexes

22

and their interaction mechanisms in order to understand physiological and pathogenic

processes. Once components of multiprotein complexes are determined, they are

further monitored using the assay methods to develop therapies [8].

Characterization of a single protein that is isolated from natural or recombinant

sources involves determination of its mass, identity, post-translational modifications

and purity. The comprehensive characterization task draws on decades of experience

in protein chemistry [4]. Figure 1.1 presents a diagram of the various components of

proteomics discovery and assay.

Figure 1.1 Conceptual organization of proteomic experiments. Reprinted from

reference [4].

23

1.2 Shotgun Proteomics Methodologies

Figure 1.2 Human islet protein reference map. The proteins were loaded onto an IPG

strip (pH 3-10) and subsequently separated by mass on a gradient (8-12%) SDS-PAGE

gel. Reprinted from reference [9].

The combination of two dimensional gel electrophoresis and mass

spectrometry (2DE-MS) has traditionally been used to determine changes in protein

identity and protein abundance in a complex protein mixture [10]. Using this

combination, a protein mixture is first separated based on isoelectric point and then by

molecular weight to almost single protein spots, therefore this strategy is sometimes

called the “single protein” method [11]. To identify individual proteins separated by

2DE, the excised gel pieces are subjected to in-gel digestion and subsequent analysis

using tandem mass spectrometry. As this method provides very high resolution, the

visible image of a stained 2D gel is used to observe changes in protein abundance,

http://pubs.acs.org/action/showImage?doi=10.1021/pr050024a&iName=master.img-001.jpg&type=master

24

protein isoform and protein modification [9]. Figure 1.2 shows an example of a

complex 2D gel pattern from a proteome. While powerful, the method is difficult to

automate, is slow to operate and does not work well with highly hydrophobic proteins

[12].

In the past few years shotgun proteomics, introduced by Yates et al. (10) has

replaced conventional 2DE-MS (2-dimensional gel electrophoresis- mass

spectrometry) due to its inherent high throughput capability and its ability to detect

and quantitate more proteins than 2D gel electrophoresis. Shotgun proteomics is a

method, in which the total proteome is digested to peptides, and the resulting highly

complex peptide mixture is separated by one-dimensional or 2- dimensional liquid

chromatography coupled to mass spectrometry (MS). The method consists of four

steps: sample preparation, liquid chromatography, MS and data processing. The results

are interpreted using bioinformatics tools that are rapidly developing [13]. The sample

preparation for proteomic analysis involves multiple steps such as protein extraction,

enrichment, digestion and peptide clean-up. The sample preparation step extracts the

proteins from the biological specimen such as blood, cell lines or tissues. The

extracted protein mixture may be further fractionated to reduce the protein complexity

using chromatographic, electrophoretic or affinity purification procedures. To facilitate

their identification, the proteins are digested with highly specific proteolytic enzymes,

such as trypsin, to generate fragments of suitable mass for MS detection. The digested

peptides are subsequently separated using high performance liquid chromatography

coupled to ESI or MALDI mass spectrometry. Both precursor mass and MS/MS

fragmentation spectra can be used to determine and quantitate the peptides. Generally,

25

the tandem mass spectra, which provide peptide sequence data based on MS/MS

fragmentation patterns, are searched against a specific protein database (e.g. NCBI and

Swiss-Prot[14]) using various algorithms (e.g., Mascot[15] or SEQUEST [16]) to

determine protein identity. The advantage of shotgun proteomics over the 2DE

approach is that the former can analyze hydrophobic membrane proteins as well as

proteins with a broad range of pI or size. In addition, the protein dynamic range which

shotgun method covers can be higher than that covered by the 2DE method [17].

1.2.1 Samples

Cancer is one of the leading causes of death worldwide. In order to develop

treatment for cancer, protein biomarkers, which can be an (1) indicator of presence of

disease, (2) disease reduction or progression, and (3) response to the treatment, are

highly desired. During biomarker discovery proteomic experiments, a variety of

sample sources can be used, such as cell lines, tissues and body fluids.

1.2.1.1 In Vitro Sample Source: Cell lines

Cell lines are routinely used in proteomic studies as they may be easily

manipulated with different chemical additives or physical conditions. Because the

population of cells can be large (as many as 100,000,000 cells), there are no

limitations with respect to the amount of sample available. Cancer cell lines are

extensively studied using quantitative proteomics for:

1) identification of differentially abundant proteins between diseased and normal cells of

the same type,

26

2) identification of pathways associated with specific phenotype. e.g., cancer progression,

3) drug resistance studies, and proteins secreted by cancer cell lines for potential

biomarker discovery [18].

One must, however, always keep in mind that a cell line is a model system that may

or may not represent the in vivo condition [19].

1.2.1.2 In Vivo Sample Sources

Biofluids

In contrast to cell lines, body fluids such as serum[20], plasma[21], saliva[22],

urine, nipple aspirate, cervical –vaginal fluid[23] and exhaled breath condensate[24]

closely represent the in-vivo biological events. Compared to biopsied samples, the

biofluids are easy to collect at low cost using less invasive methods [25,26]. Among

the body fluids, blood, the most common human sample used in diagnosis, is often the

focus for the discovery of protein biomarkers for disease [26,27]. However, the

challenges with analysis of serum or plasma are high complexity of proteome with a

wide dynamic range (at least 10 orders of magnitude[28]) and anticipated low relative

abundance of many disease-specific biomarkers.

Compared to blood, proximal fluids, a body fluid which is close to or in direct

contact with the site of disease, can be an attractive alternative sample type for

biomarker discovery. The proteins or peptides secreted, shed or leaked from diseased

tissue, are likely to be enriched in proximal fluids with respect to both blood and

disease-free control fluid of the same type[29]. The examples of proximal fluids are

urine for bladder and kidney disease, nipple aspirate or ductal lavage for breast cancer,

27

and cerebrospinal fluid for intracranial processes[30]. Evidence of marker enrichment

in proximal fluids was demonstrated with a study of ovarian cancer, where both

ovarian cyst fluid and ascites fluid constituted proximal fluid[31].

Tissue Samples

Compared to blood and proximal fluids, analysis of tissue offers several

important advantages. 1) During the biomarker discovery on tissue samples, the

proteins are studied in their surroundings. 2) The possibility of identifying potential

biomarkers is highest in damaged/diseased tissues as they are likely to be concentrated

in those tissues. Therefore, it makes sense to look for markers in tissue samples due to

their higher concentration and relatively narrower dynamic range of proteins. To

perform the discovery studies, tissue samples can be used either from animal models

or from human biopsied samples. Mouse [32-34] and rat [35,36] are two of the most

widely used animal models for proteomic research, though human biopsies are the

most appropriate samples to study human diseases. However, human biopsied samples

are not as easily available as tissue samples from animal models, and controlled

experiments are clearly much easier to perform on animal models. The biopsied

samples require extra care during their processing and storage. That is, the tissue

specimens are frozen immediately after their excision and stored at -80ºC.

Conventionally, in order to preserve all the biopsied samples and to maintain their

morphology, the samples are fixed in formalin and embedded in paraffin[37]. The

formalin fixation causes cross-linking of the proteins, and the paraffin limits water

contact.

28

Huge collections of formalin fixed and paraffin embedded biopsied samples, are

preserved and last many years [38]. Such samples have a well documented clinical

history of individual patients and are available for prospective analysis. To perform

proteomic analysis of FFPE samples, decross-linking and efficient extraction of

proteins are necessary. To remove paraffin from FFPE tissue blocks, the bocks are

treated with xylene. Further, formalin fixed tissue blocks are boiled in a solution

containing metal ions. This procedure is termed heat induced antigen retrieval [39,40].

High temperature, above 90ºC, is found to be essential to decross-link methylene

bridges between the proteins. The studies performed on FFPE samples, in order to

obtain the comprehensive proteome, use two different approaches. The first is

extraction of intact proteins with SDS and high temperature [41-43]. The

commercialized product, called Qproteome FFPE tissue kit (QIAGEN, Germantown,

MD), uses proprietary chemistry to extract full length proteins for subsequent analysis.

The second approach, a novel approach, is to perform in-solution enzymatic digestion

on FFPE samples, after heat induced decross-linking, to directly obtain peptides for

shotgun proteomic analysis[25]. The commercialized product, based on the later

extraction principle, is called Liquid Tissue-MS protein prep kit (Expression

Pathology, Inc. Rockville, MD).

1.2.2 Tissue Microdissection

The microenvironment of a tumor tissue sample is highly heterogeneous[44].

The pathologist identifies malignant cells based on their differential staining and

morphology. The malignant cells are surrounded by normal-related and other types of

29

cells in the tissue matrix. In order to perform a detailed study of biopsied samples to

gain information on proteomic changes between malignant and normal cells, the

malignant cells need to be separated into a homogeneous population. Tissue

microdissection is an indispensible tool to enrich distinct cell types from

heterogeneous tissue matrix in an efficient and accurate manner. Before the advent of

microdissection, fluorescence-activated cell sorting (flow cytometry)[45] and

magnetic-bead based cell sorting[46] were the methods of choice for cell separation.

However, these methods employed enzymes for breakage of tissue structure, which

may alter or modify the cellular constituents in a number of ways. Microdissection

techniques have the advantage over cleavage that they allow selection of individual

cells under the microscopic inspection of the intact tissue.

Microdissection techniques can be classified into two major classes, manual

microdissection and laser assisted microdissection. Early efforts to dissect specific

cell types from tissue sections used sharp tools such as scalpel blades and needles [47].

The other manual dissection technique called “negative ablation”, as the name

suggests, destroys the unwanted cells surrounding cells of interest and collects the non

ablated cells using the needle [48].

Though, manual microdissection techniques were useful in obtaining a

homogeneous cell population, these methods were slow, tedious and required

considerable expertise to perform. In addition, the manual microdissection techniques

suffered due to issues such as sample handling and contamination. To address these

issues and to perform fast, clean and accurate microdissection, laser based-

30

microdissection technology which includes laser capture microdissection and laser

microbeam microdissection, was developed [49]. Over the years, this technique has

proved to be effective, as more than one thousand research articles have been

presented on the samples procured using this technique [50] .

1.2.2.1 Laser Capture Microdissection

Laser Capture Microdissection (LCM) is a laser based cell procurement

method that was developed in mid 1990s by Emmert-Buck, Liotta and colleagues at

the National Institute of Health (NIH) and designed to perform fast and accurate

microdissection of tissue samples [49,51]. The earliest design of the LCM system was

commercialized by Arcturus Biosciences Inc. (now part of Applied Biosciences), and

later on Leica and PALM introduced a non-contact based LCM system based on a

technique called laser microbeam microdissection.

31

Figure 1.3. The principles of laser capture microdissection (LCM). (a) The scheme of

LCM. (b) Comparison of properly melted polymer spots and poor spots. Only cell

lying within the dark ring of melted polymer will be targeted for LCM. (c) Physical

forces involved in LCM. (d) A single cell bound to the thermolabile polymer.

Reprinted from reference [52].

The principles of contact based LCM technology are shown in Figure 1.3. In

brief, the tissue specimens are first processed by sectioning and staining, and then

examined to identify cells of interest based on their staining and morphology. To

selectively capture cells of interest from the tissue sections, an LCM cap with a

thermolabile polymer membrane is placed on the tissue section. An infrared (IR) laser

is focused through the transparent cap material, heating and melting the membrane,

and thus causing the targeted cells to adhere to the membrane. The cells of interest are

then dissected by lifting the LCM cap away from tissue section. The thickness of the

32

tissue sections (5-15 µm) used for microdissection is critical from an operational point

of view. The tissue thickness

33

Pathology introduced “DIRECTOR” Microdissection slides, which are based on Laser

Induced Forward Transfer (LIFT) Technology utilizing a thin layer energy transfer

coating. Laser energy is transferred to the coating and thus results in evaporation of

the coating. The evaporation of the transfer coating causes the selected feature of the

tissue section to fall into collection tube.

1.2.2.3 Comparison of LCM and LMM

Using the older version of Arcturus LCM instrument, any material adhering to

the LCM cap was collected. This type of nonspecific collection of loose material from

tissue specimen is a potential source of contamination. To overcome this issue,

Arcturus introduced a newer design of LCM caps in which the cap remains slightly

away from the tissue specimen, allowing collection of only cells which are in contact

with the melted thermolabile membrane. In addition, to avoid contamination, such as

keratin and loose tissue material, sticky “prep strips” can be used [12]. In contrast to

LCM, the primary source of contamination in LMM is fine tissue material resulting

from laser ablation of the edges of targeted cells/tissue areas [58].

In case of LCM, the tissue preparation procedure which includes slide

selection, tissue staining and dehydration, and microscopic evaluation of tissue

specimen has to be strictly followed in order to obtain effective microdissection.

Whereas, in case of LMM, the tissue preparation is less complicated than LCM, and

parameter such as tissue thickness is more flexible.

The LCM, contact-based microdissection technique, is advantageous compared

to LMM, since the cells collected on the thermolabile membrane can be easily viewed

34

under the microscope for their homogeneity. In LMM, a thin polyethylene naphthalate

(PEN) membrane is required between the glass slide and the tissue section; otherwise,

the catapulted cells might pulverize to debris in the collection tube. Thus the collected

cells remain relatively intact and can be visualized.

One example of an application of LCM is to investigate the molecular basis of breast

tumor formation. This disease is not clearly understood due to difficulties encountered

while studying the early stages of disease progression. The breast cancer progression is

a multistep process, involving the premalignant stage of atypical ductal hyperplasia

(ADH), the preinvasive stage of ductal carcinoma in situ (DCIS), and the potentially

lethal stage of invasive ductal carcinoma (IDC)[59]. The obstacles in studying breast

cancer disease lesions are complexity and heterogeneity of tissue and microscopic size

(

35

a single protein, this approach is suitable as it requires minimal sample preparation.

Chapter 4 of this thesis describes glycoform profiling of recombinant α-human

chorionic gonadotropin (α-hCG) using high resolution capillary electrophoresis

coupled with high mass resolution FT-MS [61].

In the bottom-up approach, there are two ways to convert proteins extracted

from biological specimens to peptides which are suitable for mass-spectrometry (MS)

based proteome analysis. The first solubilizes the proteins with detergents and

separates the proteins by sodium dodecyl sulfate (SDS) polyacrylamide gel

electrophoresis. The proteins trapped by the gel are subjected to enzymatic digestion,

i.e., “in-gel” digestion. The second sample preparation method is detergent-free, as it

uses strong chaotropic reagents such urea and thiourea for cell lysis, protein extraction

and solubilization. The enzymatic digestion of the proteins in the presence of

denaturing reagents is termed “in-solution” digestion.

The in-gel digestion method is advantageous over in-solution digestion due to

the absence of most impurities which could interfere with digestion; however, the gel

may limit peptide recovery. On the other hand, in-solution digestion can be more

readily automatable and can minimize losses associated with sample handling.

However, the use of chaotropes may result in incomplete solubilization of the

proteome, and digestion may be impeded by interfering substances.

36

1.2.3.1 SDS-Polyacrylamide Gel Electrophoresis (SDS-PAGE)

The mobility of proteins during gel-electrophoresis depends upon the following

factors:

1) electric field strength,

2) total charge on the molecule,

3) size and shape of the molecule and

4) ionic strength of the buffer and properties of the gel matrix through which the

molecules are migrating.

The polyacrylamide gel matrix is in extensive use for protein prefractionation

[62]. Gel matrices act like a molecular sieve, and their sieving function depends on the

mesh size of the gel. The polyacrylamide gels are synthesized by the polymerization of

acrylamide monomers into long chains and the reaction of these chains with

bifunctional compounds such as N, N-methylene-bisacrylamide (bis) to form a sieve

like structure. The mesh size of the gel is determined by the concentration of

acrylamide and bisacrylamide (%T and %C).

%T=concentration of total monomer

%C=concentration of cross linker (as a percentage of the total monomer)

The higher the concentration of monomer (%T), the smaller the mesh size of the gel

[63].

Gel electrophoresis is performed under either continuous or discontinuous buffer

conditions. The running buffer and gel buffer are same in the continuous buffer

system; whereas the discontinuous buffer system has different gel and running buffers.

The gel system contains two gel layers, the stacking and separating layer.

37

Electrophoresis with a discontinuous buffer system provides sample concentration and

higher resolution. SDS-PAGE is performed under denaturing conditions, where the

detergent denatures and opens the protein by wrapping around the peptide backbone of

the protein. SDS binds to the protein approximately at a ratio of 1:1.4. The highly

negative SDS-protein complexes are separated on the gel based on their molecular

weight rather than their charge, as protein acquires net negative charge which is

proportional to the length of the protein. The electrophoretic mobility of the proteins

through the gel is inversely proportional to the logarithm of the protein molecular

weight[64].

Prefractionation of samples is required in proteomics, and gel electrophoresis is

a versatile and reliable method to achieve such prefractionation. The discontinuous

buffer system is frequently used as it provides higher protein resolution compared to

continuous buffer system. The discontinuous buffer system offers the ability to

manipulate buffer systems to achieve “steady-state-stacking” or “isotacophoresis”

which is responsible for focusing of the proteins before their separation by PAGE.

Though the separation of proteins in SDS-PAGE is primary based on the molecular

weight, the molecular weight range that can be preferentially resolved depends upon

the gel composition, buffer system used and the pH of the buffer system. The presence

of post-translational modification, such as glycosylation on the protein, results in

anomalous migration of the glycoprotein on SDS-PAGE. This anomalous

electrophoretic migration of glycoproteins, resulting in inaccurate molecular weight

determination, is due to little or no SDS binding of the sugar moieties.

38

1.2.4 Separation Techniques

Peptide mass spectrometry (shotgun proteomics) identifies proteins by

measuring mass-to-charge ratios of peptides and their fragments in the MS spectra. In

order to perform unambiguous identification of proteins and to achieve deep proteome

coverage, mass-spectrometry is highly dependent on separation to reduce the very

complex samples prior to their analysis. This facilitates the identification of low-

abundant species that would otherwise be overshadowed by the high abundant species,

i.e., increase the dynamic range.

1.2.4.1 High Pressure Liquid Chromatography

High-pressure liquid chromatography (HPLC) is often directly coupled to mass

spectrometric instruments with electrospray ionization (ESI) source. The continuous

separation of analytes using HPLC is physically compatible with an electrospray

ionization source. Due to efficient coupling of HPLC and ESI source, the combination

has become a standard sample introduction setup for peptide analysis. The most

commonly used chromatographic materials for separation of analytes are: ion

exchange (IEX), reverse phase, hydrophilic interaction chromatography (HILIC),

affinity, and hybrid materials.

Reverse phase liquid chromatography (RPLC or RP) separates analytes based

on their hydrophobicity, and a significant advantage of RPLC, when coupled with

mass spectrometer, is that the buffers used are generally compatible with ESI. The use

of acidic pH and organic solvents (acetonitrile and methanol) are conducive for

analysis of peptides by ESI-MS. Due to its high resolution, efficiency, reproducibility,

39

and mobile phase compatibility with ESI-MS, RPLC has emerged as a preferred

separation phase for the analysis of proteins and peptides. Over the years, significant

efforts have been made to increase peak capacity, sensitivity, reproducibility, and

analysis speed of reverse phase chromatography. It has been observed that packing

long, narrow capillary RP columns results into significant improvement in loading

capacity, sensitivity, and dynamic range of the RPLC. Shen et al. have reported use of

50 µm i.d. 40-200 cm long, small-particle-size (1.4 μm) RPLC columns with high

peak capacity (1000-1500, compared with an average of 400) operated in an ultrahigh

pressure regime (20 kpsi) for proteomic and metabolomics analysis[65]. The use of a

small diameter particle stationary phase (1.7 μm diameter) contributes significantly

towards the efficiency of the separation. The efficiency is inversely proportional to

the size of the particles used for packing the column. However, columns packed with

small diameter particles exhibit high back pressure, high pressure pumps (up to 15,000

psi) are required for their operation [66].

Multidimensional separation is a common way to increase the peak capacity of

chromatographic analysis. This approach combines several separation techniques, such

as ion exchange, high pH reverse phase separation, low pH reverse phase separation

and so forth, to improve the resolving power. For effective performance of

multidimensional separation, the individual separation methods should be as

orthogonal as possible to other methods in which each dimension utilizes different

molecular properties as a basis of separation. One of the first and most practiced two

dimensional setups is combination of strong cation exchange (SCX) chromatography

with reverse phase chromatography known as multidimensional protein identification

40

technology (MudPIT [67]). In this multidimensional separation, a highly complex

peptide mixture is loaded onto an SCX column and eluted in a series of steps with

increasing salt concentration. Each fraction is transferred onto an RP column either

off-line or directly, and peptides are further separated and eluted into the MS.

1.2.5 Mass Spectrometry

Mass spectrometry usually involves three parts: ion source and optics, mass analyzer

and data processing software.

1.2.5.1 Ionization Methods

A rapid growth in mass spectrometry based proteomic analysis can be

attributed to major contributions of experimental methods, instrumentation and data

analysis. Among the most important developments in mass spectrometry related

instrumentation is the invention of soft ionization methods i.e. matrix assisted laser

desorption ionization (MALDI) and electrospray ionization (ESI), allowing peptides

and proteins to be directly analyzed by MS.

MALDI

MALDI functions just as its name suggests: the matrix assists in desorption

and ionization of ions. In this type of ionization technique, the incident laser energy is

absorbed by the matrix and transferred to the acidified analyte. The rapid laser heating

results in desorption of matrix and positively charged analyte into the gas phase.

Singly charged ions are predominantly generated by MALDI, which makes it

applicable for top-down analysis of high-molecular weight proteins [68]. However,

41

low shot-to shot reproducibility and strong dependence on sample preparation are the

drawbacks of this technique. MALDI-TOFMS is suitable for high throughput analysis.

However, the high ionization energy can be detrimental in the analysis of

compounds with labile modifications [69].

Figure 1.4 Common matrices used in MALDI mass spectrometry. Reprinted from

reference [69].

ESI

ESI, unlike MALDI , generates ions from solution. Electrospray ionization is

created by application of high voltage between the emitter end of the separation

column and the inlet of the mass spectrometer [68]. Physicochemical processes of ESI

involve formation of a Taylor cone, i.e. an electrically charged spray of liquid eluting

from the separation column, followed by generation and desolvation of eluent droplets.

The unique feature of ESI compared to other ionization methods is its ability to

produce multiply charged ions from high molecular weight biological molecules like

42

proteins, which enables the analysis of these molecules with instrument having a small

mass to charge range (400-2000 m/z). A most important development in ESI

technique, which led to the sensitive proteomic analysis, is known as nano-ESI. In

Chapters 2 and 3 of this dissertation, nano-ESI, operated at 20 nL/min, is a primary

technique used for the analysis of 10,000 laser captured microdissected breast cancer

cells. The diagram of the ESI process is discussed in the PLOT related section.

1.2.5.2 Mass Analyzers

Ion Trap

As the name suggests an ion-trap mass spectrometer works by trapping the ions

in a vacuum. The ion trap functions by repeating the steps of ion collection, ion

storage and ejection of ions from the ion trap as flow from the LC column occurs. The

unique feature of ion-trap lies in its ability to isolate and fragment peptide ions from

complex mixtures, this operation is called tandem MS. Due to their fast scan rates,

MSn scans, high sensitivity, high-duty cycle, high ion storage capacity (compared to

2D and 3D traps), reasonable resolution and mass accuracy, linear ion traps (e.g. LTQ,

Thermo Fisher) are considered as the high-throughput workhorses in proteomic

research. Therefore, for our initial development work, as mentioned in Chapters 2 and

3, we employed LTQ-MS for bottom-up 10 µm i.d. Porous Layer Open Tubular

(PLOT) LC-MS analysis of 10,000 LCM cells. Furthermore, the LTQ is coupled with

Orbitrap and FTICR as the front end of hybrid MS instruments to perform ion

trapping, ion selection and high resolution ion analysis.

Mass spectrometry has been extensively used for determination of molecular masses

of the intact proteins. Among the mass spectrometric techniques, the ESI- high mass

43

accuracy MS is preferred as ESI generated multiply charged ions fall in the m/z range

of most mass spectrometers. A variety of mass spectrometers can be used for this

purpose; including ion trap (IT), orthogonal time-of-flight, time-of-flight and Fourier

transform ion cyclotron (FTICR) and Orbitrap instruments. However, mass

spectrometers such as ion traps are not suitable for this purpose due to their low

resolving power at full scan speed. However, the mass spectrometers such as TOF,

FTICR and Orbitrap, due to their high mass resolution and high mass accuracy, have

become the preferred instruments for accurate mass determination of intact proteins.

Quadrupole -Time of Flight Mass Spectrometer

Time-of-flight mass spectrometry (TOFMS) determines the mass-to-charge

ratio of the ions using a time measurement. Ions are accelerated in the flight tube by an

electric field. This acceleration provides the same kinetic energy to all the ions bearing

the same charge. The velocity gained by the ion due to acceleration depends on the

mass-to-charge ratio. Then, the time that an ion takes to travel to the detector is

measured. The heavier ions will take longer time to reach the detector compared to

lighter ones. Based on the flight time of the ion and the known experimental

parameters, the mass-to-charge ratio of the ion can be determined.

Fourier Transform Ion Cyclotron Resonance (FTICR)

FTICR mass analyzer determines the mass to charge ratio of the ions based on

their cyclotron frequency under the influence of constant magnetic field. In the ICR

mass analyzer, the ions are stored in a Penning trap under the influence of constant

magnetic and electric fields. The ions are excited to a larger cyclotron radius by an

44

oscillating electric field perpendicular to the magnetic field. The energy applied to the

ions in ICR cell can be tuned to excite, dissociate and eject ions. The detector plates on

the opposite sides of the trap measures the cyclotron frequency of all the ions

simultaneously and with the help of Fourier transform convert these frequencies into

m/z values (Figure 1.5). FTICR is a very high mass resolution technique contributing

to accurate mass measurement[70]. The high mass resolution and high mass accuracy

of the FTICR is due to following reasons. 1) The mass of the ion is calculated from the

measurement of cyclotron frequency, a parameter that is more precisely measurable

than any other parameter. 2) The ion cyclotron frequency is defined by the magnetic

field. The better the time stability of the magnetic field (1 ppb/hour) compared to time

stability of rf voltage (100 ppb/hour) results in a superior mass precision. 3) In the

spatially uniform magnetic field, the cyclotron frequency of an ion is independent of

the ion speed. 4) In order to attain high mass precision, ICR, unlike ion-beam-based

mass measurement, does not require the use of narrow slits [71].

45

Figure1.5 Operational principle of the FTICR. Reprinted from reference [69].

Among the many applications of the FTICR, the high resolving power of

FTICR is useful for the study of large macromolecules such as proteins with several

multiple charges generated by electrospray ionization. The FTICR instrument provides

mass resolution in the range of 50,000-750,000 and mass accuracy of less than 2 ppm.

However, FTICR suffers due to relatively slow acquisition speed and low sensitivity

of analysis. In order to obtain high sensitivity and improved acquisition time, we

acquired MS scans over a limited mass window, corresponding to m/z values of the 9+

charge state of intact alpha-human chorionic gonadotropin (Chapter 4).

46

Orbitrap

In 1999, Markov invented a new type of mass analyzer called the Orbitrap [72]

which was applied for proteomic research in 2005[73]. Among the high mass

resolution FTMS instruments, the Orbitrap superceded the FT-ICR due to low cost of

operation, while providing equivalent high mass accuracy. The Orbitrap consist of two

electrodes, an outer barrel-like electrode and a coaxial inner spindle-like electrode with

an electrostatic field formed between them (Fig.1.6). The ions are tangentially injected

in the gap between the two electrodes and made to rotate around the inner electrode

due to the electrostatic attraction by the inner electrode and the balancing centrifugal

forces. While cycling around the central axis, the ions move back and forth along the

central axis. The frequency of these harmonic oscillations is Fourier transformed to

determine the mass-to charge ratio of the ions. The Orbitrap offers a high resolving

power of roughly 50,000 and a mass accuracy of less than 2 ppm, with proper

standards. With an average acquisition speed of at least 6 MS/MS spectra per second

in parallel with a single high-resolution spectrum (60,000 resolution) significantly

improved protein coverage can be achieved.

47

Figure1.6 Cutaway view of the Orbitrap mass analyzer. Ions are injected into the

Orbitrap at a point (arrow) offset from its equator and perpendicular to the z-axis,

where they begin coherent axial oscillations without the need for any further

excitation. Reprinted from reference [69].

1.2.5.3 Database Searching Tools for Proteomics

Database searching plays an important role in large-scale proteomics. Database

searching tools enable the use of mass spectrometric data of peptides to identify

proteins in sequence databases. Two mass spectrometric- based database search

principles are mainly used for identification of proteins. The first method uses the

molecular weight fingerprint of the protein digest (peptides) obtained by a site-specific

protease [74,75], and the second method uses the tandem mass spectra obtained on the

individual peptides of a digested protein[16,76]. Since each tandem mass spectrum

stands as a unique and verifiable piece of data, the second method has the ability to

identify a wide range of proteins and thus provide a comprehensive approach to

handle complex protein mixtures[77].

Tandem Mass-Spectrometry and Data Processing

48

Figure 1.7 Low energy collision induced dissociation of peptide. Reprinted from

reference [78]

In tandem mass-spectrometry (MS/MS), the gas phase peptide ions undergo

fragmentation due to process such as collision-induced dissociation (CID. The gas

phase CID is the most widely used technique in tandem mass-spectrometry. The

dissociation pathways are exclusively dependent on the collision energy. The low

energy collisions (

49

Figure 1.8 Mobile Proton Theory. Reprinted from reference [84].

To explain the intensity patterns observed in the tandem mass spectra, a mobile proton

model has been proposed[83]. The mobile proton model states that to initiate backbone

cleavages for production of b and y ions, the protons are transferred intramolecularly

from basic side-chains to the heteroatoms along the backbone. Figure 1.8A shows that

the proton exists in equilibrium between all possible basic sites. The energy required

to mobilize the proton from a basic side-chain or from the amino terminus to the

peptide backbone depends on the amino acid composition of the peptide. Therefore,

the dissociation or the fragmentation energy for the peptides containing amino acids

having greater gas-phase basicity is higher compared to the peptides with amino acids

50

having lower gas-phase basicity. An example of a lysine- terminated peptide is shown

Figure 1.8B.

SEQUEST- Database Search Algorithm

Given the mass of the precursor ion (m/z of the peptide ion) and its fragment

ions, the goal of the database search algorithm is to determine peptide sequence and

protein identity. SEQUEST [16] is a database search program which uses a descriptive

model for peptide fragmentation and correlative matching to a tandem mass

spectrum[16]. To access the quality of the match between the experimental spectrum

and amino acid sequence from the database, SEQUEST applies a two-tiered scoring

scheme. It first calculates the empirically derived preliminary score (Sp) that restricts

the number of sequences to be analyzed in the correlation analysis. Sp is calculated by

summing the peak intensities of fragment ions as well as accounting for continuity of

the fragment ion series and the length of the amino acid sequence. The second and

decisive score is a cross-correlation score, referred to as XCorr, which correlates the

experimental and theoretical spectra. The theoretical spectrum is generated from the

predicted fragmentations, i.e. b- and y-ions for each of the sequence in the database.

The similarity between the theoretical and experimental spectra is evaluated based on

the cross-correlation of the two spectra. Apart from preliminary and cross-correlation

scores, SEQUEST calculates another important difference, ΔCn, the normalized

difference of XCorr values between the best matched sequence and each of the other

sequences. ΔCn is a useful indicator of the uniqueness of the match. If the value of

ΔCn is greater than 0.1, then the match is considered as reasonably unique to a

sequence. XCorr, which is not dependent of the database size, suggests the quality of

51

the match between the spectrum and sequence, whereas ΔCn, which is dependent on

the size of the database, indicates the quality of the match relative to near misses.

Label Free Quantitative Microproteomics

Currently, a number of stable isotope labeling approaches are in use for

„shotgun” quantitative proteomic analysis. The stable isotope labeling approaches

include Isotope-Coded Affinity Tag (ICAT), Stable Isotope Labeling by Amino Acids

in cell culture (SILAC), 15

N/14

N metabolic labeling, 18

O/16

O enzymatic labeling,

Isotope Coded Protein Labeling (ICPL), Tandem Mass Tags (TMT), Isobaric Tags for

Relative and Absolute Quantification (iTRAQ) and other chemical labeling[85,86].

These stable isotope labeling methods have offered valuable flexibility while using

quantitative proteomic methods to study protein abundance changes in complex

samples. However, most labeling based quantification methods are limited in their

application due to increased time and complexity of sample preparation, the

requirement of higher sample concentration, high cost of the reagents and incomplete

labeling. Therefore, for relative quantitation of small sample amounts, there is

increased interest in label-free approaches in order to achieve more sensitive and

simpler quantification results.

Label-free protein quantitation is generally based on two approaches. The first

involves the measurement of ion intensity changes such as peptide peak areas or peak

heights in chromatography (i.e. total or single ion analysis). The second approach is

based on spectral counting of the identified peptides after MS/MS analysis. Peptide

peak intensity and spectral counting are measured for individual LC-MS/MS runs, and

52

changes in protein abundance are determined by direct comparison between different

analyses.

Relative Quantitation by Peak Intensity

In this approach, relative quantitation of the peptides was achieved by direct

comparison of peak area of each peptide ion in multiple LC-MS datasets. However,

application of this method for determination of protein abundance changes in complex

biological samples had some practical limitations. The differences in the sample

preparation and sample injection, in addition to experimental changes in retention time

and m/z value, significantly influence the direct and accurate comparison of multiple

LC-MS datasets. Therefore, highly reproducible LC-MS performance and careful

chromatographic peak alignment are critical for the quantitation approach[87].

Relative Quantitation by Spectral Count

In the spectral counting approach, comparison of the number of identified

MS/MS spectra from the same proteins (spectral count) are compared between

multiple LC-MS/MS datasets. The increase in protein sequence coverage, the number

of identified unique peptides and the number of identified total MS/MS spectra

(spectral count) correspond with the increase in protein abundance. However from

these three factors of identification, only spectral count showed strong linear

correlation with relative protein abundance with a dynamic range over 2 orders of

magnitude. Therefore spectral counting is considered as a simple and reliable index

for relative protein quantification[88]. In comparison to peak intensity, which uses

computer algorithms for automatic LC-MS peak selection, alignment and comparison,

the spectral counting approach is much easier to implement.

53

However, for accurate and reliable detection of protein changes in complex

mixtures, normalization and statistical analysis of spectral counting databases is

necessary. One of the simple normalization methods, which accounts for the run to

run variability, uses total spectral counts[89]. Another approach to normalization

involving calculation of a normalized spectral abundance factor (NPAF) was

suggested to account for the effect of protein length on spectral count[90].

Zhang et al. compared five different statistical tests on spectral count data

collected by analysis of yeast digests to evaluate the significance of comparative

quantification by spectral counts[91]. These statistical tests were 1) Fisher‟s exact test,

2) goodness-of-fit test (G-test) 3) AC test, 4) Student‟s t-test and 5) Local-Pooled-error

(LPE) test. For datasets with three or more replicates, the Student‟s t-test was found to

be the best, whereas, in case of datasets with one or two replicates, the Fisher‟s exact

test, G-test and AC test can be used.

Relative quantitation by spectral count has been successfully applied for

different clinical applications[92], including analysis of normal and acute

inflammation, biomarker discovery in human saliva proteome in type-2 diabetes[93],

comparison of protein expression in mammalian and yeast cells under different culture

conditions, distinguishing normal and diseased lung cancer samples[94,95], discovery

of phosphotyrosine-binding proteins in mammalian cells and identification of

differential plasma membrane proteins in terminally differentiated mouse cell

lines[95].

Another label -free method, the spectral index, is used to analyze relative protein

abundances in large-scale data sets obtained from biological samples by shotgun

54

proteomics is called spectral index. The spectral index method is made up of two

biochemically plausible features i.e. 1) Spectral counts (indicative of relative protein

abundance and 2) the number of samples within a group with detectable peptides [96].

We used this method to assess differentially abundant proteins between 9 non-

cancerous, normal breast epithelial (NBE) samples and 9 estrogen receptor (ER)-

positive (luminal subtype), invasive malignant breast epithelial (MBE) samples [97].

However, for a low number of replicates of breast cancer samples (n=3), we used

spectral counting (PatternLab software [98]) for determination of differentially

abundant proteins between invasive breast cancer cells and metastatic breast cancer

cells (Chapter 2).

1.3 Microproteomics

Mass spectrometry-based proteomic methods are extensively used to study

global changes in protein expression caused due to pathological stimuli in an

organism. Current methods use sample total protein amounts in the range of

micrograms or milligrams [99] and extensive protein/peptide level separations in order

to achieve comprehensive proteomic analysis. However, in many cases, obtaining

these sample amounts can be practically impossible or challenging. There are several

reasons for low availability of sample amounts e.g. rarity of the sample itself,

collection of many thousand cells takes several hours or days using a technique such

as laser capture microdissection, multiple experiments on a homogeneous sample and

so forth.

55

One of the examples of such a rare/limited sample type is brain tissue specimen

related to neurodegenerative diseases such as Parkinson, Alzheimer, and Huntington

disease. These neurodegenerative diseases are characterized by selective degeneration

of particular types of neurons; while the tissue of rest of the brain is under normal

pathological state[100]. Researchers, trying to understand the causes behind these

diseases, are using laser capture microdissection (LCM) to selectively collect

degenerative neurons. However, obtaining even 10,000 to 50,000 neurons is

impractical because the degenerative neurons are limited in numbers[101]. Another

similar example of limited sample amount is malignant cells collected from a solid

tumor. Solid tumors are heterogeneous in composition, i.e. they are made up of a

subpopulation of cancer cells, along with stromal elements that collectively form a

microenvironment[41]. The subtypes of malignant cells differ among themselves in

many properties, such as production and expression of cell surface markers, sensitivity

to therapeutics, growth rate, etc. The studies aimed at determining the proteomic

changes in these individual cell types are limited due to the time and cost required to

collect large cell numbers using the LCM procedure. The proteomic analysis of

circulating tumor cells (CTCs), which can be an indicator of potential metastasis, is

thought to provide a noninvasive way of determining tumor metastasis or the impact of

treatment on the number of CTCs[102]. As the number of CTCs circulating in the

blood is very low, advances in proteomics are required to analyze them.

To accomplish microproteomics of clinically relevant and limited amounts of

sample, one must use a minimum number of steps in the proteomic platform, and each

of these steps must limit sample losses[99]. Considerable sample losses during sample

56

preparation and limited dynamic range of LC-MS/MS system are two main obstacles

in analyzing small protein amounts. In order to improve sample preparation, low

protein binding tubes and the use of MS–friendly acid labile detergents are

suggested[99]. The use of MS-friendly detergents results in shorter extraction and

digestion procedures.

One of the recent examples of low sample proteomics was the analysis of 500-

5,000 CTCs, generating proteomic profiles of ~150-650 proteins[103]. The cells were

lysed using NP-40 detergent, and the detergent was separated by precipitating the

proteins from cell lysate. The in-solution digest of these samples were subjected to

nanoflow LC/Q-TOF analysis. In an another approach to a small sample amount,

quantitative comparison of a proteome of LCM collected single pancreatic islets,

containing 2,000-4,000 cells, treated with high and low levels of glucose, was carried

out. The cells were lysed with acid labile detergent followed by in-solution digestion.

Sensitive LC-MS/MS analysis was performed using a low column flow rate and long

chromatographic separation time. In Chapter 2 and 3, we have presented a short run on

SDS-PAGE based sample handling step, followed by sensitive LC-MS analysis using

PLOT column.

1.3.1 Alternative strategies for protein digestion

1.3.1.1 Solvents based approach

In 2007, Veenstra et al. introduced a membrane protein digestion method with

60% methanol in place of chaotropes, as a membrane protein solubilizing solvent

during trypsin digestion[44]. In this approach, the plasma membrane protein

57

population was isolated from the human epidermis and dispersed in 50 mM

ammonium bicarbonate, pH 7.9. The proteins were reduced and alkylated using TCEP

and iodoacetamide (IAA), respectively. The membrane proteins were separated using

ultracentrifugation at 100,000 g. The protein pellet was further solubilized in 60% v/v

methanol in 50 mM ammonium bicarbonate. The proteins were digested by trypsin

(trypsin/protein ratio: 1/20) at 37˚C for 5 hours in the same solubilizing buffer. The

acidified digest was analyzed using two dimensional (SCX/RP) LC-MS.

This strategy was found to have advantages compared to detergent- and

chaotrope-based solubilization as 1) the same methanol based buffer conditions were

used for solubilization, denaturation and proteolysis, 2) sample dilution and dialysis

steps were completely eliminated, and these steps typically decrease solubilizing

capacity and subsequent proteolytic efficiency, 3) methanol and ammonium

bicarbonate , volatile water soluble compounds, are removable by lyophilization after

digestion, making the methanol-based buffer approach MS-friendly. Other solvents

such as acetonitrile and trifluroethanol are also used for solubilization and digestion of

membrane proteins.

1.3.1.2 Cleavable surfactant

The surfactants are stable and strong solubilizing agents. The environmental

concerns such as a low biodegradability rate of the surfactant, has become one of the

main driving forces for the development of cleavable surfactants. Although cleavable

surfactants were first synthesized many years ago, Norris et.al applied nonacid

cleavable detergents for MALDI mass spectrometry profiling of whole cells [104].

58

They showed that cleavable surfactant results in an increase in the number of proteins

analyzed by increasing protein solubility. Cl

Development of sensitive high performance analytical ...707/fulltext.pdf · Comprehensive...

Documents

Transcript of Development of sensitive high performance analytical ...707/fulltext.pdf · Comprehensive...

Biopharmaceutical Industry Speakers List - Home Page | …phrma-docs.phrma.org/.../biopharmaceutical-industry-speakers-list... · The Biopharmaceutical Industry Speakers List contains

Glycoproteins -3

15 glycoproteins _haemoproteins

Biopharmaceutical Characterisation Compendium · Biopharmaceutical Characterisation Compendium A complete toolbox of techniques, workflows and technologies for comprehensive biopharmaceutical

SYNTHESIS OF GLYCOPROTEINS. GLYCOPROTEINS Introduction.

Driving Biopharmaceutical Advancements

The application of proteomics tools for …715/fulltext.pdf · 1 THE APPLICATION OF PROTEOMICS TOOLS FOR CHARACTERIZATION OF BIOPHARMACEUTICAL PROCESSES A dissertation presented By

BIOPHARMACEUTICAL CAPABILITIES

Biopharmaceutical Report - IQVIA

Isolation and Characterization of Glycoproteins from … Isolation and Characterization of Glycoproteins from the Yeast Cryptococcus laurentii I. Cell-Wall Glycoproteins N. KOLÁROVA*,

Mucin glycoproteins block apoptosis; promote invasion ...

Structural Biology of Glycoproteins

Fasciola hepatica surface tegument: glycoproteins at the ... · Fasciola hepatica tegumental glycoproteins . 1 . Fasciola hepatica. surface tegument: glycoproteins at the interface

Fasciola hepatica surface tegument: glycoproteins …...Fasciola hepatica tegumental glycoproteins 1 Fasciola hepatica surface tegument: glycoproteins at the interface of parasite

Biopharmaceutical classification system

Biopharmaceutical Fellowship Program · 2020. 9. 11. · Biopharmaceutical Fellowship Program. The Biopharmaceutical Fellowship Program at Novartis Gene Therapies offers PharmD graduates

Biopharmaceutical Steam Products

Glycoproteins in - CORE

fulltext.pdf (877.7Kb)

BIOPHARMACEUTICAL - Rodger Industries