A dissertation submitted to - Northeastern Universitycj82ng16n/fulltext.pdfdrug product background....

Application of Liquid Chromatography-Mass Spectrometry-Based Protein and Proteomic

Analytical Approaches to Chinese Hamster Ovary Cell Based Industrial Biopharmaceutical

Production

by Yuanwei Gao

B.S. in Chemistry, Tsinghua University, Beijing, China

M.S. in Forensic Sciences, Sam Houston State University, Texas, U.S.

A dissertation submitted to

The Faculty of

the College of Science of

Northeastern University

in partial fulfillment of the requirements

for the degree of Doctor of Philosophy

June 28th , 2016

Dissertation directed by

Barry L. Karger

Director of the Barnett Institute, Distinguished Professor

James L. Waters Chair in Analytical Chemistry

ii

Acknowledgements

I would like to take this opportunity to express my sincere gratitude to the people who

helped me during my dissertation, although it is not possible to identify all of them.

First of all, I would like to thank my advisor Professor Barry L. Karger. I first learned a lot

from his wonderful class. We had worked together closely since I joined Dr. Karger’s group, and

it was a valuable experience to have his advice and support. Dr. Karger’s insightful guidance and

great scientific intuition have inspired me. I know for certain that what I learned from him will be

beneficial throughout the rest of my career.

I would like to express my gratitude to Dr. Alexander R. Ivanov because his comments in

scientific discussions and careful explanation of experimental details have not only helped my

research project but also my career development. His deep knowledge and encouragement have

been very valuable to me.

I also want to specifically thank Somak Ray and Simion Kreimer, two colleagues, who

worked very closely with me on my research projects, for their valuable scientific discussions and

contributions.

An acknowledgement also goes to my committee members: Professor Roger W. Giese,

Professor Olga Vitek, and Professor Paul Vouros, for their help in my graduate study.

I would like to thank all of the former and present people in Dr. Karger’s group as well as

the personnel in Barnett Institute: Arseniy Belov, Dr. Siyuan Liu, Dr. Xianzhe Wang, Yu Wang,

Di Wu, Yanjun Liu, Shan Jiang, Dr. Shiaw-Lin Billy Wu, Dr. David R. Bush, Dr. Krishan Kumar,

Dr. Daniel Shujia Dai, Dr. Vennela Mullangi, Dr. Wenqin Ni, Dr. Chen Li, Dr. Zhenke Liu, Dr.

Siyang Li, Dr. Adam Hall, Dr. James Glick, Dr. Suli Liu, Dr. Fan Zhang, Dr. Fangfei Yan, Dr. Ye

iii

Zhang, Victoria Berger, Yang Tang, Zhidan Chen, Nancy Carbone, and Emanuelle Hestermann. I

appreciate their help in both my daily life and scientific research and the friendships which I have

gained during these years.

I also would like to acknowledge my collaborators for their assistance during our

collaboration: Dr. Nicholas R. Abu-Absi, Dr. Michael C. Borys, Dr. Amanda Lewis, Dr. Jin Mi,

Dr. Zhengjian Li, Dr. Mesredin Mussa, Dr. Zizhou Xing, Dr. Zhijun Tan, and Dr. Li Tao from

Bristol-Mayer Squibb and Dr. Kristine Brazin and Professor Ellis Reinherz in Dana-Farber Cancer

Institute.

Finally, I would like to thank my family: my parents and grandparents for their

unconditional love, support and having faith in me.

iv

Abstract of Dissertation

Therapeutic proteins have emerged rapidly over the past several decades, providing

effective and innovative medicines for a wide range of previously refractory human diseases.

Chinese hamster ovary (CHO) cells have become the predominant choice as the cellular expression

system for such therapeutic production in the biopharmaceutical industry. The high throughput of

the protein drug production depends on both the efficient upstream process yielding high product

titers and proficient downstream purification with high product recovery and effective impurity

removal. Numerous efforts have been made at both of the up- and down-stream processes of CHO-

based manufacturing to improve productivity. Although advances have been achieved, many

challenges remain. The underlying biology of CHO cell productivity has not been fully understood

due to an incomplete biological picture, hampering the efforts of cell cultivation optimization.

Moreover, it is challenging to apply the results of cell cultivation development received from the

bench-top scale to large scale production bioreactors, since different behaviors of the CHO cell are

frequently observed with different bioreactor types and sizes. At the same time, efficient

downstream purification is also essential to ensure drug product quality. Considering the potential

safety risks to patients, the identification and quantitation of impurity residues in therapeutic

proteins, especially host cell proteins (HCP), is of great importance but challenging due to the bulk

drug product background. New analytical technologies and strategies which can be applied to the

therapeutic protein production process are needed.

Liquid chromatography-mass spectrometry (LC-MS)-based approaches are a powerful tool

for proteomics and protein analysis, capable of providing the most comprehensive information to

date. LC-MS analysis has been extending the depth and accuracy of proteomics study. Global cell

constituent analysis or ’Omics, including proteomics and metabolomics, can provide in depth

v

global characterization of CHO cells. A deeper understanding of CHO biology can potentially

improve the optimization of manufacturing bioprocesses. Moreover, LC-MS-based methods are

also a great candidate for HCP analysis.

This dissertation aims at adapting state-of-the art LC-MS-based protein and proteomic

approaches to the industrial biopharmaceutical processes, for the benefit of industrial therapeutic

drug production. In Chapter 1, the industrial therapeutic protein production platform is introduced

as well as the technology of LC-MS-based protein and proteomics analysis.

In Chapter 2, a study is presented where a CHO-DG44 production cell line showed

different phenotypic behaviors during the scaling-up process when cultured in the production scale

(5-KL scale) and bench-top scale (20-L) bioreactors with two copper levels in the culture media

for each scale. Relative quantitative proteomics based on high-resolution two dimensional liquid

chromatography coupled to tandem mass spectrometry (2D-LC-MS/MS) was applied. Multi-

omics including proteomics and metabolomics were employed to study CHO cell systems in order

to understand the phenotypic behavior. The results revealed that CHO cells underwent intermittent

hypoxia in the large production bioreactor due to the less efficient oxygen transfer and longer

mixing times compared to the bench-top scale. This resulted in lower productivity and viability

for the production scale.

In collaboration with Simion Kreimer, Ph.D. candidate in chemistry at Northeastern,

Chapter 3 describes a workflow of HCP analysis in a therapeutic monoclonal antibody, taking the

advantage of the high resolution capabilities of the Orbitrap mass spectrometer. A spectral library

was developed based on two-dimensional high pH/low pH reversed phase (RP/RP) liquid

chromatography coupled to tandem mass spectrometry (LC/MS/MS) with data dependent

acquisition (DDA). Then, a novel data independent acquisition-to- parallel reaction monitoring

vi

(DIA-to-PRM) approach was developed for HCP identification and quantitative estimation. The

methodology is demonstrated to be capable of detecting HCPs at the low ppm level in the bulk

product background after purification. Several HCPs were quantified with isotopically labeled

peptides as internal standards.

The studies described in this dissertation demonstrate the power of LC-MS-based

approaches to address biopharmaceutical industry needs, by studying CHO biology as well as

evaluating impurities in final product. In future studies, the discovery and method developed in

this thesis can be applied to improve biopharmaceutical productivity and quality.

vii

Table of Contents

Acknowledgements ......................................................................................................................... ii

Abstract of Dissertation ................................................................................................................. iv

List of Figures ................................................................................................................................ xi

List of Tables ............................................................................................................................... xiii

List of Abbreviations .................................................................................................................... xv

Chapter 1: Overview of Therapeutic Protein Production by Chinese Hamster Ovary Cells and

Liquid Chromatography Mass Spectrometry Based Quantitative Proteomics ............................... 1

1.1 Abstract ............................................................................................................................ 2

1.2 Overview of recombinant therapeutic protein production ............................................... 3

1.3 Principle of recombinant biopharmaceutical synthesis by mammalian cell expression

systems. ....................................................................................................................................... 5

1.4 CHO as a therapeutic protein production host. ................................................................ 9

1.4.1 Advantages of CHO expression system for commercial recombinant therapeutic

protein production ....................................................................................................................... 9

1.4.2 A brief history of CHO cell lines applied to biotech industry. ................................... 11

1.5 The general platform of therapeutic protein production by CHO cells .......................... 11

1.6 Industrial platform of therapeutic protein production by CHO cells ............................. 14

1.6.1 The platform of the upstream process ........................................................................ 14

1.6.2 Challenges of upstream process ................................................................................. 18

1.6.3 The platform of downstream process ......................................................................... 21

1.6.4 Challenges of downstream process............................................................................. 24

1.7 Current advances for CHO-based therapeutic protein production. ................................ 25

1.7.1 Understanding CHO cell production and CHO cell engineering through ‘Omics

approaches ................................................................................................................................. 25

viii

1.7.2 Current approaches of host cell protein identification and quantitation ..................... 29

1.8 Introduction of liquid chromatography mass spectrometry-based quantitative

proteomics and protein analysis ................................................................................................ 32

1.8.1 Two dimensional liquid chromatography ................................................................... 34

1.8.2 Mass spectrometry ...................................................................................................... 36

1.8.3 LC-MS based quantitative proteomics and protein analysis ...................................... 46

1.8.3.1 Label-free quantitation ............................................................................................ 47

1.8.3.2 Labeled quantitation approaches ............................................................................. 47

1.8.4 MS-based proteomic data interpretation ..................................................................... 60

1.8.4.1 Peptide and protein identification ........................................................................... 60

1.8.4.2 Biological analysis .................................................................................................. 63

1.9 Conclusion ...................................................................................................................... 64

1.10 Reference ........................................................................................................................ 65

Chapter 2: Combined Metabolomics and Proteomics Reveals Hypoxia as A Cause of Lower

Productivity on Scale-up to a 5000-Liter CHO Bioprocess .......................................................... 81

2.1 Abstract .......................................................................................................................... 82

2.2 Introduction .................................................................................................................... 83

2.3 Materials and methods ................................................................................................... 85

2.3.1 Chemicals and reagents .............................................................................................. 85

2.3.2 CHO Cell Culture Conditions ..................................................................................... 86

2.3.3 Metabolomic analysis ................................................................................................. 88

2.3.4 Sample preparation for proteomics ............................................................................. 89

2.3.5 2D LC-MS/MS ........................................................................................................... 89

2.3.6 Construction and annotation of DG44 CHO cell proteome database ......................... 91

2.3.7 Protein identification of proteomics analysis ............................................................. 92

ix

2.3.8 Quantitation and differential expression analysis ....................................................... 93

2.3.9 Data filtering technique applied on the proteomics data ............................................ 94

2.3.10 Interaction network and pathway analysis ................................................................ 95

2.3.11 Western blotting ........................................................................................................ 96

2.3.12 Quantitation of fibronectin levels by ELISA ............................................................ 96

2.3.13 Real-Time PCR ........................................................................................................ 97

2.4 Results ............................................................................................................................ 97

2.4.1 CHO cell growth and productivity in 5-KL vs. 20-L scale bioreactors with two levels

of copper concentration in the media (conducted by Bristol Myers Squibb)............................ 98

2.4.2 Proteomic and metabolomics analysis platform ....................................................... 101

2.4.3 Analysis of combined differentially regulated proteins and metabolites in the 5-KL

reveals significant reduction in ROS with higher level of copper concentration in the media

and no significant copper effect in the 20-L reactor ............................................................... 103

2.4.4 Hypoxia (intermittent) in 5-KL bioreactor reduces cell viability and productivity . 109

2.4.5 Analysis of additional differentially regulated proteins supports the ROS and hypoxia

roles in the 5-KL bioreactor .................................................................................................... 111

2.4.6 The differentially regulated proteins related to important biological functions and

pathways .................................................................................................................................. 113

2.4.7 Superoxide dismutase 1 is potentially involved in the reduction of intermittent

hypoxia and oxidative stress with addition of copper in the 5-KL bioreactor ........................ 114

2.5 Discussion .................................................................................................................... 115

2.6 Conclusion .................................................................................................................... 118

2.7 Appendix ...................................................................................................................... 120

2.7.1 Perspective of biological effects caused by additional copper in the media. ........... 120

2.8 Reference ...................................................................................................................... 147

x

Chapter 3: Identification and Quantitation of Host Cell Proteins in Therapeutic Product ......... 156

3.1 Preface and Abstract..................................................................................................... 157

3.2 Introduction .................................................................................................................. 159

3.3 Materials and Methods ................................................................................................. 165

3.3.1 Chemicals and reagents ............................................................................................ 165

3.3.2 Sample preparation ................................................................................................... 165

3.3.3 LC-MS/MS ............................................................................................................... 166

3.3.4 Mass spectrometry parameters ................................................................................. 169

3.3.5 HCP identification and quantitative estimation through 2D LC-MS/MS ................ 170

3.3.6 HCP quantitation through PRM with isotopically label peptides............................. 171

3.3.7 Spectral assay generation .......................................................................................... 172

3.4 Results and discussion .................................................................................................. 173

3.4.1 Low pH RP LC gradient optimization ...................................................................... 173

3.4.2 HCP sample preparation protocol ............................................................................. 175

3.4.3 HCP identification and estimation by 2D LC-MS/MS with DDA for PA and CEX

samples for preliminary testing ............................................................................................... 175

3.4.4 HCP identification by 2D LC-MS/MS with DDA mode for UF/DF samples ......... 182

3.4.5 HCP quantitation based on PRM and isotopically labeled internal standards ......... 184

3.4.6 The generation of assay library from 2D-microflow-LC-MS-DDA ........................ 192

3.4.7 The insights provided by the preliminary results from 2D-LC-MS/MS-DDA and the

generation of the novel workflow ........................................................................................... 193

3.5 Conclusion .................................................................................................................... 198

3.6 References .................................................................................................................... 199

xi

List of Figures

Figure 1- 1 The fundamental scheme of therapeutic protein production by mammalian cell lines.6

Figure 1- 2 The classic secretory pathway for recombinant therapeutic protein secretion. ........... 8

Figure 1- 3 The platform of therapeutic protein production by CHO. .......................................... 13

Figure 1- 4 Several examples of bioreactor types for therapeutic protein production. ................. 15

Figure 1- 5 Two popular bioreactor feeding modes in biopharmaceutical industry. .................... 18

Figure 1- 6 The simplified purification platform based on chromatography for mAb and related

proteins such as Fc fusion proteins for downstream process. ....................................................... 22

Figure 1- 7 Information flow in cells and the connection between ‘Omics. ................................. 26

Figure 1- 8 The general workflow of proteomics analysis based on liquid chromatography

coupled with tandem mass spectrometry (LC-MS/MS). .............................................................. 33

Figure 1- 9 The scheme of an orbitrap. ......................................................................................... 38

Figure 1- 10 Construction of the Q Exactive. ............................................................................... 39

Figure 1- 11 The scheme of (A) data dependent acquisition (DDA) and (B) data independent

acquisition (DIA). ......................................................................................................................... 41

Figure 1- 12 The schemes of PRM and SRM processes. .............................................................. 45

Figure 1- 13 The categories of labeling approaches. .................................................................... 49

Figure 1- 14 The reaction of enzymatic labeling of 16O/18O (115). .............................................. 51

Figure 1- 15 Chemical reaction of dimethyl labeling. .................................................................. 53

Figure 1- 16 The scheme of (A) isobaric labeling reagents and (B) labeled peptide. .................. 54

Figure 1- 17 TMT 6-plex labeling reagents and the technology principle. .................................. 56

Figure 1- 18 Chemical structure of major isobaric reagents. ........................................................ 58

xii

Figure 2- 1 The cell density, viability, titer productivity, and lactate profiles of the 5-KL and 20-

L bioreactors. ................................................................................................................................ 99

Figure 2- 2 Prediction of significantly repressed biological functions related to cell fate for the 5-

KL bioreactor using IPA. ............................................................................................................ 104

Figure 2- 3 Prediction of significantly repressed biological functions related to ROS generation

for the 5-KL bioreactor using IPA. ............................................................................................. 105

Figure 2- 4 Prediction of the formation of ROS for 5-KL vs 20-L scales with low and high

copper conditions. ....................................................................................................................... 108

Figure 2- 5 Results demonstrating hypoxic stress. ..................................................................... 111

Figure 2- 6 Western blotting of SOD1, a copper-binding enzyme, for the 5-KL and 20-L scales

and under different copper levels. ............................................................................................... 115

Figure 2- 7 The scheme of the summary that increased copper reveals hypoxia as a cause of

lower productivity on scale-up to industrial CHO bioprocess. ................................................... 119

Figure 2- 8 Prediction of significantly activated biological functions for the 20-L bioreactor

using IPA at Day 6. ..................................................................................................................... 123

Figure 3- 1 The number of proteins as a function of PSM counts for PA and CEX samples. ... 177

Figure 3- 2 The calibration curves of standard peptides from TAA SpikeTide Set. .................. 187

Figure 3- 3 Precursors and fragments of the peptide VHSFPTLK of protein disulfide-isomerase.

..................................................................................................................................................... 192

Figure 3- 4 The scheme of the DIA-to-PRM HCP analysis workflow. ...................................... 196

xiii

List of Tables

Table 2- 1 The number of identified and quantified proteins with the 5-KL and 20-L bioreactors

from the proteomic data analysis. ............................................................................................... 102

Table 2- 2 Numbers of differentially regulated proteins and metabolites at each time points of the

both scales. The differentially regulation is by comparing the high and low copper conditions of

a given scale. ............................................................................................................................... 102

Table 2- 3 MetaCore analysis of proteomic data of the 5-KL scale. The significant differentially

regulated proteins related to apoptosis and cell adhesion pathways ........................................... 112

Table 2- 4 Differentially regulated proteins and metabolites with the 5-KL and 20-L bioreactors.

..................................................................................................................................................... 124

Table 3- 1 The number of identified HCPs and peptides along with different length of LC

separation gradient. ..................................................................................................................... 174

Table 3- 2 The list of identified HCPs in the PA sample with at least 50 PSM counts. ............ 177

Table 3- 3 The list of identified HCPs in the CEX sample with at least 10 PSMs and their

corresponding PSM counts in the PA sample. ............................................................................ 179

Table 3- 4 The identified HCPs in the UF/DF sample and their PSM counts in the PA sample*.

..................................................................................................................................................... 183

Table 3- 5 Peptide pairs chosen from SpikeTide Set TAA, their identification and calibration

linear range against the post-ultrafiltration digested sample ...................................................... 186

Table 3- 6 Target peptides and quantitation results for peptides from several HCPs. ................ 189

xiv

Table 3- 7 The quantitative information of the selected HCPs of the two biological replicates. 190

xv

List of Abbreviations

%V %Viability

2-DE Two dimensional gel electrophoresis

ARSA Arylsulfatase A

BCA Bicinchoninic acid

CDACF Chemically defined, animal component-free

CEX Cation exchange chromatography

CHO Chinese hamster ovary

DDA Data dependent acquisition

DIA Data independent acquisition

DiART Deuterium isobaric amine-reactive tag

DiLeu N,N-dimethyl leucines

DTT Dithiothreitol

ESI Electrospray ionization

FT-ICR Fourier transform ion cyclotron resonance

GPx Glutathione peroxidase

GSH Reduced glutathione

HCPs Host cell proteins

HIC Hydrophobic interaction chromatography

HILIC Hydrophilic interaction chromatography

HPLC High performance liquid chromatography

IAM Iodoacetamide

IPA Ingenuity Pathway Analysis

xvi

iTRAQ Isobaric tag for relative and absolute quantitation

LC-MS/MS Liquid chromatography tandem mass spectrometry

Lys-C Lysyl endopeptidase

mAb Monoclonal antibody

PA Protein A

PAT Process analytical technology

PD Proteome Discoverer

pI Isoelectric point

PRM Parallel reaction monitoring

PSMs Peptide spectral matches

Qp Specific production rate

ROS Reactive oxygen species

RP Reversed phase

SEC Size exclusion chromatography

SOD1 Superoxide dismutase 1

TAA Tumor associated antigens

TEAB Triethylammonium bicarbonate

TMT Tandem mass tag

TOF Time of flight

VCD Viable cell density

1

Chapter 1: Overview of Therapeutic Protein Production by Chinese

Hamster Ovary Cells and Liquid Chromatography Mass

Spectrometry Based Quantitative Proteomics

2

1.1 Abstract

Development of recombinant therapeutic proteins has led to the significant revolution of

modern medicine. The success in biopharmaceutical production in large scale is the key to bring

sufficient amount of drugs into market, practically benefiting numerous patients. Chinese hamster

ovary (CHO) cell lines have become the predominant choice as the production host of therapeutic

proteins in the biopharmaceutical industry. Numerous efforts have been made to increase the

therapeutic protein productivity through optimization of industrial upstream and downstream

processes. Despite significant advances achieved, many challenges remain. Currently the

bioprocess development is still empirical, laborious, and time-consuming due to the limited

understanding of CHO biology. Cell cultivation characterized with small-scales often does not

behave in the similar ways in the production large scales, hindering prediction of productivity in

the production scale bioprocesses. Meanwhile, the evaluation of impurity residue at low abundance

in the final product, especially host cell proteins (HCPs), requires detection methods with high

sensitivity and wide dynamic ranges, which cannot be achieved by traditional approaches. New

instrumentation and bioinformatics tools have been developed rapidly with the power to study and

analyze proteins and proteome, providing powerful tools to address challenges in

biopharmaceutical industry. In this chapter, the CHO-based therapeutic protein production

including the principle and the industrial manufacture processes are reviewed first, as well as some

advances and challenges to date. Then the quantitative protein and proteomics techniques based

on liquid chromatography coupled with mass spectrometry (LC-MS) and related quantitation

approaches are demonstrated.

3

1.2 Overview of recombinant therapeutic protein production

The revolution in modern medicine led by recombinant therapeutic protein products has

emerged over the past several decades. The first recombinant therapeutic protein drug, human

insulin from Eli Lilly, was approved for clinical use in 1982, marking the beginning of the success

of biopharmaceuticals (1). By 2014, over 200 biopharmaceuticals have been approved in the

United States and European Union and commercially available with more than 100 billion dollars,

and growing, as the estimated annual revenue (1, 2). Recombinant therapeutic protein products

include monoclonal antibodies, recombinant fusion proteins, cytokines, hormones, and blood-

products (3). To the date, these products are providing effective therapies to a large range of

previously refractory human diseases such as cancers and immunological disorders. Notably,

among the biopharmaceuticals approved in recent years, the fraction of monoclonal antibodies

(mAb) and related proteins (e.g., Fc fusion proteins) is steadily increasing, reaching around 50%

of the overall biopharmaceutical approvals in the 2010-2014 time period (1).

Recombinant therapeutic proteins are produced by cell hosts, which are genetically

engineered with recombinant DNA encoding the drug product. Therapeutic proteins need to be

synthesized in their biologically active forms to be effective therapies, requiring correct protein

folding (higher order structure) and post-translational modifications (PTMs). Mammalian cell

lines are competitive host candidates for certain products such as monoclonal antibodies (mAb)

because other hosts such as microbial host may be not capable of generating critical PTMs,

especially glycosylation. Among several choices of mammalian cell lines for recombinant

therapeutic protein production, Chinese hamster ovary (CHO) cells are the most widely used cell

lines and have become the workhorse of therapeutic protein production in industry (4). Since the

4

therapeutic drugs involved in this thesis are mAb and related proteins, the discussion in this chapter

will focus on such biopharmaceutics and CHO cell host expression systems.

The practical significance made by therapeutic proteins in the real world could not have

been achieved without the success of large-scale biopharmaceutical production. After the

production cell lines are selected and developed, bioprocess operation is optimized and then scaled

up to production bioreactors (kiloliter, KL). With large-scale bioprocess cultivation, large amounts

of the drug product can be harvested from the biomass, followed by downstream purification.

Finally, the high purity therapeutic protein products can enter market.

Improvements in productivity are critical to increase the availability to patients, which can

rely on genetic engineering of the cell host and optimization of manufacturing process. However,

such bioprocess development is generally empirical due to the limited understanding of the biology

of the CHO production of the product. At the same time, the evaluation of the impurity residue,

especially host cell proteins (HCPs), in the drug product is of great importance for drug quality

control, considering the potential safety risks of such impurities for patients. However, HCP

identification and quantitation are still challenging due to the fact that the HCPs can be at the low

part-per-million (ppm) level.

New instrumentation and bioinformatics tools have been applied to global cell biologics or

“omics” studies including MS-based quantitative proteomics, metabolomics, and genomics. As

powerful tools, these omics methodologies are able to provide comprehensive characterization of

CHO cells and the therapeutic protein product, potentially improving the understanding of

manufacturing process. Especially with proteomics, the development of high resolution mass

spectrometry (MS) as well as high efficiency liquid chromatography (LC) separation provides

versatile LC-MS based strategies for shotgun proteomics as well as targeted protein quantitation,

5

platforms which have not only high potential to study systems biology of CHO expression systems

but for HCP identification and quantitation.

This chapter will first briefly describe the principle of recombinant therapeutic protein

synthesis by mammalian cell lines, and then discuss the general platform of how

biopharmaceuticals turn into life-saving formulated drugs. Common upstream and downstream

processes will be discussed, since this information provides the big picture of which issues need

to be addressed and what strategies could be potential solutions. Current advances in LC-MS based

protein and proteomics analysis and multi-‘Omics study will be introduced as LC-MS based

methods can address many of the challenges in the biopharmaceutical industry.

1.3 Principle of recombinant biopharmaceutical synthesis by mammalian cell expression

systems.

Recombinant protein products, especially mAbs and related proteins, are generally

synthesized by mammalian cell lines. These hosts include hybridoma cell lines, mouse myelomas

cells, human embryonic kidney 293 cells (HEK-293), baby hamster kidney cells (BHK-21), and

Chinese hamster ovary cells (CHO) (5-8).

6

Figure 1- 1 The fundamental scheme of therapeutic protein production by mammalian cell lines.

DNA sequence encoding the protein drug of interest is cloned into the expression vectors, and

these vectors are the introduced into host cells and integrated into the genome. The transfected

host cells undergo protein synthesis through transcription, translation, and subsequent post-

translation modification. The resultant recombinant protein is secreted into the culture media.

7

Through several rounds of screening, clones which can express a relatively high amount of the

protein drug are selected as possible production cell line candidates.

The scheme of the fundamental principles for therapeutic protein production by

mammalian cell lines is shown in Figure 1-1. The DNA sequence encoding the protein drug of

interest is recombined into the expression gene vector, whose sequence is designed and optimized

for protein expression in host cells. The vector carrying the DNA sequence of interest is introduced

into the host cell and then integrated into the host genome. Then, the transfected host cell

undergoes protein synthesis. The synthesized recombinant therapeutic protein is transported

through the classical secretory pathway out of the cell. The protein product can then be harvested

from the cell culture media. Clones which can express a relatively high amount of protein with a

good cell growth profile, while generating the desired protein structure (including glycosylation),

are selected as the production cell line candidates through multiple rounds of selection screening.

The classical secretory pathway is illustrated in Figure 1-2. The genes encoding the protein

of interest are transcribed into mRNA and then translated into nascent peptides of the target protein.

The protein is designed to be expressed with a signal peptide at the N-terminus, which can be

recognized by the “signal recognition particle”, a protein-RNA complex, leading to translocation

of the nascent peptide into the endoplasmic reticulum (ER) (9). After the growing peptide chain

resides within the ER membrane, the signal peptide gets cleaved by the signal peptidase (9), and

the translation process continues. Within the ER, proteins undergo folding and PTM formation

such as disulfide bonding and glycosylation, and then are transported to Golgi apparatus (10). The

resultant proteins are encapsulated in a vesicle formed from the Golgi apparatus, and delivered to

the cell membrane. The vesicle fuses with the cell membrane, releasing the protein into the

extracellular space.

8

Figure 1- 2 The classic secretory pathway for recombinant therapeutic protein secretion.

The gene which encodes the protein of interest are transcribed into mRNA, and the translation

process starts in the ribosome. A short peptide, called signal peptide, presents at the N-terminus of

the nascent peptide of the therapeutic protein, which lead the nascent peptide towards the secretory

pathway. The signal peptide can be recognized by the signal recognition particle, resulting in

translocation of the nascent peptide into the ER. Then the translation of the therapeutic protein

continues in the ER, where the protein folding and PTM formations occur. The resultant proteins

are then transported to the Golgi apparatus, and passed into the extracellular space.

9

A major advantage of collecting protein products in the media instead of accumulating

them inside the host cells is to facilitate the subsequent purification. Biomass, including cells and

debris, can be removed simply by centrifugation and filtration. It is worth mentioning that the

media is desired to be protein-free or with a low-protein component to maximize the advantage of

secreting protein drugs. Mammalian cell lines generally require animal component-containing

media such as bovine serum for cell growth because of requisite of hormones and growth factors.

The potential contaminations from the animal component-containing media of the drug product

raise safety concerns since the protein drugs are secreted into the media. Clearly, the animal

component-free media is highly desired for commercial drug production, and the mammalian cell

lines which are able to be cultured with chemical media are primary choices.

1.4 CHO as a therapeutic protein production host.

The success of CHO-based therapeutic products started to increase after the first approved

therapeutic protein produced in CHO cell line in 1987, tissue plasminogen activator (r-tPA), from

Genentech (11). To the date, nearly 70% of all therapeutic proteins are produced in CHO cell lines

(4).

1.4.1 Advantages of CHO expression system for commercial recombinant therapeutic

protein production

CHO cells have several important features making them the most popular and widely used

production mammalian cell line in industry. CHO cells are easy to incorporate artificially

10

transfected genes and are able to express large amounts of the desirable protein. Secondly, as a

mammalian cell line, CHO cells can provide the appropriate glycoforms for the glycoprotein

therapeutics, which cannot be easily achieved by common microbial hosts such as Escherichia

coli. CHO cell lines have built up the record of production of proteins which are bioactive in human

with compatible glycoforms in the past three decades. Thirdly, CHO is not a susceptible host for

a large number of human pathogenic agents. A report in 1989 showed that CHO did not propagate

at least 44 human pathogenic viruses including HIV and polio (4). This fact means that these

pathogenic viruses that could infect patients are not within the CHO-based therapeutic products.

CHO cells also show advantage of high adaptability for industrial large-scale production.

Although the original CHO cells are adherent in culture, CHO cells are able to grow and develop

to high densities in suspension, leading to the possibility to scale up to thousand-liter bioreactors.

Also, as noted, CHO cells can grow in chemically defined, animal component-free (CDACF)

media. The complexity of chemically defined media is generally limited, benefiting the

downstream purification with fewer contaminants from the media to remove and to monitor.

A successful record, knowledge, and expertise about safety and efficiency of CHO-based

therapeutics in market have been accumulated over the past two decades. This information eases

FDA approval of new drugs made in CHO cell lines. In addition, the experience and understanding

of upstream CHO cell growth and downstream process purification ensure CHO cells as the likely

priority choice for industry production for the next several decades. Consequently, the study of

CHO biology is important to support the biopharmaceutical industry with productivity

improvements of current products and new drugs in the future. In Chapter 2, the underlying biology

of CHO cell lines has been studied at both the production and laboratory scales.

11

1.4.2 A brief history of CHO cell lines applied to biotech industry.

CHO cells were isolated from a female Chinese hamster ovary and first established in

culture plates by Dr. Theodore T. Puck in 1957 (12). After a period of time, cells underwent

spontaneous immortalization, likely due to a genetic change (13). These original CHO cells with

immortalization were then provided to several laboratories, and many strains of CHO were

generated from this original cell line and its following generation, for example, CHO-DXB11 and

CHO-DG44. CHO-K1 was one of the CHO-DXB11 derivatives later. The original CHO cells

could only grow in adherent culture, but many of their derivatives are able to grow in suspension

culture. To date, these strains are widely employed as parental cell lines in biopharmaceutical

industry (4, 13). Importantly, CHO cells were reported to be successfully grown in serum-free

media in as early as 1977 (14). In Chapter 2, the CHO production cell line under investigation is

CHO-DG44.

1.5 The general platform of therapeutic protein production by CHO cells

To bring the CHO cell lines into industrial production of a therapeutic protein, one needs

to 1) develop a production cell line in the lab and then 2) convert this cell line to the industrial

production of large scale. The latter part, industrial production platform, contains two major parts:

1) scale up of the bench-top cultivation to the large-scale bioreactors including the development

and optimization of the bioprocess operation, and 2) purify the protein product after harvest. The

large-scale cultivation is called the upstream process, and the purification is the downstream

process. One example of this platform applied to therapeutic antibodies and related drugs is shown

in Figure 1-3. All of the listed steps play a critical role in the success of protein drug production

12

with high quantity and quality within reasonable cost and time. Current knowledge and

development of each step will be introduced in the following sections, as well as the remaining

challenges.

13

Figure 1- 3 The platform of therapeutic protein production by CHO.

A. Production cell line development at the bench-top. B. Therapeutic protein production with

industrial large-scale, shown as the example for therapeutic mAbs and related proteins such as Fc

fusion proteins (15). Reprinted with permission from Shukla et al. (15). (a) Upstream bioprocess.

After the cells are taken from cell banks and thawed, a series of expansion steps are performed

14

with seed bioreactors, and then cells are transferred to the production bioreactor. The biomass is

removed from the product by centrifugation and filtration. (b) Downstream purification. For mAbs

and related proteins, Protein A chromatography is the first step to clean up most of the impurities.

At least one step of other polishing chromatography such as ion exchange will eliminate the

impurities further. The ultrafiltration/diafiltration (UF/DF) is applied to transfer the product into

the formulation buffer and desired concentration.

1.6 Industrial platform of therapeutic protein production by CHO cells

With the success of biopharmaceuticals in modern medicine, the large patient population,

and the general high doses, especially for mAbs and related proteins, there is a need for a very

large amount of product with consistent and reproducible quality. The biopharmaceutical industry

is under pressure to bring sufficient product to market at lower cost to payers. Thus, the

development of a high-yielding, scalable and robust biopharmaceutical production process is

always a significant focus in industry. Such demand must be achieved with both upstream

processing with high titer for large manufacturing scale and downstream processing with efficient

purification of drug substances. In the following section, the general design of up- and down-

stream processes will be described, focusing on mAbs and related protein production by CHO cells.

1.6.1 The platform of the upstream process

The understanding of the upstream process is of great importance to study CHO growth

profile and underlying biology. The engineering design determines the cell cultivation

environment, and this information will provide hints of potential sources of cell growth stresses

and the responsible causes. In Chapter 2, the CHO cells under investigation were cultured in stirred

tank bioreactors with fed-batch mode, both of which topics will be introduced in this section.

15

The upstream process has seen significant advances. The production bioreactor has been

scaled up to as large as 200,000 L to the date (16), and the productivity has improved from 0.05 to

2-10 g/L (3). The general workflow of the upstream process is shown in Figure 1-3B(a). The

upstream process development and optimization generally focuses on 1) designing the production

bioreactor configuration and 2) optimizing the bioprocess control such as pH, temperature,

medium and feeding. In the upstream process, the bioreactor design and manufacturing control

determines the culture conditions such as nutrition supplies and dissolved oxygen concentrations,

which are critical for product quality and quantity.

Figure 1- 4 Several examples of bioreactor types for therapeutic protein production.

A. Stirred tank bioreactor, B. Airlift bioreactor, C. Disposable wave reactor (bench-top scale).

Reprinted with permission from Jain et al. (17).

For the bioreactor, the ideal design process involves several factors: 1) sufficient mass

transfer; 2) adequate oxygen supply; 3) low shear stress (18). One of the most popular bioreactor

designs for suspension culture is stirred tank, widely used for biopharmaceutical production. As

shown in Figure 1-4A, the impeller blades stir to mix oxygen and nutrients within the culture

16

medium inside the bioreactor. The agitation rate is optimized as well as the shape and diameter of

the impeller blades in order to reach acceptable mass and gas transfer and to minimize cell lysis

caused by turbulence. This stirred-tank bioreactor can be scaled up conveniently, and the product

quality can be controlled relatively easily. It is one of the most important reactor designs in

industrial production.

The airlift bioreactor (Figure 1-4B) is another large-scale reactor design compatible for

suspension culture, in which the mass transfer and oxygen mixing is achieved by introducing gas

bubbles moving through the bioreactor. Gas (air or other gas mixture) is introduced into a part of

the reactor, and a non-gassed circulating flow is generated in the other region of the reactor (Figure

1-4B). The geometric design of the reactor and the operational parameters can be optimized to

increase mass transfer efficiency and to reduce shear stress (19). Without the mixing blade used in

the stirred-tank reactor, airlift bioreactor has less shear stress and more energy efficiency.

Disposable bioreactors are employed not only in the small scale production but also the

thousand-liter scale. Compared to traditional hard-piped bioreactor configurations, this single-use

bioreactor system has advantages including less capital investment cost, more flexibility, higher

process replication, and lower risk of cross-contamination without the need for cleaning and

sterilization (17, 20). The economic benefits have encouraged the usage of disposable bioreactors

in biopharmaceutical industry, and several vendors has provided commercial disposable stirred-

tank bioreactors up to the 1,000 L scale (21). The trend is adoption of the disposable bioreactor

and it likely will be the future of biopharmaceutical production. There are several designs for

disposable bioreactors including wave bioreactors, orbital shaken bioreactors, stirred-tank

bioreactors (17, 20, 22), see Figure 1-4C. A working scale of up to 2000-L has been reported (21).

Despite the current advances, challenges remain, such as the limited scalability, restricted design

17

options, and the lack of standardization (22). Moreover, the extractables from the disposable

bioreactor material may effect of cell growth performance and drug quality.

Besides the bioreactor design, the bioprocess operation is also critical for cell growth and

drug productivity, including growth media composition, feeding, pH, temperature, etc. The feeding

methodology is one of the most important factors for optimization of the process. Ideally, the

nutrients should be at sufficient level for cell growth without affecting the product quality, and the

accumulation of metabolic waste is minimized as much as possible. Other factors such as pH

should also be kept at suitable levels for cell growth. The accumulation of lactate and ammonia

has been widely reported to impair cell growth performance (17, 23, 24). Noticeably, the

glycosylation pattern of the glycoprotein product can be affected by nutrient starvation, media

components, metabolic waste accumulation, and pH (25-29), potentially disturbing the quality of

drug product.

Fed-batch and perfusion are the most popular feeding modes currently applied in industry

(8) (Figure 1-5). In fed-batch (Figure 1-5A), nutrients necessary for the culture are added

intermittently or continuously into the bioreactor during the cultivation time. The drug product is

usually harvested at the end of the operation, resulting in a high concentration of product in the

medium. It is also flexible to adapt for use with different clones. The nutrients can be maintained

at certain levels, and the pH of the media can be under control, but the waste will accumulate in

the reactor (17, 23, 30). In perfusion mode (Figure 1-5B), on the other hand, fresh media is added,

and the cell-free spent media is removed from the bioreactor (31). This process can decrease waste

accumulation and keep the nutrient at the desired level. Cultivation time can last much longer than

the fed-batch. However, lower cell viability is observed though with high cell density due to the

accumulation of dead cells and intracellular biomass released (8). A procedure called “bleeding”,

18

which involves removing cell-containing media through small flow, is necessary to decrease the

cell death rate and increase viability. Viable cell density, however, decreases with “bleeding”

because of unavoidable removal of viable cells during the process (17).

Figure 1- 5 Two popular bioreactor feeding modes in biopharmaceutical industry.

A. Fed-batch. B. Perfusion. Modified and reprinted with permission from Birch et al. (24).

1.6.2 Challenges of upstream process

Cell cultivation conditions defined by both bioreactor design and operation control are

crucial in achieving high productivity. The expensive facilities of large scale bioreactors would

not be easy to rebuild or to readjust significantly without high cost. As a result, practically

bioprocess operation optimization becomes the key factor to improve culture performance with

existing bioreactor configurations.

19

For bioprocess development, cell culture is first characterized at the laboratory scale and

then scaled up to a large production bioreactor. The first notable challenge is that the process

development is to some extent empirical. Currently, the understanding of CHO biology is limited,

hindering the prediction of cell responses to environmental perturbations. Consequently,

bioprocess optimization requires extensive experimentation, which is laborious and time

consuming. To resolve this low resource- and time-efficiency issue, small volume reactors at the

liter-scale are widely employed for bioprocess development because of its easy handling and low

cost (32). Process development has even been driven to smaller, milliliter-, scale combined with

robotics technology to reach high throughput (33, 34).

Using these approaches of bioprocess development with small scale leads to another

significant challenge of upstream process, scalability. Clones developed in bench-top bioreactors

may not behave in a similar way after scaling up to large-scale bioreactors with seemingly identical

parameters, hindering prediction of productivity in the production-scale bioprocess. Especially, it

is known that productivity is often lower in large (kL) relative to small (L) reactor scales (17, 35,

36). The main reason is that, due to the restriction of bioreactor physical design, not all of the

variables in the bioprocess can be maintained simultaneously during the scale-up from a liter-level

scale to a thousand-liter bioreactor. Another reason is that some variables, important to influence

culture performance, are not readily measurable, hindering efforts of their evaluation and control.

For example, the constant mixing time in the reactor is a parameter which usually cannot be used

as a scaling criterion. Mixing time is an overarching indication of mass transfer and gas mixing

efficiency, but it is not convenient to measure. Another issue is that to keep the same mixing time

of the small scale, specific power input for the large scale can be unpractically high (35, 37).

20

Specifically, CHO cells do not have a protective cell wall, and agitation in the bioreactor

must be controlled within a certain range to prevent cell damage (38). Thus, in large scale of CHO

cultivation, the relatively low agitation inevitably leads to limited mass transfer and gas-phase

mixing, resulting in substrate gradients (35, 38-40), which would impact the cell growth

performance and product quality. Oxygen transfer efficiency is one of the critical factors in the

scale-up process. Oxygen generally enters into the bioreactor as sparging bubbles. With the limited

solubility of the gas in the aqueous solution, low amounts of oxygen can be dissolved in the culture

media. Meanwhile, cells consume oxygen rapidly before it can be dispersed across the entire

culture (39). Consequently, homogeneity of dissolved oxygen is generally compromised in large

scale bioreactors. In Chapter 2, different CHO cell growth profiles and behaviors during scale-up

were investigated, and the limited homogeneity of dissolved oxygen is shown to be the cause of

observed difference relative to the lab scale process.

Despite significant advances that have been achieved for the upstream process, there are

still many challenges that remain. Because the underlying biology is not absolutely clear, it is not

easy to predict or model the cell growth characteristics and production, especially in large-scale

bioreactors. The understanding of how bioprocess operation can influence product quality (e.g.

glycosylation patterns) is also limited. Additional optimization is also required with monitoring

the quality of the drug product. All these approaches must be performed for every new drug

product.

21

1.6.3 The platform of downstream process

After the upstream cultivation, drug product carries along undesired impurities which can

be product-related and/or process-related. The product-related impurities are species that are

product variants with properties different from the desired product, such as degraded or aggregated

protein drug or molecules with undesired PTMs or misfoldings (41). The product-related

impurities are defined as those introduced from bioprocess manufacturing. They include host cell

proteins (HCPs), DNA/RNA, lipids, and small molecule chemicals from media and host cells, and

leachables (e.g. Protein A) (41). The impurities co-purified with drug product can have potential

safety risks for patients. Downstream processing has as its goal the removal of these impurities as

much as possible with high recovery of the protein product while maintaining activity. The

introduction of the downstream process in this section will focus on the purification of mAbs and

related proteins. The downstream purification of such proteins has evolved towards a common

platform in industry due to the general nature of this type of drug product. One example of

downstream purification is shown in Figure 1-3B (b). However, because of the various properties

of different products, there is no universal template or procedure which can be employed directly.

mAbs and related protein products are secreted into the media, and hence, the removal of

cells and cell debris is the first step of purification, which is also called cell culture harvest. For

large scale production, centrifugation is employed because it is economical and easily scalable for

large volumes (42). Then, depth filtration, which process utilized a several layers of porous

material as filtration medium to trap the particles, follows centrifugation to remove any residual

cellular debris. A series of capture and polishing steps are performed to eliminate other impurities.

This part of the purification process can be categorized into two groups, 1) chromatography-based

and 2) non-chromatographic, both of which will be introduced in detail later. Additionally, drug

22

product purification also requires the removal of potential viruses, especially for the mammalian

cell host, so typically at least two orthogonal steps of viral inactivation and size-based filtration

are needed in the downstream process (15, 24). To accomplish the downstream process, the last

step, ultrafiltration/diafiltration (UF/DF), is employed to reduce the storage volumes and to bring

the drug product into the formulation buffer through buffer exchange (15). A membrane with a

specific molecular cutoff is used, and the transmembrane pressure and cross-flow rate during the

filtration process controlled for most of mAb and related proteins (42).

Chromatography-based approaches have high purification efficiency and have been widely

used in the biopharmaceutical industry. In the majority cases, Protein A affinity chromatography

is used at the beginning as the capture step, and then followed by at least one ion exchange step

and then hydrophobic interaction (HIC) or size exclusion chromatography (SEX) as polishing

steps. An example of downstream purification based on chromatography which is widely adopted

by biopharmaceutical industry is shown in Figure 1-6.

Figure 1- 6 The simplified purification platform based on chromatography for mAb and related

proteins such as Fc fusion proteins for downstream process.

The purification of mAb and related proteins usually start with Protein A affinity chromatography,

in which step most of the impurities can be removed. Then one or several other chromatography

steps such as ion exchange, hydrophobic interaction (HIC), or size exclusion chromatography

(SEX) are applied, called polishing steps, to purify drug product further. At last,

ultrafiltration/diafiltration (UF/DF) step is employed to reduce the storage volumes and to bring

the drug product into the formulation buffer through buffer exchange.

23

Protein A binds to the Fc region of IgG molecules specifically with an affinity constant of

about 108 (M-1) (43). The drug product can be captured by Protein A, and many impurities will

elute through the column. Commonly, this Protein A capture step can obtain drug product with

more than 98% of purity (15, 44). The residual HCPs and DNA, protein drug aggregates, and the

low molecular weight contaminants such as leached Protein A fragments are generally removed

by the following polishing steps. The choice of which chromatography depends on the nature of

the protein drug and the impurities. The detailed procedure for each step requires optimization for

different drug products.

The chromatography-based approaches are generally expensive. To pursue cost efficiency,

non-chromatographic purification methods have been recently developed. Although a number of

such techniques have been reported, these approaches have not been widely employed in industry.

Aqueous two-phase extraction (ATPE) has been investigated to separate the product from the

biomass (45). The advantages include scalability and high capacity. It is also easy to perform

compared with Protein A chromatography. However, due to the limited understanding of the

underlying mechanism of ATPE and the involvement of complex interactions of multiple species,

the ATPE design and process optimization are difficult to optimize (20, 45). Precipitation of the

proteins of interest is another approach. It has been applied in laboratory-scale protein purification

and is promising for large scale. However, the selectivity needs improvement for this technique.

The approaches of co-purification with charged polymers have also been reported to improve the

selectivity and precipitation efficiency (15, 20). Other techniques such as crystallization and

charged UF membranes have also been reported.

24

1.6.4 Challenges of downstream process

Biopharmaceutical industry has been providing safe drugs to patients for decades,

indicating the success in eliminating toxic impurities by the downstream purification process.

However, several challenges remain. First, the capacity of downstream purification is becoming a

bottleneck of therapeutic protein production. Higher and higher titer values being reached by

advances in the upstream process increase the burden of downstream purification dramatically.

The operation optimization of upstream process such as medium adjustment or pH/temperature

control can improve productivity without significantly raising the cost. However, the downstream

capacity cannot increase significantly by optimizing purification protocols, especially with

chromatography-based approaches. As a result, the cost of the downstream process would scale at

least linearity with the increased titer values. Moreover, the possible change of impurity profiles

due to the upstream optimization may require updating the existing purification strategies,

increasing time and resource input for the downstream process. To date, the downstream process

has taken a significant proportion of the total therapeutic protein manufacturing cost, as much as

50%-80% (46).

The other difficulty is detecting and monitoring the impurity species remaining in the

products, especially HCPs. After several purification steps, impurities in the final product are at

very low levels. For example, HCPs can be at the ppm level and DNA at the ppb level for the final

drug product (24). Identification and quantitation of the impurities are challenging within the high

background of the therapeutic protein. Getting this information about impurities is of importance

for risk assessment. Moreover, being able to monitoring the impurity levels along each purification

step allows the knowledge-based optimization of the downstream process and leads to efficient

removal of critical impurities. Currently, residual DNA impurities can be detected and quantified

25

with PCR related techniques such as real-time PCR (47, 48). The detection and quantitation of

HCPs will be discussed in the next section.

Notably, although the impurities are required to be removed as much as possible, there is

no universal standard of the maximum allowable levels of impurities remaining in the final product

from a regulatory perspective. Each drug is examined and evaluated for risk assessment on a case-

by-case basis according to the patient population, dose, and route (41). This fact reflects the

complexity of the characterization and risk assessment of the therapeutic protein products.

1.7 Current advances for CHO-based therapeutic protein production.

1.7.1 Understanding CHO cell production and CHO cell engineering through ‘Omics

approaches

High titer values of protein drug are not only related to high specific productivity, but also

directly affected by the viable cell density within the cultivation time. For the upstream process,

cell line development and manufacturing operation optimization remain empirical, one of the

major reasons being the limited understanding of the underlying biology of CHO host. Currently,

the results of the study of CHO metabolic pathways have benefited the long-term goal of increasing

the productivity and developing new protein drug products. The ‘Omics studies, including

genomics, transcriptomics, proteomics, glycomics, fluxomics, and secretomics, have been proven

as powerful tools to understand CHO biology and to provide valuable information for the

biopharmaceutical industry. Cell engineering based on the critical metabolic pathways related to

protein expression and cell fate helps increase the quality and quantity of the drug product.

26

Figure 1- 7 Information flow in cells and the connection between ‘Omics.

The sequencing of CHO genome by Xu et al. in 2011 was a breakthrough in CHO systems

biology (49). In this genomic sequence database, CHO-K1 cell line was annotated with 24,383

genes (49). It revealed that CHO-K1 does not express many viral entry genes, explaining why

CHO cells resist many virus infections (49). Since then, the Chinese hamster genome and six CHO

cell lines, which are derived from CHO-K1, CHO-DG44, and CHO-S, were sequenced (50). With

this advanced genome information, other ‘Omics such as transcriptomics, proteomics and

metabolomics can provide more detailed global information on mRNAs, proteins, and metabolites

with better accuracy, and offer more precise information for the understanding of the industrially

relevant cell lines. It would not only be useful to discover key metabolic pathways but also to

27

reveal gene targets for cell line engineering and protein biomarkers for cell status evaluation and

bioprocess monitoring. The connection of multi-‘Omics is shown in Figure 1-7. It is also a

reflection of information flow in cells.

Transcriptomics is the analysis of mRNA expression levels, i.e., the genomic information

at the transcription level (51). Technologies including DNA microarray, RNA sequencing (RNA-

Seq), and microRNA (miRNA) profiling are employed for transcriptomic analysis (52). Depending

on the cell type, CHO cells can turn on or off their own sets of genes, leading to a cell-specific

gene expression pattern. Thus, different CHO cell lines can yield different transcriptomes as well

as other ‘Omics information even if they share similar genomes (3). In proteomics, the entire

complement of expressed proteins and/or their expression levels are analyzed (53). Not all protein

expression levels are necessarily directly correlated to the corresponding mRNA levels (3). Since

proteins are direct executors in the biological system, the change of protein expression profile can

reveal the disturbance of metabolic pathway, providing the first-hand information of CHO cell

growth status. The secretome is a subset of the proteome. The secreted proteins can regulate the

interactions of cell-to-cell or cell-to-intracellular matrix and may affect the cell growth behavior

(54). Mass spectrometry-based proteomic analysis coupled with two-dimensional liquid

chromatography and/or gel electrophoresis has been developed as one of the most powerful tool

for proteomics analysis. Glycomics involves characterization of glycan structure of the CHO

system, including the protein glycosylation pattern. The glycosylation patterns affect protein

functions, resulting in disturbance of protein drug quality.

In the biopharmaceutical industry, all of these ‘Omics studies of CHO eventually aim at

increasing productivity and/or quality of the protein product. The deep understanding obtained

from these studies can guide strategies of cell engineering by recognizing biomarkers for desired

28

phenotypes. For example, Doolan et al. reported the transcriptomic study on ten CHO-K1 cell lines

with a range of growth rates, which are derived from a single parent cell line (55). They reported

that the high growth rate was a multi-gene effect, involving several cellular processes including

upregulation of DNA replication and mitosis and downregulation of cell proliferation (55). The

regulation of relevant genes ALDH7A1 and CBX5 agreed with previous studies (55, 56),

indicating that they could be potential biomarkers for high growth rate. Moreover, the

understanding of cellular response to perturbations in the environment can improve bioprocess

operation control. One example is a study from Bristol-Myers Squibb by using nuclear magnetic

resonance (NMR) to investigate the metabolome of CHO production cell lines cultured in

production scale (5-KL) and benchtop scale (7-L) bioreactors (36). In this study, with the same

media and bioprocess operation parameters, the CHO cells showed higher viability and

productivity in the small bioreactor, and 30 metabolites were determined to be related, leading to

a high reliance on glycolysis (36). It, for the first time, revealed the potential underlying biology

changes during scalability to a production scale.

Besides the analysis of each of the ‘Omics analysis, a new trend of using combined ‘Omics

techniques to obtain comprehensive information has emerged. The combined global transcriptome,

targeted metabolome analysis, and targeted protein analysis have been applied to study the

erythropoietin production in CHO-K1 cells under different growth conditions, suggesting the

bottleneck of heterologous proteins production is in energy metabolism (57).

In Chapter 2, the combined global proteomics and metabolomics were employed to

investigate the causes of different phenotypes shown by a CHO cell line in two different scales of

bioreactors. Targeted transcription and western blotting analysis were used to support the

hypothesis. It is an example of the power of ‘Omics study to improve the CHO biology.

29

1.7.2 Current approaches of host cell protein identification and quantitation

As mentioned above, despite a series of purification steps, the protein drugs are still

inevitably co-purified with some impurities from the host cell mass and/or the media. Optimization

of the downstream process relies on the analysis of the residual impurities as the drug product is

being purified in individual steps.

HCPs are a major class of impurities. They have been identified as a critical quality

attribute (CQA), which means that they are considered to affect patient safety (41). Residual HCPs

can be potentially immunogenic and toxic, may block the active sites of drug product and even

have proteolytic activity. HCPs which are similar to human proteins are also of concern because

they may trigger autoimmune responses in the human body by causing cross-reactivity with human

proteins (58, 59). Considering HCPs inducing or resulting in potential safety risks to patients

and/or deactivating the drug product (44, 60-62), their identity and quantity is of great importance.

Moreover, the presence of HCPs plays a critical role for therapeutic protein approval by regulatory

agencies. The suspension of two clinical trials at Phase III for IB 1001, a recombinant factor IX,

resulted from the HCP content because of concern for drug safety (63). With biosimilar

therapeutics emerging rapidly, the information of HCP presence can be one of the critical factors

requiring attention. The identification and quantitation of HCPs in the drug product is therefore of

growing interest.

HCPs are a complex group of proteins from the host cells. They have significantly different

properties such as a wide range of hydrophobicities, molecular masses, and isoelectric points (pI).

The other challenge of HCP detection is the wide dynamic range, which requires approaches with

30

high sensitivity and selectivity to detect trace levels of HCPs in the presence of the high therapeutic

protein background. Therefore, HCP analysis requires (i) high dynamic range in order to detect

HCPs at less than 10 ppm, and even down to 1 ppm, in the bulk product background; (ii)

comprehensive identification of all HCPs with high confidence; and (iii) accurate quantitation

without bias (iv) high-throughput and short analysis time. It would be ideal that the method can

also be flexible in transferring from one therapeutic protein product to another. Such approaches

are needed not only to ensure the final drug product quality but also to provide information for

downstream process optimization, potentially reducing the cost of drug production. However,

current methods cannot reach all of these desired goals.

Currently, the conventional method to determine the overall level of HCPs is enzyme

linked immunosorbent assay (ELISA), which is considered the gold standard of HCP detection in

the biopharmaceutical industry. This method is very sensitive and able to detect ppm levels of

HCPs (1 ppm – 100 ppm). The polyclonal antibodies for ELISA are generated by using the null

cell line as an HCP pool. Combined with ELISA used as a quantitative approach, two dimensional

gel electrophoresis (2-DE) as well as western blotting, are widely used to detect HCPs, especially

to compare HCP changes at various stages of purification or with different protocols of a certain

purification stage.

However, there are some disadvantages of these conventional methods. The efficiency and

accuracy of immunospecific methods, ELISA and western blot, depend on the polyclonal

antibodies used for HCP detection. Certain HCPs have low immunogenicity in the host animals

used to generate the polyclonal antibodies, and hence ELISA and western blotting could

underestimate or even miss certain species. Also, the immune response between humans and

animals can be different. Moreover, each set of polyclonal antibodies only correlates with the HCP

31

pool that is used to raise the antibodies, which means that the assays are not interchangeable. As a

result, the generic assays developed from general CHO proteome can provide inaccurate results.

To reach more accurate results, each protein drug should have its own customized immunospecific

assay.

ELISA cannot provide identification or distribution of individual HCPs (60). Since ELISA

reflects the sum of the signal responses of a range of HCPs, its overall sensitivity is higher than

that of western blot, which distributes the signal to a number of protein bands. On the other hand,

western blot as well as other non-immunospecific detection methods based on gels can provide

HCP distributions. 2-DE (2-dimensional electrophoresis) coupled with colorimetric or fluorescent

staining is semi-quantitative with a limited dynamic range. This method can reveal the distribution

of HCPs but cannot provide identity of individual species (64, 65). 2-DE followed by mass

spectrometry (MS) analysis of protein spots provides extension of sensitivity and identification

information (66), but it still cannot not completely meet the requirement of >105 dynamic range.

Moreover, the gel-based protocols are generally laborious and time consuming with a limited

throughput format. Capillary isoelectric focusing coupled with tandem mass spectrometer online

by electrospray ionization (cIEF-ESI-MS/MS) has also reported as for HCP analysis (67).

Liquid chromatography (LC) is advantageous for HCP analysis. LC separation provides

many choices based on orthogonal separation mechanisms, and it is flexible in terms of sample

amount for handling. When coupled with mass spectrometry, it can yield a wide dynamic range

and provide comprehensive information. It has been reported that two dimensional LC/MSE (see

section 1.8.2 Mass spectrometry Data acquisition) was employed for HCP identification and

quantitation (68-70), and multiple reaction monitoring (MRM) was employed for accurate

quantitation as a targeted approach (68). However, the total run time for 2D-LC/MS method may

32

hinder its application because the 10 to 12 hour-test per sample is not practical when one requires

quick response for decision making, which is an important part of process development and control

(41).

Despite the current drawback, the development of qualitative and quantitative protein

analysis based on liquid chromatography mass spectrometry (LC-MS) is still an effective platform

for HCP analysis. Its versatility and flexibility provide a real possibility to reach high sensitivity

and selectivity within a short analysis time. LC-MS has resulted in an increased depth of proteomic

profiling and a large number of protein identities (71). It is promising to explore optimization of

the approach into a rapid and efficient workflow for comprehensive and accurate HCP analysis.

Because of the particular concern about what and how much HCPs are present in the final

drug product, several approaches based on LC-MS for HCP analysis have been examined in this

thesis, and a general workflow for HCP identification and quantitation based on LC-MS/MS is

described in Chapter 3.

1.8 Introduction of liquid chromatography mass spectrometry-based quantitative

proteomics and protein analysis

Investigation of relative CHO protein expression levels, i.e., quantitative proteomic

analysis, provides valuable information for understanding the CHO systems biology with a variety

of phenotypic changes under environmental perturbations. The samples for proteomic study are

generally ones with high complexity and a wide dynamic range. LC-MS/MS based approaches

provide a powerful tool for shotgun quantitative proteomics, and the method has become widely

used to study complex proteomes. Because of the complex and dynamic nature of proteomes, a

33

wide range of proteomic strategies have emerged to address a variety of biological questions. The

general workflow of LC-MS/MS based shotgun quantitative proteomics is shown in Figure 1-8.

Figure 1- 8 The general workflow of proteomics analysis based on liquid chromatography coupled

with tandem mass spectrometry (LC-MS/MS).

After enzymatic digestion of the proteome, the resultant peptides are separated by

multidimensional LC and analyzed by high resolution/high mass accuracy mass spectrometry. The

MS raw data are then analyzed by the bioinformatics tools with protein identification and/or

quantitation information and interpreted into biology mechanisms.

In the general workflow of proteomic study based on LC-MS/MS, the proteomic sample is

first digested by enzymes, typically trypsin and/or lysyl endopeptidase (Lys-C). The resultant

peptide mixture is separated by multi-dimensional LC and then analyzed by mass spectrometry

on-line with electrospray ionization (ESI). The multidimensional LC separation, which has high

separation power, is employed to extend the dynamic range of detection. The multi-dimensional

LC separation can be on-line or off-line, depending on the instrumentation configuration and the

34

specific analysis requirements. With high resolution/high mass accuracy tandem mass

spectrometry, the m/z values and the intensities of the peptide precursor ions and their fragment

ions are both collected, and the raw data are searched with a protein sequence database for protein

identification. The high resolution/high mass accuracy MS approach provides large numbers of

MS1 and MS2 information on the peptide species, followed by bioinformatics analysis and data

interpretation. The quantitative information can also be obtained through MS analysis, which may

require labeling or internal standards spiked-in (72). Moreover, LC-MS based proteomic study

platform is suitable for protein analysis with high-throughput, which can be adapted for HCP

analysis, as discussed in Chapter 3.

Systems biology study of CHO cell lines is valuable and meaningful for improvement of

therapeutic protein production and new drug development in the biopharmaceutical industry. LC-

MS/MS based quantitative proteomics is a powerful tool to discover critical metabolic pathways,

potentially benefitting industrial drug production. In this section, the introduction of LC-MS/MS

based quantitation proteomics technology will be presented from multidimensional LC separation,

to MS instrumentation. Then, current quantitation approaches for LC-MS will be discussed.

1.8.1 Two dimensional liquid chromatography

High separation power from LC is essential for LC-MS/MS based strategies, especially

when one aims to analyze proteomic samples with high complexity. Sufficient LC separation of

the analyte species can overcome the adverse effect caused by ion suppression, which is a

particular concern for mass spectrometry analysis. Electrospray ionization efficiency of the

analytes is affected by the environment in which ionization is occurring, especially the presence

35

of other molecules. With a complex sample containing a large number of peptides, insufficient LC

separation can result in co-elution of many peptide species, inducing low ionization rate for certain

peptides, especially those that are hydrophobic and/or with low abundance. Non-charged species

cannot reach and/or cannot be detected by the mass analyzer. As a result, the MS run suffering

from ion suppression will encounter signal loss, poor reproducibility, and compromised sensitivity.

These adverse effects caused by low ionization efficiency cannot be corrected or reduced by

modifying the MS data acquisition strategy regardless of the sensitivity and selectivity of the MS

instrumentation. Even with the targeted MS methods, where only target ions are selected and

enriched, such as in selected reaction monitoring (SRM), the quantitation results could be impacted

because of ion suppression (73). Moreover, too many co-eluting species can cause undersampling

of MS since the scan speed of the mass analyzer is limited. Therefore, LC separation with high

separation power is highly desirable for complex sample analysis.

LC with extensive separation power by means of multi-dimensional separation, is widely

employed for LC-MS/MS based proteomic analysis. Multi-dimensional LC achieves high

resolving power by combining two or more orthogonal chromatography methods (74). The most

popular approach is two dimensional liquid chromatography (2D-LC).

There are several widely used 2D-LC strategies for peptide separation of proteomic

samples. The first dimension separation can be cation exchange chromatography (SCX), size

exclusion chromatography (SEC), hydrophilic interaction chromatography (HILIC), or reversed

phase (RP) chromatography. The second dimension separation, on the other hand, is generally RP

LC (75) due to its high resolution power and compatibility of coupling with MS online. SCX/RP

separation is one of the earliest reported 2D-LC separation strategies (76). In SCX/RP, the peptide

mixture is first separated based on charge differences by SCX, and the eluate is then further

36

separated according to the peptide hydrophobicity by RP chromatography (77). However, since

most of the peptides are charged 2+ or 3+, they are eluted from the SCX column in a narrow elution

window, impacting the overall separation power of SCX/RP system. SCX/RP combination was

reported to have a reduced practical resolution power (78).

On the other hand, high pH/low pH RP/RP has been shown to provide the best practical

separation power compared to other 2D-LC combinations (75). Due to the various pKa values of

amino acid residues of the peptides, the change of pH alters the peptide charge states and thereby

their hydrophobicity indexes. As a result, the separation selectivity can be significantly different

for the first (high pH) in comparison to the second (low pH) dimension separation, leading to high

overall resolution power, especially when a wide pH gap exists between these two separations (e.g.

pH~10 for high pH and pH~2 for low pH) (79). Because of the advantages of high pH/low pH

RP/PR separation, it has been used for the CHO cell lysate analysis in Chapter 2.

1.8.2 Mass spectrometry

High resolution mass analyzers

Significant advances of mass spectrometry (MS) instrumentation have been achieved over

the past several decades. To date, it has become a powerful tool which can provide comprehensive

information for proteomic studies. MS allows the analysis within a reasonable time of thousands

of ionic species (peptides) within a wide dynamic range with high resolution, mass accuracy, and

sensitivity (80, 81). The instrument used in the experiments of Chapter 2 and Chapter 3 is a hybrid

quadrupole-Oribtrap, Q Exactive (Thermo Fisher Scientific). Consequently, this section will focus

37

on the MS analyzer type that Orbitrap belongs to, the mass analyzers with very high resolution and

high mass accuracy.

Mass analyzers with the highest mass accuracy (< 5 ppm, even smaller with preferable

conditions) to date are time-of-flight (TOF), Fourier transform ion cyclotron resonance (FT-ICR),

and Orbitrap (82). In TOF mass analyzers, analyte ions are accelerated by an electric field to gain

a specific kinetic energy, and their m/z values are determined by the time that it takes for the ions

to fly in the vacuum flight tube to the detector. Theoretically, higher resolution can be achieved by

increasing the ion flight path length, but the instrument size is restricted by the lab setting, and the

sensitivity could be compromised because of ion losses during ion transfer between orthogonal

TOF or at the detector (83, 84). FT-ICR and Orbitrap mass analyzers both collect time-domain

signals of the ion spatial motions and use a Fourier transform algorithm to convert the signals into

m/z information. Such devices can collect signals from a wide range of m/z values simultaneously,

which means that the whole spectrum can be obtained at once (85). In FT-ICR, ions move under a

magnetic field, and are then excited by a transient electric field. After the excitation, the resultant

coherent ion motions yield time-domain signals which can be collected and transferred to m/z

values. Ions in Orbitrap (Figure 1-9) oscillate around a carefully shaped central electrode under an

electronic field, and the motion frequency is Fourier transformed to m/z values. Moreover, ions

are injected with the coherent motion into Orbitrap by a curved quadrupole ion trap, called C-trap,

and the oscillation starts immediately. It does not require ion excitation inside the Orbitrap, which

is needed in FT-ICR. Compared to FT-ICR, Orbitrap is more compact and cost efficient, and easier

to maintain. Importantly, the mass resolution of FT-ICR is inversely proportionate to m/z, and that

of Orbitrap is to the square root of m/z. As a result, the mass-resolving power decreases more

slowly with the Orbitrap than the FT-ICR with increased m/z values.

38

Figure 1- 9 The scheme of an orbitrap.

The ion injection and the ion motion path are shown in red. This figure is reprinted with permission

from Marshall et al. (85).

Nowadays, many commercialized mass spectrometers which provide high resolution and

high mass accuracy consists of the combination of multiple mass analyzers, called hybrid

instruments. For example, LTQ-Orbitrap (Thermo Fisher Scientific) consists of a linear ion trap

and Orbitrap. It utilizes the high ion trapping capacity and MSn fragmentation ability of the linear

ion trap and the high mass accuracy and resolution of Orbitrap.

The Q Exactive series (Thermo Fisher Scientific) combines a quadrupole and Orbitrap (86).

The quadrupole can guide and select ions between specified m/z ranges with fast switching times,

which results in fast time scale for fragmentation for selected ions for MS2 scan, allowing an

efficient multiplexed scan mode (86). The combination of the quadrupole and Orbitrap results in

high scan speed, mass resolution, and mass accuracy. Moreover, the quadrupole technology is well

established, making the instrument design particularly robust. In the Q Exactive series (Figure 1-

10), the ion fragmentation is achieved by the higher-energy C-trap dissociation (HCD) in the

39

Orbitrap (87), which design enables the detection of low m/z fragment ions, allowing the analysis

of isobaric labeled samples for quantitative proteomic study (see next section). It is advantages

compared to another widely used fragmentation strategy based on the quadrupole ion trap, called

ion-trap based collision-induced dissociation (CID). The quadrupole ion trap is not able to trap

low-mass fragment ions and usually induces lowest-energy fragmentation (87), making the

instrumentation which relying on such fragmentation strategy unable to analyze isobaric labeled

samples. The quadrupole is also compact, so the combination of quadrupole and Orbitrap makes

the instrument a “benchtop-instrument”. In this thesis, all experiments involved in MS

instrumentation were performed on a Q Exactive (Thermo Fisher Scientific, San Jose, CA). Its

construction is shown in Figure 1-10. The Q Exactive can reach a maximum resolution of 140,000

at 200 m/z with mass accuracy < 1 ppm with internal calibration and < 3 ppm with external

calibration.

Figure 1- 10 Construction of the Q Exactive.

40

Reprinted with permission from Michalski et al. (86). Q Exactive is hybrid with quadrupole and

Orbitrap. The quadrupole can filter selected ions at a fast time scale, and the Orbitrap can detect

the ions with high resolution and mass accuracy. In MS2 scan with higher-energy C-trap

dissociation (HCD), precursor ions are fragmented by the HCD collision cell, and the resultant

fragment ions are injected into Orbitrap by C-trap.

There are other advance hybrid instrumentations commercially available. The Orbitrap

Fusion series (Thermo Fisher Scientific) brings quadrupole, Orbitrap and linear ion trap together,

reaching mass resolution as high as 500,000 at m/z 200. The TripleTOF system (ABSciex) uses

quadruple and time-of-flight mass analyzers to reach high scan speed and a wide ion detection

range. The SYNAPT mass spectrometer (Waters) combines time-of-flight and ion mobility mass

spectrometry to improve the ion separation. These state-of-the-art MS instruments provide

powerful tools for protein and proteomics study.

Data acquisition

Besides the MS instrumentation, the information that can be obtained from MS analysis

highly relies on MS data acquisition strategies. The emergence of automated data acquisition

allows the unattended collection of large amounts of data with m/z information of both precursor

and fragment ions when the mass spectrometer is coupled with LC providing continuous separation

for highly complex proteomic samples.

41

Figure 1- 11 The scheme of (A) data dependent acquisition (DDA) and (B) data independent

acquisition (DIA).

In DDA, each MS2 spectrum is obtained from a specific precursor ion. On the other hand, each

MS2 in DIA is from all precursor ions within a m/z range.

42

For shotgun proteomics-based on LC-MS/MS, the most widely used strategy is data

dependent acquisition (DDA) (Figure 1-11A). With DDA, precursor ions are initially detected with

a survey scan, obtaining the mass (m/z values and charge states) and the intensity, called a full-

scan mass (MS1) spectrum. Then, a subset of precursors is selected automatically following a

predefined rule for subsequent fragmentation (MS2 spectra) (88). Commonly, the predefined rule

is selecting the precursors with the highest abundance in the MS1 spectrum, e.g. top 15, because

the peptide precursors with high intensity are more likely to yield a MS2 spectrum with high

quality. To avoid selecting the same precursor with high abundance redundantly during peptide

elution, a strategy called “dynamic exclusion” is widely applied in DDA, in which the precursor

that has been selected and fragmented with a good MS2 spectrum will not to be reselected over a

certain period of time. DDA attempts to obtain the maximum number of unique precursors and

their MS2 spectra. DDA is powerful and versatile, and to date it is the most widely used approach

for shotgun proteomics.

However, there can be hundreds of peptide species in one MS1 full scan spectrum. With

the finite instrument scan speed and specific LC separation window, only a limited number of

peptide precursors can be selected for MS2 acquisition in DDA (89). DDA is designed to pick the

precursors with high abundance, resulting in a sampling bias toward the most abundant compounds

and an undersampling of species with low abundance. To overcome this issue, an alternate strategy,

data independent data (DIA) acquisition, has emerged and been rapidly developed over the past

several years. In DIA, peptide precursors are fragmented systematically without considering MS1

information (90), and it is programmed to fragment all precursors within a wide m/z window (e.g.

m/z 400 to 2000), as shown in Figure 1-11B. There are several DIA strategies which have been

reported with different instrumentation settings. For example, the method called MSE (91), which

43

can be performed with Waters SYNAPT mass spectrometry, fragments all precursors co-eluting

from LC and entering the mass spectrometer at the same time (Figure 1-11B). Another strategy

uses relatively narrow precursor isolation windows (e.g. 25 m/z width) to subdivide the wide m/z

range. The precursors within the narrow range are then fragmented together, and the multiple

isolation windows are set to cover the whole m/z range, leading to a decrease in the complexity of

the MS2 spectra (Figure 1-11B). This strategy can be used on the TripleTOF system (ABSciex),

called SWATH (92), and on the Q Exactive series (Thermo Fisher Scientific), called DIA with

multiplexed MS/MS (93). By eliminating the sampling bias of DDA, DIA is able to provide more

reproducible run to run data and has a better chance to observe species of low abundance because

of fragmentation of all precursor ions (94). However, DIA raw data are inherently complicated

due to the reduction of precursor selection, and thereby it is challenging to interpret noisy DIA

data. In this thesis, DDA was used in Chapter 2 for the systems biology study of CHO cells

producing a biopharmaceutical. The high throughput and data analysis tools provided

comprehensive peptide identification and quantitation. DIA was applied to analyze low abundant

HCPs in the therapeutic product in Chapter 3, to take advantage of its un-biased sampling.

Discovery proteomics based on DDA and DIA is a powerful tool to study and quantitate

the overall species in a sample and to provide hypotheses for systems biology. After that,

individual proteins can be identified and recognized as significant such as biomarkers. Then, MS-

based targeted approaches, e.g., multiple reaction monitoring (SRM or MRM) (95), and PRM (96),

can be applied for individual proteins quantitation.

With MS-based targeted approaches, the specific data on each target peptide is needed such

as the m/z values of the precursor ions, fragment ion information, and the retention time of LC

separation. In the analysis, the specific peptide precursor ions are selected by predefined m/z values

44

(or in combination of retention time windows), and the resultant fragment ions are then detected

(95, 97). Heavy isotopically labeled homologues of the target peptides can be spiked into the

samples at known amounts as internal standards. The quantitation of the specific peptides is

achieved by comparing the signal intensities between the internal standards and the analytes of

interest. Since the internal standards are isotopologues that elute at the same time as the analytes,

variations caused by ion suppression are eliminated. These internal standards can even be used as

identification confirmation of the target peptides because peptides with the same sequence share

the same retention time and fragmentation patterns. Moreover, most of the interferences are filtered

out before they can reach the mass analyzer, so the sensitivity and selectivity of this approach is

high enough to detect low amol levels of analytes with complicated biochemical background (97).

SRM (or MRM) is currently the most widely used MS-based targeted approach for

quantitation of peptides and small molecules. The scheme of SRM is shown in Figure 1-12A. SRM

is most commonly performed with a triple quadrupole MS (98), and other instrumentations such

as QqTOF are also reported to use SRM. The target peptide precursors are selected by the first

quadrupole, and the selected ions are fragmented by the collision cell. Then, the third quadrupole

or TOF acts as a filter to select several specific fragment ions with predefined m/z values. SRM

requires the knowledge of both the peptide precursor information and the corresponding

fragmentation information, the combination of which is called a “transition”. For a given peptide

precursor, the intensities of individual fragment ions vary in a wide range, and the fragment ions

with high intensity are preferred as transition choices to maximize the detection sensitivity.

Moreover, MS parameters for every single target peptide, especially collision energy values, need

to be optimized in order to ensure the presence of specific fragment ions (98). Consequently,

45

significant efforts are needed for method development of SRM assay, which is often laborious and

time consuming.

Figure 1- 12 The schemes of PRM and SRM processes.

A. SRM (MRM). SRM is performed with triple quadruple mass spectrometer. Specific transitions,

which are precursor-fragment pairs, are selected and detected. Several transitions can be detected

for one peptide precursor. B. PRM. PRM, on the other hand, is generally used with Q-Orbitrap.

Instead of selecting specific transitions, all fragments of one peptide precursor are detected to take

advantage of the high resolution of the Orbitrap and its property that the whole spectrum can be

obtained at once.

PRM is an alternate choice when one has a hybrid quadrupole-Obitrap instrument (Figure

1-12B). PRM is also recently reported to run on QqTOF (99). Instead of detecting specific

transitions, PRM collects all of the fragment ion information for a certain peptide precursor,

utilizing a mass analyzer which can obtain the whole spectrum at once (Orbitrap) with high

46

resolution, or one able to scan fast enough to detect many fragment ions within a short time (TOF).

After the data acquisition, 3 to 7 fragment ions can be chosen for quantitation with the isotopically

labeled peptides internal standards spiked in. In PRM assay development, one does not need to

pick specific fragment ions or optimize MS parameters for specific transitions. Thus, PRM assay

development is more time efficient and less laborious, and can be established more rapidly

compared to the conventional SRM/MRM. In Chapter 3, the PRM approach was employed to

quantify individual HCPs in a therapeutic antibody drug.

1.8.3 LC-MS based quantitative proteomics and protein analysis

LC-MS based quantitative proteomics strategies have emerged as powerful tools for

systems biology study and biomarker discovery. The capabilities of broad proteome coverage and

accuracy of quantitation of these approaches keep improving, and these strategies have been

applied to address a wide range of biological questions. LC-MS based quantitative proteomics

approaches can be categorized into three groups: label-free and labeled approaches. Their

application, advantages and disadvantages will be discussed in this section.

In Chapter 2, LC-MS/MS based relatively quantitative proteomics based on shotgun

approach is used for a broad survey of the overall proteome and expression differences across

several CHO cell samples at different growth time points and cultivation conditions. Potential

biomarkers for CHO cell growth status are also suggested. In Chapter 3, label-free quantitation

and targeted MS based on LC-MS with PRM were both applied for individual HCP quantitation.

47

1.8.3.1 Label-free quantitation

Label-free quantitation, in comparison to labeled, requires less sample handing. It does not

require modification and/or specific treatment of proteins/peptides compared to labeling

approaches. After enzymatic digestion, the resultant peptide mixture is analyzed by LC-MS. Each

sample needs separate MS runs, and the quantitation information of the samples are obtained from

comparison across several corresponding individual MS runs. Hence, this strategy is

straightforward and cost-efficient. It can also apply to any type of biological samples.

Label-free approaches can be based on either intensity or spectral-counting (100). For the

former, peak intensities or areas of specific peptides in the chromatographic profile are used as an

indicator of their abundance. Spectral counting relies on the positive correlation between the

number of identified MS/MS spectra and the peptide/protein amounts based on data dependent

acquisition (DDA) (89). Label-free analysis allows an unlimited number of samples that can be

compared. However, each sample must be handled and tested individually. Thus, throughput is not

as high as that of labeling techniques. Moreover, the direct comparison across several LC-MS/MS

runs can be affected by the run-to-run variation, and therefore several replicates are required to

build statistical validation of the analysis.

1.8.3.2 Labeled quantitation approaches

Labeling approaches require the introduction one or several heavy isotopologues, typically

13C, 15N, 18O, and/or D, to the proteins or peptides. MS can recognize the predictable mass

difference or specific reporter signals introduced by the labeling and also distinguish identical

peptides from different samples. Thus, samples labeled with different isotopes or isotopic

48

combinations can be pooled together and analyzed in one MS run. Such sample multiplexing

increases the throughput significantly by reducing the overall analysis time. Moreover, in these

approaches, MS run-to-run variation can be eliminated. The identical peptides are expected to have

similar behavior during LC-MS analysis including retention time, ionization efficiency, and ESI

signal response factor (101), and hence testing within one MS run provides additional statistical

validation. These approaches are especially desirable when one is interested in the differential

regulations of proteomes among samples under several physiological conditions.

There are two major categories of labeling strategy, in vivo metabolic labeling and in vitro

chemical derivatization processes. Stable isotope labeling by amino acids in cell culture (SILAC)

is widely used as an in vivo metabolic labeling approach (102). There are several types of chemical

derivatization labeling techniques, including enzymatic labeling of 16O/18O (103), dimethyl

labeling (104), isotope-coded affinity tag (ICAT) labeling (105), and isobaric mass tag labeling

(106, 107).

Based on different MS spectral information used to recognize identical peptides from

different samples, there are two types of labeling techniques. One is to recognize the peptides by

predictable mass shifts of the precursor ions in the MS1 spectra. Specific mass shifts correspond

to specific labeled samples. Also, in this strategy quantitative information generally depends on

precursor ions, by calculating the ratio of precursor ion intensities of the heavy/light peptide pairs.

The other technique is to obtain the information at the MS2 level of the specific “reporter ions”

for each sample from an isobaric mass tag containing different combinations of isotopes. An

illustration of the categories is shown in Figure 1-13. In the following sections, the major labeling

approaches will be discussed in detail. In Chapter 2, the isobaric mass tag labeling, TMT, was used

to compare different cell culture growths.

49

Figure 1- 13 The categories of labeling approaches.

Based on the MS spectral recognition (in red), MS can recognize the labeling of the MS1 precursor

ions for SILAC, ICAT, 16O/18O labeling, and dimethyl labeling. The quantitative information can

be obtained at the MS2 level for isobaric labeling including iTRAQ, TMT, DiLeu, and DiART.

According to the labeling mechanisms (in purple), SILAC is metabolic labeling, and chemical

labeling includes ICAT, 16O/18O labeling, dimethyl labeling, and isobaric labeling. Moreover,

SILAC, ICAT, and isobaric labeling are generally performed at the protein level, and 16O/18O,

dimethyl, and isobaric labeling can be used at peptide level after enzymatic digestion.

Metabolic labeling

In SILAC, cells are cultured in the growth media containing one or several essential amino

acids labeled with stable isotopes. Then, all of the expressed proteins are incorporated

metabolically with these labeled amino acids after several cell cycles of replication. Media

containing different sets of labeled peptides can provide a series of samples with various mass

differences. These samples are pooled together based on either equal number of cells or equal

amount of total protein and then analyzed by MS after necessary sample handling and preparation.

Peptide intensity ratios of the “heavy” and “light” pairs represent their relative abundances,

which can be interpreted for relative expression levels of the corresponding proteins. Traditional

SILAC typically used Arg and/or Lys labeled with 13C and/or 15N for essential amino acids, and it

has been reported to reach as high as 5-plex SILAC (108, 109). SILAC provides the most accurate

50

quantitation information among quantitative proteomic approaches. Since the samples can be

pooled at the very early stage of sample handling, the systematic and random variations from

sample preparation can be significantly reduced (110). However, SILAC is not easy to apply on

tissue samples or biofluids, limiting its application for sample types mainly to cell culture.

Moreover, SILAC may not be practical to be employed for large production scale (liters to kilo-

liters) in the biopharmaceutical industry.

Enzymatic labeling with 16O/18O

Enzymatic labeling of 16O/18O is one of the earliest isotopic labeling techniques used in

proteomics (111). As shown in Figure 1-14, 18O atoms can be introduced into the C terminus of

peptides during or after protein enzymatic digestion in H218O solution with enzymes such as

trypsin, Lys-C and Glu-C (101, 111). This approach is relatively inexpensive and easy to perform

(101). However, since different labeling efficiencies usually occur for different peptides and

oxygen back-exchange can happen, the labeling protocol needs to be optimized (112-114). The

number of sample multiplex is up to three.

51

O

NH

RR OH trypsin

NH2

R

O

Otrypsin

R

O

OHROH trypsin

O

OHR

OH trypsin

O

Otrypsin

R

OH

OROH trypsin

OH H

OH H

OH H

+ 18 +

18

18

+

18

18 +

18

18

Figure 1- 14 The reaction of enzymatic labeling of 16O/18O (115).

ICAT

ICAT chemical reagents specifically react with Cys residues of proteins and peptides.

ICAT labeling is generally performed at the protein level, and the “light” and “heavy” isotopically

labeled ICAT reagents (typically duplex) provide a mass shift of 9 Da per labeled Cys residue

(105). After labeling, the proteins undergo enzymatic digestion. There is a biotin group presenting

in the ICAT labels, so the Cys-containing peptides, which are labeled by ICAT reagents, can then

be isolated and enriched by an Avidin column. The enriched Cys-containing peptides are analyzed

by the following MS analysis (105, 116). Therefore, the sample complexity is reduced, increasing

the possibility of identification and quantification of proteins with low abundance. However, the

proteins containing no Cys-residue cannot be analyzed, and low Cys-containing proteins would be

only identified with single peptides or even cannot be identified.

52

Dimethyl labeling

Dimethyl labeling can reach up to triplex sample comparison, and the reaction has been

automated online with LC-MS (104, 117, 118). The labeling reaction is performed at the peptide

level after protein digestion. Dimethyl labeling is based on the reaction between primary amines

and formaldehyde to from a Schiff base, and then reduced by cyanoborohydride (Figure 1-15)

(119). The primary amine groups of the peptides, which are N-termini and epsilon amino group of

Lys-residues, are converted to a dimethyl labeled amine, and the N-terminus proline can be

converted to a monomethylamine (120). With the combination of different forms of formaldehyde

(normal, deuterated, and deuterated with 13C labeled) and cyanoborohydride (normal and

deuterated) (Figure 1-15), triplex “mass tags” can be obtained. The advantages of the approach

include low cost, quick reaction, and high labeling efficiency (sub-micrograms to milligrams of

sample) (104). However, the disadvantage is that the deuterated groups are usually around the

hydrophobic portion of the peptides, and the retention time of the identical peptides with different

labeled tags can have noticeable retention time shifts with RP chromatography, complicating the

data analysis. The relative quantitation of the identical peptide species requires a search across a

retention time range instead of in one spectrum (104).

53

Figure 1- 15 Chemical reaction of dimethyl labeling.

Figure is reprinted with the permission from Boersema et al. (117).

Isobaric labeling-based relative quantitation

In isobaric labeling, the quantitative and qualitative information of analytes are obtained at

the MS2 level during MS analysis. An isobaric labeling reagent set has the same chemical structure

with identical mass, but with different combinations of isotopic substitutions. The general scheme

of the isobaric labeling reagent structure is shown in Figure 1-16. The structure is composed of a

reactive group, a mass normalizer group, and a mass reporter group. The reactive group is used to

covalently attach the mass tag onto the peptide. The mass reporter groups of the individual reagents

have different masses in one set, resulting from different isotopic combinations, distinguishable in

the MS2 spectrum. The function of the mass normalizer group, that also contain different

combinations of isotopes, is to balance the mass difference of the mass reporter group, making the

overall reagent isobaric.

54

Figure 1- 16 The scheme of (A) isobaric labeling reagents and (B) labeled peptide.

Most of the isobaric labeling reagents react with the primary amine groups of peptides, which are

N-terminus and epsilon amino group of Lys-residues, and such reactions occur without major side

reactions.

For the isobaric labeling-based approaches, each reagent in one isobaric labeling reagent

set is to provide the each analyte in a given sample with a specific isobaric mass tag, and then the

group of samples under study are pooled together. This method can label the samples before or

after enzymatic digestion. Individual peptides from the multiplexed samples co-elute with the same

LC retention time. The precursor ions are indistinguishable in the MS1 spectrum and will be

isolated together for the following fragmentation event. During the MS2 scan event, two types of

non-overlapping product ions are generated, (1) reporter ions from the labeled mass tags at low

55

m/z values and (2) peptide fragment ions at high m/z values. In the MS2 scans, the signal intensities

of the reporter ions from the different mass tags provide quantitation information across the

different samples, and the MS2 peptide fragment ions are used for peptide identification. An

example of the widely used isobaric labeling reagent, tandem mass tag (TMT), as well as the

detection principle, is shown in Figure 1-17.

There are advantages of isobaric labeling technology over the MS1-based detection

approaches. First, this technique is very flexible. It can be applied to any type of sample including

cell lines, tissues, and body fluids and can be employed at the protein or peptide level. Second,

isobaric labeling is able to reach high numbers of sample multiplexing without significantly

reducing the sensitivity of the MS analysis. The precursor ions of identical peptides from different

samples show the same m/z value, and the co-isolation of these precursor ions thus does not

compromise the MS1 sensitivity while still allowing MS2 spectra to be obtained. As a result,

isobaric labeling can commonly reach high numbers of multiplexing, boosting the throughput

significantly. The 8-plex iTRAQ is commercially available from Sciex (Framingham, MA), and

10-plex TMT are from Thermo Scientific (Rockford, IL) (121). The combinatorial isobaric mass

tags (CMTs) was reported to potentially reach 28-plex in one set (122).

56

Figure 1- 17 TMT 6-plex labeling reagents and the technology principle.

A. The chemical structure of TMT reagent, and the new peptide bond formation at the primary

amine group with the TMT isobaric tags. The isotope distribution for the TMT 6-plex is also shown.

B. The scheme of how TMT labeling technique provides identification and quantitation

information for multiple samples in one LC-MS/MS run.

However, only mass spectrometers which can detect low m/z values in the MS2 scan can

be used for isobaric labeling because the m/z values of the reporter ions generally range from m/z

57

110 to 135. Moreover, the quantitative accuracy can be compromised (121, 123-125). The isolation

window for precursor ion selection is often from 1.5 to 3 Th. Besides the targeted precursors, all

other precursor ions of the co-eluting peptides within this isolation window are also selected. After

fragmentation, the reporter ion signals for all the co-isolated labeled peptide precursor ions

contribute to the reporter ion signal. Consequently, the actual analyte abundances can be incorrect.

The effect of such interference is unpredictable, dependent on the complexity of samples. However,

the relative quantitation difference from proteomic samples are typically considered to be

underestimated and compressed with the assumption that the majority of proteins would not be

differentially regulated in the biological studies. Thus, the quantitative data interpretation of the

isobaric labeling requires special attention. One approach was reported to utilize the triple-stage

mass spectrometry (MS3), which is capable of both ion-trap-based CID and HCD fragmentation,

to increase the quantitative accuracy (124). The precursor ions are first fragmented by CID, which

is with low fragmentation energy, and then the most intense product ions are then selected for the

subsequent HCD fragmentation (MS3). In this way, interference can be removed. However, in this

thesis, such specific instrument is not available, so specific ratio cutoff was used in Chapter 2

instead of considering the actual fold change values.

58

Figure 1- 18 Chemical structure of major isobaric reagents.

A. iTRAQ (107); B. DiLeu (126); C. DiART (127); D. CMT (122). In this figure, (a) shows the

chemical structure of the isobaric reagent, and (b) illustrates the new peptide bond formation at the

primary amine group with the isobaric tag.

59

There are several types of isobaric reagents with their corresponding reaction chemistry

mechanisms which have been developed. Tandem mass tag (TMT) reagents (106) and isobaric tag

for relative and absolute quantitation (iTRAQ) reagents (107) are commercially available and most

commonly used. Several other novel isobaric labeling reagents have also been reported including

N,N-dimethyl leucines (DiLeu) (126), deuterium isobaric amine-reactive tag (DiART) (127), and

combinatorial isobaric mass tags (CMT) (122). The chemical structures of these isobaric reagents

are shown in Figures 1-17 and 1-18. All of the reagents react with the primary amine groups of

peptides, which are N-terminus and epsilon amino group of Lys-residues, and such reaction has

been proven without major side reactions (101). Other isobaric labeling for specific post-

translational modification and cysteine-specific isobaric tags are based on other chemistry

reactions such as carbonyl- and sulfhydryl-reactions (121).

In Chapter 2, TMT 6-plex (reporter ions from m/z 126 to 131) has been applied to study

proteome profile changes of CHO cells across several times using different culture conditions.

This approach is more suitable and practical than other techniques such as SILAC because

industrial large scale cultivation was studied.

Notably, as mentioned previously, there are multiple choices of MS data acquisition

strategies, label-free and labeled quantitative approaches. Individual choices can be combined,

yielding an extensive number of workflows. For example, DDA and isobaric labeling can be used

for discovery proteomics, and SRM can also work with labeled approaches to quantify target

proteins with extended sensitivity. One can make choices depending on the issues that are

attempted to address.

60

1.8.4 MS-based proteomic data interpretation

1.8.4.1 Peptide and protein identification

In the shotgun bottom-up proteomic analysis, peptide identification is the first step to

interpret the MS data, which is then used to infer the presence of corresponding proteins. Also, the

quantitative information, can also obtained from peptide to reflect the regulation level of proteins.

To date, there are three strategies to identify peptides from MS2 spectra: database searching, de

novo sequencing and spectral library searching.

Currently, database searching is the most widely used strategy. The data interpretation in

database searching is fundamentally supported by a protein sequence database (128). In this

strategy, the protein sequences in the database are in silico digested by specific enzyme, and the

information of all possible precursors and their corresponding possible fragment ions of the peptide

candidates are collected. Then the experimental MS raw data (MS1 and MS2 spectra) are

compared to this information, yielding a matching score for each peptide spectral match (PSM)

based on either similarity or probability, which can indicate how confident the match is. SEQUEST

(129) and Mascot (130) are widely used examples of such strategy. Note that, with the database

searching algorithm alone, false positive PSMs still occur. To reduce this false positive discovery

rate, a PSM evaluation strategy, called target-decoy strategy, is frequently used (131). A ‘decoy’

database which contains reversed or shuffled peptide sequences and the ‘target’ database which is

composed of true protein sequences are combined, and the experimental MS raw data searched

against this combined ‘target-decoy’ database. Since the PSMs matching the ‘decoy’ sequences

are known to be false, the cutoff of the matching score can be determined based on the desired

false discovery rate (FDR). Database searching is effective and relatively accurate. However, it

61

cannot recognize peptides which are not present in the database. In Chapter 2, database searching

was used for peptide and protein identification for CHO proteomic MS data analysis.

The algorithm of de novo sequencing, on the other hand, only relies on the peptide fragment

pattern of experimental MS data to determine the peptide sequences, without the assistance of a

protein sequence database (132). The approach can identify peptides which are not contained in

the reference database, which could be particularly helpful to identify protein variants and proteins

from organisms whose genomes have not been sequenced (132, 133). However, de novo

sequencing suffers from inaccuracy caused by factors such as noisy spectra, incomplete ion series,

and limited MS2 mass accuracy (133). As a result, database searching has generally outperformed

the de novo sequencing strategy. However, the rapid development of MS instrumentation is

promising to provide higher quality of the MS2 spectra, which may lead to more attention for de

novo sequencing.

Spectral library searching (134-136) is based on the assumption that under similar

conditions, a given peptide would yield nearly identical MS2 spectra including the fragment ion

species and their intensities. If an MS2 spectrum is an accurate peptide spectral match (PSM), it

can be compiled to build up a reference called a spectral library. The newly obtained experimental

MS data can then be compared with the PSMs in the spectral library to identify known peptides.

Moreover, LC retention time information can also be added in the spectral library. When one is

using similar LC separation conditions (e.g. RP LC), the retention time information can increase

the identification certainty. This method is highly time efficient with high accuracy, especially

when using a comprehensive spectral library with high quality. However, it can only identify the

peptide containing in the spectral library, and it also requires that the new experiments should

follow the similar data acquisition conditions with which the spectral library has been built up.

62

For DDA data, the algorithm can be applied directly since each MS2 spectrum is from a

well-selected precursor. On the other hand, the data analysis for DIA may require some conversion

because of noisy MS2 spectra and limited precursor isolation. There are generally two strategies

for DIA data analysis. The first one is called targeted extraction, which basically relies on spectral

library searching. In this approach, the spectral library is required, which should contain PSMs and

their retention times. Then, the spectral information from the DIA data can be extracted and

analyzed based on the spectral library for peptide identification. OpenSWATH is one of such

approach (137). The other strategy is mentioned as untargeted peptide identification, which can

utilize database searching to interpret DIA data. In this strategy, DIA data are computationally

reconstructed into pseudo MS2 spectrum. That is, from the DIA data, each precursor is grouped

with its all possible fragment ions. The resultant precursor-fragment group, called pseudo MS2

spectrum, can then be put into a database searching strategy for peptide identification. DIA-Umpire

is one of these search tools (138).

Peptide identification is then used for protein inference. Two general groups of strategy,

probabilistic and non-probabilistic methods, have been developed. The non-probabilistic approach

provides the protein entries which can be explained by the identified peptides with high confidence.

In the probabilistic method, the quality of PSMs will also be taken into consideration to provide a

probability evaluation of the protein presence. For example, it may assign a high probability for a

protein with multiple low scoring PSMs.

Notably, no single tool or search engineer can yield the “best” results, and it has been

reported that the combination of multiple algorithms can provide a more robust pipeline to provide

more peptide identifications than a single approach (133). Several softwares can provide such a

platform to insert the desired tools and to build the customized MS data analysis pipeline. For

63

example, the commercial software Proteome Discoverer (Thermo Scientific) can integrate several

searching engines and PSM evaluation methods together. In Chapter 2, we used three searching

engines to maximize the number of identified proteins by Proteome Discoverer 1.4.

1.8.4.2 Biological analysis

In shotgun proteomic analysis for systems biology study, the protein abundance

information needs to be further interpreted to provide biological reasons or to generate testable

hypotheses for the biological system under specific perturbations. Generally, a large number of

differentially regulated proteins/metabolites/mRNAs can be determined through statistical

analysis, and the biofunctions and/or pathways that are supported by differentially regulated

molecules can be selected as candidates of cellular responses under certain circumstances. The

hypotheses can then be generated, and follow up experiments performed to test the hypotheses.

The lists of differentially regulated proteins/metabolites/mRNAs from ’Omics experiments

are generally long, and manual data analysis can hence be laborious and hard to yield biological

insight. In order to interpret the data, several strategies have been developed. One popular approach

is Gene Ontology (GO) enrichment analysis (http://geneontology.org/page/go-enrichment-

analysis). GO annotate proteins/genes based on three biological categories: “biological process”,

“molecular function” or “cellular component” (139). After the proteins from the experiments are

annotated with GO terms, enrichment analysis identify which specific GO terms show more

abundance and are over-represented in samples under perturbation, highlighting the likely

involved underlying biology. Another strategy is pathway analysis. The pathway database is built

up by assigning the relevant proteins/mRNA/metabolites into certain pathways based on their

http://geneontology.org/page/go-enrichment-analysis

http://geneontology.org/page/go-enrichment-analysis

64

biological effects. Then, the protein list obtained from experiments is mapped into the pathway

database. The specific pathways which are over-represented by the experimental data are

considered as biological processes being affected by perturbations. Kyoto encyclopedia of genes

and genomes (KEGG) (140) and Ingenuity Pathway Knowledge Base are examples of such

pathway databases.

Currently many biological databases and software tools are available for ‘Omics data

analysis, such as Ingenuity Pathway Analysis (IPA) (Ingenuity® Systems, www.ingenuity.com,

Redwood City, CA), MetaCore (Thomson Reuters, https://portal.genego.com/, New York City,

NY), and Visualization and Integrated Discovery (141) (DAVID) (Leidos Biomedical Research,

Inc., https://david.ncifcrf.gov/, Frederic, MD). All of the software mentioned here can provide both

GO term and pathway analysis. In this thesis, IPA and MetaCore were used in Chapter 2 to analyze

the proteomic and metabolomics data for biology analysis. For both IPA and MetaCore, the

biology databases are generated from current publication and knowledge. Proteomics,

metabolomics, and transcriptomics data can be used as input. Specifically, IPA can analyze

combined proteomic and metabolomics data sets (142), which was utilized in Chapter 2.

1.9 Conclusion

The success of the biopharmaceutical production in industry depends on advances of

upstream bioprocess and downstream purification at the large scale. Despite the achievements to

date, the limited understanding of the systems biology of CHO, the primary workhorse for

therapeutic protein production, hinders the upstream bioprocess development. Meanwhile, the

detection and evaluation of residual HCPs in the final drug product is still challenging. LC-MS

http://www.ingenuity.com/

https://portal.genego.com/

https://david.ncifcrf.gov/

65

based quantitative proteomic and protein analysis is a powerful tool to study the systems biology

of CHO cell lines as well as a promising platform for HCP analysis.

1.10 Reference

1. Walsh G (2014) Biopharmaceutical benchmarks 2014. Nat. Biotechnol. 32(10):992-1000.


3. Datta P, Linhardt RJ, & Sharfstein ST (2013) An 'omics approach towards CHO cell

engineering. Biotechnol. Bioeng. 110(5):1255-1271.

4. Jayapal KR, Wlaschin KF, Hu WS, & Yap MGS (2007) Recombinant protein therapeutics

from CHO cells - 20 years and counting. Chem. Eng. Prog. 103(10):40-47.

5. Griffin TJ, Seth G, Xie HW, Bandhakavi S, & Hu WS (2007) Advancing mammalian cell

culture engineering using genome-scale technologies. Trends Biotechnol. 25(9):401-408.

6. Barnes LM, Bentley CM, & Dickson AJ (2001) Characterization of the stability of

recombinant protein production in the GS-NS0 expression system. Biotechnol. Bioeng.

73(4):261-270.

7. Baldi L, et al. (2005) Transient gene expression in suspension HEK-293 cells: Application

to large-scale protein production. Biotechnol. Prog. 21(1):148-153.

8. Chu L & Robinson DK (2001) Industrial choices for protein production by large-scale cell

culture. Curr. Opin. Biotechnol. 12(2):180-187.

9. Dalton AC & Barton WA (2014) Over-expression of secreted proteins from mammalian

cell lines. Protein Sci. 23(5):517-525.

66

10. Kober L, Zehe C, & Bode J (2012) Development of a novel ER stress based selection

system for the isolation of highly productive clones. Biotechnol. Bioeng. 109(10):2599-

2611.

11. Kaufman RJ, et al. (1985) Coamplification and coexpression of human tissue-type

plasminogen-activator and murine dihydrofolate-reductase sequences in Chinese-hamster

overy cells. Mol. Cell. Biol. 5(7):1750-1759.

12. Tjio JH & Puck TT (1958) Genetics of somatic mammalian cells. II. Chromosomal

constitution of cells in tissue culture. J. Exp. Med. 108(2):259-268.

13. Wurm MF (2013) CHO quasispecies-Implications for manufacturing processes. Processes.

1:296-311.

14. Hamilton WG & Ham RG (1977) Clonal growth of Chinese-hamster cell lines in protein-

free media. In. Vitro. Cell. Dev. B. 13(9):537-547.

15. Shukla AA & Thommes J (2010) Recent advances in large-scale production of monoclonal

antibodies and related proteins. Trends Biotechnol. 28(5):253-261.

16. Farid SS (2007) Process economics of industrial monoclonal antibody manufacture. J.

Chromatogr. B Analyt. Technol. Biomed. Life Sci. 848(1):8-18.

17. Jain E & Kumar A (2008) Upstream processes in antibody production: Evaluation of

critical parameters. Biotechnol. Adv. 26(1):46-72.

18. Glacken MW, Fleischaker RJ, & Sinskey AJ (1983) Large-scale production of mammalian-

cells and their products-engineering principles and barriers to scale-up. Ann. N. Y. Acad.

Sci. 413(DEC):355-372.

19. Grima EM, Chisti Y, & MooYoung M (1997) Characterization of shear rates in airlift

bioreactors for animal cell culture. J. Biotechnol. 54(3):195-210.

67

20. Gronemeyer P, Ditz R, & J. S (2014) Trends in upstream and downstream process

development for antibody manufacturing. Bioeng. 1:188-212.

21. De Jesus M & Wurm FM (2011) Manufacturing recombinant proteins in kg-ton quantities

using animal cells in bioreactors. Eur. J. Pharm. Biopharm. 78(2):184-188.

22. Shukla AA & Gottschalk U (2013) Single-use disposable technologies for

biopharmaceutical manufacturing. Trends Biotechnol. 31(3):147-154.

23. Bibila TA & Robinson DK (1995) In pursuit of the optimal fed-batch process for

monoclonal-antibody production. Biotechnol. Prog. 11(1):1-13.

24. Birch JR & Racher AJ (2006) Antibody production. Adv. Drug Delivery Rev. 58(5-6):671-

685.

25. Goochee CF & Monica T (1990) Environmental-effects on protein glycosylation. Nat.

Biotechnol. 8(5):421-427.

26. Wong DCF, Wong KTK, Goh LT, Heng CK, & Yap MGS (2005) Impact of dynamic

online fed-batch strategies on metabolism, productivity and N-glycosylation quality in

CHO cell cultures. Biotechnol. Bioeng. 89(2):164-177.

27. Yang M & Butler M (2000) Effects of ammonia on CHO cell growth, erythropoietin

production, and glycosylation. Biotechnol. Bioeng. 68(4):370-380.

28. Andersen DC, Bridges T, Gawlitzek M, & Hoy C (2000) Multiple cell culture factors can

affect the glycosylation of Asn-184 in CHO-produced tissue-type plasminogen activator.

Biotechnol. Bioeng. 70(1):25-31.

29. Baker KN, et al. (2001) Metabolic control of recombinant protein N-glycan processing in

NS0 and CHO cells. Biotechnol. Bioeng. 73(3):188-202.

68

30. Xie LZ & Wang DIC (1997) Integrated approaches to the design of media and feeding

strategies for fed-batch cultures of animal cells. Trends Biotechnol. 15(3):109-113.

31. Voisard D, Meuwly F, Ruffieux PA, Baer G, & Kadouri A (2003) Potential of cell retention

techniques for large-scale high-density perfusion culture of suspended mammalian cells.

Biotechnol. Bioeng. 82(7):751-765.

32. Betts JI & Baganz F (2006) Miniature bioreactors: current practices and future

opportunities. Microb. Cell. Fact. 5:14.

33. Tai M, Ly A, Leung I, & Nayar G (2015) Efficient high-throughput biological process

characterization: Definitive screening design with the Ambr250 bioreactor system.

Biotechnol. Prog. 31(5):1388-1395.

34. Bhambure R, Kumar K, & Rathore AS (2011) High-throughput process development for

biopharmaceutical drug substances. Trends Biotechnol. 29(3):127-135.

35. Xing ZZ, Kenty BN, Li ZJ, & Lee SS (2009) Scale-up analysis for a CHO cell culture

process in large-scale bioreactors. Biotechnol. Bioeng. 103(4):733-746.

36. Aranibar N, et al. (2011) NMR-based metabolomics of mammalian cell and tissue cultures.

J. Biomol. NMR 49(3-4):195-206.

37. Junker BH (2004) Scale-up methodologies for Eseheriehia coli and yeast fermentation

processes. J Biosci. Bioeng. 97(6):347-364.

38. Marks DM (2003) Equipment design considerations for large scale cell culture.

Cytotechnology. 42(1):21-33.

39. Sieblist C, Jenzsch M, Pohlscheidt M, & Lubbert A (2011) Insights into large-scale cell-

culture reactors: I. Liquid mixing and oxygen supply. Biotechnol. J. 6(12):1532-1546.

69

40. Sieblist C, et al. (2011) Insights into large-scale cell-culture reactors: II. Gas-phase mixing

and CO2 stripping. Biotechnol. J. 6(12):1547-1556.

41. Tscheliessnig AL, Konrath J, Bates R, & Jungbauer A (2013) Host cell protein analysis in

therapeutic protein bioprocessing - methods and applications. Biotechnol. J. 8(6):655-670.

42. Shukla AA, Hubbard B, Tressel T, Guhan S, & Low D (2007) Downstream processing of

monoclonal antibodies - Application of platform approaches. J. Chromatogr. B Analyt.

Technol. Biomed. Life Sci. 848(1):28-39.

43. Hober S, Nord K, & Linhult M (2007) Protein A chromatography for antibody purification.

J. Chromatogr. B Analyt. Technol. Biomed. Life Sci. 848(1):40-47.

44. Shukla AA & Hinckley P (2008) Host cell protein clearance during protein A

chromatography: development of an improved column wash step. Biotechnol. Prog.

24(5):1115-1121.

45. Azevedo AM, Rosa PAJ, Ferreira IF, & Aires-Barros MR (2009) Chromatography-free

recovery of biopharmaceuticals through aqueous two-phase processing. Trends Biotechnol.

27(4):240-247.

46. Guiochon G & Beaver LA (2011) Separation science is the key to successful

biopharmaceuticals. J. Chromatogr. A 1218(49):8836-8858.

47. Briggs J & Panfili PR (1991) Quantitation of DNA and protein impurities in

biopharmaceuticals. Anal. Chem. 63(9):850-859.

48. Wolter T & Richter A (2005) Assays for controlling host-cell impuriteis in

biopharmaceuticals. Bioprocess. Int. 3(2):2-6.

49. Xu X, et al. (2011) The genomic sequence of the Chinese hamster ovary (CHO)-K1 cell

line. Nat. Biotechnol. 29(8):735-U131.

70

50. Lewis NE, et al. (2013) Genomic landscapes of Chinese hamster ovary cell lines as

revealed by the Cricetulus griseus draft genome. Nat. Biotechnol. 31(8):759-765.

51. Kim JY, Kim YG, & Lee GM (2012) CHO cells in biotechnology for production of

recombinant proteins: current state and further potential. Appl. Microbiol. Biotechnol.

93(3):917-930.

52. Kildegaard HF, Baycin-Hizal D, Lewis NE, & Betenbaugh MJ (2013) The emerging CHO

systems biology era: harnessing the 'omics revolution for biotechnology. Curr. Opin.

Biotechnol. 24(6):1102-1107.

53. Mallick P & Kuster B (2010) Proteomics: a pragmatic perspective. Nat. Biotechnol.

28(7):695-709.

54. Chaudhuri S, et al. (2015) Investigation of CHO secretome: Potential way to improve

recombinant protein production from bioprocess. J. Bioprocess. Biotech. 5(7):1000240.

55. Doolan P, et al. (2013) Transcriptomic analysis of clonal growth rate variation during CHO

cell line development. J. Biotechnol. 166(3):105-113.

56. Clarke C, et al. (2011) Large scale microarray profiling and coexpression network analysis

of CHO cells identifies transcriptional modules associated with growth and productivity.

J. Biotechnol. 155(3):350-359.

57. Ley D, et al. (2015) Multi-omic profiling of EPO-producing Chinese hamster ovary cell

panel reveals metabolic adaptation to heterologous protein production. Biotechnol. Bioeng.

112(11):2373-2387.

58. de Zafra CLZ, Quarmby V, Francissen K, Vanderlaan M, & Zhu-Shimoni J (2015) Host

cell proteins in biotechnology-derived products: A risk assessment framework. Biotechnol.

Bioeng. 112(11):2284-2291.

71

59. Hogwood CEM, Bracewell DG, & Smales CM (2014) Measurement and control of host

cell proteins (HCPs) in CHO cell bioprocesses. Curr. Opin. Biotechnol. 30:153-160.

60. Wang X, Hunter AK, & Mozier NM (2009) Host cell proteins in biologics development:

identification, quantitation and risk assessment. Biotechnol. Bioeng. 103(3):446-458.

61. Beatson R, et al. (2011) Transforming growth factor-beta 1 is constitutively secreted by

Chinese hamster ovary cells and is functional in human cells. Biotechnol. Bioeng.

108(11):2759-2764.

62. Gao SX, et al. (2011) Fragmentation of a highly purified monoclonal antibody attributed

to residual CHO cell protease activity. Biotechnol. Bioeng. 108(4):977-982.

63. Gutierrez AH, Moise L, & De Groot AS (2012) Of hamsters and men A new perspective

on host cell proteins. Hum. Vaccin. Immunother. 8(9):1172-1174.

64. Krawitz DC, Forrest W, Moreno GT, Kittleson J, & Champion KM (2006) Proteomic

studies support the use of multi-product immunoassays to monitor host cell protein

impurities. Proteomics 6(1):94-110.

65. Jin M, Szapiel N, Zhang J, Hickey J, & Ghose S (2010) Profiling of host cell proteins by

two-dimensional difference gel electrophoresis (2D-DIGE): implications for downstream

process development. Biotechnol. Bioeng. 105(2):306-316.

66. Levy NE, Valente KN, Choe LH, Lee KH, & Lenhoff AM (2014) Identification and

characterization of host cell protein product-associated impurities in monoclonal antibody

bioprocessing. Biotechnol. Bioeng. 111(5):904-912.

67. Zhu GJ, et al. (2012) A rapid cIEF-ESI-MS/MS method for host cell protein analysis of a

recombinant human monoclonal antibody. Talanta 98:253-256.

72

68. Doneanu CE, et al. (2012) Analysis of host-cell proteins in biotherapeutic proteins by

comprehensive online two-dimensional liquid chromatography/mass spectrometry. mAbs

4(1):24-44.

69. Farrell A, et al. (2015) Quantitative host cell protein analysis using two dimensional data

independent LC-MS^E. Anal. Chem. 87(18):9186-9193.

70. Doneanu CE, et al. (2015) Enhanced detection of low-abundance host cell protein (HCP)

impurities in high-purity monoclonal antibodies down to 1 ppm using ion mobility mass

spectrometry coupled with multidimensional liquid chromatography. Anal. Chem.

87(20):10283-10291.

71. Aebersold R & Mann M (2003) Mass spectrometry-based proteomics. Nature

422(6928):198-207.

72. Bantscheff M, Schirle M, Sweetman G, Rick J, & Kuster B (2007) Quantitative mass

spectrometry in proteomics: a critical review. Anal. Bioanal. Chem. 389(4):1017-1031.

73. Furey A, Moriarty M, Bane V, Kinsella B, & Lehane M (2013) Ion suppression; A critical

review on causes, evaluation, prevention and applications. Talanta 115:104-122.

74. Motoyama A & Yates JR (2008) Multidimensional LC separations in shotgun proteomics.

Anal. Chem. 80(19):7187-7193.

75. Gilar M, Olivova P, Daly AE, & Gebler JC (2005) Orthogonality of separation in two-

dimensional liquid chromatography. Anal. Chem. 77(19):6426-6434.

76. Link AJ, et al. (1999) Direct analysis of protein complexes using mass spectrometry. Nat.

Biotechnol. 17(7):676-682.

77. Washburn MP, Wolters D, & Yates JR (2001) Large-scale analysis of the yeast proteome

by multidimensional protein identification technology. Nat. Biotechnol. 19(3):242-247.

73

78. Peng JM, Elias JE, Thoreen CC, Licklider LJ, & Gygi SP (2003) Evaluation of

multidimensional chromatography coupled with tandem mass spectrometry (LC/LC-

MS/MS) for large-scale protein analysis: The yeast proteome. J. Proteome Res. 2(1):43-

50.

79. Gilar M, Olivova P, Daly AE, & Gebler JC (2005) Two-dimensional separation of peptides

using RP-RP-HPLC system with different pH in first and second separation dimensions. J.

Sep. Sci. 28(14):1694-1703.

80. Michalski A, Cox J, & Mann M (2011) More than 100,000 Detectable Peptide Species

Elute in Single Shotgun Proteomics Runs but the Majority is Inaccessible to Data-

Dependent LC-MS/MS. J. Proteome Res. 10(4):1785-1793.

81. Mann M & Kelleher NL (2008) Precision proteomics: The case for high resolution and

high mass accuracy. Proc. Natl. Acad. Sci. U. S. A. 105(47):18132-18138.

82. Yates JR, Ruse CI, & Nakorchevsky A (2009) Proteomics by mass spectrometry:

Approaches, advances, and applications. Annual Review of Biomedical Engineering,

Annual Review of Biomedical Engineering, (Annual Reviews, Palo Alto), Vol 11, pp 49-

79.

83. Chernushevich IV (2000) Duty cycle improvement for a quadrupole-time-of-flight mass

spectrometer and its use for precursor ion scans. Eur. J. Mass Spectrom. 6(6):471-479.

84. Chernushevich IV, Loboda AV, & Thomson BA (2001) An introduction to quadrupole-

time-of-flight mass spectrometry. J. Mass Spectrom. 36(8):849-865.

85. Marshall AG & Hendrickson CL (2008) High-Resolution Mass Spectrometers. Annual

Review of Analytical Chemistry, Annual Review of Analytical Chemistry, (Annual

Reviews, Palo Alto), Vol 1, pp 579-599.

74

86. Michalski A, et al. (2011) Mass Spectrometry-based Proteomics Using Q Exactive, a High-

performance Benchtop Quadrupole Orbitrap Mass Spectrometer. Mol. Cell. Proteomics

10(9):11.

87. Olsen JV, et al. (2007) Higher-energy C-trap dissociation for peptide modification analysis.

Nat. Methods 4(9):709-712.

88. Stahl DC, Swiderek KM, Davis MT, & Lee TD (1996) Data-controlled automation of

liquid chromatography tandem mass spectrometry analysis of peptide mixtures. J. Am. Soc.

Mass 7(6):532-540.

89. Liu HB, Sadygov RG, & Yates JR (2004) A model for random sampling and estimation of

relative protein abundance in shotgun proteomics. Anal. Chem. 76(14):4193-4201.

90. Chapman JD, Goodlett DR, & Masselon CD (2014) Multiplexed and data-independent

tandem mass spectrometry for global proteome profiling. Mass Spectrom. Rev. 33(6):452-

470.

91. Plumb RS, et al. (2006) UPLC/MSE; a new approach for generating molecular fragment

information for biomarker structure elucidation. Rapid Commun. Mass Spectrom.

20(13):1989-1994.

92. Gillet LC, et al. (2012) Targeted data extraction of the MS/MS spectra generated by data-

independent acquisition: A new concept for consistent and accurate proteome analysis. Mol.

Cell. Proteomics 11(6):17.

93. Egertson JD, et al. (2013) Multiplexed MS/MS for improved data-independent acquisition.

Nat. Methods 10(8):744-746.

75

94. Geromanos SJ, et al. (2009) The detection, correlation, and comparison of peptide

precursor and product ions from data independent LC-MS with data dependant LC-MS/MS.

Proteomics 9(6):1683-1695.

95. Picotti P & Aebersold R (2012) Selected reaction monitoring-based proteomics: workflows,

potential, pitfalls and future directions. Nat. Methods 9(6):555-566.

96. Peterson AC, Russell JD, Bailey DJ, Westphall MS, & Coon JJ (2012) Parallel reaction

monitoring for high resolution and high mass accuracy quantitative, targeted proteomics.

Mol. Cell. Proteomics 11(11):1475-1488.

97. Gallien S, et al. (2012) Targeted proteomic quantification on quadrupole-orbitrap mass

spectrometer. Mol. Cell. Proteomics 11(12):1709-1723.

98. Lange V, Picotti P, Domon B, & Aebersold R (2008) Selected reaction monitoring for

quantitative proteomics: a tutorial. Mol. Syst. Biol. 4:14.

99. Schilling B, et al. (2015) Multiplexed, Scheduled, High-Resolution Parallel Reaction

Monitoring on a Full Scan QqTOF Instrument with Integrated Data-Dependent and

Targeted Mass Spectrometric Workflows. Anal. Chem. 87(20):10222-10229.

100. Zhu WH, Smith JW, & Huang CM (2010) Mass spectrometry-based label-free quantitative

proteomics. J. Biomed. Biotechnol. 2010:840518-840523.

101. Xie F, Liu T, Qian WJ, Petyuk VA, & Smith RD (2011) Liquid chromatography-mass

spectrometry-based quantitative proteomics. J. Biol. Chem. 286(29):25443-25449.

102. Ong SE, et al. (2002) Stable isotope labeling by amino acids in cell culture, SILAC, as a

simple and accurate approach to expression proteomics. Mol. Cell. Proteomics 1(5):376-

386.

76

103. Stewart, II, Thomson T, & Figeys D (2001) O-18 Labeling: a tool for proteomics. Rapid

Commun. Mass Spectrom. 15(24):2456-2465.

104. Boersema PJ, Raijmakers R, Lemeer S, Mohammed S, & Heck AJR (2009) Multiplex

peptide stable isotope dimethyl labeling for quantitative proteomics. Nat. Protoc. 4(4):484-

494.

105. Gygi SP, et al. (1999) Quantitative analysis of complex protein mixtures using isotope-

coded affinity tags. Nat. Biotechnol. 17(10):994-999.

106. Thompson A, et al. (2003) Tandem mass tags: A novel quantification strategy for

comparative analysis of complex protein mixtures by MS/MS. Anal. Chem. 75(8):1895-

1904.

107. Ross PL, et al. (2004) Multiplexed protein quantitation in Saccharomyces cerevisiae using

amine-reactive isobaric tagging reagents. Mol. Cell. Proteomics 3(12):1154-1169.

108. Tzouros M, et al. (2013) Development of a 5-plex SILAC Method Tuned for the

Quantitation of Tyrosine Phosphorylation Dynamics. Mol. Cell. Proteomics 12(11):3339-

3349.

109. Molina H, et al. (2009) Temporal Profiling of the Adipocyte Proteome during

Differentiation Using a Five-Plex SILAC Based Strategy. J. Proteome Res. 8(1):48-58.

110. Merrill AE, et al. (2014) NeuCode labels for relative protein quantification. Mol. Cell.

Proteomics 13(9):2503-2512.

111. Schnolzer M, Jedrzejewski P, & Lehmann WD (1996) Protease-catalyzed incorporation of

O-18 into peptide fragments and its application for protein sequencing by electrospray and

matrix-assisted laser desorption/ionization mass spectrometry. Electrophoresis 17(5):945-

953.

77

112. Qian WJ, et al. (2005) Quantitative proteome analysis of human plasma following in vivo

lipopolysaccharide administration using O-16/O-18 labeling and the accurate mass and

time tag approach. Mol. Cell. Proteomics 4(5):700-709.

113. Fenselau C & Yao XD (2009) (18)O(2)-labeling in quantitative proteomic strategies: A

status report. J. Proteome Res. 8(5):2140-2143.

114. Petritis BO, Qian WJ, Camp DG, & Smith RD (2009) A Simple Procedure for Effective

Quenching of Trypsin Activity and Prevention of (18)O-Labeling Back-Exchange. J.

Proteome Res. 8(5):2157-2163.

115. Miyagi M & Rao KCS (2007) Proteolytic O-18-labeling strategies for quantitative

proteomics. Mass Spectrom. Rev. 26(1):121-136.

116. Shiio Y & Aebersold R (2006) Quantitative proteome analysis using isotope-coded affinity

tags and mass spectrometry. Nat. Protoc. 1(1):139-145.

117. Boersema PJ, Aye TT, van Veen TAB, Heck AJR, & Mohammed S (2008) Triplex protein

quantification based on stable isotope labeling by peptide dimethylation applied to cell and

tissue lysates. Proteomics 8(22):4624-4632.

118. Raijmakers R, et al. (2008) Automated online sequential isotope labeling for protein

quantitation applied to proteasome tissue-specific diversity. Mol. Cell. Proteomics

7(9):1755-1762.

119. Hsu JL, Huang SY, Chow NH, & Chen SH (2003) Stable-isotope dimethyl labeling for

quantitative proteomics. Anal. Chem. 75(24):6843-6852.

120. Hsu JL, Huang SY, Shiea JT, Huang WY, & Chen SH (2005) Beyond quantitative

proteomics: Signal enhancement of the a(1) ion as a mass tag for peptide sequencing using

dimethyl labeling. J. Proteome Res. 4(1):101-108.

78

121. Rauniyar N & Yates JR (2014) Isobaric labeling-based relative quantification in shotgun

proteomics. J. Proteome Res. 13(12):5293-5309.

122. Braun CR, et al. (2015) Generation of multiple reporter ions from a single isobaric reagent

increases multiplexing capacity for quantitative proteomics. Anal. Chem. 87(19):9855-

9863.

123. Ow SY, et al. (2009) iTRAQ underestimation in simple and complex mixtures: "the good,

the bad and the ugly". J. Proteome Res. 8(11):5347-5355.

124. Ting L, Rad R, Gygi SP, & Haas W (2011) MS3 eliminates ratio distortion in isobaric

multiplexed quantitative proteomics. Nat. Methods 8(11):937-940.

125. Wenger CD, et al. (2011) Gas-phase purification enables accurate, multiplexed proteome

quantification with isobaric tagging. Nat. Methods 8(11):933-935.

126. Xiang F, Ye H, Chen RB, Fu Q, & Li LJ (2010) N,N-Dimethyl Leucines as Novel Isobaric

Tandem Mass Tags for Quantitative Proteomics and Peptidomics. Anal. Chem. 82(7):2817-

2825.

127. Zhang JX, Wang Y, & Li SW (2010) Deuterium Isobaric Amine-Reactive Tags for

Quantitative Proteomics. Anal. Chem. 82(18):7588-7595.

128. Eng JK, Searle BC, Clauser KR, & Tabb DL (2011) A face in the crowd: Recognizing

peptides through database search. Mol. Cell. Proteomics 10(11):9.

129. Eng JK, McCormack AL, & Yates JR (1994) An approach to correlate tandem mass-

spectral data of peptides with amino acid sequences in a protein database. J. Am. Soc. Mass

5(11):976-989.

79

130. Perkins DN, Pappin DJC, Creasy DM, & Cottrell JS (1999) Probability-based protein

identification by searching sequence databases using mass spectrometry data.

Electrophoresis 20(18):3551-3567.

131. Elias JE & Gygi SR (2010) Target-Decoy Search Strategy for Mass Spectrometry-Based

Proteomics. Proteome Bioinformatics, Methods in Molecular Biology, eds Hubbard SJ &

Jones AR (Humana Press Inc, Totowa), Vol 604, pp 55-71.

132. Seidler J, Zinn N, Boehm ME, & Lehmann WD (2010) De novo sequencing of peptides

by MS/MS. Proteomics 10(4):634-649.

133. Hoopmann MR & Moritz RL (2013) Current algorithmic solutions for peptide-based

proteomics data generation and identification. Curr. Opin. Biotechnol. 24(1):31-38.

134. Craig R, Cortens JC, Fenyo D, & Beavis RC (2006) Using annotated peptide mass

spectrum libraries for protein identification. J. Proteome Res. 5(8):1843-1849.

135. Frewen BE, Merrihew GE, Wu CC, Noble WS, & MacCoss MJ (2006) Analysis of peptide

MS/MS spectra from large-scale proteomics experiments using spectrum libraries. Anal.

Chem. 78(16):5678-5684.

136. Lam H, et al. (2007) Development and validation of a spectral library searching method

for peptide identification from MS/MS. Proteomics 7(5):655-667.

137. Rost HL, et al. (2014) OpenSWATH enables automated, targeted analysis of data-

independent acquisition MS data. Nat. Biotechnol. 32(3):219-223.

138. Tsou CC, et al. (2015) DIA-Umpire: comprehensive computational framework for data-

independent acquisition proteomics. Nat. Methods 12(3):258-264.

139. Ashburner M, et al. (2000) Gene Ontology: tool for the unification of biology. Nat. Genet.

25(1):25-29.

80

140. Kanehisa M & Goto S (2000) KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic

Acids Res. 28(1):27-30.

141. Huang DW, Sherman BT, & Lempicki RA (2009) Systematic and integrative analysis of

large gene lists using DAVID bioinformatics resources. Nat. Protoc. 4(1):44-57.

142. Kramer A, Green J, Pollard J, & Tugendreich S (2014) Causal analysis approaches in

Ingenuity Pathway Analysis. Bioinformatics 30(4):523-530.

81

Chapter 2: Combined Metabolomics and Proteomics Reveals

Hypoxia as A Cause of Lower Productivity on Scale-up to a

5000-Liter CHO Bioprocess

The paper based on this chapter is published in Biotechnology Journal in 2016, and its digital

object identifier is DOI: 10.1002/biot.201600030. A poster on this work was presented in the 64th

American Society for Mass Spectrometry (ASMS) conference in June 2016 (abstract ID: 280605).

Yuanwei Gao1, Somak Ray1, Shujia Dai1, Alexander R. Ivanov1, Nicholas R. Abu-Absi2, Amanda

M. Lewis2, Zhuangrong Huang2, Zizhuo Xing2, Michael C. Borys2, Zheng Jian Li2, Barry L.

Karger1

1Barnett Institute and Department of Chemistry and Chemical Biology, Northeastern University,

Boston, MA, 02115

2Biologics Development, Global Manufacturing and Supply, Bristol-Myers Squibb, 38 Jackson

Road, Devens, MA 01434

I thank Somak Ray for proteomic sequence database construction and statistic data analysis, Dr.

Shujia Dai for his contribution of the early stage of this study, Dr. Alexander Ivanov for discussion,

and Dr. Barry Karger for conceptual design, idea contribution, and manuscript composition. I also

thank the scientists from Bristol-Myers Squibb for their strong collaboration, especially, Dr.

Nicholas Abu-Absi for providing bioreactor data, and Dr. Amanda Lewis for helpful discussions.

82

2.1 Abstract

Large-scale bioprocessing is key to the successful manufacturing of a biopharmaceutical.

However, cell viability and productivity are often lower in the scale-up from laboratory to

production. In this study, we analyzed CHO cells, which showed lower percent viabilities and

productivity in a 5-KL production scale bioreactor compared to a 20-L bench-top scale under

seemingly identical process parameters. An increase in copper concentration in the media from 0.2

μM to 0.4 μM led to a doubling of percent viability in the production scale albeit still at a lower

level than the bench-top scale. Combined metabolomics and proteomics revealed the increased

copper reduced the presence of reactive oxygen species (ROS) in the 5-KL scale process. The

reduction in oxidative stress was supported by the increased level of glutathione peroxidase in the

lower copper level condition. The excess ROS was shown to be due to hypoxia (intermittent), as

evidenced by the reduction in fibronectin with increased copper. The 20-L scale showed much less

hypoxia and thus less excess ROS generation, resulting in little to no impact to productivity with

the increased copper in the media. The study illustrates the power of ‘Omics in aiding in the

understanding of biological processes in biopharmaceutical production.

83

2.2 Introduction

Biologics, including antibodies, hormones and cytokines, represent an increasingly

important class of therapeutics, with 7 of the 10 top selling drugs in 2013 in this class (1). The

majority of biologics are manufactured using mammalian cellular hosts, especially Chinese

hamster ovary (CHO) cells. The biotechnology industry is under pressure to bring therapeutics to

market faster and at lower cost. In order to meet these demands, it is increasingly important for

the industry to develop high yielding, scalable, and robust processes that are controllable and well

understood. Many advances in the field of bioprocessing are facilitating this goal. Engineering

design of large volume bioreactors to provide cells with an optimal environment for high cell

density and high yield processes continues to improve. At the same time, understanding the

biology of cellular processes during CHO cell protein production is being actively pursued (2, 3).

Recently, molecular profiling-based genomics, proteomics, and metabolomics are being applied

to bioreactor processes to study cellular production (4-6). Ultimately, the combination of multiple

‘Omics methods will lead to an improved understanding of the manufacturing and cellular

processes, resulting in improvements in bioreactor productivity (4), production of specific

glycoforms (6-8), and potential identification of biomarkers for process assessment and control

(9).

At present, there are important gaps in our understanding that need to be addressed. A

significant gap is process scalability. To date, the vast majority of biological development studies

have been conducted using small volume reactors at the liter scale because of the ease of handling

and cost. Moreover, advances in high-throughput technologies and robotics are driving bioprocess

development to even smaller, milliliter scale (10). Yet, it is known that productivity is generally

lower in large (KL) production relative to small (L or less) reactor scales (2, 11, 12).

84

Accurate and reproducible manipulation of the large-scale bioprocess is central to the

success of the expensive and time-consuming production of biopharmaceuticals. Considerable

effort has been made by industry to develop and qualify bench-top bioreactor systems as

representative models of cell behavior at the manufacturing scale. Application of the ‘Omics tools

should aid our ability to translate results from laboratory scale to the large scale, leading to a

significant impact on biotechnology production.

This paper presents, for the first time, a combined proteomic and metabolomic study to

compare a CHO culture reactor at the manufacturing (5-KL) versus the laboratory scale (20-L).

Phenotypically, it was found that the viable cell density during the stationary phase of the process

was significantly lower for the production scale reactor, using the same media and dosing regimen.

At the same time, an unexpected 20-fold increase in trace level copper (Cu2+) in the buffering

agent, sodium carbonate, from 0.02 μM to 0.4 μM, led to a 2-fold increase in the viable cell density

for the 5-KL process while having only a minor effect on the 20-L scale. Copper, a well-known

trace metal of cell culture media, serves as a cofactor of many enzymes controlling their functional

states and activity levels (13, 14). There have been several previous reports describing the effects

of copper levels on the productivity of CHO cells, albeit for low volume reactors or shake flasks

(15-20).

The present study identifies the changes in proteomic and metabolomic profiles that occur

as a result of the increased copper on the production scale process relative to the laboratory scale.

Statistically significant network and pathway analysis of the combined ‘Omics results revealed

that an excess of reactive oxygen species (ROS) occurred in the 5-KL reactor and that the increased

level of copper reduced this stress, leading to decreased apoptosis and cell death. On the other

hand, the oxidative stress was found to be less pronounced for the 20-L bioreactor, and, as a result,

85

the influence of copper level less significant. Based on the ‘Omics data, along with qPCR, ELISA

and western blotting, the excess ROS production for the 5-KL production scale reactor has been

attributed to intermittent hypoxia resulting as the cells periodically enter zones of lower oxygen

concentration. The hypoxia is likely due to limited mass transfer and homogeneity of the gas

throughout the 5-KL bioreactor. For the 20-L scale, oxygen distribution was more complete,

leading to far less hypoxia and stress.

2.3 Materials and methods

2.3.1 Chemicals and reagents

Formic acid, urea, triethylammonium bicarbonate buffer (TEAB) (1.0 M, pH 8.5),

dithiothreitol (DTT), iodoacetamide (IAM), phenylmethanesulfonyl fluoride (PMSF), and

ammonium hydroxide solution (≥ 25% in H2O) were purchased from Sigma- Aldrich (St. Louis,

MO). Sequencing-grade modified trypsin was from Promega (Madison, WI), and mass

spectrometry grade lysyl endopeptidase (Lys-C) was purchased from Wako (Richmond, VA). The

bicinchoninic acid (BCA) protein assay kit, tandem mass tag (TMT) 6-plex kit, HaltTM protease

and phosphatase inhibitor cocktail (EDTA free), cell extraction buffer, SuperSignal west femto

trial kit, Quant-iTTM protein assay kit, RiboPure RNA extraction kits (AM1924), LC-MS grade

water, LC-MS grade acetonitrile, and SDS-polyacrylamide NuPAGE Noves 4-12% Bis-Tris

protein gels were from Thermo Fisher Scientific (Rockford, IL). The PVDF membranes for the

western blotting protein transfer, ECL western blotting substrate kit, and fibronectin ELISA kit

(ab108849) were from Abcam (Cambridge, MA). The primary antibodies against β-actin (sc-

47778), superoxide dismutase 1 (SOD1) (sc-11407), fibronectin (sc-9068), and glutathione

86

peroxidase (GPx 1/2) (sc-30147), as well as the secondary antibodies against the corresponding

primary antibodies were purchased from SantaCruz Biotech (Heidelberg, Germany). First strand

synthesis kits (330401), SYBR green mix (330520) and RT-PCR primers were purchased from

SABiosciences, a division of Qiagen (Valencia, CA).

2.3.2 CHO Cell Culture Conditions

A CHO DG44 cell line expressing a recombinant antibody fusion protein using a vector

with the dihydrofolate reductase-deficient (DHFR) selection marker was used for all experiments.

The bioreactor experiments all employed the same proprietary, chemically defined media. The cell

line and process conditions were similar to those previously described (12). Cell cultures were

expanded in a series of shake flasks, rocker bags, and seed bioreactors to generate enough cells to

inoculate production bioreactors. Rocker bags were used in place of seed bioreactors for small

scale experiments; however, the number of population doublings was controlled to be similar for

all experiments. The media contained 0.02 μM and 0.4 μM CuSO4 for the low and high

concentrations (15), respectively. The high copper concentration resulted from impurities in

Na2CO3 used for pH control. Experiments were carried out in either 20-L or 5-KL bioreactors with

initial working volumes of 11-L or 3-KL, respectively. The temperature, initially controlled at

37 °C, was reduced to a lower temperature at a pre-defined time to extend the viability and

productivity of the culture. The pH in the bioreactor was controlled through the addition of CO2

gas and 1 M Na2CO3. The bioreactors were operated in fed-batch mode, with the timing and

amount of feed determined by measured glucose concentrations in the bioreactor. Dextran sulfate

87

was added to the bioreactors on Day 3. The time of culture harvest was determined according to

proprietary pre-determined criteria.

Cell culture samples were monitored for cell density and viability using either Cedex

(Roche Diagnostics Corp., Indianapolis, IN) or ViCell (Beckman Coulter, Inc., Indianapolis, IN)

automated cell counters that operate using the trypan blue dye exclusion method. Monitoring of

pH, pCO2, pO2, glucose, lactate, glutamine, glutamate, and ammonium were performed using

either BioProfile 400 or NovaFlex instruments (Nova Biomedical, Waltham, MA). Dissolved

oxygen (DO) was measured using in-line probes and controlled at the same set points throughout

the bioreactor runs for all experiments at both scales. All bioreactors were sparged with fixed flow

rates of air to match vessel volumes per minute at each bioreactor scale. Oxygen was supplied to

the bioreactors automatically by the controller to maintain the specified DO set-points, and the DO

profiles for all bioreactors were similar for all runs regardless of culture condition or scale. Product

titers were measured by a Protein-A high performance liquid chromatography (HPLC) assay based

on reference standards of the purified product at known concentrations. The number and frequency

and sampling was determined in part by schedule of GMP manufacturing operations, different

harvest timing between bioreactors, and complexity of sample analysis. Supernatant and cell pellet

samples were collected from bioreactors to enable metabolomic and proteomic analyses. A

smaller sub-set of samples was analyzed via proteomic methods due to the expense and complexity

of the methods involved. Cell pellets containing either 5 (for metabolomics) or 10 (for proteomics)

x106 total cells/mL were retained. Sample time points were Day 3, Day 5, and Day 7 for the 5-KL

scale, and Day 3, Day 6, and Day 10 for the 20-L scale. These time points correspond to

exponential, stationary, and early death phase at the corresponding bioreactor scales. Two

88

biological replicates (two bioreactors) were taken for all time points of high and low copper

conditions at both scales.

The cell cultivation in this chapter was performed by Bristol-Myer Squibb (Devens, MA),

and they also provided the data related to the cultivation performance.

2.3.3 Metabolomic analysis

Metabolomic analysis was performed by Metabolon, Inc. (Durham, NC) according to their

standard analysis platform (21). For data analysis, with each replicate, the intracellular metabolite

signal intensities were first normalized by total cellular protein amount at a given time point, and

then these protein-normalized intensities were divided by Day 0 protein-normalized intensities of

the particular metabolite. The metabolite ratios (high copper : low copper) for each replicate were

calculated by dividing the resultant normalized intensities for the high copper sample by that of

the low copper sample for a given time point. The replicate ratios were either averaged or ratios

of average normalized intensities were calculated. A cut-off ratio of 1.5 or 0.67 was used to define

if the metabolite was up or down regulated, respectively. For comparison of metabolite abundance

between 5-KL and 20-L samples, shared metabolites involved in the Ingenuity Pathway Analysis

(IPA) (Qiagen, Redwood City, CA) reported “reactive oxygen formation” biofunction were

identified from the Day 7 (5-KL bioreactor) and Day 10 (20-L bioreactor) results. Ratios of

replicate average abundance were calculated for these selected metabolites for Day 7/Day 10 with

high vs high copper, and low vs. low copper samples. The metabolites and their corresponding

log-ratios were subjected to the IPA ‘disease and biofunction activation’ prediction tool.

89

2.3.4 Sample preparation for proteomics

A pellet consisting of approximately 107 cells for each sample was reconstituted in 500 μL

of cell lysis buffer (10 M urea and 5 mM DTT in 100 mM TEAB pH 8.0). Cell lysate was prepared

by using a Model 505 Sonic Dismembrator (Thermo Fisher Scientific, Pittsburgh, PA). The lysate

protein concentration was determined using the BCA protein assay. Approximately 100 μg protein

was denatured and reduced with freshly prepared 10 mM DTT at 37 ℃ for 1 hour, and then

alkylated with 10 mM IAM in the dark at room temperature for 45 minutes with the presence of

10 M urea and 100 mM TEAB (pH 8.0). Proteins were precipitated by adding cold acetone and

maintained at -20 ℃ overnight. After the supernatant was discarded, the protein was reconstituted

in 200 μL of 25 mM TEAB (pH 8.0) in 90% water and 10% acetonitrile. The digestion of Lys-C

was conducted at 37 ℃ for 6 hours with an enzyme to protein ratio of 1:200 (w/w), and then the

tryptic digestion was performed with an enzyme to protein ratio of 1:50 (w/w) at 40 ℃ overnight.

Protein digests (100 μg per TMT channel) from the samples of a given bioreactor size for three

cell growth time points under high and low copper media were labeled with six TMT channels,

following the protocol supplied by the manufacturer. The solutions were lyophilized to dryness

and stored at -80 ℃ prior to 2D-LC/MS analysis.

2.3.5 2D LC-MS/MS

The labeled protein digest mixture was separated and analyzed by 2D high pH/low pH

reversed phase (RP/RP) liquid chromatography coupled with a Q-Exactive mass spectrometer

(Thermo Fisher Scientific, San Jose, CA). The first-dimension separation was off line with the

platform containing an Agilent 1200 series system with diode array detector (Agilent Technologies,

90

Santa Clara, CA) and a 300Extend_C18 column (3.5 μm beads, 2.1x150 mm) (Agilent

Technologies). Mobile phase A and B were 20 mM ammonium formate in water (pH 10), and 20

mM ammonium formate (pH 10) in 90% acetonitrile/10% water, respectively. After reconstitution

with mobile phase A, 200 μg of labeled digest was injected on the column. After desalting by

mobile phase A for 1 hour at 200 μL/min, a gradient was then run at a flow rate of 200 μL/min

(from 2% B to 100% B in 44 minutes, 100% B to 2% B and 2% B for 9 minutes). The fractions

were collected in 2-minute intervals, and pooled to equalize protein levels for a final fraction

number of 19 based on the UV absorption profile at 214 nm.

For the second dimension LC, samples were analyzed on an Ultimate 3000

chromatography system with a home-packed IntegraFrit column (20 cm x 75 μm, New Objective,

Woburn, MA) with 200 Å Magic C18 AQ particles (3 μm diameter) (Michrom Bioresources,

Auburn, CA). Mobile phase A was 0.1% formic acid in water, and mobile phase B was 0.1%

formic acid in acetonitrile. The flow rate was 300 nL/min for the sample injection and desalting

processes, followed by a separation gradient (2% B to 32 % in 120 minutes, 32% B to 90% B in

20 minutes, 90% B for 3 minutes) with 200 nL/min flow rate. The sample was then detected online

by the Q-Exactive mass spectrometer.

MS data were collected in the data dependent data acquisition mode with a survey single

stage MS (MS1) scan followed by high collision dissociation (HCD) MS/MS scans of the top 12

most intense precursor ions. The full MS scans were acquired in the Orbitrap with a resolution of

70,000 (m/z = 200) and a scan range of m/z 375 to 1600. HCD spectra were acquired for MS2

with a resolution of 17,500 and the fixed first mass of m/z 100. The isolation window is 2.0 m/z.

For accurate mass measurement, the lock mass option was enabled using the

polydimethylcyclosiloxane ion at m/z 455.12002 as an internal calibrant.

91

2.3.6 Construction and annotation of DG44 CHO cell proteome database

The DG44 CHO protein sequence database was developed by Somak Ray, the second

author of the paper published corresponding to this chapter. DG44 CHO transcriptomic sequences

were pooled from published transcriptomic (22) and in-house sequencing data by Roche 454

(Branford, CT) using fifty-base, single-end runs. The final transcriptomic data was assembled

using CLC Genomics Workbench Version 4 (http://www.clcbio.com/) with the NCBI mouse

RefSeq set of transcripts (http://www.ncbi.nlm.nih.gov/refseq/) as reference. For annotating the

resulting CHO transcript, annotations for the same mouse sequence that led to the assembly of the

final CHO sequence in the CLC Genomics software were used.

To augment the set of the transcriptomic sequences, the published CHO genome sequence

(23) from NCBI RefSeq was used. First, the transcript subsequence coding for amino acids was

identified using the transcript nucleotide sequence along with the corresponding mouse protein

sequence using a dynamic programming alignment algorithm based method “FrameBot” (24)

which also corrects for frameshift mutations. The CHO protein coding sequences, which were less

than 90 percent of the length compared to the corresponding mouse protein sequences, were

subjected to extension of their sequences using homolog sequences from the CHO genome. For

extension, a low-complexity masked CHO genome sequence was generated using Windowmasker

(25). The ‘protein2genome’ (abbreviated p2g) module from the “Exonerate” software (26) was

run to best align the mouse protein homolog of the CHO protein against the low-complexity

masked CHO genomic sequence. The original protein coding DG44 CHO transcript sequence and

the protein coding CHO sequence from the top hit of p2g were globally aligned, and gaps were

filled using sequence information from the CHO homolog. The corresponding amino acid

92

sequence from the extended transcript sequences along with the rest of the DG44 sequences were

generated using Framebot.

Those mouse RefSeq proteins for which no homologous DG44 CHO transcriptomic

sequence were found, were searched against the new CHO genome. These mouse sequences were

used as query to search against the ‘Windowmasker’ masked CHO genome and the ‘p2g’ module

of Exonerate. The top hit of p2g from the CHO genome was retained. Together the peptide

sequences derived from transcripts of DG44 CHO and mouse homologs present in the CHO

genome made up the protein sequence database with a total 18,075 sequences.

The CHO proteome sequence databased construction and annotation was instituted by

Somak Ray, the second author of the published paper corresponding to this chapter.

2.3.7 Protein identification of proteomics analysis

The raw data files from each LC-MS/MS run were processed in Proteome Discoverer 1.4

(PD 1.4) (Thermo Fisher Scientific) and searched against the CHO database with three search

engines: Sequest HT, Mascot, and MS Amanda (27). Cysteine carbamidomethylation and TMT 6-

plex modification at the N-terminus and lysine were set as fixed modifications, along with

oxidation of methionine and deamidation of asparagine and glutamine set as dynamic

modifications. Up to two missed tryptic cleavages were allowed. Mass tolerance was set at 10

ppm for precursor ions, and 0.05 Da for fragment ions. Percolator was used to filter matches to

1% peptide false discovery rate (FDR). The quantitation method was chosen as TMTe 6plex

(custom), with the peak integration tolerance of 20 ppm. The proteins with at least one unique

93

peptide identified in all 6 time points with reporter ions satisfying the above criteria were

considered as “identified proteins”.

2.3.8 Quantitation and differential expression analysis

The reporter ion intensities of a given TMT channel for each peptide-spectrum match (PSM)

extracted from PD 1.4 were normalized by dividing individual intensities by the sum of the

intensities of that particular channel. The proteins with at least two PSMs identified with reporter

ions in all six TMT channel were considered as “quantified proteins”. Among these “quantified

proteins”, an intensity-based filtering technique was applied on each TMT channel to improve the

reproducibility between the two replicates from separate bioreactor runs (see next section). Then,

with the given replicate the common proteins from all six channels were determined as the protein

list with high confidence of quantitation information. The common proteins identified in the two

separate replicates of a given bioreactor size were taken as the final list with high confidence of

quantitation.

Relative quantitation of proteins (high copper vs. lower copper for each time point) was

achieved by pairwise comparison of TMT reporter ion intensities among samples using the DanteR

software (version 0.1.1; Pacific Northwest National Laboratory, Richland, WA;

http://omics.pnl.gov) (28). The median of log2-ratios of the proteins obtained from DanteR was

adjusted to zero. Then, the protein log ratios were determined, based on the protein list obtained

after the intensity-based filtering technique. The average of each protein log2-ratio (high copper:

low copper) from the two separate bioreactor replicates was taken as the protein log ratio. The

http://omics.pnl.gov/

94

ratio-based filtering technique (see next section) was applied to choose the differentially expressed

proteins with high confidence.

2.3.9 Data filtering technique applied on the proteomics data

Because of the complexity of the bioreactor process at the large scale, as well as the general

sampling procedures, we adopted a strategy of several levels of filtering to obtain consistent results

to compare the data between the high and low copper conditions.

For the first level, we filtered proteins according to their abundances as measured by the

normalized sum of the PSMs of TMT reporter ion intensities in order to remove outliers, based on

an M-A plot (29). Each PSM reporter ion intensity was normalized by dividing by the

corresponding sum of all PSM intensities belonging to that particular channel. For each TMT

channel, the sum of all normalized reporter ion intensities of all PSMs for each protein was

calculated as a measure of abundance of that protein. For a given channel, the log2- average

abundance for each protein in the two replicates was plotted in ascending order. The average log2-

abundances were next binned, with each bin containing data points for 300 proteins. The log2

ratio of the abundances was calculated, and only those proteins within ±1.5σ of the mean ratio of

log abundance of that bin were retained for further consideration. Among them, the shared proteins

of all six TMT channels were taken as “proteins after the intensity-based filtering technique”. The

first level of filtering technique was instituted by Somak Ray, the second author of the published

paper corresponding to this chapter.

To determine the differentially expressed proteins, a second level of filtering was applied

in which three criteria were used. (a) The protein log ratios of the two replicates for each bioreactor

95

size were with the same sign. (b) The average of the protein log ratios from the two replicates

was > 0.3 (fold change 1.23) for up-regulation and < -0.30 (fold change 0.81) for down-regulation.

(c) Each of the fold changes of the two replicates was > 0.11 (fold change 1.08, which is 12.5%

less than 1.23) for up-regulation and < -0.11 (fold change 0.93) for down-regulation. After this

second level of filtering, the remaining differentially expressed proteins were used for network

and pathway analysis with the averaged log ratios from the two replicates.

2.3.10 Interaction network and pathway analysis

MetaCore (Thomson Reuters, https://portal.genego.com/, New York City, NY) and

Ingenuity Pathway Analysis (IPA) were used to map the significant differentially regulated

proteins and metabolites into biological networks and pathway maps. The list of differentially

expressed proteins and metabolites for specific samples and their corresponding log2-ratios were

subjected to IPA core analysis using its default values. Unless otherwise noted, for the IPA

“diseases or biological functions” analysis, only disease/biological functions involving both

metabolites and proteins that were statistically significant with a |Z score| ≥ 2.0 for activation or

repression and p-value < 0.05 were reported. The time point of Day 3 of both of the replicates were

not considered for further analysis because of the limited differences found between the high and

low copper conditions.

Also, the differentially expressed proteins for each time point were submitted into

MetaCore, in which the “pathway maps” in the “Functional Ontology Enrichment” tool was used.

Pathways with p-values less than 0.01 were considered as statistically significant. The activation

https://portal.genego.com/

96

or deactivation of the pathways were determined by the up- or down- regulation of the relevant

differentially regulated proteins and their functions.

2.3.11 Western blotting

The cell lysates were prepared using radioimmunoprecipitation assay (RIPA) buffer with

addition of the protease inhibitor cocktail and 1 mM PMSF for the western blot analysis. The total

protein concentration was determined by the BCA assay. About 40 μg of protein was denatured

and separated by SDS-polyacrylamide gel electrophoresis. Proteins were then transferred to PVDF

membranes. SOD1, fibronectin, and GPx were detected, with β-actin used as the loading control.

After incubating with the primary and then secondary antibodies, protein bands were visualized

using the ChemiDoc MP imaging system (Bio-Rad Laboratories, Hercules, CA).

2.3.12 Quantitation of fibronectin levels by ELISA

The cell pellets were washed with ice-cold 1×PBS and lysed in cell extraction buffer with

addition of the protease inhibitor cocktail and 1 mM PMSF. After incubation on ice for 30 minutes

with occasional vortexing, the lysates were centrifuged at 13,000 g for 10 min at 4°C. The

supernatants were subsequently transferred, and total protein concentration was determined using

Quant-iTTM protein assay kit. By following the protocol supplied by the manufacturer, fibronectin

levels were measured by the fibronectin mouse ELISA kit with a microplate reader Spectramax

384 (Molecular Devices, Sunnyvale, CA) at a wavelength of 450 nm. Fibronectin levels were

97

determined using the standard curve generated using the fibronectin standard provided by the

ELISA kit.

2.3.13 Real-Time PCR

The whole cell RNA was extracted from cell pellets using the RiboPure kit. RNA was

converted to cDNA using the RT2 First Strand Synthesis Kit. Quantitative PCR was carried out

using SYBR Green Master Mix and PCR primers. Primers for FN1 (encoding fibronectin) were

optimized for use in CHO was optimized for use in rat. All measurements were done in duplicate,

and a difference of less than or equal to 0.5 CT between duplicates was considered acceptable.

The ViiA™ 7 Real-Time PCR System (Thermo Fisher Scientific, Foster City, CA) and software

was used to run RT-PCR and analyze results. The comparative CT method was used to normalize

measurements relative to hypoxanthine-guanine phosphoribosyltransferase-like (LOC100769768),

an established CHO housekeeping gene. After normalization, relative mRNA expression for each

gene of interest was made across scales, treatment and time using the comparative CT method. A

log fold change greater than 2 was considered statistically significant.

The ELISA test of fibronectin and the qPCR test were performed by Bristol-Myer Squibb

(Devens, MA).

2.4 Results

A significant difference in cell culture performance at the 5-KL scale was initially observed

due to a 20-fold difference in copper concentration (0.02 μM to 0.4 μM), as described below.

98

Interestingly, for the laboratory scale (20-L), with the same conditions, no significant phenotypic

difference was observed for the two copper concentration levels, and the overall cell performance

(viability and product titer) was much higher compared to manufacturing scale. We sought to

understand the underlying biological cause for the phenotypic differences at large scale due to

copper and bioreactor scale through proteomics and metabolomics studies.

2.4.1 CHO cell growth and productivity in 5-KL vs. 20-L scale bioreactors with two levels

of copper concentration in the media (conducted by Bristol Myers Squibb)

Scale-up of a CHO DG44 fed-batch bioreactor process from 20-L to 5-KL scale showed a

dramatic decrease in productivity (Figure 2-1). During one run of the production scale, a doubling

of the productivity during the stationary phase was observed. After some testing, the increased

productivity was attributed to an elevated level, from 0.02 μM to 0.4 μM, of the trace metal, copper,

found in Na2CO3 used for pH control. Copper concentrations in basal and feed media were similar

for all experiments. For simplicity, we will refer throughout the paper to the two concentration

levels as low and high copper, respectively.

99

Figure 2- 1 The cell density, viability, titer productivity, and lactate profiles of the 5-KL and 20-L

bioreactors.

(A) Viable cell density, (B) viability, and (C) normalized titer profiles (D) lactate profiles for 20-

L (dashed lines) and 5-KL (solid lines) bioreactors under the high (blue lines) and low (red lines)

copper conditions. There are two bioreactor experiments for each condition. Profiles for the 5-KL

conditions extend out for variable durations from day 9 to day 12 since they all met the pre-defined

harvest criteria at different times, while the 20-L cultures were all extended out to 12 days.

As seen in Figure 2-1A, viable cell density (VCD) was similar for all conditions for the

first three days of culture. For the 20-L scale, VCD continued to increase for an additional 4 days

before leveling off, while for the 5-KL scale, VCD sharply decreased after 3 days before leveling

off. Significantly, the high copper condition for the 5-KL scale maintained 2 fold higher VCD

levels compared to the low copper condition. In contrast, at the 20-L scale, the VCD was only

100

about 15% higher with the high copper level compared to the low copper level. Similar trends were

observed for the %viability (%V) profiles (Figure 2-1B) with a steep decline in %V after 3 days

for the 5-KL scale. The high copper condition for the production scale process leveled off at %V

about twice that of the low copper condition; however, the high copper level still had a significantly

lower %V than found on the 20-L scale. Finally, in contrast to the 5-KL scale, for the 20-L scale,

little or no differences in %V profiles was observed between the copper levels.

The titer profiles normalized to the low copper condition for the 5-KL scale (Figure 2-1C)

also follow the trends observed for the VCD and %V profiles. The product titers in the 20-L

bioreactors reached 3.5 to 4 times higher compared to the 5-KL bioreactors, demonstrating that

extrapolation from laboratory to production scale can be limited. The high copper condition

increased the titer by 50% for the large scale, but this was still far lower than the titer observed in

20-L bioreactors. On the other hand, the increase in titer resulting from higher Cu was only about

10% for the 20-L reactors. The increase in productivity with additional copper in the 5-KL scale

and for the 20-L relative to the 5-KL scale follows directly the increase in the number of viable

cells. Further, and importantly, the specific productivity as well as the product quality attributes

were determined to be similar, independent of the copper level or scale of the reactor.

Interestingly, comparing the profiles across bioreactor scales showed little or no lactate

consumption for the 5-KL, with no significant effect resulting from increased copper (Figure 2-

1D). For the 20-L scale, the lactate concentration decreased after 120 hours; however, again, there

was no difference between the copper concentration levels. Addition of copper has previously been

shown to result in consumption of lactate, which is generally considered a desirable phenotype for

improvement of production and viability (16-20, 30). Our results are at variance with those

reported in the literature; however, it was pointed in a recent paper that the lactate behavior with

101

added copper is dependent on the cell line and conditions used (19). We next explored the causes

for the phenotypic changes of cultured cells observed in Figure 2-1 using the combination of

proteomics and metabolomics.

2.4.2 Proteomic and metabolomics analysis platform

Cells growing under the high (0.40 M) or low (0.02 M) copper levels at three different

time points in the 20-L and 5-KL bioreactors were collected and the cell pellets processed for

proteomic study. Relative quantitation of protein expression was achieved by the TMT

multiplexing labeling method with six channels. With each bioreactor size, relative protein

expression at specific time points under the two concentrations of copper was profiled. Proteins

were subsequently identified using the annotated CHO-DG 44 proteome database with Proteome

Discoverer (PD) 1.4. Cell pellets from three time points for the high and low Cu conditions at both

scales, were analyzed from duplicate bioreactors. The strategy of high resolution 2D-RP/RP LC

coupled to a Q-Exactive mass spectrometer achieved high coverage of the CHO cell proteome with

TMT reporter ion quantitation.

Between 6400 and 7000 proteins were identified in the samples of both bioreactor scales,

with the number of common quantifiable proteins close to 5000, considering only proteins with at

least 2 PSMs in all 3 time points in both replicates. The number of identified and quantifiable

proteins is listed in Table 2-1.

102

Table 2- 1 The number of identified and quantified proteins with the 5-KL and 20-L bioreactors

from the proteomic data analysis.

Protein numbers 20-L scale 5-KL scale

Total number of identified proteins 5967 6352

Number of quantified proteins 4941 5354

Number of proteins after intensity-based filtering technique 4027 4199

Due to the large volume operation and general sampling procedures, as described above,

we utilized several levels of data filtering of proteins to obtain consistent results to compare high

and low copper conditions. With the first level of filtering, more than 4000 proteins remained.

Second, the thresholds of the protein log2 ratios of each replicate and the average protein log2

ratios were set to determine the differentially regulated proteins, resulting in less than 100

differentially expressed proteins in each time point. For the 5-KL scale, 212 and 282 metabolites

were identified and quantified for the large and small bioreactor scales, respectively. The numbers

of differentially regulated proteins and metabolites determined are shown in Table 2-2 and the

differentially regulated proteins and metabolites can be found in Table 2-4 in section 2.7 Appendix.

Table 2- 2 Numbers of differentially regulated proteins and metabolites at each time points of the

both scales. The differentially regulation is by comparing the high and low copper conditions of

a given scale.

5-KL scale 20-L scale

Day 5 Day 7 Day 6 Day 10

Number of differentially regulated proteins 99 66 44 34

Number of differentially regulated metabolites 73 105 69 62

103

2.4.3 Analysis of combined differentially regulated proteins and metabolites in the 5-KL

reveals significant reduction in ROS with higher level of copper concentration in the

media and no significant copper effect in the 20-L reactor

As detailed in the section 2.3 Materials and Methods, we conducted proteomic analysis and

obtained metabolomics data for both bioreactor scales at various time points at high vs. low copper.

The differentially regulated proteins and metabolites under the various conditions are listed in

Table 2-4 in section 2.6 Appendix.

We focus first on results from analysis of the 5-KL scale cultures. Using IPA (31), we

combined the differentially regulated protein and metabolite data at several time points (Days 3, 5

and 7); however, for Days 3 and 5, no biological functions were found to be differentially altered

at a statistically significant level between the two copper levels. Importantly, on Day 7 we

observed significant changes due to the higher copper concentration (Figure 2-2 and Figure 2-3).

Figure 2-2 shows that an increase in copper led to a significant decrease in the biological functions

“cell death”, “killing of cells” and “apoptosis” (Z-values < -2), as defined by IPA. The reduction

in these biological functions is consistent with the measured phenotypic changes shown in Figure

2-1.

104

Figure 2- 2 Prediction of significantly repressed biological functions related to cell fate for the 5-

KL bioreactor using IPA.

Combined differentially regulated proteins and metabolites (high vs. low copper) were used as

input to the IPA tool. The color codes for quantitative measurements and predicted outcomes are

shown in the inset panel. Significantly repressed biological functions related to cell death and

survival (blue octagon) at day 7. Z scores of cell death, killing of cells, and apoptosis of pancreatic

cancer cell lines were -2.227, -2.179, and -2.176, respectively.

105

Figure 2- 3 Prediction of significantly repressed biological functions related to ROS generation for

the 5-KL bioreactor using IPA.



shown in the inset panel. Significantly repressed biological functions related to free radical

scavenging (blue octagon) at day 7. Z score of production of reactive oxygen species was -2.686,

and Z score of synthesis of reactive oxygen species was -2.836.

Figure 2-3 presents the other significant biological function from the combined data – the

high copper condition reduced the level of ROS. The results in Figure 2-2 and Figure 2-3 suggest

that the reduction in ROS with higher copper is related to a decreased level of cell death. This is

106

not surprising given that increased ROS is known to cause damage to proteins, DNA, and lipids,

leading to cell death (32-34). There are a number of metabolites and proteins listed in Figure 2-2

and Figure 2-3 that are related to both oxidative stress and cell death, supporting the connection

between the two biological functions.

With respect to metabolites, cholesterol, linoleic acid, oleic acid, and glutamic acid were

observed to be down-regulated with the high copper condition on Day 7. Cholesterol can

potentially alter the mitochondrial membrane potential and hence increase ROS generation,

resulting in activation of apoptosis (35, 36). Oleic and linoleic acids have been reported to induce

ROS production through activation of NADPH oxidase (37, 38). These acids can also lead to free

fatty acid-mediated apoptosis and cell death (39, 40). Glutamic acid can cause increased ROS

production by affecting the function of succinate dehydrogenase in mitochondria (41). Thus, the

down-regulation of these metabolites with additional copper supports the reduction of ROS

generation and cell death. Further, glycine and cysteine were up-regulated in the high copper

condition, both of which are known to suppress apoptosis (42-44). Moreover, glycine and

guanosine, which was also found to be up-regulated with additional copper, were reported to

inhibit the ROS generation caused by glutamic acid (41).

With respect to proteins, the high copper condition leads to down regulation of BAX (Bcl-

associated X protein) and BAK1 (Bcl-2 antagonist or killer) on Day 5 and Day 7, respectively.

BAX and BAK1 are well-known apoptotic regulators, belonging to the Bcl-2 (B cell lymphoma 2)

family, which control and regulate the apoptotic mitochondrial events by governing the

permeability of mitochondrial membrane (45). Their down-regulation should decrease apoptosis

(46). Arylsulfatase A (ARSA), which can hydrolyze ascorbic acid 2-sulfate to ascorbic acid (47),

was up-regulated with high copper on Day 7, which again points to reduction of both cell death

107

and ROS production. Additional analysis for individual proteins and metabolites which are related

to cell death and ROS production can be found in the section 2.4.5 Additional differentially

regulated protein analysis.

We next turn to examine the combined ’Omics data for the 20-L scale where the

productivity was much higher than the 5-KL scale and where there was much lower influence with

increased copper. Here, three time points were examined - Days 3, 6 and 10. No significant

difference at any time point was found in ROS production or cell death between the two copper

levels, in agreement with the results in Figure 1. We did find, however, that there was a significant

increase in the biological function of protein synthesis on Day 6 (Z = 2.213) (Figure 2-8) and a

decrease in cell aggregation for Days 6 (Z = -2.203) and 10 (Z = -2.203). The increase in protein

synthesis may relate to the small increase in titer observed for the higher copper level in Figure 2-

1C.

To further compare the 20-L and 5-KL scales, we selected metabolites from the IPA

biological function category “reactive oxygen formation” that were observed for both Day 7 for

the 5-KL and Day 10 for the 20-L scales and compared the two scales under the same copper

concentration level. The shared relevant metabolites were reduced glutathione (GSH), sphingosine,

sorbitol, NAD+, cysteine, guanosine, glycine, cholesterol, homocysteine, and glutamic acid.

Subjecting these metabolites to IPA analysis, we found that at the low copper condition, ROS

formation was significantly elevated for the 5-KL scale, relative to the 20-L scale (Z = 2.190), and

was still higher for the high copper condition (comparing again 5-KL to 20-L) but not at a

significant level (Z = 1.452) (Figures 2-4A and 2-4B). The conclusion is that the ROS stress is

observed for the low copper condition for the industrial production scale, relative to the 20-L scale,

108

but that the high copper condition for the 5-KL scales moderates the stress sufficiently that the

ROS difference is not statistically different for the two scales.

Figure 2- 4 Prediction of the formation of ROS for 5-KL vs 20-L scales with low and high copper

conditions.

The color legends are same as in Figure 2. (A) Z score 2.190 under low copper conditions; (B) Z

score 1.452 under high copper condition. (C) The western blotting of glutathione peroxidase (GPx)

1/2 for the 5-KL (Day 5 and Day 7) and 20-L scale (Day 6 and Day 10) and under different copper

levels. Higher regulated GPx 1/2 level indicated higher oxidative stress. The ROS formation is

significantly activated between the two bioreactor scales with the low copper condition, but not

with the high copper condition, indicating that the difference for 5-KL vs 20-L scales in ROS

formation is moderated by additional copper. The quantitative estimation was performed with

ImageJ (http://imagej.nih.gov/ij/).

To provide additional support for the finding of oxidative stress in the large scale bioreactor

and its reduction at the higher copper level, the level of glutathione peroxidase (GPx) was

determined by western blotting. GPx, a marker for oxidative stress, catalyzes the reduction of

109

peroxides by means of reducing glutathione, forming glutathione disulfide and water (33). As

seen in Figure 2-4C, GPx was down-regulated with high copper for Days 5 and 7 of the 5-KL scale,

reflecting the reduced response of the cells to lower oxidative stress. Further, for the 20-L scale,

GPx showed no significant difference between the two copper levels for western blotting on Days

6 or 10 (Figure 2-4C). As a clear demonstration of the difference in oxidative stress between the

5-KL and 20-L scales, Figure 3C further shows by western blotting that a much higher level of

GPx is produced for the industrial scale bioreactor under the low copper condition. The cells in

the 5-KL scale respond to the increased oxidative stress by producing more GPx to try to alleviate

this stress. Furthermore, with further study, GPx could become a biomarker for the oxidative stress

for CHO cell bioreactors. The picture that emerges is that the 5-KL reactor is under oxidative

stress and that this stress affects viable cell density (apoptosis and cell death).

2.4.4 Hypoxia (intermittent) in 5-KL bioreactor reduces cell viability and productivity

The cause of the increased ROS in the 5-KL industrial scale was next explored. One of the

major concerns in process control, especially for production scale bioreactors, is the level and

uniformity of oxygen mixing throughout the reactor. Previous studies have shown that the oxygen

transfer coefficient is 50% lower in 5-KL scale relative to the 20-L scale (11), suggesting the

potential for insufficient oxygen mass transfer in 5-KL bioreactors. In addition, the mixing time

of a 5-KL bioreactor at its maximum allowed agitation (>100 seconds) is much longer than that of

a 20-L bioreactor (42-86 seconds) (11). It is reasonable to assume the presence of an oxygen

gradient in 5-KL bioreactors during cell culture. Since cells are under continual movement due to

stirring with a relatively long mixing time, cells can, from time to time (intermittent), experience

110

lower oxygen levels, i.e. hypoxia. Furthermore, hypoxia is known to increase ROS (48), and,

separately, it has been shown that intermittent hypoxia induces potentially even greater cellular

oxidative stress than continuous hypoxia (48-52). In the present study, hypoxia in the large

bioreactor and increased ROS observed in the ‘Omics studies (Figure 2-2 and Figure 2-3) are likely

related.

Although DO profiles are monitored and controlled at the same set-points for all

experimental conditions, the measurements taken are indicative of DO levels at a single point in

the bioreactor. Furthermore, DO gradients leading to hypoxic conditions are difficult to measure

in large tanks. Therefore, to support the hypothesis of hypoxic stress in the large scale process,

we measured the relative levels of fibronectin by western blotting, ELISA, and qPCR. Fibronectin

is known to be up-regulated in hypoxia (53, 54). As shown in Figures 2-5A and 2-5C (ELISA and

western blotting), relative to high copper, fibronectin is greater at the low copper level on Day 7

for the production scale process of 5-KL. qPCR (Figure 2-5B) also showed up-regulation of the

fibronectin gene for the lower copper level on Day 7 for the production scale (Day 5 was not

measured by qPCR). Further, the increased level of fibronectin with process time for the 5-KL

scale with the low copper condition indicates that the effects of hypoxia were cumulative. On the

other hand, for the 20-L scale, Figure 2-5 shows the differences of the fibronectin level for the two

copper conditions to be far less relative to the 5-KL scale. The results support that intermittent

hypoxic stress is a factor in the reduction of the viable cell density after Day 3 (Figure 2-1) for the

industrial scale process and that the higher copper level is able to moderate the stress, resulting in

the higher productivity for the 5-KL bioreactor. These results suggest that fibronectin could

become a biomarker for hypoxic stress with production scale CHO cell bioreactors.

111

Figure 2- 5 Results demonstrating hypoxic stress.

The regulation profile of selected marker for hypoxia stress, fibronectin, from (A) ELISA, (B) the

qPCR, and (C) the western blotting for the 5-KL and 20-L scale and under different copper levels

at certain time points. Higher level of fibronectin correlates with a higher hypoxic stress level. The

error bars were calculated based on the two biological replicates. The quantitative estimation was

performed with ImageJ (http://imagej.nih.gov/ij/).

2.4.5 Analysis of additional differentially regulated proteins supports the ROS and hypoxia

roles in the 5-KL bioreactor

We next explored the differentially regulated proteins separately using a different data

analysis platform – MetaCore – to search for additional non-metabolic pathways which could

further support the connection between hypoxia, oxidative stress, and the influence of the copper.

For the 5-KL scale, deactivation of apoptotic and cell adhesion-related pathways was found with

112

high copper (see Table 2-2). It has been reported that increased ROS levels can alter endothelial

barrier function and cause the differential-regulation of certain cell adhesion related proteins (51,

55). Thus, the reduction in these pathways by high copper are in support of the reduction in ROS

and oxidative stress in the large bioreactor.

Table 2- 3 MetaCore analysis of proteomic data of the 5-KL scale. The significant differentially

regulated proteins related to apoptosis and cell adhesion pathways

Pathway groups a Day 5 Day 7

Cell adhesion N-cadherin (0.52)b,

β-actin (0.93)

β-catenin (-0.44),

p120-catenin (-0.36),

desmoplakin (-0.37)

Apoptosis and survival Bax (-0.38) elF2S1 (-0.36),

Bak (-0.51)

a Significant pathways with p values < 0.01.

b Log2 ratios of high vs. low copper conditions.

Further, β-catenin, one of the adherent junction proteins, was found to be significantly

down-regulated with the high level of copper in the production scale bioreactor (Table 2-3). The

level of β-catenin is decreased as excess ROS is diminished (55), supporting the role of copper in

the reduction of ROS. Moreover, since β-catenin is an important control of FOXO, HIF-1, and

Wnt signaling, the regulation of these signaling pathways may potentially play a role in the cell

fate under hypoxic conditions (56). With increase in copper, several proteins downstream of Wnt

signaling, MMP-3 (Day 5) (57) and CD44 (Day 7) (58), were found to be down-regulated,

indicating the deactivation of Wnt signaling (Table 2-4). The deactivation of Wnt/β-catenin

113

signaling likely slowed the G1/S phase transition of the cell cycle through the regulation of cyclin

D1 and c-Myc (59, 60). This result could decrease cell death and thus raise CHO cell productivity

(61).

2.4.6 The differentially regulated proteins related to important biological functions and

pathways

As shown in Figure 2-2, 2-3, Table 2-3 and Table 2-4, there are several differentially

regulated proteins and metabolites involved in significant biological functions and pathways with

the 5-KL scale. Multiple endocrine neoplasia I (MEN1), which was down-regulated with the high

copper condition on Day 7, can induce apoptosis through the response of BAX and BAK1 (62).

p120-Catenin (CTNND1) was also down-regulated on Day 7 with high copper for the 5-KL scale.

The reduction of this protein was reported to decrease apoptosis (63). Moreover, p120-catenin

directly functions as a part of the “core” cadherin-catenin complex (64) with other members

including β-catenin, which was also down-regulated on Day 7. Other proteins, including folliculin

(FLCN, also known as BHD) (65), angiomotin (AMOT) (66), and spliceosome-associated protein

CWC15 (CWC15) (67) were all reported to be involved in the apoptotic process. The down-

regulation of these proteins on Day 7 with high copper points to the reduced cell death as seen in

Figure 1.

γ-Glutamyltransferase (GGTL3) was down-regulated on Day 7 with the high copper

condition. It is involved in glutathione degradation for other physiological functions other than

scavenging ROS (68, 69). The lower level of GGTL3 indicates a lower GSH degradation rate and

a larger GSH pool as antioxidant. Moreover, several amino acids were found differentially

114

expressed in the high copper condition. Histidine, being a singlet oxygen free radical scavenger,

can protect the cell from ROS induced apoptosis and cell death (70). γ-Amino butyric acid (GABA),

which was found up-regulated in high copper, can also prevent cell death by inhibiting apoptosis

by membrane depolarization and Ca2+ influx, the latter of which activates PI3-K/Akt-dependent

growth and survival pathways (71). A number of polyamine compounds such as spermine,

spermidine and putrescine were also found up-regulated in the presence of high copper.

Polyamines can protect against ROS generated stress by reducing oxidative damage of DNA both.

Polyamines can also protect cells against ROS-induced glutathione oxidation, lipid peroxidation

and protein oxidation (72). Urea, a cellular nitrogenous compound breakdown product, was found

to be downregulated in the high copper samples. Urea has been shown to increase production of

mitochondria associated ROS generation (73).

2.4.7 Superoxide dismutase 1 is potentially involved in the reduction of intermittent

hypoxia and oxidative stress with addition of copper in the 5-KL bioreactor

Copper is a well-known trace metal co-factor for a number of enzymes (13). Given the

reduction in oxidative stress with increased copper, we sought potential copper binding enzyme

targets that acted as antioxidants. One such enzyme which is a copper binding protein is superoxide

dismutase 1 (SOD1) which catalyzes the conversion of the superoxide radical to oxygen or

hydrogen peroxide (74). While SOD1 was detected in our proteomics workflow, it was removed

by the filtering steps. Nevertheless, we, probed SOD1 at both copper levels on Day 7 by western

blotting. SOD1 is found to be up-regulated on Day 7 for high copper with the 5-KL scale (Figure

5). This up-regulation may result in a stronger defense against the excess ROS generated by the

115

intermittent hypoxia. On the other hand, additional copper did not up-regulate SOD1 expression

for the 20-L scale (Figure 2-6), likely due to the lack of need for the additional antioxidant.

Moreover, as seen in the western blotting, the expression of the SOD1 was similar between the 5-

KL and 20-L scale under the low copper condition. Since copper is a regulator for a wide range of

signal transduction, the full range of specific multifaceted molecular actions of copper clearly

require further investigation.

Figure 2- 6 Western blotting of SOD1, a copper-binding enzyme, for the 5-KL and 20-L scales

and under different copper levels.

The quantitative estimation was performed with ImageJ (http://imagej.nih.gov/ij/).

2.5 Discussion

We have, for the first time, presented a systems biology study of a manufacturing scale

CHO bioprocess. In this work, lower cell viability and process productivity were observed in a 5-

KL production scale, relative to a 20-L benchtop bioreactor (Figure 2-1). We found that, due to

trace copper in the sodium bicarbonate used to control pH (increase in copper from 0.02 μM to 0.4

μM), the cell viability and process productivity of the 5-KL scale was doubled during the stationary

phase process, without at the same time disturbing the product quality attributes. This increase in

116

copper concentration had only a minor effect on the performance of the 20-L bioreactor. These

results clearly show that the extrapolation of phenotypic behavior on the lab scale does not

necessarily lead to the same behavior in the production scale process.

In previous studies, researchers found that increased copper led to high productivity of the

CHO cell process with a decrease (consumption) of lactate in the stationary region. It was reasoned

that the higher copper level affected copper binding to the COX proteins, thus reducing ROS

produced in the mitochondria (18, 20). In the present study, we did not find lactate consumption

for the 5-KL scale. Importantly, the previous studies used much smaller laboratory scale bioreactor

volumes of 5-L or less, at least a 1000- fold decrease compared to the manufacturing scale here.

Also, the concentration of copper was generally higher, and process protocols likely differed from

our study (e.g., media, CHO strains, feeding regimens, reactor designs, etc.).

We sought to understand the reasons for doubling of the titer with the addition of trace

amounts of copper in the 5-KL scale, while the same effect was not observed for the lab scale

process. Quantitative proteomics was conducted on the two bioreactor scales at the two copper

levels in the growth, stationary and early death phases. For the 5-KL production scale, analysis of

the proteins differentially regulated above and below the designated cut-off threshold of high

versus low copper provided only limited statistically significant insight into the causes of the

phenotypic differences. Analysis of the metabolomics data also provided only limited insight.

Importantly, the combination of the proteomics and metabolomics differentially regulated data did

lead to statistically significant biological insight. This success of the combined-omics

demonstrates the potential of ’Omics in elucidating underlying biology of complex processes

(systems biology). Undoubtedly, further additions of other ’Omics data, e.g. lipidomics,

117

transcriptomics, etc., would provide deeper insight into the biology of the production of

biopharmaceuticals.

The picture that emerges from our ’Omics analysis is that for the 5-KL scale, cell death

and ROS production was reduced when the higher level of copper was present (Figure 2-2 and 2-

3). The reduced ROS generation was likely related to decrease cell death, since oxidative stress

caused by excess ROS is a well-known trigger for apoptosis (33, 45, 75). Reduction in the high

copper condition of oleic, linoleic and glutamic acids, known to increase ROS production supports

this conclusion. That apoptosis was decreased in the high copper level can be clearly seen in the

reduction in the concentration of BAX and BAK1. On the other hand, analysis of the 20-L scale

between the two copper levels did not show statistically significant differences of ROS production,

suggesting less oxidative stress than found for the 5-KL scale. Western blot analysis of glutathione

peroxidase, a marker for oxidative stress, clearly shows that the 5-KL scale is under more stress

than the 20-L scale (Figure 2-4).

We then investigated the cause of the increased ROS production for the industrial scale

process. Given the general concern of gas mixing in kiloliter scale processes and a previous study

that suggested the existence of lower oxygen concentration zones in large-scale bioreactors (11),

we hypothesized that the periodic contact of CHO cells with lower oxygen regions (intermittent

hypoxia) was related to the increased ROS level in the large scale bioreactor. Hypoxia is well

known to create excess ROS (48, 52). Fibronectin, which is known to be up-regulated under

hypoxia (53, 54), was found by qPCR, ELISA and western blotting to be higher in the large scale

bioreactor compared to the 20-L scale under the low copper condition (Figure 2-5). Thus, it seems

likely that the difficulty in efficient oxygen mixing for the 5-KL scale resulted in intermittent

hypoxia.

118

Figure 2-5 further shows that increased copper had a significant effect in reducing the level

of fibronectin in the production scale process, and only a minor reduction of the protein for the lab

scale process. Thus, the increased copper would appear to be counteracting the hypoxia. As

the ’Omics results show, the higher copper level reduced the production of ROS, an outgrowth of

the hypoxia. Further study would be required to determine exactly how copper affects the

production of ROS, given that copper, as other trace metals, can bind to many enzymes. We did

find by western blotting that SOD-1, a copper binding enzyme and antioxidant, appears to be

affected by the increased copper (Figure 2-6). However, more study would be required to elucidate

the biology involved.

In conclusion, the present study is a clear example of how multi-omics approaches can be

applied to explore the biology of the process and improve productivity (4). Such strategies will

drive improvement in bioprocessing and enable the efficient development of robust, scalable and

well-understood processes. Furthermore, such studies can lead to biomarkers that can be utilized

to rapidly monitor changes in a process. With continued advances, there is little doubt systems

biology will find increasing use in biomanufacturing through a greater understanding of the

biology of the process.

2.6 Conclusion

As a summary of the conclusion (Figure 2-7) in the industrial scale 5-kL bioreactors for

biopharmaceutical production, CHO cells undergo intermittent hypoxia due to incomplete oxygen

mixing. As found through combined metabolomics and proteomics datasets, this condition leads

to excess ROS. The resultant stress causes decreased productivity compared to bench-scale 20-L

119

bioreactors. Additional copper, from 0.02 μM to 0.4 μM, reduced the stress and improved the

productivity 2 fold for the 5-KL scale. For the 20-L scale, copper did not affect productivity

significantly as the cell stress was lower.

Figure 2- 7 The scheme of the summary that increased copper reveals hypoxia as a cause of lower

productivity on scale-up to industrial CHO bioprocess.

120

2.7 Appendix

2.7.1 Perspective of biological effects caused by additional copper in the media.

CHO cells in the large scale bioreactor were shown to encounter hypoxic stress as well as

hypoxia-induced oxidative stress caused by excess ROS. An increase in copper concentration

helped the cells resist the stresses, leading to increased viability. Previously, we proposed that the

copper-binding protein SOD1 could be one of the reasons of how copper affected the bioprocess

in the large scale. However, in addition to SOD1, there are other copper influenced processes that

could possibly be affected based on the current biological analysis. Two of the most promising

hypotheses are list below, but they will need further study to refine or confirm.

First, it could be the mitogen-activated protein kinase (MAPK) signaling transduction (14,

76, 77) that was affected by additional copper and which may aid in cell survival under cellular

stresses. The significant pathways from MetaCore proteomic analysis of both the 5-KL and 20-L

bioreactors were studied, and MAPK signaling transduction attracted our attention. The pathway

maps from MetaCore contain many sub-biological processes, and MAPK signaling was the one

most shared in the top 10 statistically significant pathways (ranked by the p values) of the 5-KL

and 20-L scales. Although the proteins directly related to MAPK signaling such as mitogen-

activated protein kinase kinases (MAPKKs) and mitogen-activated protein kinases (MAPKs) were

not differentially regulated, the activity of the MAPKKs and MAPKs depends on their

phosphorylated forms, not their overall protein amounts. Unfortunately, phosphoproteomics was

beyond the scope of the present study. For the 5-KL bioreactor, JNK and p38 were related to

apoptosis and survival pathways. For the 20-L bioreactor, ERK was related to cell adhesion and

cytoskeleton remodeling pathways, and JNK showed up in the development pathways. Moreover,

121

the elevation of N-cadherin in the 20-L bioreactor in Day 6 could also be an indicator of the

enhancement of ERK signaling with the high copper condition. The N-cadherin (CDH2) level was

reported to decrease after the ERK inhibition treatment, and the knockdown of N-cadherin also

caused a decrease of the phosphorylation rate of ERK (78). Also, the activation of JNK could

increase phosphorylated Bax and Bak, resulting in the activation of these two pro-apoptotic

proteins (79), and p38 would activate the phosphorylated Bax as well (80). Importantly, the

oxidative stress caused by ROS can potentially induce activation of MAPK pathways (77), which

are reported to be regulated by copper (14). It is possible that copper helped to balance the

activation status of the MAP kinase which benefitted the cell survival under the oxidative stress in

the large scale bioreactor. Thus, MAPK signaling could be a candidate process through which

copper affected the growth status of CHO cells.

Second, it is possible that the regulation between Wnt signaling and FOXO and HIF-1

related signaling transductions was influenced by additional copper, leading to a stronger defense

against both hypoxic and oxidative stresses in the 5-KL scale. In our study, β-catenin was found

to be down-regulated with additional copper on Day 7 in the 5-KL scale. First, a higher ROS level

is known to increase the abundance of β-catenin (55), supporting the role of copper in the reduction

of ROS in the large scale. Second, β-catenin is a key protein that regulates the activity balance of

FOXO, HIF-1, and Wnt signaling, all of which pathways play a role in cell fate under the hypoxic

condition (56). With the high copper condition in the 5-KL scale, several proteins which are

downstream of Wnt signaling, MMP-3 (Day 5) (57) and CD44 (Day 7) (58), were down-regulated,

indicating the deactivation of Wnt signaling transduction. The deactivation of Wnt/β-catenin

signaling can potentially slow or even arrest the G1/S phase transition of cell cycle through the

regulation of cyclin D1 and c-Myc (59, 60). The arresting/slowing of cells in G1/S phase could be

122

one reason of the decreased cell death and hence increase of the host cell productivity for the

therapeutic protein in the 5-KL scale (61, 81). Moreover, cellular oxygen consumption under the

hypoxia condition can lead to depleted oxygen, resulting in even more severe hypoxic stress. For

this reason, the beneficial cellular stress response for hypoxia should be the cell cycle arrest to

reduce oxygen consumption, increasing cell survival chances (82). Thus, copper might promote

an advantageous response through the regulation of the Wnt/FOXO/HIF-1 signaling balance in the

CHO cells under stress in the large scale.

Based on the biological analysis, MAPK signaling and Wnt/FOXO/HIF-1 signaling are

promising targets which might have been affected by additional copper, but further study is

required to explore these hypotheses. Moreover, the effects of copperas any trace metal are likely

complex with multiple factors. The unraveling of this complex effect will need to be explored in

future studies.

123

Figure 2- 8 Prediction of significantly activated biological functions for the 20-L bioreactor using

IPA at Day 6.



shown in the inset panel. Significantly activation of biological function related to synthesis of

protein (orange octagon) is with Z-score 2.213.

Table 2- 4 Differentially regulated proteins and metabolites with the 5-KL and 20-L bioreactors.

5-KL scale Day 5

Differentially Expressed Proteins Differentially Regulated Metabolites

RefSeq Genes names Average Day 5 log2 ratios

(high copper/low copper)

Metabolite Names Average Day 5 log2 ratios


NM_177717.4 4732456N10Rik 0.52 γ-glutamylglycine 1.44

NM_001163493.1 Stard13 0.60 palmitoyl sphingomyelin -0.63

NM_153319.2 Amot 0.40 n-butyl oleate -0.70

NM_175318.4 Spty2d1 0.38 13-methylmyristic acid -0.73

NM_172382.2 Kdm4a 0.44 leucylphenylalanine -0.90

NM_145615.4 Etfa 0.65 leucylisoleucine -0.66

NM_009610.2 Actg2 0.93 γ-glutamyltryptophan 0.80

NM_175659.1 Hist1h2ah 2.19 isoleucylglycine -0.66

NM_025314.3 Dtd1 0.39 Alanyl-Leucine -1.26

NM_026758.3 Mphosph6 0.38 Valyl-Phenylalanine -1.15

NM_133939.1 Lsm8 0.51

(S)-3-methyl-2-

oxopentanoic acid 0.75

NM_001048250.2 2810008M24Rik 0.51

1-oleoyl

lysophosphatidylcholine 1.26

NM_001033135.3 -1.01 1-oleoylglycerol -0.64

12

4

Table 2-4 (continued)

5-KL scale Day 5 (continued)






NM_001001602.2 Dab2ip -0.38 1-palmitoylglycerol -1.02

NM_146183.2 Zfp428 -0.40 2-oleoylglycerol -0.99

NM_198113.2 Ssh3 -0.41 2-palmitoylglycerol -1.86

NM_139236.3 Nol6 -0.68 3'-adenylic acid -0.72

NM_027992.3 Tmem106b -0.74 4-guanidinobutanoic acid 0.69

NM_172508.2 Dse -0.50

4-hydroxyphenyllactic

acid, (DL)-isomer 1.01

NM_145531.2 Spg11 -0.41 9Z-hexadecenoic acid -0.86

NM_026636.2 5430437P03Rik -0.41 adenine -0.61

NM_029841.3 2510039O18Rik -0.32 α-D-glucose 6-phosphate 0.88

NM_001005767.4 Parl -0.50 Asp-Leu -1.71

NM_183106.2 Ttc17 -0.33 behenic acid -1.30

NM_008822.2 Pex7 -0.52 cis-10-heptadecenoic acid -1.37

NM_197991.2 2310044H10Rik -0.31 desmosterol 0.68

NM_130881.2 Pabpc4 -0.47 D-fructose 0.61

NM_007527.3 Bax -0.38 erythronic acid -0.75

NM_133683.3 Tmem19 -0.30 GABA -0.73

12

5








NM_010380.3 H2-D1 -0.50 galactitol 0.90

NM_144857.1 BC011248 -0.53 galactose-1-phosphate -0.67

NM_008788.2 Pcolce -0.34 γ-glutamylisoleucine 0.67

NM_008408.4 Stt3a -0.38 γ-glutamyl-leucine 0.73

NM_080561.3 Rnf216 0.34 γ-glutamylphenylalanine 0.73

NM_175394.2 Wtap 0.30 γ-glutamylthreonine 0.76

NM_030252.2 BC003266 0.32 γ-glutamyl-valine 0.75

NM_001177965.1 Naa10 0.33 GDP fucose 0.66

NM_001008238.2 Bnip2 0.47 glutathione -0.91

NM_001076676.1 Usp33 0.37 glycylleucine -2.19

NM_144806.2 Prpsap2 0.32 guanine 0.93

NM_009104.2 Rrm2 0.34 hypotaurine 0.80

NM_009736.3 Bag1 0.42 Ile-Ala -0.64

NM_134131.2 Tnfaip8 0.33 Ile-Ile -1.19

NM_172552.3 Tdg 0.37 inositol 1-phosphate -1.52

NM_025349.2 Lsm7 0.39 lauric acid -0.66

NM_138753.2 Hexim1 0.33 Leu-Leu -0.93

12

6








NM_001205226.1 Cnot1 0.30 L-glutamic acid -0.77

NM_021414.5 Ahcyl2 0.55 L-glutamine -1.99

NM_001033439.2 Lrch1 0.40 L-homocysteine -0.93

NM_177545.4 Vangl1 0.42 linoleic acid -0.95

NM_001099624.2 Rapgef2 0.47 L-lactic acid 0.93

NM_025365.3 Tomm6 0.54 L-ornithine 1.43

NM_009680.3 Ap3b1 0.37 L-serine 0.59

NM_013840.3 Uxt 0.38 myristic acid -0.60

NM_001161816.1 Gm15455 0.37 N-acetyl-L-tyrosine -0.60

NM_001177946.1 Aamdc 0.31 nervonic acid 0.63

NM_009193.1 Slbp 0.39 N-formylmethionine 0.69

NM_007664.4 Cdh2 0.52 oleylamide -1.04

NM_001162989.1 Phax 0.33 orotic acid 1.83

NM_008761.3 Fxyd5 -0.38 Phe-Leu -0.90

NM_019478.3 Pqbp1 -0.38 phenylalanylphenylalanine -1.13

NM_007856.2 Dhcr7 -0.38 Phe-Ser -1.12

NM_011127.2 Prrx1 -0.31 putrescine 1.24

12

7








NM_010809.1 Mmp3 -0.47 pyrophosphate -0.68

NM_011399.3 Slc25a17 -0.30 pyrrolidonecarboxylic acid 0.59

NM_001135567.1 1190007I07Rik -0.31 rac-1-stearoylglycerol -0.72

NM_008130.2 Gli3 -0.59 Ser-Leu -1.53

NM_001081118.1 Phrf1 -0.32 stearic acid amide -1.05

NM_028440.1 3110003A17Rik -0.87 trans-4-hydroxy-L-proline 0.59

NM_026744.3 Mrpl53 -0.33 Tyr-Leu -0.62

NM_019951.1 Sec11a -0.45 UDP-D-galactose -0.80

NM_028208.1 Ptar1 -0.38 urea -0.66

NM_172402.3 Slc25a32 -0.31 xanthosine monophosphate 0.83

NM_026554.4 Ncbp2 -0.47

NM_019755.4 Plp2 -0.35

NM_011014.2 Sigmar1 -0.34

XM_003946427.1 Pwp2 -0.56

NM_008222.4 Hccs -0.39

NM_178602.3 Grinl1a -0.49

NM_198027.2 Alkbh6 -0.35

12

8



Differentially Expressed Proteins



NM_175518.5 D730040F13Rik -0.50

NM_001160182.1 Tor1aip2 -0.36

NM_027259.1 Polr2i -0.30

NM_001081151.1 Gan -0.40

NM_025969.4 1700034H14Rik -0.37

NM_009444.1 Tgoln2 -0.40

NM_025813.3 Mfsd1 -0.43

NM_145959.3 D15Ertd621e -0.34

NM_008889.2 Ppp1r14b -0.31

NM_001078167.1 Sfrs1 -0.36

NM_146019.2 Chd3 -0.32

NM_026698.2 Tmem129 -0.38

NM_001167680.1 Rhbdf2 -0.31

NM_024187.4 U2af1 -0.41

NM_053162.2 Mrpl34 -0.40

NM_020585.2 Golga7 -0.30

NM_029211.2 Rnf121 -0.34

12

9



Differentially Expressed Proteins



NM_009279.3 Ssr4 -0.30

NM_030566.2 Rabep2 -0.35

5-KL scale Day 7


RefSeq Genes names Average Day 7 log2

ratios


Metabolite Names Average Day 7 log2

ratios (high copper/low

copper)

NM_177717.4 4732456N10Rik -0.95 γ-glutamylglycine 1.34

NM_001033135.3 Rnf149 -0.98 palmitoyl sphingomyelin -0.96

NM_144857.1 BC011248 -0.55 n-butyl oleate -1.20

NM_026229.4 Gpr89 -0.40 13-methylmyristic acid -2.08

NM_025368.3 Josd2 -0.58 γ-glutamyltryptophan 1.35

NM_021330.4 Acp1 -0.52 1-stearoyl-GPI (18:0) -1.16

NM_011515.4 Vamp7 -0.36 isoleucylglycine -0.64

NM_025628.2 Cox6b1 -0.33 Leucyl-Methionine 1.22

13

0





ratios




copper)

NM_001177785.1 Cd44 -0.35

(S)-3-methyl-2-oxopentanoic

acid 1.62

NM_023434.3 Tox4 0.36 13,16-docosadienoic acid -1.20

NM_175151.4 Tatdn1 0.33

1-oleoyl


NM_153319.2 Amot 0.46 1-oleoylglycerol -1.33

NM_011749.4 Zfp148 0.32

1-oleoyl-

lysophosphatidylethanolamine -0.98

NM_031998.2 Tsga14 0.48 1-palmitoylglycerol -1.14

NM_029759.3 Fam54b 0.31 2-oleoylglycerol -3.07

NM_175095.4 Commd2 0.41 2-palmitoylglycerol -1.66

NM_177780.3 Dock5 0.39 3'-adenylic acid -2.26

NM_026899.3 Ssu72 0.30 4-hydroxybutanoic acid -1.15

NM_025946.5 Romo1 1.28

4-hydroxyphenyllactic acid,

(DL)-isomer 0.64

NM_028626.1 Mcee 0.64 9Z-hexadecenoic acid -1.82

NM_153530.2 Dis3l2 0.35 9Z-tetradecenoic acid -0.91

13

1





ratios




copper)

NM_009713.4 Arsa 0.39 acetylleucine 0.83

NM_008130.2 Gli3 -0.34 α-D-glucose 6-phosphate 0.90

NM_026120.4 2410127L17Rik -0.31 α-ketoisocaproic acid 1.04

NM_024231.2 Zfpl1 -0.31 azelaic acid -0.82

NM_026758.3 Mphosph6 -0.48 behenic acid -0.84

NM_019951.1 Sec11a -0.69 β-alanine -0.85

NM_025436.2 Sc4mol -0.31 β-glycerophosphoric acid -1.05

NM_013897.2 Timm8b -0.41 biopterin 0.74

NM_173376.3 Rbmx2 -0.48 cadaverine 2.18

NM_010593.1 Jup -0.44 cholesterol -1.05

NM_026932.4 Ebna1bp2 -0.36 cis-10-heptadecenoic acid -1.70

NM_146018.1 Flcn -0.34 coenzyme A 0.64

NM_026282.5 Spc24 -0.35 D-fructose 0.74

NM_007523.2 Bak1 -0.51 eicosa-11Z, 14Z-dienoic acid -1.49

NM_198102.2 Tra2a -0.37 erythritol 1.18

13

2





ratios




copper)

NM_198113.2 Ssh3 -0.58 ethanolamine -1.58

NM_027992.3 Tmem106b -0.69 fumaric acid 0.65

NM_019733.2 Rbpms -0.31 GABA 1.55

NM_030147.2 Brd8 -0.34 galactose-1-phosphate -0.67

NM_197991.2 2310044H10Rik -0.40 γ-glutamylisoleucine 0.80

NM_025813.3 Mfsd1 -0.38 γ-glutamyl-leucine 1.02

NM_026114.3 Eif2s1 -0.36 γ-glutamylphenylalanine 0.61

NM_007924.2 Ell -0.40 γ-glutamylthreonine 1.05

NM_009009.4 Rad21 -0.31 γ-glutamyl-valine 1.24

NM_145531.2 Spg11 -0.36 glycine 0.75

NM_018776.1 Crlf3 -0.43 glycylglycine 0.92

NM_133835.2 Ubac1 -0.31 glycylleucine -0.94

NM_019837.2 Nudt3 -0.30 gondoic acid -2.06

NM_001205226.1 Cnot1 -0.42 guanosine 0.98

NM_023153.3 Cwc15 -0.30 heptadecanoic acid -1.16

13

3





ratios




copper)

NM_018740.2 Rai12 -0.64 heptanoic acid -0.86

NM_144786.2 Ggt7 -0.41 hypoxanthine 1.22

NM_001168488.1 Men1 -0.36 indole-3-lactic acid 2.86

NM_001135567.1 1190007I07Rik -0.50 inositol 1-phosphate -1.98

NM_001177556.1 Gng12 -0.33 L-alanine 0.80

NM_026490.2 Mrpl19 -0.35 lauric acid -1.44

NM_028812.3 Gtf2e1 -0.40 L-cysteine 0.68

NM_028440.1 3110003A17Rik -0.34 L-cystine -0.74

NM_145073.2 Hist1h3g -0.30 Leu-Trp 0.67

NM_026448.3 Klhl7 -0.33 L-glutamic acid -0.86

NM_026452.2 Coq9 -0.47 L-glutamic acid 5-methyl ester 0.83

NM_023842.2 Dsp -0.37 L-histidine 0.97

NM_001085450.1 Ctnnd1 -0.36 L-homocysteine -1.23

NM_145610.2 Ppan -0.36 L-homoserine 1.35

NM_133763.1 Dnttip1 -0.30 lignoceric acid -1.12

13

4



Differentially Regulated Metabolites



copper)

linoleic acid -2.00

L-isoleucine 0.63

L-lactic acid 1.09

L-ornithine -0.59

L-serine 0.75

myristic acid -1.70

N-acetyl amino acid 0.99

N-acetyl-L-methionine -1.26

NAD+ 0.65

nervonic acid -0.88

N-formylmethionine 0.72

oleic acid -0.95

oleylamide 1.05

orotic acid 1.26

pentadecanoic acid -1.17

pelargonic acid -0.78

13

5






copper)

phosphate -0.80

phosphorylcholine 0.85

putrescine 1.54

pyridoxal -0.65

pyrophosphate -1.13

pyrrolidonecarboxylic acid 0.70

rac-1-stearoylglycerol -1.27

riboflavin 0.71

S-adenosylhomocysteine -0.60

Ser-Leu -1.38

sn-glycero-3-phosphocholine -1.55

sorbitol 1.49

spermidine 1.32

trans-4-hydroxy-L-proline 0.78

Tyr-Leu -1.39

spermine 3.19

13

6






copper)

trans-4-hydroxy-L-proline 0.78

Tyr-Leu -1.39

UDP 1.28

UDP-D-galactose -0.60

undecanoic acid -1.28

uracil -1.09

urea -1.47

UTP 2.16

vaccenic acid -1.34

13

7


20-L scale Day 6


RefSeq Gene names Average Day 6 log2 ratios




XM_003945892.1 LOC101056392 0.72 isoleucylglycine 0.74

NM_198294.2 Tanc1 0.60 γ-glutamyltryptophan 0.66

NM_001077265.1 Hnrnpd 0.62 γ-glutamylglycine 2.80

NM_001033294.3 Ddx31 0.33 1,2-dipalmitoylglycerol 1.02

NM_019880.3 Mtch1 0.39 (S)-2-hydroxystearic acid -0.63

NM_025558.5 Cyb5b 0.35

(S)-3-methyl-2-oxopentanoic

acid 0.83

NM_026411.1 1700021F05Rik -0.49

1-oleoyl


NM_173376.3 Rbmx2 -0.44 2-oleoylglycerol 0.82

NM_145070.3 Hip1r -0.35 2-tyrosine 1.32

NM_023060.3 Eefsec -0.52 4'-phosphopantetheine -0.59

NM_001033196.2 Znfx1 -0.36 5-aminovaleric acid 0.71

NM_009483.1 Kdm6a -0.59 5'-methylthioadenosine 0.60

NM_001081412.2 Bcr -0.80 7β-hydroxycholesterol 0.72

NM_172758.4 Slc38a7 -0.93 acetylcholine 1.08

NM_001029850.3 Magi1 -0.50 acetylleucine 0.73

13

8


20-L scale Day 6 (continued)






NM_021305.3 Sec61a2 -0.37 adenine 0.91

NM_011787.2 Amfr -0.53 adenosine -0.61

NM_025573.3 Sfrs9 -0.40 α-D-glucose 6-phosphate 0.66

NM_026422.2 Mrrf -0.39 α-ketoisocaproic acid 0.90

NM_028173.4 Tram1 -0.43 ascorbic acid 1.19

NM_010744.3 Tmed1 -0.33 cadaverine 1.52

NM_175294.3 Nucks1 -0.46 CDP ethanolamine -0.84

NM_029934.3 Mboat7 0.36 citric acid 0.81

NM_172911.3 D8Ertd82e 0.66 coenzyme A 0.65

NM_019921.2 Akap10 0.44 deoxyguanosine 0.62

NM_001136069.2 Ldha 0.42 deoxyuridine 0.69

NM_025403.3 Nop10 0.33 D-glyceric acid 0.69

NM_024166.6 Chchd2 0.41 D-sphingosine 3.04

NM_001162989.1 Phax 0.44 D-threitol 0.84

NM_011417.3 Smarca4 0.39 galactitol 0.63

NM_007513.4 Slc7a1 0.52 galactose 1.23

13

9








NM_011743.2 Zfp106 0.35 γ-butyroβine 0.82

NM_139153.2 Agap3 -0.30 γ-glutamylalanine 0.76

NM_011135.4 Cnot7 -0.33 γ-glutamylcysteine 0.65

NM_026169.4 Frmd8 -0.36 γ-glutamylglutamate 0.83

NM_001171582.1 Mars -0.32 γ-glutamylmethionine 0.68

NM_080793.5 Setd7 -0.42 glucose-1-phosphate 0.67

NM_009805.4 Cflar -0.34 glutathione 0.85

NM_009336.2 Vps72 -0.33 glycylglycine 0.75

NM_146251.4 Pnpla7 -0.30 glycylleucine 2.20

NM_001033528.1 Usp36 -0.34 guanosine 0.65

NM_011497.3 Aurka -0.31 inositol 1-phosphate 0.93

NM_001081218.1 Hcfc2 -0.37

L-α-lysophosphatidylcholine,

palmitoyl 1.05

NM_013790.2 Abcc5 -0.41

L-α-lysophosphatidylcholine,

stearoyl 1.24

lanosterol 0.66

L-cystine 1.25

14

0








L-homoserine 1.21

L-kynurenine 0.83

L-leucine 0.83

L-lysine 0.68

L-malic acid 0.70

L-xylonate 1.41

N-acetyl-β-alanine 0.67

N-acetyl-D-glucosamine 6-

phosphate 1.27

N-acetylneuraminic acid 0.60

NAD+ 0.84

N-glycolylneuraminic acid 0.65

oleylamide -1.05

rac-1-stearoylglycerol 0.61

L-Arabitol 1.92

ribitol 0.65

ribose 1.03

14

1






sedoheptulose 7-phosphate 1.39

S-methyl-L-cysteine 0.67

sn-glycerol-3-phosphate 0.59

thymidine 0.96

thymine 0.87

UDP-D-galactose 0.82

xanthosine monophosphate 0.75

20-L scale Day 10






NM_052976.3 Ophn1 0.38 1,2-dipalmitoylglycerol 0.69

NM_008943.2 Psen1 0.52 2-hydroxy-3-methylvalerate 0.79

NM_011625.1 Ppp1r13b 0.41 γ-glutamylglycine 2.67

14

2








NM_145541.4 Rap1a 0.37 N-acetylisoleucine 0.65

NM_008234.3 Hells 0.44 γ-glutamyltryptophan 0.70

NM_007483.2 Rhob -0.38

(S)-3-methyl-2-

oxopentanoic acid 1.21

NM_001161816.1 Gm15455 -0.37

1-oleoyl


NM_008471.2 Krt19

-0.49

1-stearoyl-2-hydroxy-sn-

glycero-3-

phosphoethanolamine

0.64

NM_025542.2 2410001C21Rik -0.56 2-hydroxyisovaleric acid 0.85

NM_023172.3 Ndufb9 -0.45 2-tyrosine 1.05

NM_019760.3 Serinc1 -0.46 4'-phosphopantetheine -0.70

NM_008788.2 Pcolce -0.38 5-aminovaleric acid 0.80

NM_010233.1 Fn1 -0.50 acetylcholine 1.18

NM_025461.5 Cox16 0.53 acetylleucine 0.85

NM_144522.5 Tbc1d10b 0.32 adenine 1.10

NM_019921.2 Akap10 0.38 alanylglycine 1.85

NM_145596.3 Gatad2a 0.38 α-D-glucose 6-phosphate 1.73

14

3








NM_023128.4 Palm 0.31 α-hydroxyisocaproic acid 1.09

NM_175150.3 Txndc15 0.34 α-ketoisocaproic acid 1.20

NM_139146.2 Satb2 0.31 ascorbic acid 1.72

NM_028364.2 Puf60 0.36 cadaverine 1.63

NM_172468.2 Snx30 -0.32 CDP ethanolamine -0.62

NM_025698.1 Tmed7 -0.34 citric acid 0.68

NM_025666.2 Ubr7 -0.31 coenzyme A 0.63

NM_177878.2 Mblac1 -0.41 desmosterol 0.63

NM_177717.4 4732456N10Rik -0.30 D-glyceric acid 0.89

NM_007833.4 Dcn -0.30 D-sphingosine 1.90

NM_183270.2 Chchd8 -0.40 erythronic acid 0.61

NM_011580.3 Thbs1 -0.33 galactitol 0.95

NM_020579.2 B4galt3 -0.32 γ-glutamylalanine 1.22

NM_027772.2 Pdss2 -0.30 γ-glutamylcysteine 1.62

NM_027410.1 Tecpr1 -0.31 γ-glutamylglutamate 1.24

NM_020587.2 Sfrs4 -0.33 γ-glutamylmethionine 0.66

14

4








NM_028173.4 Tram1 -0.35 glucose-1-phosphate 0.77

glutathione 1.65

glycylleucine 1.20

inositol 1-phosphate 1.22

L-α-

lysophosphatidylcholine,

stearoyl

2.21

lanosterol 0.85

L-asparagine 0.71

lathosterol 0.85

L-cystine 0.94

L-glutamine 0.76

L-homoserine 1.34

L-lactic acid 0.66

L-lysine 0.69

L-ornithine 0.63

L-xylonate 0.71

14

5








N-acetyl-α-D-galactosamine 0.83

N-acetyl-D-glucosamine 6-

phosphate 1.13

NAD+ 0.65

NADH 0.64

N-glycolylneuraminic acid 0.85

palmitoylethanolamide -0.99

phenylalanylglycine 1.10

ribitol 0.89

sedoheptulose 7-phosphate 1.25

S-methyl-L-cysteine 1.06

sorbitol 1.40

stearic acid amide 3.04

thymine 1.06

UDP-D-galactose 1.11

14

6

147

2.8 Reference


2. Jayapal KR, Wlaschin KF, Hu WS, & Yap MGS (2007) Recombinant protein therapeutics

from CHO cells - 20 years and counting. Chem. Eng. Prog. 103(10):40-47.

3. Kim JY, Kim YG, & Lee GM (2012) CHO cells in biotechnology for production of

recombinant proteins: current state and further potential. Appl. Microbiol. Biotechnol.

93(3):917-930.

4. Lewis AM, Abu-Absi NR, Borys MC, & Li ZJ (2016) The use of 'Omics technology to

rationally improve industrial mammalian cell line performance. Biotechnol. Bioeng.

113(1):26-38.

5. Farrell A, McLoughlin N, Milne JJ, Marison IW, & Bones J (2014) Application of multi-

omics techniques for bioprocess design and optimization in Chinese hamster ovary cells.

J. Proteome Res. 13(7):3144-3159.

6. Kildegaard HF, Baycin-Hizal D, Lewis NE, & Betenbaugh MJ (2013) The emerging CHO

systems biology era: harnessing the 'omics revolution for biotechnology. Curr. Opin.

Biotechnol. 24(6):1102-1107.

7. Birch JR & Racher AJ (2006) Antibody production. Adv. Drug Delivery Rev.Adv. Drug

Delivery Rev. 58(5-6):671-685.

8. Yang Z, et al. (2015) Engineered CHO cells for production of diverse, homogeneous

glycoproteins. Nat. Biotechnol. 33(8):842-844.

148

9. Mallick P & Kuster B (2010) Proteomics: a pragmatic perspective. Nat. Biotechnol.

28(7):695-709.

10. Tai M, Ly A, Leung I, & Nayar G (2015) Efficient high-throughput biological process

characterization: Definitive screening design with the Ambr250 bioreactor system.

Biotechnol. Prog. 31(5):1388-1395.

11. Xing ZZ, Kenty BN, Li ZJ, & Lee SS (2009) Scale-up analysis for a CHO cell culture

process in large-scale bioreactors. Biotechnol. Bioeng. 103(4):733-746.

12. Aranibar N, et al. (2011) NMR-based metabolomics of mammalian cell and tissue cultures.

J. Biomol. NMR 49(3-4):195-206.

13. Kim BE, Nevitt T, & Thiele DJ (2008) Mechanisms for copper acquisition, distribution

and regulation. Nat. Chem. Biol. 4(3):176-185.

14. Grubman A & White AR (2014) Copper as a key regulator of cell signalling pathways.

Expert Rev. Mol. Med. 16:e11.

15. Qian YM, et al. (2011) Cell culture and gene transcription effects of copper sulfate on

Chinese Hamster Ovary cells. Biotechnol. Prog. 27(4):1190-1194.

16. Luo J, et al. (2012) Comparative metabolite analysis to understand lactate metabolism shift

in Chinese hamster ovary cell culture process. Biotechnol. Bioeng. 109(1):146-156.

17. Yuk IH, et al. (2014) Effects of copper on CHO cells: Insights from gene expression

analyses. Biotechnol. Prog. 30(2):429-442.

18. Kang S, et al. (2014) Proteomics analysis of altered cellular metabolism induced by

insufficient copper level. J. Biotechnol. 189:15-26.

19. Yuk IH, et al. (2015) Effects of Copper on CHO Cells: Cellular Requirements and Product

Quality Considerations. Biotechnol. Prog. 31(1):226-238.

149

20. Nargund S, Qiu JS, & Goudar CT (2015) Elucidating the role of copper in CHO cell energy

metabolism using C-13 metabolic flux analysis. Biotechnol. Prog. 31(5):1179-1186.

21. Lawton KA, et al. (2008) Analysis of the adult human plasma metabolome.

Pharmacogenomics 9(4):383-397.

22. Schaub J, et al. (2010) CHO gene expression profiling in biopharmaceutical process

analysis and design. Biotechnol. Bioeng. 105(2):431-438.



24. Wang Q, et al. (2013) Ecological patterns of nifH genes in four terrestrial climatic zones

explored with targeted metagenomics using FrameBot, a new informatics tool. mBio

4(5):e00592-00513.

25. Morgulis A, Gertz EM, Schaffer AA, & Agarwala R (2006) WindowMasker: window-

based masker for sequenced genomes. Bioinformatics 22(2):134-141.

26. Slater GS & Birney E (2005) Automated generation of heuristics for biological sequence

comparison. BMC Bioinf. 6:31.

27. Dorfer V, et al. (2014) MS Amanda, a universal identification algorithm optimized for high

accuracy tandem mass spectra. J. Proteome Res. 13(8):3679-3684.

28. Taverner T, et al. (2012) DanteR: an extensible R-based tool for quantitative analysis of -

omics data. Bioinformatics 28(18):2404-2406.

29. Onsongo G, et al. (2010) LTQ-iQuant: A freely available software pipeline for automated

and accurate protein quantification of isobaric tagged peptide data from LTQ instruments.

Proteomics 10(19):3533-3538.

150

30. Li F, Vijayasankaran N, Shen A, Kiss R, & Amanullah A (2010) Cell culture processes for

monoclonal antibody production. mAbs 2(5):466-479.

31. Kramer A, Green J, Pollard J, & Tugendreich S (2014) Causal analysis approaches in

Ingenuity Pathway Analysis. Bioinformatics 30(4):523-530.

32. Finkel T & Holbrook NJ (2000) Oxidants, oxidative stress and the biology of ageing.

Nature 408(6809):239-247.

33. Valko M, et al. (2007) Free radicals and antioxidants in normal physiological functions

and human disease. Int. J. Biochem. Cell Biol. 39(1):44-84.

34. D'Autreaux B & Toledano MB (2007) ROS as signalling molecules: mechanisms that

generate specificity in ROS homeostasis. Nat. Rev. Mol. Cell Biol. 8(10):813-824.

35. Lu XL, et al. (2011) Cholesterol induces pancreatic beta cell apoptosis through oxidative

stress pathway. Cell Stress Chaperones 16(5):539-548.

36. Subramanian S, et al. (2011) Dietary cholesterol exacerbates hepatic steatosis and

inflammation in obese LDL receptor-deficient mice. J. Lipid Res. 52(9):1626-1635.

37. Hatanaka E, et al. (2013) Oleic, linoleic and linolenic acids increase ROS production by

fibroblasts via NADPH oxidase activation. PLoS One 8(4):e58626.

38. Shirakawa J, et al. (2011) Protective effects of dipeptidyl peptidase-4 (DPP-4) inhibitor

against increased beta cell apoptosis induced by dietary sucrose and linoleic acid in mice

with diabetes. J. Biol. Chem. 286(29):25467-25476.

39. Wrede CE, Dickson LM, Lingohr MK, Briaud I, & Rhodes CJ (2002) Protein kinase B/Akt

prevents fatty acid-induced apoptosis in pancreatic beta-cells (INS-1). J. Biol. Chem.

277(51):49676-49684.

151

40. Shirakawa J, et al. (2011) Protective effects of dipeptidyl peptidase-4 (DPP-4) inhibitor

against increased beta-cell apoptosis induced by dietary sucrose and linoleic acid in mice

with diabetes. J. Biol. Chem. 286(29):25467-25476.

41. Corte CLD, Bastos LL, Dobrachinski F, Rocha JBT, & Soares FAA (2012) The

combination of organoselenium compounds and guanosine prevents glutamate-induced

oxidative stress in different regions of rat brains. Brain Res. 1430:101-111.

42. Schwartz LB, Carcangiu ML, Bradham L, & Schwartz PE (1991) Rapidly progressive

squamous-cell carcinoma of the cervix coexisting with human-immunodeficiency-virus

infection-clinical opinion. Gynecol. Oncol. 41(3):255-258.

43. Cullinan SB & Diehl JA (2004) PERK-dependent activation of Nrf2 contributes to redox

homeostasis and cell survival following endoplasmic reticulum stress. J. Biol. Chem.

279(19):20108-20117.

44. Beal MF (1995) Aging, energy, and oxidative stress in neurodegenerative diseases. Ann.

Neurol. 38(3):357-366.

45. Elmore S (2007) Apoptosis: A review of programmed cell death. Toxicol. Pathol.

35(4):495-516.

46. Gross A, McDonnell JM, & Korsmeyer SJ (1999) BCL-2 family members and the

mitochondria in apoptosis. Genes Dev. 13(15):1899-1911.

47. Fluharty AL, Stevens RL, Miller RT, Shapiro SS, & Kihara H (1976) Ascorbic acid 2-

sulfate sulfhohydrolase activity of human arylsulfatase-A. Biochim. Biophys. Acta

429(2):508-516.

48. Clanton TL (2007) Hypoxia-induced reactive oxygen species formation in skeletal muscle.

J. Appl. Physiol. 102(6):2379-2388.

152

49. Lavie L & Lavie P (2009) Molecular mechanisms of cardiovascular disease in OSAHS:

the oxidative stress link. Eur. Resp. J. 33(6):1467-1484.

50. Prabhakar NR, Kumar GK, Nanduri J, & Semenza GL (2007) ROS signaling in systemic

and cellular responses to chronic intermittent hypoxia. Antioxid. Redox Signal. 9(9):1397-

1403.

51. Makarenko VV, et al. (2014) Intermittent hypoxia-induced endothelial barrier dysfunction

requires ROS-dependent MAP kinase activation. Am. J. Physiol.-Cell Physiol.

306(8):C745-C752.

52. Majmundar AJ, Wong WHJ, & Simon MC (2010) Hypoxia-inducible factors and the

response to hypoxic stress. Mol. Cell. 40(2):294-309.

53. Lokmic Z, Musyoka J, Hewitson TD, & Darby IA (2012) Hypoxia and hypoxia signaling

in tissue repair and fibrosis. International Review of Cell and Molecular Biology, Vol 296,

International Review of Cell and Molecular Biology, ed Jeon KW (Elsevier Academic

Press Inc, San Diego), Vol 296, pp 139-185.

54. Qian Y, et al. (2014) Hypoxia influences protein transport and epigenetic repression of

CHO cell cultures in shake flasks. Biotechnol. J. 9(11):1413-1424.

55. Usatyuk PV & Natarajan V (2005) Regulation of reactive oxygen species-induced

endothelial cell-cell and cell-matrix contacts by focal adhesion kinase and adherens

junction proteins. Am. J. Physiol.-Lung Cell. Mol. Physiol. 289(6):L999-L1010.

56. Hoogeboom D & Burgering BMT (2009) Should I stay or should I go: beta-catenin decides

under stress. Biochim. Biophys. Acta-Rev. Cancer 1796(2):63-74.

57. Prieve MG & Moon RT (2003) Stromelysin-1 and mesothelin are differentially regulated

by Wnt-5a and Wnt-1 in C57mg mouse mammary epithelial cells. BMC Dev. Biol. 3:2.

153

58. Wielenga VJM, et al. (1999) Expression of CD44 in Apc and Tcf mutant mice implies

regulation by the WNT pathway. Am. J. Pathol. 154(2):515-523.

59. Davidson G & Niehrs C (2010) Emerging links between CDK cell cycle regulators and

Wnt signaling. Trends Cell Biol. 20(8):453-460.

60. Niehrs C & Acebron SP (2012) Mitotic and mitogenic Wnt signalling. EMBO J.

31(12):2705-2713.

61. Jain E & Kumar A (2008) Upstream processes in antibody production: Evaluation of

critical parameters. Biotechnol. Adv. 26(1):46-72.

62. Schnepp RW, et al. (2004) Menin induces apoptosis in murine embryonic fibroblasts. J.

Biol. Chem. 279(11):10685-10691.

63. Schackmann RCJ, et al. (2013) Loss of p120-catenin induces metastatic progression of

breast cancer by inducing anoikis resistance and augmenting growth factor receptor

signaling. Cancer Res. 73(15):4937-4949.

64. Goodwin M & Yap AS (2004) Classical cadherin adhesion molecules: coordinating cell

adhesion, signaling and the cytoskeleton. J. Mol. Histol. 35(8-9):839-844.

65. Cash TP, Gruber JJ, Hartman TR, Henske EP, & Simon MC (2011) Loss of the Birt-Hogg-

Dube tumor suppressor results in apoptotic resistance due to aberrant TGF beta-mediated

transcription. Oncogene 30(22):2534-2546.

66. Zheng YJ, et al. (2009) Angiomotin-Like Protein 1 Controls Endothelial Polarity and

Junction Stability During Sprouting Angiogenesis. Circ. Res. 105(3):260-270.

67. Hitomi JI, et al. (2008) Identification of a molecular signaling network that regulates a

cellular necrotic cell death pathway. Cell 135(7):1311-1323.

154

68. Leh H, et al. (1996) Cloning and expression of a novel type (III) of human gamma-

glutamyltransferase truncated mRNA. FEBS Lett. 394(3):258-262.

69. Wu GY, Fang YZ, Yang S, Lupton JR, & Turner ND (2004) Glutathione metabolism and

its implications for health. J. Nutr. 134(3):489-492.

70. Napoli C, et al. (2000) Mildly oxidized low density lipoprotein activates multiple apoptotic

signaling pathways in human coronary cells. FASEB J. 14(13):1996-2007.

71. Soltani N, et al. (2011) GABA exerts protective and regenerative effects on islet beta cells

and reverses diabetes. Proc. Natl. Acad. Sci. U. S. A. 108(28):11692-11697.

72. Rhee HJ, Kim EJ, & Lee JK (2007) Physiological polyamines: simple primordial stress

molecules. J. Cell Mol. Med. 11(4):685-703.

73. Zhou XM, Burg MB, & Ferraris JD (2012) Water restriction increases renal inner

medullary manganese superoxide dismutase (MnSOD). Am. J. Physiol. Renal Physiol.

303(5):F674-F680.

74. Abreu IA & Cabelli DE (2010) Superoxide dismutases-a review of the metal-associated

mechanistic variations. Biochim. Biophys. Acta,- Proteins Proteomics 1804(2):263-274.

75. Perez-Matute P, Zulet MA, & Martinez JA (2009) Reactive species and diabetes:

counteracting oxidative stress to improve health. Curr. Opin. Pharmacol. 9(6):771-779.

76. Raman M, Chen W, & Cobb MH (2007) Differential regulation and properties of MAPKs.

Oncogene 26(22):3100-3112.

77. Son Y, et al. (2011) Mitogen-activated protein kinases and reactive oxygen species: How

can ROS activate MAPK pathways? J. Signal Transduction 2011:792639.

155

78. Velpula KK, et al. (2012) Glioma stem cell invasion through regulation of the

interconnected ERK, integrin alpha 6 and N-cadherin signaling pathway. Cell. Signal.

24(11):2076-2084.

79. Weston CR & Davis RJ (2007) The JNK signal transduction pathway. Curr. Opin. Cell

Biol. 19(2):142-149.

80. Kim BJ, Ryu SW, & Song BJ (2006) JNK- and p38 kinase-mediated phosphorylation of

Bax leads to its activation and mitochondrial translocation and to apoptosis of human

hepatoma HepG2 cells. J. Biol. Chem. 281(30):21256-21265.

81. Oh SKW, Vig P, Chua F, Teo WK, & Yap MGS (1993) Substantial overproduction of

antibodies by applying osmotic-pressure and sodium butyrate. Biotechnol. Bioeng.

42(5):601-610.

82. Ortmann B, Druker J, & Rocha S (2014) Cell cycle progression in response to oxygen

levels. Cell. Mol. Life Sci. 71(18):3569-3582.

156

Chapter 3: Identification and Quantitation of Host Cell

Proteins in Therapeutic Product

Two posters based on this chapter were presented at the 63rd conference of American Society for

Mass Spectrometry (ASMS) in June 2015 (abstract ID: 620) and the 64th ASMS conference in

June 2016 (abstract ID: 283778), respectively. A manuscript based on this chapter is in preparation.

Yuanwei Gao1, Simion Kreimer1, Somak Ray1, Alexander R. Ivanov1, Mi Jin2, Zhijun Tan2,

Nesredin Mussa2, Li Tao2, Zhengjian Li2, Barry L. Karger1

1Barnett Institute and Department of Chemistry and Chemical Biology, Northeastern University,

Boston, MA, 02115

2Biologics Development, Global Manufacturing and Supply, Bristol-Myers Squibb, 38 Jackson

Road, Devens, MA 01434

I thank Simion Kreimer for strong collaboration and script construction, Somak Ray for script

writing, Dr. Alexander Ivanov for discussion, and Dr. Barry Karger for conceptual design and idea

contribution. I also want to thank the scientists at Bristol-Myers Squibb for their collaboration and

sample donation, especially, Dr. Mi Jin for initiating the project and Dr. Nesredin Mussa for

discussions.

157

3.1 Preface and Abstract

Host cell proteins (HCPs) are a major class of process related-impurities in

biopharmaceutical products. HCP analysis is critical to ensure drug quality, and HCP clearance is

an important indicator of bioprocess robustness. HCP detection requires the analysis of multiple

species over a wide dynamic concentration range relative to the high therapeutic protein product

background at high throughput and reasonable cost. The conventional method, ELISA, however,

cannot satisfy all circumstances for HCP detection and monitoring. Liquid chromatography-mass

spectrometry (LC-MS)-based approaches have been shown to be powerful for HCP analysis,

emerging as the most promising method to complement ELISA.

In this work, a therapeutic monoclonal antibody drug sample was provided by Bristol-

Mayer Squibb (Devens, MA) at three purification stages, Protein A chromatography (PA), cation

exchange chromatography (CEX), and ultrafiltration/diafiltration (UF/DF), processes widely

employed in the downstream purification of monoclonal antibody drugs. The goal of this work is

to develop a method for HCP identification and quantitation, which can not only detect and

quantify HCPs at single digit ppm level in the drug product, to be used at any stage of the

purification to support downstream processing design.

In the present study, preliminary information on the HCP population and distribution in

teach sample of purification was investigated with two dimensional-liquid chromatography-mass

spectrometry (2D-LC-MS) in the data dependent acquisition (DDA) mode. Several HCPs

identified in the post-UF/DF sample were quantified using 1D-LC-MS with parallel reaction

monitoring (PRM) with isotopically labeled peptides as internal standards, demonstrating that 1D-

LC-MS-PRM was capable of detecting and quantifying HCPs at the low ppm level. The important

158

properties of the sample for HCP analysis were obtained, including sample complexity, dynamic

range, potential difficulties as well as the limitation of the LC-MS-DDA method employed.

Based on these results, in collaboration with Simion Kreimer, Ph.D. candidate in our lab,

a novel DIA-to-PRM workflow with high sensitivity and selectivity was designed for HCP

identification and quantitation. The method was demonstrated to detect HCPs at low ppm levels

with reasonably rapid throughput. The detailed discussion of the DIA-to-PRM workflow can be

found in Simion Kreimer’s thesis. In the current chapter, a spectral assay library was generated

with the 2D-LC-MS/MS analysis of the mAb from the Protein A stage of purification for the

targeted DIA data analysis. The overall workflow is summarized.

159

3.2 Introduction

Biopharmaceuticals are generally synthesized in non-human host cells and require

purification from other components derived from the expression system. Host cell proteins (HCPs)

are a significant class of process-related impurities that are inevitably co-purified with the

biopharmaceutical product despite multiple steps of downstream purification. Additional

background of downstream process and current HCP detection methods can be found in Chapter

1 Page 36-40 and Page 44-47. This section contains extended details of the HCP analytical

challenges and current advances of mass spectrometry-based methods for HCP analysis. We focus

on mAb and related proteins expressed in the CHO expression system as this is the focus of our

research.

For biopharmaceutical production, HCP detection and monitoring in downstream

purification processes are of great importance for two reasons. First, the presence of HCPs, i.e.

impurities, in the final drug product is a critical product quality concern. Besides safety concerns

of potential immunological response (1), some residual HCP species can be proteases (2, 3), which

might generate degraded product that could be inactive or even harmful, or influence drug storage

stability (2-4). Second, the ability of efficient and consistent HCP clearance is a benchmark of the

manufacturing capability and robustness, which is also a part of process validation (5). The risk

assessment of the HCP present in the final product is important. It is necessary for the

biopharmaceutical industry to take action to mitigate the actual or potential safety issues by

optimized downstream purification. Moreover, from a regulatory perspective, the function of each

purification process is also expected to be well described and understood, requiring information

on which impurities remain or are removed during certain purification steps (4). To obtain such

160

understanding of the manufacturing process, HCP detection and monitoring is critical, and the

ideal method should apply at any stage of the manufacturing process.

Multi-analyte enzyme-linked immunosorbent assay (ELISA), which has high throughput,

high sensitivity and selectivity, is the current “gold standard” for HCP detection (1, 5, 6). It can

detect multi-analytes and provide the total HCP quantitation (1 ppm-100 ppm) (7), however,

without individual HCP information. Two dimensional gel electrophoresis (2D-DE) or 2D-DE in

combination with western blotting is usually employed to offer the complementary information

such as HCP distribution and, particularly, individual HCP properties including molecular weight

and isoelectric point (pI). These tools, especially ELISA, have been used to provide valuable

residual HCP information in drug product for decades.

However, these conventional methods cannot fit all circumstances for HCP detection and

monitoring. 2D-DE has low sensitivity, and some HCPs can be masked by the overloaded drug

product. The HCP detection and quantitation of immunodetection-based methods such as ELISA

and western blotting rely on the quantity and affinity of the pool of anti-HCP antibodies.

There are several disadvantages hampering the HCP detection by ELISA. First, not all HCP

species can be detected with high sensitivity. Non- and low immunoreactive HCPs from the animal

(e.g. rabbit) used to generate the antibodies can be underestimated, resulting from low abundance

or low affinity anti-HCP antibodies for such HCP species. Considering that fact that the immune

response between humans and animals can be different, this underestimation of certain species

could lead to a potential safety issue. Second, some HCP species, which are immunoreactive and

can be recognized by ELISA, may not be detected with sufficient accuracy and hence the overall

HCP quantitation by a given ELISA assay could be underestimating the HCP level. Also, dilution

dependent non-linearity of HCP ELISA is often observed especially with samples at a late

161

purification stage (5). In this case, one or several HCPs can saturate the corresponding polyclonal

antibodies within the whole anti-HCP antibody pool, leading to higher observed HCP

concentration in the original sample with a higher sample dilution factor (5). It has been reported

that phospholipase B-like 2 is one of such HCP resulting in ELISA non-linearity (8). In principle,

if the sample is diluted sufficiently for an ELISA test, these HCPs can reach a sufficiently low

level that would not saturate the antibodies. However, usually the detection limit of the ELISA

assay is reached first with the sample dilution before the plateau of non-linearity is reached. In this

case, a process specific or analyte-specific ELISA assay needs to be developed to eliminate such

an effect.

Third, the complex anti-HCP antibody pool of the multi-analyte ELISA could cross-react

with the therapeutic drug product (1). Such cross-reactivity would impact the HCP ELISA

quantitation especially at late purification stages. Fourth, ELISA is not that interchangeable. A

given HCP ELISA only responds to the product synthesized by the cell line used to develop the

assay and may show unreliable HCP quantitation to another bioprocess. For example, during

clinical development, the HCP level of somatotropin was detected as 20 ppm with a commercially

available general ELISA kit, but tested as actually 1400 ppm HCP with a process-specific ELISA

assay (9). ELISA could thus fail to demonstrate the HCP changes resulting from bioprocess

development. However, it is time consuming (12-18 months) to develop a validated ELISA with

high sensitivity and selectivity (1). Consequently, the general commercial ELISA is often used for

drug candidates at early stage of clinical experiments, and a process-specific ELISA is only

developed for the most promising molecules and bioprocesses (9).

It is important to note that ELISA cannot provide individual HCP information. According

to previous HCP studies for monoclonal antibodies and Fc-fusion proteins, although certain HCPs

162

such as clusterin and heat shock protein have been shown to be associated with many of the drug

products (10-12), there are some HCPs which are drug product-specific (11, 12). It has been

reported that the difference of only two residues near the complementarity determining regions

(CDRs) of the mAb yielded significant changes in the HCP profile (11). As a result, it is difficult

to predict HCP distribution for a new drug product, even for closely related molecules. This fact

demonstrates the necessity to develop a process-related ELISA for each new drug product. Thus,

although ELISA has shown efficiency of HCP detection, the use of orthogonal methods,

particularly ones that can provide rapid HCP distribution independent of immunoreactivity are

desired.

Mass spectrometry (MS)-based technology has emerged as the most promising orthogonal

method to ELISA for HCP detection (4, 5, 9, 11). MS-based methods are able to identify and

quantify a large number of protein species with relatively high throughput. The proteomic

approach of the combination of in-gel protein digestion and mass spectrometry with data

dependent acquisition has been applied to HCP identification (13-15). However, this method still

cannot reach the required high dynamic range (>105), hampering the detection of HCPs at the 10

ppm level or lower in the background of the bulk biopharmaceutical. The top-down approach,

surface-enhanced laser desorption/ionization time-of-flight mass spectrometry (SELDI-TOF), has

been reported as a screening tool for HCP detection (6, 13, 16, 17). The advantages of SELDI-

TOF include simple and time efficient sample preparation and quantitative estimation of individual

HCPs. The method can be used to track individual HCPs for optimizing a given purification step

and/or during downstream purification in a high throughput manner (17, 18). However, the

sensitivity of the method decreases rapidly for proteins larger than 30 kDa, and it is difficult to

obtain identification without assistance of other approaches (17).

163

Liquid chromatography coupled with MS (LC/MS) has also been reported for HCP

detection and is becoming the most promising approach (19-22). HPLC coupled with matrix-

assisted laser desorption/ionization time-of-flight (MALDI-TOF/TOF) off-line was reported for

HCP analysis for a null cell line supernatant (23) and HCP recovery procedure for sample

preparation optimization (24). Although this method is not involved in HCP detection with the

drug product present, these studies provides helpful insights into sample preparation for the HCP

analysis. Doneau et al. reported use of 2D-LC/MS in a data independent MSE mode with a Q-TOF-

MS for HCP identification and quantitative estimation, and that multiple reaction monitoring

(MRM) was used for accurate quantitation with isotopically labeled peptides (19). With ion

mobility coupled to the Q-TOF to improve species separation, this 2D-LC/MSE technique has been

reported to identify and quantify HCPs at the 1 ppm level (25). 2D-LC/MSE provides both

identification and quantitative information of HCPs with high sensitivity and selectivity. However,

this 2D-LC separation approach is time consuming, requiring more than 12 hours per run, which

may not be practical for large number of samples and/or rapid screening purposes.

The development of qualitative and quantitative protein analysis based on LC-MS is still

an effective platform for HCP analysis. In the present study, two-dimensional high pH/low pH

reversed phase (RP/RP) liquid chromatography coupled to tandem mass spectrometry (2D-LC-

MS/MS) with data dependent acquisition (DDA) was initially employed to investigate the residual

HCPs in the mAb sample during three purification steps, Protein A chromatography (PA), cation

exchange chromatography (CEX), and ultrafiltration/diafiltration (UF/DF) (buffer exchange),

initially building up the understanding of the basics of the HCP distribution in terms of downstream

impurity removal and providing insight into the development of new HCP detection strategy. Four

HCPs, clusterin, putative phospholipase B, 78 kDa glucose-regulated protein precursor, and

164

protein disulfide-isomerase, were quantified in the UF/DF sample using 1D-LC-MS with parallel

reaction monitoring (PRM) using isotopically labeled peptides as internal standards,

demonstrating that 1D-LC-MS was able to detect and quantify HCPs at low ppm level with PRM.

This study not only provides a preliminary understanding of the HCP sample analysis, but also

supports downstream processing by comprehensively tracking HCP profiles across different

purification steps.

Based on the initial information obtained from the 2D-LC-MS-DDA analysis, in

collaboration with Simion Kreimer, Ph.D. candidate in our lab, a novel DIA-to-PRM HCP analysis

workflow of HCP analysis was developed, with high dynamic range for the relatively low

complexity sample. This workflow was demonstrated to detect HCPs at low ppm levels in a

purified therapeutic mAb expressed by CHO cell system, providing comprehensive HCP profiles

of a certain drug product. A high resolution Orbitrap mass spectrometer was employed in this

method. As described in this chapter, a spectral assay library was constructed with the 2D-LC-

MS/MS analysis of the mAb after early stage PA purification. A targeted spectral library search

based on OpenSWATH and untargeted database search relying on DIA-Umpire were combined to

promote DIA data interpretation followed by PRM verification. This workflow was developed by

Simion Kreimer and is detailed in his thesis. This methodology can provide identification of HCPs

at the sub-ppm level with a high sensitivity and specificity, and estimation of individual HCPs at

the low ppm level. This novel workflow can be used as a general method for HCP detection and

monitoring in the biopharmaceutical industry.

165

3.3 Materials and Methods

3.3.1 Chemicals and reagents

The therapeutic product, a monoclonal antibody (mAb), was provided by Bristol-Myers

Squibb (Devens, MA). This product was purified by Protein A (PA) chromatography, cation

exchange chromatography (CEX), and ultrafiltration/diafiltration (UF/DF), and the samples were

obtained after each purification step. Triethylammonium bicarbonate buffer (TEAB) (1.0 M, pH

8.0), dithiothreitol (DTT), iodoacetamide (IAM), urea, MS RT calibration mix, LC-MS grade

ammonium hydroxide solution (≥ 25% in H2O), and LC-MS grade formic acid were obtained from

Sigma- Aldrich (St. Louis, MO). LC-MS grade water, LC-MS grade acetonitrile, PierceTM peptide

retention time calibration mixture, and the bicinchoninic acid (BCA) protein assay kit was from

Thermo Fisher Scientific (Rockland, IL). Sequencing-grade modified trypsin was purchased from

Promega (Madison, WI). The mass spectrometry grade lysyl endopeptidase (Lys-C) was purchased

from Wako (Richmond, VA). Stable isotopically labeled and non-labeled peptide standards from

SpikeTides tumor associated antigens (TAA) set, as well as custom isotopically labeled peptide

standards of target HCPs for quantitation were obtained from JPT Peptide Technologies GmbH

(Berlin, Germany).

3.3.2 Sample preparation

The protein concentration of the mAb samples were determined by the BCA protein assay.

Approximately 300 μg protein was denatured by 10 M urea in 100 mM TEAB (pH 8.0). The

sample was reduced with 10 mM DTT and 10 M urea in 100 mM TEAB (pH 8.0) at 37 °C for 1

166

hour and then alkylated with 10 mM IAM and 10 M urea in 100 mM TEAB (pH 8.0) in the dark

at room temperature for 45 minutes. Cold acetone pre-chilled at -20 °C was added at six fold of

the solution volume, and the mixture was incubated at -20 ºC overnight to precipitate the protein.

Then the mixture was centrifuged at 14,000 g for 15 minutes, and the supernatant was discarded.

The digestion buffer was 25 mM TEAB (pH 8.0) in 90% water and 10% acetonitrile. A

volume of 100 μL digestion buffer was added to each sample to redissolve the precipitated proteins.

Lys-C and trypsin stock solution were prepared individually with the digestion buffer. The

digestion of Lys-C was performed at 37 ºC for 6 hours with an enzyme to protein ratio of 1:100

(w/w). Next, the trypsin was added to the mixture with an enzyme to protein ratio of 1:50 (w/w).

The digestion was conducted at 40 ºC overnight for about 18 hours. Then, the digestion mixture

was dried by speed vacuum.

For 2D-LC-MS/MS analysis of HCP identification and quantitative estimation, two

replicates were performed for the protein A sample with separate digestion, and three replicates

for CEX samples. For HCP screening in UF/DF samples with 2D-LC-MS/MS, two technical

replicates were tested with one digestion.

3.3.3 LC-MS/MS

The resultant digest mixture was separated and analyzed by 2D high pH/low pH reversed

phase (RP/RP) liquid chromatography coupled with high resolution/mass accuracy mass

spectrometer. Two different sets of experiments were performed: one was with the nanoflow-LC

for the second dimension separation, and the other was with the microflow-LC for the second

dimension separation.

167

For the nanoflow-LC as the second dimension, the first-dimension of the separation was

performed off-line. The platform consisted of an Agilent 1200 series (Santa Clara, CA) LC system

with diode array detector, a 300Extend_C18 column (3.5 µm beads, 2.1x150 mm), and a Gilson

FC 203B fraction collector (Gilson Inc., Middleton, WI). Mobile phase A was 20 mM ammonium

formate in water (pH 10), and mobile phase B was 20 mM ammonium formate (pH 10) in 90%

acetonitrile/10% water. The lyophilized digested sample was solubilized with 40 µL of mobile

phase A with vortexing and 10 seconds of sonication to maximize the recovery. After injection,

the column was flushed with mobile phase A at 0.2 mL/min for 10 minutes for desalting. A

gradient was then run at a flow rate of 200 µL/min (from 2% B to 100% B in 44 minutes, 100% B

down to 2% B and 2% B for 9 minutes). The fractions were collected in 2-minute intervals from

1 to 55 minutes across the gradient, for a total of 27 fractions. For the PA and CEX samples, based

on the UV absorption profile at 214 nm, several fractions were pooled to equalize protein levels

for a final fraction number of 20. Each fraction was lyophilized to dryness and stored at -80 ºC.

For the UF/DF sample, the fractions were pooled to five. The resultant five fractions were also

dried by speed vacuum and stored at -80 ºC. For the second dimension LC, each fraction was

reconstituted with 0.1% formic acid in water. For spectral library generation, the retention time

markers were added into each fraction before LC-MS analysis. Samples were analyzed on an

Ultimate 3000 chromatography system coupled to the Q Exactive mass spectrometer (Thermo

Fisher Scientific). The column was a home-packed IntegraFrit column (New Objective, Woburn,

MA), 25 cm x 75 μm, with 200 Å Magic C18 AQ particles (3 μm diameter) (Michrom Bioresources,

Auburn, CA). Mobile phase A was 0.1% formic acid in water, and mobile phase B was 0.1%

formic acid in acetonitrile. The flow rate was 200 nL/min, and the separation gradient was 2% B

to 32 % in 120 minutes, 32% B to 90% B in 20 minutes, 90% B for 3 minutes.

168

For the microflow-LC as the second dimension, the first-dimension of the separation was

also performed off-line with the same LC-UV system and mobile phases as described above. The

separation column, on the other hand, was an XBridge peptide BEH C18 column (3.5 µm beads,

300Å, 2.1x150 mm) (Waters, Milford, MA). After injection of 1.8 mg protein digest sample, the

desalting process was performed with mobile phase A at 0.20 mL/min for 60 minutes until the UV

absorption reached the baseline. A 60-minute gradient was then used for fractionation at a flow

rate of 200 μL/min (from 2% B to 42% B in 46 minutes, 42% B to 100% B in 2 minutes, 100% B

for 7 minutes, 100% B down to 2% B and 2% B for 5 minutes). A total number of 24 fractions

were collected with a 2.5-minute intervals from 1 to 60 minutes across the gradient, and the

fractions were then pooled to a final number of 10 fractions. These fractions were lyophilized to

dryness and stored at -80 ºC. For the second dimension LC separation, the retention time marker

mix was added into each fraction before the LC-MS analysis. Samples were analyzed on an

Ultimate 3000 chromatography system coupled to the Q Exactive Plus mass spectrometer (Thermo

Fisher Scientific). To generate the desirable flow rate, the flow selector of the nanoflow-LC

system was replaced by one which was capable of producing a flow rate 2.5 -50 μL/min. The same

mobile phases were used as described above, and the column was an ACQUITY UPLC M-class

peptide CSH C18 column (1.7 µm beads, 130Å, 0.3 x 150 mm) (Waters, Milford, MA). The flow

rate was 10 μL/min, and a 2-hour gradient was used (3% B for 10 mintes, 3% to 6% B within 11

minutes, 6% to 32% B in 110 minutes, then to 95% B for 10 minutes, 95% B for 9 minutes, and

down to 3% B for 10 minutes).

For 1D LC-MS/MS, after the enzymatic digestion, 2 μL of 1% formic acid in water were

added into the resultant mixture to end the digestion. Then the sample was lyophilized and stored

169

at -80 ºC. LC-MS/MS analysis conditions were the same as the second dimension LC separation

described above.

3.3.4 Mass spectrometry parameters

For the data dependent acquisition (DDA) mode, MS data were collected with a survey

single stage MS (MS1) scan followed by high collision dissociation (HCD) MS/MS scans of the

top 12 most intense precursor ions. For the full MS scans, the resolution was 70,000 (m/z = 200)

with a scan range of m/z 375 to 1600. The automatic gain control (AGC) target value was set at

1x 106 ions. The MS2 spectra were acquired with a resolution of 17,500 (m/z = 200). The isolation

window was m/z 2.0, and the AGC target value was 1x105. The normalized collision energy was

28.0. The maximum ion injection time was 100 milliseconds for both MS1 and MS2 scans. Target

ions that had been selected for MS/MS were dynamically excluded for 60 seconds. The intensity

threshold, which displayed the minimum intensity required to initiate a data dependent scan, was

1.0x104 ion counts. For accurate mass measurement, the lock mass option was enabled using the

polydimethylcyclosiloxane ion at m/z 455.12002 as an internal calibrant.

PRM was set with the pre-knowledge of the retention time for each target peptide. For the

LC-MS/MS using nanoflow-LC, PRM was used for peptide quantitation. To generate the target

peptide-specific spectral library, a peptide standard mixture without the mAb peptide background

was tested by DDA with the same method as the second dimension LC separation in the 2D-LC-

MS/MS. The spectral library for these target peptides were generated, and the retention time

information and the dominant charge states of each peptide standard was obtained manually. In

the PRM MS setting, MS1 full scan was performed first, and then the MS2 scans were obtained

170

based on the schedule of retention time window and m/z values in the inclusion list. Both of the

MS1 and MS2 scans were with a resolution 70,000 (m/z = 200), AGC target 3x106, and a maximum

injection time of 300 milliseconds. The MS1 full scan range was m/z 330 to 1500. For the MS2

scan, the isolation window was 2.0 m/z, and the normalized collision energy was 28.0.

For the LC-MS/MS using microflow-LC, PRM was used for both confirmation of the

putative identification of DIA analysis and quantitative estimation of the HCPs (for details see

Simion Kreimer’s thesis).

3.3.5 HCP identification and quantitative estimation through 2D LC-MS/MS

HCP identification and quantitative estimation was achieved through 2D LC-MS/MS with

using nanoflow-LC as second dimension separation. The raw MS data files obtained from the

DDA mode were processed in Proteome Discoverer 1.4 (PD 1.4) (Thermo Fisher Scientific). The

raw data was combined and searched against the CHO protein sequence database (26). Sequest

HT, Mascot (PD 1.4), and MS Amanda (27) were used. Cysteine carbamidomethylation was set

as fixed modification, and oxidation of methionine and deamidation of asparagine and glutamine

were set as dynamic modifications, allowing for two missed tryptic cleavages. Mass tolerance for

the precursor ions was set at 10 ppm, and that for fragment ions at 0.05 ppm. The peptide false

discovery rate (FDR) was 1%.

For PA and CEX samples, within each set of fractions, the peptide spectra match (PSM)

number of the unique peptide for HCPs was calculated, and normalized by the total PSMs of all

fractions. The average of the PSM number of each HCP was calculated between the replicates, and

this average was used as the PSM count for each HCP. For PA samples, proteins identified in both

171

of the replicates with at least 2 PSMs in the PA sample were considered as the identified HCPs.

For CEX samples, HCPs identified in two out of the three replicates with at least 2 PSMs were

considered as the identified HCPs. The HCP PSM counts were also used as an indicator of HCP

abundance.

For UF/DF samples, two technical replicates were tested for each fraction, and the sum of

the PSM of the unique peptide for HCPs from all MS runs was consider as the PSM count for the

given HCPs. The HCPs with at least 4 PSMs were manually checked, and the ones with validated

peptide matches were considered as identified HCPs in the UF/DF sample.

3.3.6 HCP quantitation through PRM with isotopically label peptides

The quantitation of HCP was based on the PRM experiments with nanoflow-LC-MS/MS.

The mixture of all target peptides were tested by nanoflow-LC-MS/MS with the DDA mode, and

the raw MS file was search against the protein sequence database containing all the corresponding

proteins. The target peptide-specific spectral library was then generated through Skyline for the

following PRM data analysis, which was also performed with Skyline. For PRM method

evaluation, 11 peptides were chosen from the TAA kit from TAA SpikeTides sets (Table 3-1), and

both of the isotopically labeled and non-labeled peptides were spiked into the digested UF/DF

sample. The ppm level was estimated with the assumption that the average HCP molecular weight

was 50 kDa. For HCP quantitation, the stable isotopically labeled peptides for the target HCPs

were spiked in the digested PA, CEX and UF/DF samples as internal standards. For HCP

quantitation through PRM, 5 to 7 product ions were used for each peptide, depending on the quality

of the transitions. The peak area ratio of each product ion of the light HCP peptide and heavy

172

internal standard peptide was calculated, and the average ratio of the several chosen product ions

was obtained to calculate the amount of the HCP peptide. When several peptides were quantified

for a certain HCP, the average of those peptides was taken to calculate the abundance of the

corresponding HCP.

3.3.7 Spectral assay generation

The PA sample were digested and analyzed by 2D-microflow-LC-MS-DDA with 10

fractions from the first dimension separation, and the resultant raw data were used to generate an

assay library which was used to assist the targeted analysis of DIA data interpretation. The resultant

raw DDA data were searched by Myrimatch (28) and MS-GF+ (29) with semi-tryptic searching.

Cysteine carbamidomethylation was set as fixed modification. Oxidation of methionine and

glutamine, and isotopically labeled lysine (+10 m/z) and arginine (+8 m/z) were set as dynamic

modifications. Two missed tryptic cleavage was allowed. Mass tolerance for the precursor ions

and the fragment ions was set at 6 ppm. The peptide false discovery rate (FDR) was 1%. The

validation of spectral matches was performed by PeptideShaker (30). Then the search results were

processed by an in-house script, which selected the highest scoring spectrum for each charge state

of the peptides identified and generates a PRM assay based on the at least 3 and maximum 8

highest intensity identified by and/or y ion transitions and retention time normalized based on a

set of retention time calibrant peptides.

173

3.4 Results and discussion

Therapeutic mAb samples at various stages of purification were provided by Bristol-Myers

Squibb. The estimated HCP level was determined at Bristol Myers using a commercial (non-

optimized) ELISA kit (Cygus Technologies, Southport, NC). The ELISA results showed more

than 700 ppm of HCPs in the PA sample, about 50 ppm in the CEX sample, and less than 20 ppm

in the UF/DF sample. No further information such as HCP identity or specific HCP levels was

known. HCP analysis consists of a sample of relatively low complexity with a concentration range

of up to 5 to 6 orders of magnitude between the HCPs and the therapeutic drug. In order to obtain

preliminary information on the sample, we performed 2D-LC-MS/MS-DDA utilizing high

resolution/mass accuracy mass spectrometry. The preliminary information described below

provided significant insight for developing a novel workflow. With the collaboration of Simion

Kreimer, a workflow for HCP analysis based on 1D-LC-MS-DIA, followed by PRM, was

developed. In the following, the preliminary studies using 2D-LC-MS/MS-DDA and quantitation

of HCPs using 1D-LC-MS-PRM with isotopically labeled peptides will be described and the

workflow summarized. The details of the novel workflow will be presented in Simion Kreimer’s

thesis.

3.4.1 Low pH RP LC gradient optimization

For low pH RP nanoflow-LC-MS/MS, we optimized the gradient time for HCP sample

analysis. The gradient time affects the LC separation power as well as MS performance. With a

given gradient steepness and a given amount of analyte loaded on the column, longer gradient

times yield improved resolution, but wider peak widths, and hence lower signal intensity. Higher

174

LC separation power decreases species overlap and hence reduces potential ion suppression for

MS from peptides from the therapeutic drug, but lower signal intensity might affect detection of

HCP peptides. Shorter gradient times will yield sharper peaks and thus higher signal, and also

higher throughput. However, large amounts of therapeutic product peptides may reduce the chance

to detect co-eluting HCP peptides at low levels. Thus, there needs to be an optimum in gradient

rate.

We tested digested CEX purified samples with gradient times of 2 hours, 3 hours, and 4

hours, respectively, ramping linearly from 2% to 32 % of mobile phase B using DDA for MS

analysis. The resultant MS raw data were searched with the CHO protein sequence database

obtained from the C. griseus genome published in 2013 by Lewis et al. (26) by PD 1.4. The results

showed that the 2-hour gradient separation yielded the best performance, balancing separation

power and sensitivity, with highest number of identified peptides and HCPs (Table 3-1).

Table 3- 1 The number of identified HCPs and peptides along with different length of LC

separation gradient.

2-hour gradient 3-hour gradient 4-hour gradient

Number of identified HCPs a 41 33 25

Number of identified peptides b 558 531 466

a The criteria is the HCPs identified in at least two out of three technical replicates.

b Total number of peptides identified from the therapeutic protein as well as the HCPs, but not

peptides identified from proteins considered as common contaminants in the common repository

of adventitious protein (cRAP) sequences database (31).

175

3.4.2 HCP sample preparation protocol

HCP sample preparation for LC-MS-based approaches is challenging. There are several

critical factors that need to be considered when choosing the suitable protocol for HCP analysis.

First, sample preparation without HCP enrichment is preferred. HCP species co-purified with the

mAb would either have similar physiochemical properties to those of the mAb or be associated

with the mAb molecule itself by attractive interactions. Consequently, certain HCP species could

be lost with any enrichment method.

Second, acetone precipitation was chosen as the desalting and protein recovery procedure

before protein digestion, instead of using denaturing detergents such as RapGiestTM SF surfactant.

Acetone precipitation is a well-accepted and widely used protein recovery procedure for proteomic

sample preparation (32, 33). The procedure is easy to use with a high level of protein resuspension

(33). With acetone precipitation, most of the formulation components (e.g. salts) in the original

mAb sample can be removed with good protein recovery (24, 32). On the other hand, using

denaturing detergents such as RapGiestTM SF does not remove the original buffer components and

requires a separate desalting procedure. Thus, we chose acetone precipitation as the sample

preparation step.

3.4.3 HCP identification and estimation by 2D LC-MS/MS with DDA for PA and CEX

samples for preliminary testing

Preliminary HCP identification and quantitative abundance of the PA and CEX samples

were conducted by 2D LC-MS/MS with DDA. As mentioned in Materials and Methods Section

176

3.2, 20 fractions were taken from the first dimension high pH LC separation, and the 2-hour linear

gradient was used for the second dimension low pH separation. The average of normalized PSM

counts (number of peptide spectrum matches) between replicates was used as an estimate for label-

free quantitation.

The HCP distribution in terms of the PSM counts is shown in Figure 3-1, and the specific

HCPs are shown in Tables 3-2 and 3-3. PSM counting is a widely used label-free strategy for

relative quantitation. It is based on the empirical observation of positive correlation between the

number of identified MS/MS spectra and the peptide/protein amounts based on data dependent

acquisition (DDA) (34). Since this approach does not directly measure any protein or peptide

physical properties, it is semi-quantitative. However, since we aim at getting preliminary

understanding of the HCP sample, PSM counting is suitable for relative amount comparison for a

give HCP between the samples after different purification steps and for individual HCPs in the

same sample with significant difference of PSM counts.

With the PA sample, 728 HCPs were identified in both of the replicates with at least 2

PSMs. Among the HCPs, 43 were with at least 50 PSMs, and 211 HCPs were within the 10 to 49

PSM range. The number of the proteins identified with less than 10 PSMs was 474. Although a

large number of HCPs were detected, many were of low abundance. On the other hand, in the CEX

sample, 151 HCPs were identified in at least two out of three replicates with at least 2 PSMs. Five

HCPs were with at least 50 PSMs, and 20 were with from 10 to 49 PSMs. As shown in Figure 3-

1, the total number of identified HCPs, as well as the number with high PSM counts in the CEX

samples, significantly decreased compared with those in the PA sample, as expected. Examining

individual HCPs in Tables 3-2 and 3-3, a large number identified in the PA sample with high PSM

counts were no longer found in the CEX sample or were found with very low PSM counts (HCPs

177

with less than 10 PSMs in the CEX sample are not shown). The results demonstrate that CEX

chromatography was effective to reduce the residual HCPs carried along from the Protein A

purification.

Figure 3- 1 The number of proteins as a function of PSM counts for PA and CEX samples.

A. The comparison of identified HCP numbers for the PA and CEX samples. B. A chart of the

comparison of identified HCP numbers for the PA and CEX samples. The identified HCPs were

grouped into three categories: those with at least 50 PSMs, from 10 to 49 PSMs, and from 2 to 9

PSMs.

Table 3- 2 The list of identified HCPs in the PA sample with at least 50 PSM counts.

HCP identified with at least 50 PSMs in the

PA sample

PSM

counts

Number of identified

unique peptides

1 Putative phospholipase B-like 2 503 55

2 Clusterin 377 34

3 Elongation factor 2 320 53

4 Endoplasmin 277 58

5 78 kDa glucose-regulated protein precursor 249 44

6 Serine protease HTRA1 isoform X2 242 27

178


HCP identified with at least 50 PSMs in the

PA sample

PSM

counts

Number of identified

unique peptides

7 Pyruvate kinase PKM isoform X2 239 48

8 Glyceraldehyde-3-phosphate dehydrogenase 172 28

9 Elongation factor 1-alpha 1 168 25

10 Protein disulfide-isomerase, partial 128 29

11 Glutathione S-transferase P 1 128 20

12 α-Enolase isoform X3 121 33

13 Lysosomal alpha-glucosidase isoform X2 111 26

14 Filamin-B isoform X4 110 41

15 Elongation factor 1-gamma 110 20

16 Protein disulfide-isomerase A3 precursor 109 26

17 Isoamyl acetate-hydrolyzing esterase 1

homolog

109

24

18 Calreticulin precursor 104 18

19 Cytosolic purine 5'-nucleotidase 99 22

20 Hypoxia up-regulated protein 1 precursor 97 33

21 Heat shock cognate 71 kDa protein 94 22

22 Actin, cytoplasmic 1 88 16

23 Filamin-A isoform X4 86 33

24 Complement C1r subcomponent 84 20

25 Complement C1s subcomponent 84 20

26 T-complex protein 1 subunit theta isoform X3 80 22

27 Heat shock protein HSP 90-beta 77 28

28 Transketolase isoform X2 75 23

29 Peroxiredoxin-1 73 19

30 Fructose-bisphosphate aldolase A isoform X3 73 21

31 Uncharacterized protein LOC103163294 68 2

32 Lumican 67 12

33 Phosphoglycerate kinase 1 67 16

34 Alanine--tRNA ligase, cytoplasmic 66 26

35 Plasminogen activator inhibitor 1 isoform X2 66 21

36 Prolow-density lipoprotein receptor-related

protein 1 isoform X3

65

25

37 Myosin-9 isoform X3 62 29

38 Adenylyl cyclase-associated protein 1 59 18

39 ATP-citrate synthase isoform X3 58 21

40 T-complex protein 1 subunit zeta 58 20

41 Lipoprotein lipase isoform X2 52 23

42 Protein-glutamine gamma-glutamyltransferase

2

51

22

43 Fibronectin isoform X11 50 55

Table 3- 3 The list of identified HCPs in the CEX sample with at least 10 PSMs and their corresponding PSM counts in the PA

sample.

HCP identified with at least

10 PSMs in the CEX sample

Theoretical

pI

PSM counts in

the CEX

sample

Number of

identified unique

peptides in the

CEX sample

PSM counts in

the PA sample

Number of

identified unique

peptides in the

PA sample

1 Putative phospholipase B-like 2 5.90 175 42 503 55

2 78 kDa glucose-regulated

protein precursor

5.07 127 40 249 44

3 Uncharacterized protein

LOC103163294

6.32 97 5 68 2

4 Protein disulfide-isomerase 4.84 75 24 128 29

5 Clusterin 5.51 55 14 377 34


LOC103161293

4.88 32 4 19 3


LOC100756391 isoform X2

5.58 29 17 19 11

8 Anionic trypsin-2 isoform X2 4.79 26 3 23 2

9 Calreticulin precursor 4.33 24 10 104 18

10 Multidrug resistance protein 1 8.86 21 1 (4)* 1

11 Protein artemis isoform X6 8.65 21 3 (1)* 1

12 Desmoplakin isoform X2 6.53 21 13 4 1

13 Apoptogenic protein 1,

mitochondrial

9.33 20 1 18 2



9.12 18 14 8 1

15 Fam178a family with sequence

similarity 178, member A

8.95 18 2 27 2

16 Olfactory receptor 11H6 8.76 18 8 19 3

17 Anionic trypsin-2 7.46 17 4 16 1

17

9

Table 3-3 (Continued)

HCP identified with at least

10 PSMs in the CEX sample

Theoretical

pI

PSM counts in

the CEX

sample

Number of

identified unique

peptides in the

CEX sample

PSM counts in

the PA sample

Number of

identified unique

peptides in the

PA sample

18 Fibrous sheath-interacting

protein 2 isoform X2

5.78 16 4 22 2

19 Myeloid cell surface antigen

CD33 isoform 2 precursor

8.29 15 1 (9)* 1

20 Glutathione S-transferase P 1 7.64 15 4 128 20

21 Lg κ chain V-III region MOPC

321-like isoform X2

6.08 13 3 24 3

22 Glyceraldehyde-3-phosphate

dehydrogenase

8.49 13 6 172 28

23 Complement C1r

subcomponent

5.70 12 6 84 20

24 Protein-glutamine gamma-

glutamyltransferase 2

5.08 10 9 51 22

25 Hypoxia up-regulated protein 1

precursor

5.09 10 7 97 33

* These HCP were only found in one replicate in the PA sample analysis, and the PSM counts obtained in the only replicate are shown

in the parentheses.

The highlighted columns indicate the HCPs with a theoretical pI higher than 8.00 which were at higher or compatible level in the CEX

sample compared with those in the PA samples.

18

0

181

Since the PA and CEX samples were analyzed under the same 2D-LC-MS/MS protocol

with 20 fractions collected with the first dimension separation, the PSM counts can be used as an

indicator of the relative HCP level. The HCPs identified with at least 10 PSMs in the CEX sample

are listed in Table 3-3 as well as their corresponding PSM counts obtained in the PA sample.

Comparing Tables 3-2 and 3-3, it can be seen that most of the HCPs were significantly decreased

in PSM counts in the CEX sample in comparison to the PA sample. For example, the PSM count

of putative phospholipase B-like 2 dropped from 503 to 175 after CEX purification; the clusterin

PSM number was 377 in the PA sample and 55 in the CEX sample; and calreticulin precursor was

with 104 and 24 PSM counts in the PA and CEX sample.

However, several HCPs showed comparable, or even higher, levels in the CEX samples.

Apoptogenic protein 1, uncharacterized protein LOC103162254 isoform X2, fam178a family with

sequence similarity 178, and Olfactory receptor 11H6 are found with roughly 20 PSM counts in

both the CEX and PA samples. All have a theoretical pI value larger than 8. With CEX purification,

it is likely that these basic HCPs co-eluted with the mAb. A similar explanation can be applied to

the uncharacterized protein LOC103163294 with a theoretical pI 6.32. This protein showed

compatible levels between the PA and CEX samples with more than 50 PSMs.

Interestingly, three HCPs in Table 3-3, multidrug resistance protein 1, protein artemis

isoform X6, and myeloid cell surface antigen CD33 isoform 2 precursor, were not considered as

identified HCPs in the PA sample, because they were only detected in one replicate out of two

with low PSM counts (less than 10 PSMs in the single replicate). However, the proteins passed the

filtering criteria and were considered to be identified HCPs with relatively high PSM numbers in

the CEX sample. It is possible that they were “enriched” by the CEX chromatography since Protein

182

A purification and CEX chromatography are orthogonal methods. The fact that the HCPs were

only identified in one out of two replicates of the Protein A sample shows that the reproducibility

for low level species when determined by DDA is limited due to stochastic sampling, as is well

known (35).

3.4.4 HCP identification by 2D LC-MS/MS with DDA mode for UF/DF samples

HCP identification of the UF/DF samples was achieved by 2D LC-MS/MS-DDA with 5

fractions from the 1D high pH separation. A total number of 18 HCPs were identified in the UF/DF

sample, as shown in Table 3-4. The number of unique peptides for each HCP are also listed. UF/DF

is a buffer exchange procedure with a specific molecular weight cutoff (in this case 30 kDa) which

is generally used to concentrate, and at the same time, remove impurities of low molecular weight.

183

Table 3- 4 The identified HCPs in the UF/DF sample and their PSM counts in the PA sample*.

HCP identified in the UF/DF sample Molecular

weight

Theoretical

pI

Number of

identified unique

peptides

1 78 kDa glucose-regulated protein 72.3 kDa 5.07 8

2 PAX-interacting protein 1 isoform X2 104.4 kDa 6.57 1

3 Clusterin 51.7 kDa 5.51 4

4 DNA repair endonuclease XPF 103.2 kDa 6.79 1

5 Tubulin polyglutamylase TTLL11

isoform X3

62.7 kDa 8.90 1

6 Putative phospholipase B-like 2 65.8 kDa 5.90 4

7 Heparin cofactor 2 54.3 kDa 6.25 2

8 Protein disulfide-isomerase 54.2 kDa 4.84 3

9 cAMP-specific 3',5'-cyclic

phosphodiesterase 4D isoform X1

84.5 kDa 5.02 1

10 Protein FAM35A 102.5 kDa 6.54 1

11 Leukocyte immunoglobulin-like

receptor subfamily B member 3

isoform X3

70.6 kDa 5.91 1

12 Probable G-protein coupled receptor 75 59.4 kDa 9.19 1

13 Protocadherin-9 113.8 kDa 5.23 1

14 Ewing's tumor-associated antigen 1

isoform X2

94.5 kDa 6.45 1

15 Cadherin-13 77.6 kDa 4.96 1

16 Cadherin EGF LAG seven-pass G-type

receptor 3 isoform X2

357.7 kDa 6.30 1



49.9 kDa 9.12 1

18 MAP/microtubule affinity-regulating

kinase 4 isoform X2

74.0 kDa 9.72 1

The shaded columns demonstrate the HCPs which were not identified in either the PA or the CEX

samples.

Comparing the identified HCP list of the UF/DF (Table 3-4), CEX (Table 3-3), and PA

(Table 3-2) samples, several HCPs, among the highest abundance in the PA sample, are still on

the top of the list in the CEX sample, and also identified in the UF/DF samples. These proteins are

putative phospholipase B-like 2, 78 kDa glucose-regulated protein, protein disulfide-isomerase,

184

and clusterin. Interestingly, examining other HCP studies, these four proteins also stand out as

being among the most commonly reported HCPs with mAb and Fc fusion proteins.

Clusterin has been identified in different mAbs even after several chromatographic

purifications based on different separation mechanisms including Protein A and CEX (10, 19, 21,

36, 37). Moreover, Levy et al. reported that clusterin showed interaction with several different

mAbs and Fc fusion proteins through cross-interaction chromatography (CIC) (11). Similarly,

putative phospholipase B-like 2 (8, 36), 78 kDa glucose-regulated protein (19, 21, 36, 37), and

protein disulfide-isomerase (13, 21, 36) have been identified in the post-Protein A purification and

CEX chromatography in different mAb molecules.

As shown in Table 3-4, most of the identified HCPs, 13 out of 18, were identified with

only one unique peptide. After multiple purification steps, this post-UF/DF mAb sample contained

relatively low amounts of HCPs, resulting in low numbers of unique peptides. Note that, the PSM

counts obtained in these experiments cannot be used to compare the relative amounts of a give

HCP in the Tables 3-2 or 3-3, because the fraction numbers after the first dimension separation

were different.

3.4.5 HCP quantitation based on PRM and isotopically labeled internal standards

Quantitation of several HCPs was next developed with 1D-LC-MS using parallel reaction

monitoring (PRM) to explore an approach for HCP analysis. To evaluate the sensitivity and

specificity of the PRM method, 11 standard peptides, chosen from a commercial tumor associated

antigen (TAA) kit, were spiked into the digested UF/DF sample with their isotopically labeled

homologues. The overall protein amount of the sample was determined by BCA assay. The ppm

185

level of each peptide will depend on the molecular weight of the protein to which it is associated.

Here, we assumed the protein to be at 50 kDa. With 1D-nano-LC-MS using the PRM approach, 2

µg of digested sample was injected on the column, representing 0.04 fmol per injection or 1 ppm

level of the protein. Within the background of UF/DF digested sample, 8 peptides could be

identified at the 1 ppm (and potentially lower) level, and 1 peptide could be detected at the 5 ppm

level. Two peptides could not be detected at as high as 40 ppm level, likely due to the interference

(ion suppression) of the monoclonal antibody drug peptides. With the isotopically labeled peptides

as internal standards, 7 peptides showed good linearity with an R2 value more than 0.99, from 1

ppm to 40 ppm, 1 peptide from 1 ppm to 20 ppm, and 1 peptide from 5 ppm to 40 ppm. The

summary of the results is shown in Table 3-5, and the calibration curves in Figure 3-2. The results

demonstrate that the PRM method can be used to quantify very low HCP levels using isotopically

label peptides as internal standards.

186

Table 3- 5 Peptide pairs chosen from SpikeTide Set TAA, their identification and calibration

linear range against the post-ultrafiltration digested sample

Peptides chosen from

SpikeTides Set TAA

Identifiable at 1 ppm level Linear range for

calibration curve

KPAAGFLPSLLK √ 1- 40 ppm

LVSALIGEEK √ 1- 40 ppm

VIEASFPAGVDSSPR √ 1- 40 ppm

EGTPPIEER √ 1- 40 ppm

VGILHLGSR X Can be identified from 5 ppm level 5- 40 ppm

ESESTAGSFSLSVR X Cannot be identified at any level NA

GAAPPAAATAYDR √ 1- 40 ppm

TLGDSSAGEIALSTR √ 1- 40 ppm

GLALWEAYR √ 1-20 ppm

AASWGLPSVSLDLPR X Cannot be identified at any level. NA

TFEDIPLEEPEVK √ 1- 40 ppm

187

Figure 3- 2 The calibration curves of standard peptides from TAA SpikeTide Set.

188

Figure 3-2 (continued) The calibration curves of standard peptides from TAA SpikeTide Set. (A)

KPAAGFLPSLLK. (B) LVSALIGEEK. (C) VIEASFPAGVDSSPR. (D) EGTPPIEER. (E)

GAAPPAAATAYDR. (F) TLGDSSAGEIALSTR. (G) TFEDIPLEEPEVK. (H) GLALWEAYR.

(I) VGILHLGSR.

Table 3- 6 Target peptides and quantitation results for peptides from several HCPs.

HCPs Target peptides Molecular weight UF/DF sample CEX sample PA sample

Clusterin EIQNAVQGVK

LTQQYNELLHSLQTK

51.7 kDa 20 ppm

(CV% 20%)*

18 ppm

(CV% 42%)

387 ppm

(CV% 1%)

Putative phospholipase B VTSFSLAK

SVLLDAASGQLR

AFIPNGPSPGSR

65.8 kDa 39 ppm

(CV% 6%)

54 ppm

(CV% 56%)

336 ppm

(CV% 12%)

78 kDa glucose-regulated protein TWNDPSVQQDIK

NQLTSNPENTVFDAK

72.3 kDa 66 ppm

(CV% 3%)

50 ppm

(CV% 22%)

88 ppm

(CV% 8%)

Protein disulfide-isomerase VHSFPTLK 54.2 kDa 3 ppm

(CV% 18%)

3 ppm

(CV% 12%)

7 ppm

(CV% 3%)

*The quantitation is based on two biological replicates and three technical replicates. The calculation of the CV% is based on the two

biological replicates. The technical replicates CVs were much smaller (shown in Table 3-7). The lysine and arginine residues of the

internal standard peptides are labeled with stable isotopes that produce a mass shift of +8 Da and +10 Da, respectively.

18

9

Table 3- 7 The quantitative information of the selected HCPs of the two biological replicates.

HCPs UF/DF sample CEX sample PA sample

Replicate 1 Replicate 2 Replicate 1 Replicate 2 Replicate 1 Replicate 2

Clusterin 22 ppm

CV% 4.6%*

17 ppm

CV% 3.1%

13 ppm

CV% 1.5 %

23 ppm

CV% 2.4%

390 ppm

CV% 1.1%

383 ppm

CV% 4.9%

Putative phospholipase B 38 ppm

CV% 10.5%

41 ppm

CV% 3.3%

32 ppm

CV% 0.5%

75 ppm

CV% 2.3%

308 ppm

CV% 0.9%

364 ppm

CV% 1.4%

78 kDa glucose-regulated protein 64 ppm

CV% 14.9%

67 ppm

CV% 8.9 %

42 ppm

CV% 4.7%

57 ppm

CV% 5.7%

92 ppm

CV% 1.2%

83 ppm

CV% 4.4%

Protein disulfide-isomerase 3 ppm

CV% 8.5 %

2 ppm

CV% 37%

2 ppm

CV% 4.4 %

3 ppm

CV% 7.6%

8 ppm

CV% 10.8%

5 ppm

CV% 9.7%

*The calculation of the CV% is based on the three technical replicates.

19

0

191

Given the PRM method developed above, the four HCPs discussed in the previous section

- clusterin, putative phospholipase B, 78 kDa glucose-regulated protein, and protein disulfide-

isomerase - were quantitated in the PA, CEX, and UF/DF samples using homologous isotopically

labelled peptides as internal standards. The quantitative results are listed in Table 3-6. The large

decrease in concentration for the tested HCPs between PA and CEX can be seen. On the other

hand, little change occurred between the CEX and UF/DF samples, as ultrafiltration was used

mainly as a buffer exchange step and the filter cut-off was 30 kD.

Note that protein disulfide-isomerase was at a very low level in the PA sample and did not

change very much CEX and UF/DF samples. The precursor and product ions of both heavy and

light peptide “VHSFPTLK” are shown in Figure 3-3. It is clear that the quality of the transitions

was good for identification and quantitation even at such a low level. For low level peptides which

can only result in poor quality MS/MS spectra, the isotopically labeled homologues help to confirm

the identification and quantitation of the peptide since both the peptide and its isotopic internal

standard have the same retention time and fragment pattern.

192

A. B.

Figure 3- 3 Precursors and fragments of the peptide VHSFPTLK of protein disulfide-isomerase.

(A) The light peptide. (B) The heavy peptide. The spectrum is from the PRM analysis by Skyline.

3.4.6 The generation of assay library from 2D-microflow-LC-MS-DDA

A spectral assay library was generated with PA sample for DIA data analysis. 2D-LC-MS-

DDA was used, and the second dimension separation was a microflow-LC system with ACQUITY

UPLC M-class peptide CSH C18 column (1.7 µm beads, 130Å, 0.3 x 150 mm). Ten fractions from

the first dimension separation was used instead of 20 in order to increase the time efficiency. In

193

order to obtain as much spectral assay as possible, three injections for each fraction were made;

30 µg of sample was injected for the first run, and 15 µg was for the second run. Then those MS

raw files were analysis by database search. In the third run, around 100 species which were

identified with high confidence in the previous two runs were excluded for MS2 scan. In this way,

a total number of 4,535 assays corresponding to 3,505 peptides in various charge states were

generated, and the number of proteins identified was 759. The reason of using PA sample as well

as microflow-LC will be discussed in the next section.

3.4.7 The insights provided by the preliminary results from 2D-LC-MS/MS-DDA and the

generation of the novel workflow

In the present study, 2D-LC-MS/MS strategy with the PSM counting approach for relative

quantitation provided high sensitivity and selectivity to distinguish the HCP peptides of low

abundance from the high level of therapeutic protein peptides. 1D-LC-MS-PRM showed high

sensitivity to quantify species at very low ppm level with high throughput. This strategy was able

to identify and track HCP amount in the therapeutic mAb samples across several purification steps,

which support the reasoning design of downstream processing. 2D-LC separation power decreases

species co-elution and hence reduces potential ion suppression for MS. The DDA mode is currently

the conventional strategy for MS data acquisition, and many data analysis methods are available

and ready to use. PSM counting was used to estimate the HCP level across different purification

steps and provided a straightforward reference for the purification efficiency.

194

Disadvantages of 2D-LC-MS-DDA strategy for HCP analysis

Despite the high resolving power and straightforward data interpretation, some

disadvantages remain for the 2D-LC-MS-DDA strategy. First, the throughput of 2D-LC is low.

For a given sample, it took around more than a week for triplicate runs of the second dimension

LC to analyze the 20 fractions from the first dimension separation. Decreasing the number of

fractions can increase the throughput but may compromise the analysis sensitivity due to co-elution

of HCP and therapeutic peptides. Nonetheless, even with the 5 fractions used to analyze the UF/DF

sample, the throughput is still not desirable that each sample needs several days for analysis. Thus,

this strategy may not be practical when one aims to screen HCPs rapidly.

Second, with DDA, the reproducibility of low abundant HCP detection is limited. As

discussion in section 1.8.2, Chapter 1, DDA sampling for MS2 scan biases toward the high

abundant species, e.g. top 15 highest abundant precursors, and the analytes with low abundance

may not be sampled in every technical replicate runs due to such stochastic sampling. With the PA

sample, for example, there were more than 1600 HCP species detected in either replicate, but only

about half of them were identified in both replicates. Since they usually identified with single

unique peptide and only several PSMs, it is difficult to determine which ones were the true

identification.

Third, to avoid false positives and/or false negatives, manual checking the identified

peptide from the database search can be helpful to increase the confidence, as we did when we

analyzed the UF/DF sample. However, it can be laborious, especially when there are a large

number of HCPs which need to be confirmed, such as the PA sample. Sometimes manual checking

can still not be definitive due to low quality of the MS2 spectra.

195

Meanwhile, we also observed that the retention time of a give species could vary somewhat

from run to run. The retention time can be sensitive to factors which might not be easy to control

during the nanoflow-LC experiment. Since PRM collects MS/MS on a predefined schedule, these

variations of retention time could cause some difficulty to set up the PRM parameters in the present

study.

Moreover, this 2D-LC-MS-DDA strategy provides a rough HCP distribution in the sample

and relative amount changes of HCPs across different purification steps. 1D-LC-MS-PRM was

able to quantify individual HCPs with a high throughput using isotopically labeled peptides. On

the other hand, these customized internal standards need to be synthesized in-house or ordered

from a third party after the HCP identification, which may be time consuming, and expensive. As

a result, a more rapid approach to estimate the individual and/or overall HCP levels is desirable.

Overview of the DIA-to-PRM HCP analysis workflow

Based on the preliminary results and experience, we understood that we were dealing with

hundreds of HCP species in the HCP sample of early purification stage and several tens in the final

product. Such kinds of samples are not as complex as proteomic samples such as cell lysate (with

thousands of proteins). The dynamic range of the HCP sample is high, where, the protein species

of interest are at very low levels in the high abundance of therapeutic protein.

In collaboration with Simion Kreimer and detailed in his thesis, a DIA-to-PRM HCP

analysis workflow was developed for HCP identification and quantitative estimation (Figure 3-4).

In this workflow, 1D-microflow-LC-MS with DIA was used to test the sample at later stages of

purification. A therapeutic-specific spectral assay library was generated from the sample at early

196

purification stage (Protein A) by 2D-LC-MS-DDA, which was used to assist the targeted DIA data

analysis. The untargeted DIA data analysis was achieved with the CHO protein sequence database.

The combination of targeted and untargeted DIA data analysis was applied to interpret the DIA

raw data. The putative peptide identifications were then tested by the 1D-LC-PRM method to

validate the identification. With this workflow, a total number of 37 HCPs were identified in the

UF/DF sample. Compared with 18 HCPs identified with the 2D-LC-MS-DDA approach cited

earlier, the increased number of identified HCPs indicates improved sensitivity of the HCP

analysis.

Figure 3- 4 The scheme of the DIA-to-PRM HCP analysis workflow.

197

Insights provided with 2D-LC-MS-DDA and the reasoning of the DIA-to-PRM HCP analysis

workflow

The preliminary results obtained with the 2D-LC-MS-DDA provided valuable insight to

guide the design of the DIA-to-PRM workflow. First, despite the high sensitivity with nanoflow

LC-MS, microflow-LC separation was chosen for the new workflow to enhance the robustness

and reproducibility of the retention time as well as signal intensity. With a wider column (300 m),

more sample material can be injected, and hence overcome the potential sensitivity decrease. The

increased robustness and reproducibility allowed the automation of PRM parameter setting as well

as HCP quantitative estimation without isotope homologues. Second, DIA was used instead of

DDA. As discussion in 1.8.2, Chapter 1, DIA systematically fragments all precursor ions, which

is suitable for detection of low abundant species.

Third, the therapeutic-specific spectral assay library generated from the PA sample was

used for targeted DIA data analysis. HCP identified Protein A pool is a reasonable HCP reference

for the sample after purification, especially for the relatively high abundance HCPs. Examing the

detected HCPs in either of the two replicates of the PA samples regardless their PSM counts,

around 1600 proteins, all HCPs identified with at least 10 PSMs in the CEX sample (Table 3-3)

were identified in the PA samples, indicating the good coverage of HCP reference from the PA

sample. The null CHO cell line was not used for spectral assays generation, which could cover the

whole proteome. The reason is that certain HCPs may be at low abundance in the CHO protein,

but can carry along with the drug product through Protein A purification due to attractive

interaction between mAb and HCP, and hence be present at relatively high level in the PA sample.

As a result, using the PA sample can yield more identified peptides and MS2 spectral with a higher

quality of HCPs, compared with the spectral assays generated from the null CHO cell line.

198

Fourth, besides the targeted spectral assay, the untargeted database search is needed for

more complete DIA analysis. Although the spectral assay obtained from the PA sample is a good

reference for HCP pool, this reference does not necessarily reflect the comprehensive residual HCP

profile. For example, three HCPs identified with high PSM counts were only detected in one

replicate of PA sample analysis. There are 22 proteins identified in the CEX sample which could

not be found in the 1600 proteins of the PA sample, though all 22 proteins had very low PSM

counts between 2 to 10. This result indicates that certain HCPs were overwhelmed by the presence

of other high abundant HCPs in the PA sample with DDA analysis; however, they were observed

in the later CEX sample through orthogonal purification mechanism. The untargeted database

search using the CHO protein sequence database can overcome this drawback to enhance the depth

of DIA data interpretation.

Moreover, the putative identification obtained from the DIA data analysis can be validated

by the following PRM test, which can avoid tedious manual checking and be with high sensitivity

and selectivity. The preliminary results showed that 1D-LC-PRM can identify at single digit ppm

level of HCP species.

3.5 Conclusion

As a major class of process related-impurities, HCP detection and quantitation is of great

importance for drug quality control. In this chapter, 2D-LC-MS with DDA was used to analyze

the HCP distribution and profile changing in the mAb therapeutic product along several

purification steps. Several individual HCPs were quantified with isotopically labeled peptides as

internal standard using 1D-LC-MS-PRM. The preliminary results and understanding obtained

199

from this study provided valuable information in the following workflow development. A DIA-to-

PRM workflow of HCP analysis was developed with the collaboration of Simion Kreimer, and

details of this workflow can be found in Simion Kreimer’s thesis.

3.6 References

1. Wang X, Hunter AK, & Mozier NM (2009) Host cell proteins in biologics development:

identification, quantitation and risk assessment. Biotechnol. Bioeng. 103(3):446-458.

2. Gao SX, et al. (2011) Fragmentation of a highly purified monoclonal antibody attributed

to residual CHO cell protease activity. Biotechnol. Bioeng. 108(4):977-982.

3. Robert F, et al. (2009) Degradation of an Fc-Fusion Recombinant Protein by Host Cell

Proteases: Identification of a CHO Cathepsin D Protease. Biotechnol. Bioeng.

104(6):1132-1141.

4. Bracewell DG, Francis R, & Smales CM (2015) The future of host cell protein (HCP)

identification during process development and manufacturing linked to a risk-based

management for their control. Biotechnol. Bioeng. 112(9):1727-1737.

5. Zhu-Shimoni J, et al. (2014) Host cell protein testing by ELISAs and the use of orthogonal

methods. Biotechnol. Bioeng. 111(12):2367-2379.

6. Tscheliessnig AL, Konrath J, Bates R, & Jungbauer A (2013) Host cell protein analysis in

therapeutic protein bioprocessing - methods and applications. Biotechnol. J. 8(6):655-670.

7. Flatman S, Alam I, Gerard J, & Mussa N (2007) Process analytics for purification of

monoclonal antibodies. J. Chromatogr. B Analyt. Technol. Biomed. Life Sci. 848(1):79-87.

200

8. Vanderlaan M, et al. (2015) Hamster phospholipase B-like 2 (PLBL2): A host-cell protein

impurity in therapeutic monoclonal antibodies derived from Chinese hamster ovary cells.

Bioprocess. Int. 13(4):18-55.

9. de Zafra CLZ, Quarmby V, Francissen K, Vanderlaan M, & Zhu-Shimoni J (2015) Host

cell proteins in biotechnology-derived products: A risk assessment framework. Biotechnol.

Bioeng. 112(11):2284-2291.

10. Levy NE, Valente KN, Lee KH, & Lenhoff AM (2016) Host cell protein impurities in

chromatographic polishing steps for monoclonal antibody purification. Biotechnol. Bioeng.

113(6):1260-1272.

11. Levy NE, Valente KN, Choe LH, Lee KH, & Lenhoff AM (2014) Identification and

characterization of host cell protein product-associated impurities in monoclonal antibody

bioprocessing. Biotechnol. Bioeng. 111(5):904-912.

12. Aboulaich N, et al. (2014) A novel approach to monitor clearance of host cell proteins

associated with monoclonal antibodies. Biotechnol. Prog. 30(5):1114-1124.

13. Tait AS, Hogwood CEM, Smales CM, & Bracewell DG (2012) Host cell protein dynamics

in the supernatant of a mAb producing CHO cell line. Biotechnol. Bioeng. 109(4):971-982.

14. Hogwood CEM, Tait AS, Koloteva-Levine N, Bracewell DG, & Smales CM (2013) The

dynamics of the CHO host cell protein profile during clarification and protein A capture in

a platform antibody purification process. Biotechnol. Bioeng. 110(1):240-251.

15. Krawitz DC, Forrest W, Moreno GT, Kittleson J, & Champion KM (2006) Proteomic

studies support the use of multi-product immunoassays to monitor host cell protein

impurities. Proteomics 6(1):94-110.

201

16. Bomans K, et al. (2013) Identification and monitoring of host cell proteins by mass

spectrometry combined with high performance immunochemistry testing. PLoS One

8(11):11.

17. Tarrant RDR, Velez-Suberbie ML, Tait AS, Smales CM, & Bracewell DG (2012) Host cell

protein adsorption characteristics during protein a chromatography. Biotechnol. Prog.

28(4):1037-1044.

18. Berrill A, Ho SV, & Bracewell DG (2010) Product and contaminant measurement in

bioprocess development by SELDI-MS. Biotechnol. Prog. 26(3):881-887.

19. Doneanu CE, et al. (2012) Analysis of host-cell proteins in biotherapeutic proteins by

comprehensive online two-dimensional liquid chromatography/mass spectrometry. mAbs

4(1):24-44.


independent LC-MSE. Anal. Chem. 87(18):9186-9193.

21. Zhang QC, et al. (2014) Comprehensive tracking of host cell proteins during monoclonal

antibody purifications using mass spectrometry. mAbs 6(3):659-670.

22. Schenauer MR, Flynn GC, & Goetze AM (2012) Identification and quantification of host

cell protein impurities in biotherapeutics using mass spectrometry. Anal. Biochem.

428(2):150-157.

23. Valente KN, Lenhoff AM, & Lee KH (2015) Expression of difficult-to-remove host cell

protein impurities during extended Chinese hamster ovary cell culture and their impact on

continuous bioprocessing. Biotechnol. Bioeng. 112(6):1232-1242.

24. Valente KN, Schaefer AK, Kempton HR, Lenhoff AM, & Lee KH (2014) Recovery of

Chinese hamster ovary host cell proteins for proteomic analysis. Biotechnol. J. 9(1):87-99.

202

25. Doneanu CE, et al. (2015) Enhanced detection of low-abundance host cell protein (HCP)

impurities in high-purity monoclonal antibodies down to 1 ppm using ion mobility mass

spectrometry coupled with multidimensional liquid chromatography. Anal. Chem.

87(20):10283-10291.



27. Dorfer V, et al. (2014) MS Amanda, a universal identification algorithm optimized for high

accuracy tandem mass spectra. J. Proteome Res. 13(8):3679-3684.

28. Tabb DL, Fernando CG, & Chambers MC (2007) MyriMatch: Highly accurate tandem

mass spectral peptide identification by multivariate hypergeometric analysis. J. Proteome

Res. 6(2):654-661.

29. Kim S & Pevzner PA (2014) MS-GF plus makes progress towards a universal database

search tool for proteomics. Nat. Commun. 5:10.

30. Vaudel M, et al. (2015) PeptideShaker enables reanalysis of MS-derived proteomics data

sets. Nat. Biotechnol. 33(1):22-24.

31. Mellacheruvu D, et al. (2013) The CRAPome: a contaminant repository for affinity

purification-mass spectrometry data. Nat. Methods 10(8):730-736.

32. Jiang L, He L, & Fountoulakis M (2004) Comparison of protein precipitation methods for

sample preparation prior to proteomic analysis. J. Chromatogr. A 1023(2):317-320.

33. Bodzon-Kulakowska A, et al. (2007) Methods for samples preparation in proteomic

research. J. Chromatogr. B Analyt. Technol. Biomed. Life Sci. 849(1-2):1-31.

34. Liu HB, Sadygov RG, & Yates JR (2004) A model for random sampling and estimation of

relative protein abundance in shotgun proteomics. Anal. Chem. 76(14):4193-4201.

203

35. Chapman JD, Goodlett DR, & Masselon CD (2014) Multiplexed and data-independent

tandem mass spectrometry for global proteome profiling. Mass Spectrom. Rev. 33(6):452-

470.

36. Joucla G, et al. (2013) Cation exchange versus multimodal cation exchange resins for

antibody capture from CHO supernatants: Identification of contaminating Host Cell

Proteins by mass spectrometry. J. Chromatogr. B Analyt. Technol. Biomed. Life Sci.

942:126-133.


independent LC-MS^E. Anal. Chem. 87(18):9186-9193.

204

Copyrights

A dissertation submitted to - Northeastern Universitycj82ng16n/fulltext.pdfdrug product background....

Documents

Transcript of A dissertation submitted to - Northeastern Universitycj82ng16n/fulltext.pdfdrug product background....