A dissertation submitted to - Northeastern Universitycj82ng16n/fulltext.pdfdrug product background....
Transcript of A dissertation submitted to - Northeastern Universitycj82ng16n/fulltext.pdfdrug product background....
Application of Liquid Chromatography-Mass Spectrometry-Based Protein and Proteomic
Analytical Approaches to Chinese Hamster Ovary Cell Based Industrial Biopharmaceutical
Production
by Yuanwei Gao
B.S. in Chemistry, Tsinghua University, Beijing, China
M.S. in Forensic Sciences, Sam Houston State University, Texas, U.S.
A dissertation submitted to
The Faculty of
the College of Science of
Northeastern University
in partial fulfillment of the requirements
for the degree of Doctor of Philosophy
June 28th , 2016
Dissertation directed by
Barry L. Karger
Director of the Barnett Institute, Distinguished Professor
James L. Waters Chair in Analytical Chemistry
ii
Acknowledgements
I would like to take this opportunity to express my sincere gratitude to the people who
helped me during my dissertation, although it is not possible to identify all of them.
First of all, I would like to thank my advisor Professor Barry L. Karger. I first learned a lot
from his wonderful class. We had worked together closely since I joined Dr. Karger’s group, and
it was a valuable experience to have his advice and support. Dr. Karger’s insightful guidance and
great scientific intuition have inspired me. I know for certain that what I learned from him will be
beneficial throughout the rest of my career.
I would like to express my gratitude to Dr. Alexander R. Ivanov because his comments in
scientific discussions and careful explanation of experimental details have not only helped my
research project but also my career development. His deep knowledge and encouragement have
been very valuable to me.
I also want to specifically thank Somak Ray and Simion Kreimer, two colleagues, who
worked very closely with me on my research projects, for their valuable scientific discussions and
contributions.
An acknowledgement also goes to my committee members: Professor Roger W. Giese,
Professor Olga Vitek, and Professor Paul Vouros, for their help in my graduate study.
I would like to thank all of the former and present people in Dr. Karger’s group as well as
the personnel in Barnett Institute: Arseniy Belov, Dr. Siyuan Liu, Dr. Xianzhe Wang, Yu Wang,
Di Wu, Yanjun Liu, Shan Jiang, Dr. Shiaw-Lin Billy Wu, Dr. David R. Bush, Dr. Krishan Kumar,
Dr. Daniel Shujia Dai, Dr. Vennela Mullangi, Dr. Wenqin Ni, Dr. Chen Li, Dr. Zhenke Liu, Dr.
Siyang Li, Dr. Adam Hall, Dr. James Glick, Dr. Suli Liu, Dr. Fan Zhang, Dr. Fangfei Yan, Dr. Ye
iii
Zhang, Victoria Berger, Yang Tang, Zhidan Chen, Nancy Carbone, and Emanuelle Hestermann. I
appreciate their help in both my daily life and scientific research and the friendships which I have
gained during these years.
I also would like to acknowledge my collaborators for their assistance during our
collaboration: Dr. Nicholas R. Abu-Absi, Dr. Michael C. Borys, Dr. Amanda Lewis, Dr. Jin Mi,
Dr. Zhengjian Li, Dr. Mesredin Mussa, Dr. Zizhou Xing, Dr. Zhijun Tan, and Dr. Li Tao from
Bristol-Mayer Squibb and Dr. Kristine Brazin and Professor Ellis Reinherz in Dana-Farber Cancer
Institute.
Finally, I would like to thank my family: my parents and grandparents for their
unconditional love, support and having faith in me.
iv
Abstract of Dissertation
Therapeutic proteins have emerged rapidly over the past several decades, providing
effective and innovative medicines for a wide range of previously refractory human diseases.
Chinese hamster ovary (CHO) cells have become the predominant choice as the cellular expression
system for such therapeutic production in the biopharmaceutical industry. The high throughput of
the protein drug production depends on both the efficient upstream process yielding high product
titers and proficient downstream purification with high product recovery and effective impurity
removal. Numerous efforts have been made at both of the up- and down-stream processes of CHO-
based manufacturing to improve productivity. Although advances have been achieved, many
challenges remain. The underlying biology of CHO cell productivity has not been fully understood
due to an incomplete biological picture, hampering the efforts of cell cultivation optimization.
Moreover, it is challenging to apply the results of cell cultivation development received from the
bench-top scale to large scale production bioreactors, since different behaviors of the CHO cell are
frequently observed with different bioreactor types and sizes. At the same time, efficient
downstream purification is also essential to ensure drug product quality. Considering the potential
safety risks to patients, the identification and quantitation of impurity residues in therapeutic
proteins, especially host cell proteins (HCP), is of great importance but challenging due to the bulk
drug product background. New analytical technologies and strategies which can be applied to the
therapeutic protein production process are needed.
Liquid chromatography-mass spectrometry (LC-MS)-based approaches are a powerful tool
for proteomics and protein analysis, capable of providing the most comprehensive information to
date. LC-MS analysis has been extending the depth and accuracy of proteomics study. Global cell
constituent analysis or ’Omics, including proteomics and metabolomics, can provide in depth
v
global characterization of CHO cells. A deeper understanding of CHO biology can potentially
improve the optimization of manufacturing bioprocesses. Moreover, LC-MS-based methods are
also a great candidate for HCP analysis.
This dissertation aims at adapting state-of-the art LC-MS-based protein and proteomic
approaches to the industrial biopharmaceutical processes, for the benefit of industrial therapeutic
drug production. In Chapter 1, the industrial therapeutic protein production platform is introduced
as well as the technology of LC-MS-based protein and proteomics analysis.
In Chapter 2, a study is presented where a CHO-DG44 production cell line showed
different phenotypic behaviors during the scaling-up process when cultured in the production scale
(5-KL scale) and bench-top scale (20-L) bioreactors with two copper levels in the culture media
for each scale. Relative quantitative proteomics based on high-resolution two dimensional liquid
chromatography coupled to tandem mass spectrometry (2D-LC-MS/MS) was applied. Multi-
omics including proteomics and metabolomics were employed to study CHO cell systems in order
to understand the phenotypic behavior. The results revealed that CHO cells underwent intermittent
hypoxia in the large production bioreactor due to the less efficient oxygen transfer and longer
mixing times compared to the bench-top scale. This resulted in lower productivity and viability
for the production scale.
In collaboration with Simion Kreimer, Ph.D. candidate in chemistry at Northeastern,
Chapter 3 describes a workflow of HCP analysis in a therapeutic monoclonal antibody, taking the
advantage of the high resolution capabilities of the Orbitrap mass spectrometer. A spectral library
was developed based on two-dimensional high pH/low pH reversed phase (RP/RP) liquid
chromatography coupled to tandem mass spectrometry (LC/MS/MS) with data dependent
acquisition (DDA). Then, a novel data independent acquisition-to- parallel reaction monitoring
vi
(DIA-to-PRM) approach was developed for HCP identification and quantitative estimation. The
methodology is demonstrated to be capable of detecting HCPs at the low ppm level in the bulk
product background after purification. Several HCPs were quantified with isotopically labeled
peptides as internal standards.
The studies described in this dissertation demonstrate the power of LC-MS-based
approaches to address biopharmaceutical industry needs, by studying CHO biology as well as
evaluating impurities in final product. In future studies, the discovery and method developed in
this thesis can be applied to improve biopharmaceutical productivity and quality.
vii
Table of Contents
Acknowledgements ......................................................................................................................... ii
Abstract of Dissertation ................................................................................................................. iv
List of Figures ................................................................................................................................ xi
List of Tables ............................................................................................................................... xiii
List of Abbreviations .................................................................................................................... xv
Chapter 1: Overview of Therapeutic Protein Production by Chinese Hamster Ovary Cells and
Liquid Chromatography Mass Spectrometry Based Quantitative Proteomics ............................... 1
1.1 Abstract ............................................................................................................................ 2
1.2 Overview of recombinant therapeutic protein production ............................................... 3
1.3 Principle of recombinant biopharmaceutical synthesis by mammalian cell expression
systems. ....................................................................................................................................... 5
1.4 CHO as a therapeutic protein production host. ................................................................ 9
1.4.1 Advantages of CHO expression system for commercial recombinant therapeutic
protein production ....................................................................................................................... 9
1.4.2 A brief history of CHO cell lines applied to biotech industry. ................................... 11
1.5 The general platform of therapeutic protein production by CHO cells .......................... 11
1.6 Industrial platform of therapeutic protein production by CHO cells ............................. 14
1.6.1 The platform of the upstream process ........................................................................ 14
1.6.2 Challenges of upstream process ................................................................................. 18
1.6.3 The platform of downstream process ......................................................................... 21
1.6.4 Challenges of downstream process............................................................................. 24
1.7 Current advances for CHO-based therapeutic protein production. ................................ 25
1.7.1 Understanding CHO cell production and CHO cell engineering through ‘Omics
approaches ................................................................................................................................. 25
viii
1.7.2 Current approaches of host cell protein identification and quantitation ..................... 29
1.8 Introduction of liquid chromatography mass spectrometry-based quantitative
proteomics and protein analysis ................................................................................................ 32
1.8.1 Two dimensional liquid chromatography ................................................................... 34
1.8.2 Mass spectrometry ...................................................................................................... 36
1.8.3 LC-MS based quantitative proteomics and protein analysis ...................................... 46
1.8.3.1 Label-free quantitation ............................................................................................ 47
1.8.3.2 Labeled quantitation approaches ............................................................................. 47
1.8.4 MS-based proteomic data interpretation ..................................................................... 60
1.8.4.1 Peptide and protein identification ........................................................................... 60
1.8.4.2 Biological analysis .................................................................................................. 63
1.9 Conclusion ...................................................................................................................... 64
1.10 Reference ........................................................................................................................ 65
Chapter 2: Combined Metabolomics and Proteomics Reveals Hypoxia as A Cause of Lower
Productivity on Scale-up to a 5000-Liter CHO Bioprocess .......................................................... 81
2.1 Abstract .......................................................................................................................... 82
2.2 Introduction .................................................................................................................... 83
2.3 Materials and methods ................................................................................................... 85
2.3.1 Chemicals and reagents .............................................................................................. 85
2.3.2 CHO Cell Culture Conditions ..................................................................................... 86
2.3.3 Metabolomic analysis ................................................................................................. 88
2.3.4 Sample preparation for proteomics ............................................................................. 89
2.3.5 2D LC-MS/MS ........................................................................................................... 89
2.3.6 Construction and annotation of DG44 CHO cell proteome database ......................... 91
2.3.7 Protein identification of proteomics analysis ............................................................. 92
ix
2.3.8 Quantitation and differential expression analysis ....................................................... 93
2.3.9 Data filtering technique applied on the proteomics data ............................................ 94
2.3.10 Interaction network and pathway analysis ................................................................ 95
2.3.11 Western blotting ........................................................................................................ 96
2.3.12 Quantitation of fibronectin levels by ELISA ............................................................ 96
2.3.13 Real-Time PCR ........................................................................................................ 97
2.4 Results ............................................................................................................................ 97
2.4.1 CHO cell growth and productivity in 5-KL vs. 20-L scale bioreactors with two levels
of copper concentration in the media (conducted by Bristol Myers Squibb)............................ 98
2.4.2 Proteomic and metabolomics analysis platform ....................................................... 101
2.4.3 Analysis of combined differentially regulated proteins and metabolites in the 5-KL
reveals significant reduction in ROS with higher level of copper concentration in the media
and no significant copper effect in the 20-L reactor ............................................................... 103
2.4.4 Hypoxia (intermittent) in 5-KL bioreactor reduces cell viability and productivity . 109
2.4.5 Analysis of additional differentially regulated proteins supports the ROS and hypoxia
roles in the 5-KL bioreactor .................................................................................................... 111
2.4.6 The differentially regulated proteins related to important biological functions and
pathways .................................................................................................................................. 113
2.4.7 Superoxide dismutase 1 is potentially involved in the reduction of intermittent
hypoxia and oxidative stress with addition of copper in the 5-KL bioreactor ........................ 114
2.5 Discussion .................................................................................................................... 115
2.6 Conclusion .................................................................................................................... 118
2.7 Appendix ...................................................................................................................... 120
2.7.1 Perspective of biological effects caused by additional copper in the media. ........... 120
2.8 Reference ...................................................................................................................... 147
x
Chapter 3: Identification and Quantitation of Host Cell Proteins in Therapeutic Product ......... 156
3.1 Preface and Abstract..................................................................................................... 157
3.2 Introduction .................................................................................................................. 159
3.3 Materials and Methods ................................................................................................. 165
3.3.1 Chemicals and reagents ............................................................................................ 165
3.3.2 Sample preparation ................................................................................................... 165
3.3.3 LC-MS/MS ............................................................................................................... 166
3.3.4 Mass spectrometry parameters ................................................................................. 169
3.3.5 HCP identification and quantitative estimation through 2D LC-MS/MS ................ 170
3.3.6 HCP quantitation through PRM with isotopically label peptides............................. 171
3.3.7 Spectral assay generation .......................................................................................... 172
3.4 Results and discussion .................................................................................................. 173
3.4.1 Low pH RP LC gradient optimization ...................................................................... 173
3.4.2 HCP sample preparation protocol ............................................................................. 175
3.4.3 HCP identification and estimation by 2D LC-MS/MS with DDA for PA and CEX
samples for preliminary testing ............................................................................................... 175
3.4.4 HCP identification by 2D LC-MS/MS with DDA mode for UF/DF samples ......... 182
3.4.5 HCP quantitation based on PRM and isotopically labeled internal standards ......... 184
3.4.6 The generation of assay library from 2D-microflow-LC-MS-DDA ........................ 192
3.4.7 The insights provided by the preliminary results from 2D-LC-MS/MS-DDA and the
generation of the novel workflow ........................................................................................... 193
3.5 Conclusion .................................................................................................................... 198
3.6 References .................................................................................................................... 199
xi
List of Figures
Figure 1- 1 The fundamental scheme of therapeutic protein production by mammalian cell lines.6
Figure 1- 2 The classic secretory pathway for recombinant therapeutic protein secretion. ........... 8
Figure 1- 3 The platform of therapeutic protein production by CHO. .......................................... 13
Figure 1- 4 Several examples of bioreactor types for therapeutic protein production. ................. 15
Figure 1- 5 Two popular bioreactor feeding modes in biopharmaceutical industry. .................... 18
Figure 1- 6 The simplified purification platform based on chromatography for mAb and related
proteins such as Fc fusion proteins for downstream process. ....................................................... 22
Figure 1- 7 Information flow in cells and the connection between ‘Omics. ................................. 26
Figure 1- 8 The general workflow of proteomics analysis based on liquid chromatography
coupled with tandem mass spectrometry (LC-MS/MS). .............................................................. 33
Figure 1- 9 The scheme of an orbitrap. ......................................................................................... 38
Figure 1- 10 Construction of the Q Exactive. ............................................................................... 39
Figure 1- 11 The scheme of (A) data dependent acquisition (DDA) and (B) data independent
acquisition (DIA). ......................................................................................................................... 41
Figure 1- 12 The schemes of PRM and SRM processes. .............................................................. 45
Figure 1- 13 The categories of labeling approaches. .................................................................... 49
Figure 1- 14 The reaction of enzymatic labeling of 16O/18O (115). .............................................. 51
Figure 1- 15 Chemical reaction of dimethyl labeling. .................................................................. 53
Figure 1- 16 The scheme of (A) isobaric labeling reagents and (B) labeled peptide. .................. 54
Figure 1- 17 TMT 6-plex labeling reagents and the technology principle. .................................. 56
Figure 1- 18 Chemical structure of major isobaric reagents. ........................................................ 58
xii
Figure 2- 1 The cell density, viability, titer productivity, and lactate profiles of the 5-KL and 20-
L bioreactors. ................................................................................................................................ 99
Figure 2- 2 Prediction of significantly repressed biological functions related to cell fate for the 5-
KL bioreactor using IPA. ............................................................................................................ 104
Figure 2- 3 Prediction of significantly repressed biological functions related to ROS generation
for the 5-KL bioreactor using IPA. ............................................................................................. 105
Figure 2- 4 Prediction of the formation of ROS for 5-KL vs 20-L scales with low and high
copper conditions. ....................................................................................................................... 108
Figure 2- 5 Results demonstrating hypoxic stress. ..................................................................... 111
Figure 2- 6 Western blotting of SOD1, a copper-binding enzyme, for the 5-KL and 20-L scales
and under different copper levels. ............................................................................................... 115
Figure 2- 7 The scheme of the summary that increased copper reveals hypoxia as a cause of
lower productivity on scale-up to industrial CHO bioprocess. ................................................... 119
Figure 2- 8 Prediction of significantly activated biological functions for the 20-L bioreactor
using IPA at Day 6. ..................................................................................................................... 123
Figure 3- 1 The number of proteins as a function of PSM counts for PA and CEX samples. ... 177
Figure 3- 2 The calibration curves of standard peptides from TAA SpikeTide Set. .................. 187
Figure 3- 3 Precursors and fragments of the peptide VHSFPTLK of protein disulfide-isomerase.
..................................................................................................................................................... 192
Figure 3- 4 The scheme of the DIA-to-PRM HCP analysis workflow. ...................................... 196
xiii
List of Tables
Table 2- 1 The number of identified and quantified proteins with the 5-KL and 20-L bioreactors
from the proteomic data analysis. ............................................................................................... 102
Table 2- 2 Numbers of differentially regulated proteins and metabolites at each time points of the
both scales. The differentially regulation is by comparing the high and low copper conditions of
a given scale. ............................................................................................................................... 102
Table 2- 3 MetaCore analysis of proteomic data of the 5-KL scale. The significant differentially
regulated proteins related to apoptosis and cell adhesion pathways ........................................... 112
Table 2- 4 Differentially regulated proteins and metabolites with the 5-KL and 20-L bioreactors.
..................................................................................................................................................... 124
Table 3- 1 The number of identified HCPs and peptides along with different length of LC
separation gradient. ..................................................................................................................... 174
Table 3- 2 The list of identified HCPs in the PA sample with at least 50 PSM counts. ............ 177
Table 3- 3 The list of identified HCPs in the CEX sample with at least 10 PSMs and their
corresponding PSM counts in the PA sample. ............................................................................ 179
Table 3- 4 The identified HCPs in the UF/DF sample and their PSM counts in the PA sample*.
..................................................................................................................................................... 183
Table 3- 5 Peptide pairs chosen from SpikeTide Set TAA, their identification and calibration
linear range against the post-ultrafiltration digested sample ...................................................... 186
Table 3- 6 Target peptides and quantitation results for peptides from several HCPs. ................ 189
xiv
Table 3- 7 The quantitative information of the selected HCPs of the two biological replicates. 190
xv
List of Abbreviations
%V %Viability
2-DE Two dimensional gel electrophoresis
ARSA Arylsulfatase A
BCA Bicinchoninic acid
CDACF Chemically defined, animal component-free
CEX Cation exchange chromatography
CHO Chinese hamster ovary
DDA Data dependent acquisition
DIA Data independent acquisition
DiART Deuterium isobaric amine-reactive tag
DiLeu N,N-dimethyl leucines
DTT Dithiothreitol
ESI Electrospray ionization
FT-ICR Fourier transform ion cyclotron resonance
GPx Glutathione peroxidase
GSH Reduced glutathione
HCPs Host cell proteins
HIC Hydrophobic interaction chromatography
HILIC Hydrophilic interaction chromatography
HPLC High performance liquid chromatography
IAM Iodoacetamide
IPA Ingenuity Pathway Analysis
xvi
iTRAQ Isobaric tag for relative and absolute quantitation
LC-MS/MS Liquid chromatography tandem mass spectrometry
Lys-C Lysyl endopeptidase
mAb Monoclonal antibody
PA Protein A
PAT Process analytical technology
PD Proteome Discoverer
pI Isoelectric point
PRM Parallel reaction monitoring
PSMs Peptide spectral matches
Qp Specific production rate
ROS Reactive oxygen species
RP Reversed phase
SEC Size exclusion chromatography
SOD1 Superoxide dismutase 1
TAA Tumor associated antigens
TEAB Triethylammonium bicarbonate
TMT Tandem mass tag
TOF Time of flight
VCD Viable cell density
1
Chapter 1: Overview of Therapeutic Protein Production by Chinese
Hamster Ovary Cells and Liquid Chromatography Mass
Spectrometry Based Quantitative Proteomics
2
1.1 Abstract
Development of recombinant therapeutic proteins has led to the significant revolution of
modern medicine. The success in biopharmaceutical production in large scale is the key to bring
sufficient amount of drugs into market, practically benefiting numerous patients. Chinese hamster
ovary (CHO) cell lines have become the predominant choice as the production host of therapeutic
proteins in the biopharmaceutical industry. Numerous efforts have been made to increase the
therapeutic protein productivity through optimization of industrial upstream and downstream
processes. Despite significant advances achieved, many challenges remain. Currently the
bioprocess development is still empirical, laborious, and time-consuming due to the limited
understanding of CHO biology. Cell cultivation characterized with small-scales often does not
behave in the similar ways in the production large scales, hindering prediction of productivity in
the production scale bioprocesses. Meanwhile, the evaluation of impurity residue at low abundance
in the final product, especially host cell proteins (HCPs), requires detection methods with high
sensitivity and wide dynamic ranges, which cannot be achieved by traditional approaches. New
instrumentation and bioinformatics tools have been developed rapidly with the power to study and
analyze proteins and proteome, providing powerful tools to address challenges in
biopharmaceutical industry. In this chapter, the CHO-based therapeutic protein production
including the principle and the industrial manufacture processes are reviewed first, as well as some
advances and challenges to date. Then the quantitative protein and proteomics techniques based
on liquid chromatography coupled with mass spectrometry (LC-MS) and related quantitation
approaches are demonstrated.
3
1.2 Overview of recombinant therapeutic protein production
The revolution in modern medicine led by recombinant therapeutic protein products has
emerged over the past several decades. The first recombinant therapeutic protein drug, human
insulin from Eli Lilly, was approved for clinical use in 1982, marking the beginning of the success
of biopharmaceuticals (1). By 2014, over 200 biopharmaceuticals have been approved in the
United States and European Union and commercially available with more than 100 billion dollars,
and growing, as the estimated annual revenue (1, 2). Recombinant therapeutic protein products
include monoclonal antibodies, recombinant fusion proteins, cytokines, hormones, and blood-
products (3). To the date, these products are providing effective therapies to a large range of
previously refractory human diseases such as cancers and immunological disorders. Notably,
among the biopharmaceuticals approved in recent years, the fraction of monoclonal antibodies
(mAb) and related proteins (e.g., Fc fusion proteins) is steadily increasing, reaching around 50%
of the overall biopharmaceutical approvals in the 2010-2014 time period (1).
Recombinant therapeutic proteins are produced by cell hosts, which are genetically
engineered with recombinant DNA encoding the drug product. Therapeutic proteins need to be
synthesized in their biologically active forms to be effective therapies, requiring correct protein
folding (higher order structure) and post-translational modifications (PTMs). Mammalian cell
lines are competitive host candidates for certain products such as monoclonal antibodies (mAb)
because other hosts such as microbial host may be not capable of generating critical PTMs,
especially glycosylation. Among several choices of mammalian cell lines for recombinant
therapeutic protein production, Chinese hamster ovary (CHO) cells are the most widely used cell
lines and have become the workhorse of therapeutic protein production in industry (4). Since the
4
therapeutic drugs involved in this thesis are mAb and related proteins, the discussion in this chapter
will focus on such biopharmaceutics and CHO cell host expression systems.
The practical significance made by therapeutic proteins in the real world could not have
been achieved without the success of large-scale biopharmaceutical production. After the
production cell lines are selected and developed, bioprocess operation is optimized and then scaled
up to production bioreactors (kiloliter, KL). With large-scale bioprocess cultivation, large amounts
of the drug product can be harvested from the biomass, followed by downstream purification.
Finally, the high purity therapeutic protein products can enter market.
Improvements in productivity are critical to increase the availability to patients, which can
rely on genetic engineering of the cell host and optimization of manufacturing process. However,
such bioprocess development is generally empirical due to the limited understanding of the biology
of the CHO production of the product. At the same time, the evaluation of the impurity residue,
especially host cell proteins (HCPs), in the drug product is of great importance for drug quality
control, considering the potential safety risks of such impurities for patients. However, HCP
identification and quantitation are still challenging due to the fact that the HCPs can be at the low
part-per-million (ppm) level.
New instrumentation and bioinformatics tools have been applied to global cell biologics or
“omics” studies including MS-based quantitative proteomics, metabolomics, and genomics. As
powerful tools, these omics methodologies are able to provide comprehensive characterization of
CHO cells and the therapeutic protein product, potentially improving the understanding of
manufacturing process. Especially with proteomics, the development of high resolution mass
spectrometry (MS) as well as high efficiency liquid chromatography (LC) separation provides
versatile LC-MS based strategies for shotgun proteomics as well as targeted protein quantitation,
5
platforms which have not only high potential to study systems biology of CHO expression systems
but for HCP identification and quantitation.
This chapter will first briefly describe the principle of recombinant therapeutic protein
synthesis by mammalian cell lines, and then discuss the general platform of how
biopharmaceuticals turn into life-saving formulated drugs. Common upstream and downstream
processes will be discussed, since this information provides the big picture of which issues need
to be addressed and what strategies could be potential solutions. Current advances in LC-MS based
protein and proteomics analysis and multi-‘Omics study will be introduced as LC-MS based
methods can address many of the challenges in the biopharmaceutical industry.
1.3 Principle of recombinant biopharmaceutical synthesis by mammalian cell expression
systems.
Recombinant protein products, especially mAbs and related proteins, are generally
synthesized by mammalian cell lines. These hosts include hybridoma cell lines, mouse myelomas
cells, human embryonic kidney 293 cells (HEK-293), baby hamster kidney cells (BHK-21), and
Chinese hamster ovary cells (CHO) (5-8).
6
Figure 1- 1 The fundamental scheme of therapeutic protein production by mammalian cell lines.
DNA sequence encoding the protein drug of interest is cloned into the expression vectors, and
these vectors are the introduced into host cells and integrated into the genome. The transfected
host cells undergo protein synthesis through transcription, translation, and subsequent post-
translation modification. The resultant recombinant protein is secreted into the culture media.
7
Through several rounds of screening, clones which can express a relatively high amount of the
protein drug are selected as possible production cell line candidates.
The scheme of the fundamental principles for therapeutic protein production by
mammalian cell lines is shown in Figure 1-1. The DNA sequence encoding the protein drug of
interest is recombined into the expression gene vector, whose sequence is designed and optimized
for protein expression in host cells. The vector carrying the DNA sequence of interest is introduced
into the host cell and then integrated into the host genome. Then, the transfected host cell
undergoes protein synthesis. The synthesized recombinant therapeutic protein is transported
through the classical secretory pathway out of the cell. The protein product can then be harvested
from the cell culture media. Clones which can express a relatively high amount of protein with a
good cell growth profile, while generating the desired protein structure (including glycosylation),
are selected as the production cell line candidates through multiple rounds of selection screening.
The classical secretory pathway is illustrated in Figure 1-2. The genes encoding the protein
of interest are transcribed into mRNA and then translated into nascent peptides of the target protein.
The protein is designed to be expressed with a signal peptide at the N-terminus, which can be
recognized by the “signal recognition particle”, a protein-RNA complex, leading to translocation
of the nascent peptide into the endoplasmic reticulum (ER) (9). After the growing peptide chain
resides within the ER membrane, the signal peptide gets cleaved by the signal peptidase (9), and
the translation process continues. Within the ER, proteins undergo folding and PTM formation
such as disulfide bonding and glycosylation, and then are transported to Golgi apparatus (10). The
resultant proteins are encapsulated in a vesicle formed from the Golgi apparatus, and delivered to
the cell membrane. The vesicle fuses with the cell membrane, releasing the protein into the
extracellular space.
8
Figure 1- 2 The classic secretory pathway for recombinant therapeutic protein secretion.
The gene which encodes the protein of interest are transcribed into mRNA, and the translation
process starts in the ribosome. A short peptide, called signal peptide, presents at the N-terminus of
the nascent peptide of the therapeutic protein, which lead the nascent peptide towards the secretory
pathway. The signal peptide can be recognized by the signal recognition particle, resulting in
translocation of the nascent peptide into the ER. Then the translation of the therapeutic protein
continues in the ER, where the protein folding and PTM formations occur. The resultant proteins
are then transported to the Golgi apparatus, and passed into the extracellular space.
9
A major advantage of collecting protein products in the media instead of accumulating
them inside the host cells is to facilitate the subsequent purification. Biomass, including cells and
debris, can be removed simply by centrifugation and filtration. It is worth mentioning that the
media is desired to be protein-free or with a low-protein component to maximize the advantage of
secreting protein drugs. Mammalian cell lines generally require animal component-containing
media such as bovine serum for cell growth because of requisite of hormones and growth factors.
The potential contaminations from the animal component-containing media of the drug product
raise safety concerns since the protein drugs are secreted into the media. Clearly, the animal
component-free media is highly desired for commercial drug production, and the mammalian cell
lines which are able to be cultured with chemical media are primary choices.
1.4 CHO as a therapeutic protein production host.
The success of CHO-based therapeutic products started to increase after the first approved
therapeutic protein produced in CHO cell line in 1987, tissue plasminogen activator (r-tPA), from
Genentech (11). To the date, nearly 70% of all therapeutic proteins are produced in CHO cell lines
(4).
1.4.1 Advantages of CHO expression system for commercial recombinant therapeutic
protein production
CHO cells have several important features making them the most popular and widely used
production mammalian cell line in industry. CHO cells are easy to incorporate artificially
10
transfected genes and are able to express large amounts of the desirable protein. Secondly, as a
mammalian cell line, CHO cells can provide the appropriate glycoforms for the glycoprotein
therapeutics, which cannot be easily achieved by common microbial hosts such as Escherichia
coli. CHO cell lines have built up the record of production of proteins which are bioactive in human
with compatible glycoforms in the past three decades. Thirdly, CHO is not a susceptible host for
a large number of human pathogenic agents. A report in 1989 showed that CHO did not propagate
at least 44 human pathogenic viruses including HIV and polio (4). This fact means that these
pathogenic viruses that could infect patients are not within the CHO-based therapeutic products.
CHO cells also show advantage of high adaptability for industrial large-scale production.
Although the original CHO cells are adherent in culture, CHO cells are able to grow and develop
to high densities in suspension, leading to the possibility to scale up to thousand-liter bioreactors.
Also, as noted, CHO cells can grow in chemically defined, animal component-free (CDACF)
media. The complexity of chemically defined media is generally limited, benefiting the
downstream purification with fewer contaminants from the media to remove and to monitor.
A successful record, knowledge, and expertise about safety and efficiency of CHO-based
therapeutics in market have been accumulated over the past two decades. This information eases
FDA approval of new drugs made in CHO cell lines. In addition, the experience and understanding
of upstream CHO cell growth and downstream process purification ensure CHO cells as the likely
priority choice for industry production for the next several decades. Consequently, the study of
CHO biology is important to support the biopharmaceutical industry with productivity
improvements of current products and new drugs in the future. In Chapter 2, the underlying biology
of CHO cell lines has been studied at both the production and laboratory scales.
11
1.4.2 A brief history of CHO cell lines applied to biotech industry.
CHO cells were isolated from a female Chinese hamster ovary and first established in
culture plates by Dr. Theodore T. Puck in 1957 (12). After a period of time, cells underwent
spontaneous immortalization, likely due to a genetic change (13). These original CHO cells with
immortalization were then provided to several laboratories, and many strains of CHO were
generated from this original cell line and its following generation, for example, CHO-DXB11 and
CHO-DG44. CHO-K1 was one of the CHO-DXB11 derivatives later. The original CHO cells
could only grow in adherent culture, but many of their derivatives are able to grow in suspension
culture. To date, these strains are widely employed as parental cell lines in biopharmaceutical
industry (4, 13). Importantly, CHO cells were reported to be successfully grown in serum-free
media in as early as 1977 (14). In Chapter 2, the CHO production cell line under investigation is
CHO-DG44.
1.5 The general platform of therapeutic protein production by CHO cells
To bring the CHO cell lines into industrial production of a therapeutic protein, one needs
to 1) develop a production cell line in the lab and then 2) convert this cell line to the industrial
production of large scale. The latter part, industrial production platform, contains two major parts:
1) scale up of the bench-top cultivation to the large-scale bioreactors including the development
and optimization of the bioprocess operation, and 2) purify the protein product after harvest. The
large-scale cultivation is called the upstream process, and the purification is the downstream
process. One example of this platform applied to therapeutic antibodies and related drugs is shown
in Figure 1-3. All of the listed steps play a critical role in the success of protein drug production
12
with high quantity and quality within reasonable cost and time. Current knowledge and
development of each step will be introduced in the following sections, as well as the remaining
challenges.
13
Figure 1- 3 The platform of therapeutic protein production by CHO.
A. Production cell line development at the bench-top. B. Therapeutic protein production with
industrial large-scale, shown as the example for therapeutic mAbs and related proteins such as Fc
fusion proteins (15). Reprinted with permission from Shukla et al. (15). (a) Upstream bioprocess.
After the cells are taken from cell banks and thawed, a series of expansion steps are performed
14
with seed bioreactors, and then cells are transferred to the production bioreactor. The biomass is
removed from the product by centrifugation and filtration. (b) Downstream purification. For mAbs
and related proteins, Protein A chromatography is the first step to clean up most of the impurities.
At least one step of other polishing chromatography such as ion exchange will eliminate the
impurities further. The ultrafiltration/diafiltration (UF/DF) is applied to transfer the product into
the formulation buffer and desired concentration.
1.6 Industrial platform of therapeutic protein production by CHO cells
With the success of biopharmaceuticals in modern medicine, the large patient population,
and the general high doses, especially for mAbs and related proteins, there is a need for a very
large amount of product with consistent and reproducible quality. The biopharmaceutical industry
is under pressure to bring sufficient product to market at lower cost to payers. Thus, the
development of a high-yielding, scalable and robust biopharmaceutical production process is
always a significant focus in industry. Such demand must be achieved with both upstream
processing with high titer for large manufacturing scale and downstream processing with efficient
purification of drug substances. In the following section, the general design of up- and down-
stream processes will be described, focusing on mAbs and related protein production by CHO cells.
1.6.1 The platform of the upstream process
The understanding of the upstream process is of great importance to study CHO growth
profile and underlying biology. The engineering design determines the cell cultivation
environment, and this information will provide hints of potential sources of cell growth stresses
and the responsible causes. In Chapter 2, the CHO cells under investigation were cultured in stirred
tank bioreactors with fed-batch mode, both of which topics will be introduced in this section.
15
The upstream process has seen significant advances. The production bioreactor has been
scaled up to as large as 200,000 L to the date (16), and the productivity has improved from 0.05 to
2-10 g/L (3). The general workflow of the upstream process is shown in Figure 1-3B(a). The
upstream process development and optimization generally focuses on 1) designing the production
bioreactor configuration and 2) optimizing the bioprocess control such as pH, temperature,
medium and feeding. In the upstream process, the bioreactor design and manufacturing control
determines the culture conditions such as nutrition supplies and dissolved oxygen concentrations,
which are critical for product quality and quantity.
Figure 1- 4 Several examples of bioreactor types for therapeutic protein production.
A. Stirred tank bioreactor, B. Airlift bioreactor, C. Disposable wave reactor (bench-top scale).
Reprinted with permission from Jain et al. (17).
For the bioreactor, the ideal design process involves several factors: 1) sufficient mass
transfer; 2) adequate oxygen supply; 3) low shear stress (18). One of the most popular bioreactor
designs for suspension culture is stirred tank, widely used for biopharmaceutical production. As
shown in Figure 1-4A, the impeller blades stir to mix oxygen and nutrients within the culture
16
medium inside the bioreactor. The agitation rate is optimized as well as the shape and diameter of
the impeller blades in order to reach acceptable mass and gas transfer and to minimize cell lysis
caused by turbulence. This stirred-tank bioreactor can be scaled up conveniently, and the product
quality can be controlled relatively easily. It is one of the most important reactor designs in
industrial production.
The airlift bioreactor (Figure 1-4B) is another large-scale reactor design compatible for
suspension culture, in which the mass transfer and oxygen mixing is achieved by introducing gas
bubbles moving through the bioreactor. Gas (air or other gas mixture) is introduced into a part of
the reactor, and a non-gassed circulating flow is generated in the other region of the reactor (Figure
1-4B). The geometric design of the reactor and the operational parameters can be optimized to
increase mass transfer efficiency and to reduce shear stress (19). Without the mixing blade used in
the stirred-tank reactor, airlift bioreactor has less shear stress and more energy efficiency.
Disposable bioreactors are employed not only in the small scale production but also the
thousand-liter scale. Compared to traditional hard-piped bioreactor configurations, this single-use
bioreactor system has advantages including less capital investment cost, more flexibility, higher
process replication, and lower risk of cross-contamination without the need for cleaning and
sterilization (17, 20). The economic benefits have encouraged the usage of disposable bioreactors
in biopharmaceutical industry, and several vendors has provided commercial disposable stirred-
tank bioreactors up to the 1,000 L scale (21). The trend is adoption of the disposable bioreactor
and it likely will be the future of biopharmaceutical production. There are several designs for
disposable bioreactors including wave bioreactors, orbital shaken bioreactors, stirred-tank
bioreactors (17, 20, 22), see Figure 1-4C. A working scale of up to 2000-L has been reported (21).
Despite the current advances, challenges remain, such as the limited scalability, restricted design
17
options, and the lack of standardization (22). Moreover, the extractables from the disposable
bioreactor material may effect of cell growth performance and drug quality.
Besides the bioreactor design, the bioprocess operation is also critical for cell growth and
drug productivity, including growth media composition, feeding, pH, temperature, etc. The feeding
methodology is one of the most important factors for optimization of the process. Ideally, the
nutrients should be at sufficient level for cell growth without affecting the product quality, and the
accumulation of metabolic waste is minimized as much as possible. Other factors such as pH
should also be kept at suitable levels for cell growth. The accumulation of lactate and ammonia
has been widely reported to impair cell growth performance (17, 23, 24). Noticeably, the
glycosylation pattern of the glycoprotein product can be affected by nutrient starvation, media
components, metabolic waste accumulation, and pH (25-29), potentially disturbing the quality of
drug product.
Fed-batch and perfusion are the most popular feeding modes currently applied in industry
(8) (Figure 1-5). In fed-batch (Figure 1-5A), nutrients necessary for the culture are added
intermittently or continuously into the bioreactor during the cultivation time. The drug product is
usually harvested at the end of the operation, resulting in a high concentration of product in the
medium. It is also flexible to adapt for use with different clones. The nutrients can be maintained
at certain levels, and the pH of the media can be under control, but the waste will accumulate in
the reactor (17, 23, 30). In perfusion mode (Figure 1-5B), on the other hand, fresh media is added,
and the cell-free spent media is removed from the bioreactor (31). This process can decrease waste
accumulation and keep the nutrient at the desired level. Cultivation time can last much longer than
the fed-batch. However, lower cell viability is observed though with high cell density due to the
accumulation of dead cells and intracellular biomass released (8). A procedure called “bleeding”,
18
which involves removing cell-containing media through small flow, is necessary to decrease the
cell death rate and increase viability. Viable cell density, however, decreases with “bleeding”
because of unavoidable removal of viable cells during the process (17).
Figure 1- 5 Two popular bioreactor feeding modes in biopharmaceutical industry.
A. Fed-batch. B. Perfusion. Modified and reprinted with permission from Birch et al. (24).
1.6.2 Challenges of upstream process
Cell cultivation conditions defined by both bioreactor design and operation control are
crucial in achieving high productivity. The expensive facilities of large scale bioreactors would
not be easy to rebuild or to readjust significantly without high cost. As a result, practically
bioprocess operation optimization becomes the key factor to improve culture performance with
existing bioreactor configurations.
19
For bioprocess development, cell culture is first characterized at the laboratory scale and
then scaled up to a large production bioreactor. The first notable challenge is that the process
development is to some extent empirical. Currently, the understanding of CHO biology is limited,
hindering the prediction of cell responses to environmental perturbations. Consequently,
bioprocess optimization requires extensive experimentation, which is laborious and time
consuming. To resolve this low resource- and time-efficiency issue, small volume reactors at the
liter-scale are widely employed for bioprocess development because of its easy handling and low
cost (32). Process development has even been driven to smaller, milliliter-, scale combined with
robotics technology to reach high throughput (33, 34).
Using these approaches of bioprocess development with small scale leads to another
significant challenge of upstream process, scalability. Clones developed in bench-top bioreactors
may not behave in a similar way after scaling up to large-scale bioreactors with seemingly identical
parameters, hindering prediction of productivity in the production-scale bioprocess. Especially, it
is known that productivity is often lower in large (kL) relative to small (L) reactor scales (17, 35,
36). The main reason is that, due to the restriction of bioreactor physical design, not all of the
variables in the bioprocess can be maintained simultaneously during the scale-up from a liter-level
scale to a thousand-liter bioreactor. Another reason is that some variables, important to influence
culture performance, are not readily measurable, hindering efforts of their evaluation and control.
For example, the constant mixing time in the reactor is a parameter which usually cannot be used
as a scaling criterion. Mixing time is an overarching indication of mass transfer and gas mixing
efficiency, but it is not convenient to measure. Another issue is that to keep the same mixing time
of the small scale, specific power input for the large scale can be unpractically high (35, 37).
20
Specifically, CHO cells do not have a protective cell wall, and agitation in the bioreactor
must be controlled within a certain range to prevent cell damage (38). Thus, in large scale of CHO
cultivation, the relatively low agitation inevitably leads to limited mass transfer and gas-phase
mixing, resulting in substrate gradients (35, 38-40), which would impact the cell growth
performance and product quality. Oxygen transfer efficiency is one of the critical factors in the
scale-up process. Oxygen generally enters into the bioreactor as sparging bubbles. With the limited
solubility of the gas in the aqueous solution, low amounts of oxygen can be dissolved in the culture
media. Meanwhile, cells consume oxygen rapidly before it can be dispersed across the entire
culture (39). Consequently, homogeneity of dissolved oxygen is generally compromised in large
scale bioreactors. In Chapter 2, different CHO cell growth profiles and behaviors during scale-up
were investigated, and the limited homogeneity of dissolved oxygen is shown to be the cause of
observed difference relative to the lab scale process.
Despite significant advances that have been achieved for the upstream process, there are
still many challenges that remain. Because the underlying biology is not absolutely clear, it is not
easy to predict or model the cell growth characteristics and production, especially in large-scale
bioreactors. The understanding of how bioprocess operation can influence product quality (e.g.
glycosylation patterns) is also limited. Additional optimization is also required with monitoring
the quality of the drug product. All these approaches must be performed for every new drug
product.
21
1.6.3 The platform of downstream process
After the upstream cultivation, drug product carries along undesired impurities which can
be product-related and/or process-related. The product-related impurities are species that are
product variants with properties different from the desired product, such as degraded or aggregated
protein drug or molecules with undesired PTMs or misfoldings (41). The product-related
impurities are defined as those introduced from bioprocess manufacturing. They include host cell
proteins (HCPs), DNA/RNA, lipids, and small molecule chemicals from media and host cells, and
leachables (e.g. Protein A) (41). The impurities co-purified with drug product can have potential
safety risks for patients. Downstream processing has as its goal the removal of these impurities as
much as possible with high recovery of the protein product while maintaining activity. The
introduction of the downstream process in this section will focus on the purification of mAbs and
related proteins. The downstream purification of such proteins has evolved towards a common
platform in industry due to the general nature of this type of drug product. One example of
downstream purification is shown in Figure 1-3B (b). However, because of the various properties
of different products, there is no universal template or procedure which can be employed directly.
mAbs and related protein products are secreted into the media, and hence, the removal of
cells and cell debris is the first step of purification, which is also called cell culture harvest. For
large scale production, centrifugation is employed because it is economical and easily scalable for
large volumes (42). Then, depth filtration, which process utilized a several layers of porous
material as filtration medium to trap the particles, follows centrifugation to remove any residual
cellular debris. A series of capture and polishing steps are performed to eliminate other impurities.
This part of the purification process can be categorized into two groups, 1) chromatography-based
and 2) non-chromatographic, both of which will be introduced in detail later. Additionally, drug
22
product purification also requires the removal of potential viruses, especially for the mammalian
cell host, so typically at least two orthogonal steps of viral inactivation and size-based filtration
are needed in the downstream process (15, 24). To accomplish the downstream process, the last
step, ultrafiltration/diafiltration (UF/DF), is employed to reduce the storage volumes and to bring
the drug product into the formulation buffer through buffer exchange (15). A membrane with a
specific molecular cutoff is used, and the transmembrane pressure and cross-flow rate during the
filtration process controlled for most of mAb and related proteins (42).
Chromatography-based approaches have high purification efficiency and have been widely
used in the biopharmaceutical industry. In the majority cases, Protein A affinity chromatography
is used at the beginning as the capture step, and then followed by at least one ion exchange step
and then hydrophobic interaction (HIC) or size exclusion chromatography (SEX) as polishing
steps. An example of downstream purification based on chromatography which is widely adopted
by biopharmaceutical industry is shown in Figure 1-6.
Figure 1- 6 The simplified purification platform based on chromatography for mAb and related
proteins such as Fc fusion proteins for downstream process.
The purification of mAb and related proteins usually start with Protein A affinity chromatography,
in which step most of the impurities can be removed. Then one or several other chromatography
steps such as ion exchange, hydrophobic interaction (HIC), or size exclusion chromatography
(SEX) are applied, called polishing steps, to purify drug product further. At last,
ultrafiltration/diafiltration (UF/DF) step is employed to reduce the storage volumes and to bring
the drug product into the formulation buffer through buffer exchange.
23
Protein A binds to the Fc region of IgG molecules specifically with an affinity constant of
about 108 (M-1) (43). The drug product can be captured by Protein A, and many impurities will
elute through the column. Commonly, this Protein A capture step can obtain drug product with
more than 98% of purity (15, 44). The residual HCPs and DNA, protein drug aggregates, and the
low molecular weight contaminants such as leached Protein A fragments are generally removed
by the following polishing steps. The choice of which chromatography depends on the nature of
the protein drug and the impurities. The detailed procedure for each step requires optimization for
different drug products.
The chromatography-based approaches are generally expensive. To pursue cost efficiency,
non-chromatographic purification methods have been recently developed. Although a number of
such techniques have been reported, these approaches have not been widely employed in industry.
Aqueous two-phase extraction (ATPE) has been investigated to separate the product from the
biomass (45). The advantages include scalability and high capacity. It is also easy to perform
compared with Protein A chromatography. However, due to the limited understanding of the
underlying mechanism of ATPE and the involvement of complex interactions of multiple species,
the ATPE design and process optimization are difficult to optimize (20, 45). Precipitation of the
proteins of interest is another approach. It has been applied in laboratory-scale protein purification
and is promising for large scale. However, the selectivity needs improvement for this technique.
The approaches of co-purification with charged polymers have also been reported to improve the
selectivity and precipitation efficiency (15, 20). Other techniques such as crystallization and
charged UF membranes have also been reported.
24
1.6.4 Challenges of downstream process
Biopharmaceutical industry has been providing safe drugs to patients for decades,
indicating the success in eliminating toxic impurities by the downstream purification process.
However, several challenges remain. First, the capacity of downstream purification is becoming a
bottleneck of therapeutic protein production. Higher and higher titer values being reached by
advances in the upstream process increase the burden of downstream purification dramatically.
The operation optimization of upstream process such as medium adjustment or pH/temperature
control can improve productivity without significantly raising the cost. However, the downstream
capacity cannot increase significantly by optimizing purification protocols, especially with
chromatography-based approaches. As a result, the cost of the downstream process would scale at
least linearity with the increased titer values. Moreover, the possible change of impurity profiles
due to the upstream optimization may require updating the existing purification strategies,
increasing time and resource input for the downstream process. To date, the downstream process
has taken a significant proportion of the total therapeutic protein manufacturing cost, as much as
50%-80% (46).
The other difficulty is detecting and monitoring the impurity species remaining in the
products, especially HCPs. After several purification steps, impurities in the final product are at
very low levels. For example, HCPs can be at the ppm level and DNA at the ppb level for the final
drug product (24). Identification and quantitation of the impurities are challenging within the high
background of the therapeutic protein. Getting this information about impurities is of importance
for risk assessment. Moreover, being able to monitoring the impurity levels along each purification
step allows the knowledge-based optimization of the downstream process and leads to efficient
removal of critical impurities. Currently, residual DNA impurities can be detected and quantified
25
with PCR related techniques such as real-time PCR (47, 48). The detection and quantitation of
HCPs will be discussed in the next section.
Notably, although the impurities are required to be removed as much as possible, there is
no universal standard of the maximum allowable levels of impurities remaining in the final product
from a regulatory perspective. Each drug is examined and evaluated for risk assessment on a case-
by-case basis according to the patient population, dose, and route (41). This fact reflects the
complexity of the characterization and risk assessment of the therapeutic protein products.
1.7 Current advances for CHO-based therapeutic protein production.
1.7.1 Understanding CHO cell production and CHO cell engineering through ‘Omics
approaches
High titer values of protein drug are not only related to high specific productivity, but also
directly affected by the viable cell density within the cultivation time. For the upstream process,
cell line development and manufacturing operation optimization remain empirical, one of the
major reasons being the limited understanding of the underlying biology of CHO host. Currently,
the results of the study of CHO metabolic pathways have benefited the long-term goal of increasing
the productivity and developing new protein drug products. The ‘Omics studies, including
genomics, transcriptomics, proteomics, glycomics, fluxomics, and secretomics, have been proven
as powerful tools to understand CHO biology and to provide valuable information for the
biopharmaceutical industry. Cell engineering based on the critical metabolic pathways related to
protein expression and cell fate helps increase the quality and quantity of the drug product.
26
Figure 1- 7 Information flow in cells and the connection between ‘Omics.
The sequencing of CHO genome by Xu et al. in 2011 was a breakthrough in CHO systems
biology (49). In this genomic sequence database, CHO-K1 cell line was annotated with 24,383
genes (49). It revealed that CHO-K1 does not express many viral entry genes, explaining why
CHO cells resist many virus infections (49). Since then, the Chinese hamster genome and six CHO
cell lines, which are derived from CHO-K1, CHO-DG44, and CHO-S, were sequenced (50). With
this advanced genome information, other ‘Omics such as transcriptomics, proteomics and
metabolomics can provide more detailed global information on mRNAs, proteins, and metabolites
with better accuracy, and offer more precise information for the understanding of the industrially
relevant cell lines. It would not only be useful to discover key metabolic pathways but also to
27
reveal gene targets for cell line engineering and protein biomarkers for cell status evaluation and
bioprocess monitoring. The connection of multi-‘Omics is shown in Figure 1-7. It is also a
reflection of information flow in cells.
Transcriptomics is the analysis of mRNA expression levels, i.e., the genomic information
at the transcription level (51). Technologies including DNA microarray, RNA sequencing (RNA-
Seq), and microRNA (miRNA) profiling are employed for transcriptomic analysis (52). Depending
on the cell type, CHO cells can turn on or off their own sets of genes, leading to a cell-specific
gene expression pattern. Thus, different CHO cell lines can yield different transcriptomes as well
as other ‘Omics information even if they share similar genomes (3). In proteomics, the entire
complement of expressed proteins and/or their expression levels are analyzed (53). Not all protein
expression levels are necessarily directly correlated to the corresponding mRNA levels (3). Since
proteins are direct executors in the biological system, the change of protein expression profile can
reveal the disturbance of metabolic pathway, providing the first-hand information of CHO cell
growth status. The secretome is a subset of the proteome. The secreted proteins can regulate the
interactions of cell-to-cell or cell-to-intracellular matrix and may affect the cell growth behavior
(54). Mass spectrometry-based proteomic analysis coupled with two-dimensional liquid
chromatography and/or gel electrophoresis has been developed as one of the most powerful tool
for proteomics analysis. Glycomics involves characterization of glycan structure of the CHO
system, including the protein glycosylation pattern. The glycosylation patterns affect protein
functions, resulting in disturbance of protein drug quality.
In the biopharmaceutical industry, all of these ‘Omics studies of CHO eventually aim at
increasing productivity and/or quality of the protein product. The deep understanding obtained
from these studies can guide strategies of cell engineering by recognizing biomarkers for desired
28
phenotypes. For example, Doolan et al. reported the transcriptomic study on ten CHO-K1 cell lines
with a range of growth rates, which are derived from a single parent cell line (55). They reported
that the high growth rate was a multi-gene effect, involving several cellular processes including
upregulation of DNA replication and mitosis and downregulation of cell proliferation (55). The
regulation of relevant genes ALDH7A1 and CBX5 agreed with previous studies (55, 56),
indicating that they could be potential biomarkers for high growth rate. Moreover, the
understanding of cellular response to perturbations in the environment can improve bioprocess
operation control. One example is a study from Bristol-Myers Squibb by using nuclear magnetic
resonance (NMR) to investigate the metabolome of CHO production cell lines cultured in
production scale (5-KL) and benchtop scale (7-L) bioreactors (36). In this study, with the same
media and bioprocess operation parameters, the CHO cells showed higher viability and
productivity in the small bioreactor, and 30 metabolites were determined to be related, leading to
a high reliance on glycolysis (36). It, for the first time, revealed the potential underlying biology
changes during scalability to a production scale.
Besides the analysis of each of the ‘Omics analysis, a new trend of using combined ‘Omics
techniques to obtain comprehensive information has emerged. The combined global transcriptome,
targeted metabolome analysis, and targeted protein analysis have been applied to study the
erythropoietin production in CHO-K1 cells under different growth conditions, suggesting the
bottleneck of heterologous proteins production is in energy metabolism (57).
In Chapter 2, the combined global proteomics and metabolomics were employed to
investigate the causes of different phenotypes shown by a CHO cell line in two different scales of
bioreactors. Targeted transcription and western blotting analysis were used to support the
hypothesis. It is an example of the power of ‘Omics study to improve the CHO biology.
29
1.7.2 Current approaches of host cell protein identification and quantitation
As mentioned above, despite a series of purification steps, the protein drugs are still
inevitably co-purified with some impurities from the host cell mass and/or the media. Optimization
of the downstream process relies on the analysis of the residual impurities as the drug product is
being purified in individual steps.
HCPs are a major class of impurities. They have been identified as a critical quality
attribute (CQA), which means that they are considered to affect patient safety (41). Residual HCPs
can be potentially immunogenic and toxic, may block the active sites of drug product and even
have proteolytic activity. HCPs which are similar to human proteins are also of concern because
they may trigger autoimmune responses in the human body by causing cross-reactivity with human
proteins (58, 59). Considering HCPs inducing or resulting in potential safety risks to patients
and/or deactivating the drug product (44, 60-62), their identity and quantity is of great importance.
Moreover, the presence of HCPs plays a critical role for therapeutic protein approval by regulatory
agencies. The suspension of two clinical trials at Phase III for IB 1001, a recombinant factor IX,
resulted from the HCP content because of concern for drug safety (63). With biosimilar
therapeutics emerging rapidly, the information of HCP presence can be one of the critical factors
requiring attention. The identification and quantitation of HCPs in the drug product is therefore of
growing interest.
HCPs are a complex group of proteins from the host cells. They have significantly different
properties such as a wide range of hydrophobicities, molecular masses, and isoelectric points (pI).
The other challenge of HCP detection is the wide dynamic range, which requires approaches with
30
high sensitivity and selectivity to detect trace levels of HCPs in the presence of the high therapeutic
protein background. Therefore, HCP analysis requires (i) high dynamic range in order to detect
HCPs at less than 10 ppm, and even down to 1 ppm, in the bulk product background; (ii)
comprehensive identification of all HCPs with high confidence; and (iii) accurate quantitation
without bias (iv) high-throughput and short analysis time. It would be ideal that the method can
also be flexible in transferring from one therapeutic protein product to another. Such approaches
are needed not only to ensure the final drug product quality but also to provide information for
downstream process optimization, potentially reducing the cost of drug production. However,
current methods cannot reach all of these desired goals.
Currently, the conventional method to determine the overall level of HCPs is enzyme
linked immunosorbent assay (ELISA), which is considered the gold standard of HCP detection in
the biopharmaceutical industry. This method is very sensitive and able to detect ppm levels of
HCPs (1 ppm – 100 ppm). The polyclonal antibodies for ELISA are generated by using the null
cell line as an HCP pool. Combined with ELISA used as a quantitative approach, two dimensional
gel electrophoresis (2-DE) as well as western blotting, are widely used to detect HCPs, especially
to compare HCP changes at various stages of purification or with different protocols of a certain
purification stage.
However, there are some disadvantages of these conventional methods. The efficiency and
accuracy of immunospecific methods, ELISA and western blot, depend on the polyclonal
antibodies used for HCP detection. Certain HCPs have low immunogenicity in the host animals
used to generate the polyclonal antibodies, and hence ELISA and western blotting could
underestimate or even miss certain species. Also, the immune response between humans and
animals can be different. Moreover, each set of polyclonal antibodies only correlates with the HCP
31
pool that is used to raise the antibodies, which means that the assays are not interchangeable. As a
result, the generic assays developed from general CHO proteome can provide inaccurate results.
To reach more accurate results, each protein drug should have its own customized immunospecific
assay.
ELISA cannot provide identification or distribution of individual HCPs (60). Since ELISA
reflects the sum of the signal responses of a range of HCPs, its overall sensitivity is higher than
that of western blot, which distributes the signal to a number of protein bands. On the other hand,
western blot as well as other non-immunospecific detection methods based on gels can provide
HCP distributions. 2-DE (2-dimensional electrophoresis) coupled with colorimetric or fluorescent
staining is semi-quantitative with a limited dynamic range. This method can reveal the distribution
of HCPs but cannot provide identity of individual species (64, 65). 2-DE followed by mass
spectrometry (MS) analysis of protein spots provides extension of sensitivity and identification
information (66), but it still cannot not completely meet the requirement of >105 dynamic range.
Moreover, the gel-based protocols are generally laborious and time consuming with a limited
throughput format. Capillary isoelectric focusing coupled with tandem mass spectrometer online
by electrospray ionization (cIEF-ESI-MS/MS) has also reported as for HCP analysis (67).
Liquid chromatography (LC) is advantageous for HCP analysis. LC separation provides
many choices based on orthogonal separation mechanisms, and it is flexible in terms of sample
amount for handling. When coupled with mass spectrometry, it can yield a wide dynamic range
and provide comprehensive information. It has been reported that two dimensional LC/MSE (see
section 1.8.2 Mass spectrometry Data acquisition) was employed for HCP identification and
quantitation (68-70), and multiple reaction monitoring (MRM) was employed for accurate
quantitation as a targeted approach (68). However, the total run time for 2D-LC/MS method may
32
hinder its application because the 10 to 12 hour-test per sample is not practical when one requires
quick response for decision making, which is an important part of process development and control
(41).
Despite the current drawback, the development of qualitative and quantitative protein
analysis based on liquid chromatography mass spectrometry (LC-MS) is still an effective platform
for HCP analysis. Its versatility and flexibility provide a real possibility to reach high sensitivity
and selectivity within a short analysis time. LC-MS has resulted in an increased depth of proteomic
profiling and a large number of protein identities (71). It is promising to explore optimization of
the approach into a rapid and efficient workflow for comprehensive and accurate HCP analysis.
Because of the particular concern about what and how much HCPs are present in the final
drug product, several approaches based on LC-MS for HCP analysis have been examined in this
thesis, and a general workflow for HCP identification and quantitation based on LC-MS/MS is
described in Chapter 3.
1.8 Introduction of liquid chromatography mass spectrometry-based quantitative
proteomics and protein analysis
Investigation of relative CHO protein expression levels, i.e., quantitative proteomic
analysis, provides valuable information for understanding the CHO systems biology with a variety
of phenotypic changes under environmental perturbations. The samples for proteomic study are
generally ones with high complexity and a wide dynamic range. LC-MS/MS based approaches
provide a powerful tool for shotgun quantitative proteomics, and the method has become widely
used to study complex proteomes. Because of the complex and dynamic nature of proteomes, a
33
wide range of proteomic strategies have emerged to address a variety of biological questions. The
general workflow of LC-MS/MS based shotgun quantitative proteomics is shown in Figure 1-8.
Figure 1- 8 The general workflow of proteomics analysis based on liquid chromatography coupled
with tandem mass spectrometry (LC-MS/MS).
After enzymatic digestion of the proteome, the resultant peptides are separated by
multidimensional LC and analyzed by high resolution/high mass accuracy mass spectrometry. The
MS raw data are then analyzed by the bioinformatics tools with protein identification and/or
quantitation information and interpreted into biology mechanisms.
In the general workflow of proteomic study based on LC-MS/MS, the proteomic sample is
first digested by enzymes, typically trypsin and/or lysyl endopeptidase (Lys-C). The resultant
peptide mixture is separated by multi-dimensional LC and then analyzed by mass spectrometry
on-line with electrospray ionization (ESI). The multidimensional LC separation, which has high
separation power, is employed to extend the dynamic range of detection. The multi-dimensional
LC separation can be on-line or off-line, depending on the instrumentation configuration and the
34
specific analysis requirements. With high resolution/high mass accuracy tandem mass
spectrometry, the m/z values and the intensities of the peptide precursor ions and their fragment
ions are both collected, and the raw data are searched with a protein sequence database for protein
identification. The high resolution/high mass accuracy MS approach provides large numbers of
MS1 and MS2 information on the peptide species, followed by bioinformatics analysis and data
interpretation. The quantitative information can also be obtained through MS analysis, which may
require labeling or internal standards spiked-in (72). Moreover, LC-MS based proteomic study
platform is suitable for protein analysis with high-throughput, which can be adapted for HCP
analysis, as discussed in Chapter 3.
Systems biology study of CHO cell lines is valuable and meaningful for improvement of
therapeutic protein production and new drug development in the biopharmaceutical industry. LC-
MS/MS based quantitative proteomics is a powerful tool to discover critical metabolic pathways,
potentially benefitting industrial drug production. In this section, the introduction of LC-MS/MS
based quantitation proteomics technology will be presented from multidimensional LC separation,
to MS instrumentation. Then, current quantitation approaches for LC-MS will be discussed.
1.8.1 Two dimensional liquid chromatography
High separation power from LC is essential for LC-MS/MS based strategies, especially
when one aims to analyze proteomic samples with high complexity. Sufficient LC separation of
the analyte species can overcome the adverse effect caused by ion suppression, which is a
particular concern for mass spectrometry analysis. Electrospray ionization efficiency of the
analytes is affected by the environment in which ionization is occurring, especially the presence
35
of other molecules. With a complex sample containing a large number of peptides, insufficient LC
separation can result in co-elution of many peptide species, inducing low ionization rate for certain
peptides, especially those that are hydrophobic and/or with low abundance. Non-charged species
cannot reach and/or cannot be detected by the mass analyzer. As a result, the MS run suffering
from ion suppression will encounter signal loss, poor reproducibility, and compromised sensitivity.
These adverse effects caused by low ionization efficiency cannot be corrected or reduced by
modifying the MS data acquisition strategy regardless of the sensitivity and selectivity of the MS
instrumentation. Even with the targeted MS methods, where only target ions are selected and
enriched, such as in selected reaction monitoring (SRM), the quantitation results could be impacted
because of ion suppression (73). Moreover, too many co-eluting species can cause undersampling
of MS since the scan speed of the mass analyzer is limited. Therefore, LC separation with high
separation power is highly desirable for complex sample analysis.
LC with extensive separation power by means of multi-dimensional separation, is widely
employed for LC-MS/MS based proteomic analysis. Multi-dimensional LC achieves high
resolving power by combining two or more orthogonal chromatography methods (74). The most
popular approach is two dimensional liquid chromatography (2D-LC).
There are several widely used 2D-LC strategies for peptide separation of proteomic
samples. The first dimension separation can be cation exchange chromatography (SCX), size
exclusion chromatography (SEC), hydrophilic interaction chromatography (HILIC), or reversed
phase (RP) chromatography. The second dimension separation, on the other hand, is generally RP
LC (75) due to its high resolution power and compatibility of coupling with MS online. SCX/RP
separation is one of the earliest reported 2D-LC separation strategies (76). In SCX/RP, the peptide
mixture is first separated based on charge differences by SCX, and the eluate is then further
36
separated according to the peptide hydrophobicity by RP chromatography (77). However, since
most of the peptides are charged 2+ or 3+, they are eluted from the SCX column in a narrow elution
window, impacting the overall separation power of SCX/RP system. SCX/RP combination was
reported to have a reduced practical resolution power (78).
On the other hand, high pH/low pH RP/RP has been shown to provide the best practical
separation power compared to other 2D-LC combinations (75). Due to the various pKa values of
amino acid residues of the peptides, the change of pH alters the peptide charge states and thereby
their hydrophobicity indexes. As a result, the separation selectivity can be significantly different
for the first (high pH) in comparison to the second (low pH) dimension separation, leading to high
overall resolution power, especially when a wide pH gap exists between these two separations (e.g.
pH~10 for high pH and pH~2 for low pH) (79). Because of the advantages of high pH/low pH
RP/PR separation, it has been used for the CHO cell lysate analysis in Chapter 2.
1.8.2 Mass spectrometry
High resolution mass analyzers
Significant advances of mass spectrometry (MS) instrumentation have been achieved over
the past several decades. To date, it has become a powerful tool which can provide comprehensive
information for proteomic studies. MS allows the analysis within a reasonable time of thousands
of ionic species (peptides) within a wide dynamic range with high resolution, mass accuracy, and
sensitivity (80, 81). The instrument used in the experiments of Chapter 2 and Chapter 3 is a hybrid
quadrupole-Oribtrap, Q Exactive (Thermo Fisher Scientific). Consequently, this section will focus
37
on the MS analyzer type that Orbitrap belongs to, the mass analyzers with very high resolution and
high mass accuracy.
Mass analyzers with the highest mass accuracy (< 5 ppm, even smaller with preferable
conditions) to date are time-of-flight (TOF), Fourier transform ion cyclotron resonance (FT-ICR),
and Orbitrap (82). In TOF mass analyzers, analyte ions are accelerated by an electric field to gain
a specific kinetic energy, and their m/z values are determined by the time that it takes for the ions
to fly in the vacuum flight tube to the detector. Theoretically, higher resolution can be achieved by
increasing the ion flight path length, but the instrument size is restricted by the lab setting, and the
sensitivity could be compromised because of ion losses during ion transfer between orthogonal
TOF or at the detector (83, 84). FT-ICR and Orbitrap mass analyzers both collect time-domain
signals of the ion spatial motions and use a Fourier transform algorithm to convert the signals into
m/z information. Such devices can collect signals from a wide range of m/z values simultaneously,
which means that the whole spectrum can be obtained at once (85). In FT-ICR, ions move under a
magnetic field, and are then excited by a transient electric field. After the excitation, the resultant
coherent ion motions yield time-domain signals which can be collected and transferred to m/z
values. Ions in Orbitrap (Figure 1-9) oscillate around a carefully shaped central electrode under an
electronic field, and the motion frequency is Fourier transformed to m/z values. Moreover, ions
are injected with the coherent motion into Orbitrap by a curved quadrupole ion trap, called C-trap,
and the oscillation starts immediately. It does not require ion excitation inside the Orbitrap, which
is needed in FT-ICR. Compared to FT-ICR, Orbitrap is more compact and cost efficient, and easier
to maintain. Importantly, the mass resolution of FT-ICR is inversely proportionate to m/z, and that
of Orbitrap is to the square root of m/z. As a result, the mass-resolving power decreases more
slowly with the Orbitrap than the FT-ICR with increased m/z values.
38
Figure 1- 9 The scheme of an orbitrap.
The ion injection and the ion motion path are shown in red. This figure is reprinted with permission
from Marshall et al. (85).
Nowadays, many commercialized mass spectrometers which provide high resolution and
high mass accuracy consists of the combination of multiple mass analyzers, called hybrid
instruments. For example, LTQ-Orbitrap (Thermo Fisher Scientific) consists of a linear ion trap
and Orbitrap. It utilizes the high ion trapping capacity and MSn fragmentation ability of the linear
ion trap and the high mass accuracy and resolution of Orbitrap.
The Q Exactive series (Thermo Fisher Scientific) combines a quadrupole and Orbitrap (86).
The quadrupole can guide and select ions between specified m/z ranges with fast switching times,
which results in fast time scale for fragmentation for selected ions for MS2 scan, allowing an
efficient multiplexed scan mode (86). The combination of the quadrupole and Orbitrap results in
high scan speed, mass resolution, and mass accuracy. Moreover, the quadrupole technology is well
established, making the instrument design particularly robust. In the Q Exactive series (Figure 1-
10), the ion fragmentation is achieved by the higher-energy C-trap dissociation (HCD) in the
39
Orbitrap (87), which design enables the detection of low m/z fragment ions, allowing the analysis
of isobaric labeled samples for quantitative proteomic study (see next section). It is advantages
compared to another widely used fragmentation strategy based on the quadrupole ion trap, called
ion-trap based collision-induced dissociation (CID). The quadrupole ion trap is not able to trap
low-mass fragment ions and usually induces lowest-energy fragmentation (87), making the
instrumentation which relying on such fragmentation strategy unable to analyze isobaric labeled
samples. The quadrupole is also compact, so the combination of quadrupole and Orbitrap makes
the instrument a “benchtop-instrument”. In this thesis, all experiments involved in MS
instrumentation were performed on a Q Exactive (Thermo Fisher Scientific, San Jose, CA). Its
construction is shown in Figure 1-10. The Q Exactive can reach a maximum resolution of 140,000
at 200 m/z with mass accuracy < 1 ppm with internal calibration and < 3 ppm with external
calibration.
Figure 1- 10 Construction of the Q Exactive.
40
Reprinted with permission from Michalski et al. (86). Q Exactive is hybrid with quadrupole and
Orbitrap. The quadrupole can filter selected ions at a fast time scale, and the Orbitrap can detect
the ions with high resolution and mass accuracy. In MS2 scan with higher-energy C-trap
dissociation (HCD), precursor ions are fragmented by the HCD collision cell, and the resultant
fragment ions are injected into Orbitrap by C-trap.
There are other advance hybrid instrumentations commercially available. The Orbitrap
Fusion series (Thermo Fisher Scientific) brings quadrupole, Orbitrap and linear ion trap together,
reaching mass resolution as high as 500,000 at m/z 200. The TripleTOF system (ABSciex) uses
quadruple and time-of-flight mass analyzers to reach high scan speed and a wide ion detection
range. The SYNAPT mass spectrometer (Waters) combines time-of-flight and ion mobility mass
spectrometry to improve the ion separation. These state-of-the-art MS instruments provide
powerful tools for protein and proteomics study.
Data acquisition
Besides the MS instrumentation, the information that can be obtained from MS analysis
highly relies on MS data acquisition strategies. The emergence of automated data acquisition
allows the unattended collection of large amounts of data with m/z information of both precursor
and fragment ions when the mass spectrometer is coupled with LC providing continuous separation
for highly complex proteomic samples.
41
Figure 1- 11 The scheme of (A) data dependent acquisition (DDA) and (B) data independent
acquisition (DIA).
In DDA, each MS2 spectrum is obtained from a specific precursor ion. On the other hand, each
MS2 in DIA is from all precursor ions within a m/z range.
42
For shotgun proteomics-based on LC-MS/MS, the most widely used strategy is data
dependent acquisition (DDA) (Figure 1-11A). With DDA, precursor ions are initially detected with
a survey scan, obtaining the mass (m/z values and charge states) and the intensity, called a full-
scan mass (MS1) spectrum. Then, a subset of precursors is selected automatically following a
predefined rule for subsequent fragmentation (MS2 spectra) (88). Commonly, the predefined rule
is selecting the precursors with the highest abundance in the MS1 spectrum, e.g. top 15, because
the peptide precursors with high intensity are more likely to yield a MS2 spectrum with high
quality. To avoid selecting the same precursor with high abundance redundantly during peptide
elution, a strategy called “dynamic exclusion” is widely applied in DDA, in which the precursor
that has been selected and fragmented with a good MS2 spectrum will not to be reselected over a
certain period of time. DDA attempts to obtain the maximum number of unique precursors and
their MS2 spectra. DDA is powerful and versatile, and to date it is the most widely used approach
for shotgun proteomics.
However, there can be hundreds of peptide species in one MS1 full scan spectrum. With
the finite instrument scan speed and specific LC separation window, only a limited number of
peptide precursors can be selected for MS2 acquisition in DDA (89). DDA is designed to pick the
precursors with high abundance, resulting in a sampling bias toward the most abundant compounds
and an undersampling of species with low abundance. To overcome this issue, an alternate strategy,
data independent data (DIA) acquisition, has emerged and been rapidly developed over the past
several years. In DIA, peptide precursors are fragmented systematically without considering MS1
information (90), and it is programmed to fragment all precursors within a wide m/z window (e.g.
m/z 400 to 2000), as shown in Figure 1-11B. There are several DIA strategies which have been
reported with different instrumentation settings. For example, the method called MSE (91), which
43
can be performed with Waters SYNAPT mass spectrometry, fragments all precursors co-eluting
from LC and entering the mass spectrometer at the same time (Figure 1-11B). Another strategy
uses relatively narrow precursor isolation windows (e.g. 25 m/z width) to subdivide the wide m/z
range. The precursors within the narrow range are then fragmented together, and the multiple
isolation windows are set to cover the whole m/z range, leading to a decrease in the complexity of
the MS2 spectra (Figure 1-11B). This strategy can be used on the TripleTOF system (ABSciex),
called SWATH (92), and on the Q Exactive series (Thermo Fisher Scientific), called DIA with
multiplexed MS/MS (93). By eliminating the sampling bias of DDA, DIA is able to provide more
reproducible run to run data and has a better chance to observe species of low abundance because
of fragmentation of all precursor ions (94). However, DIA raw data are inherently complicated
due to the reduction of precursor selection, and thereby it is challenging to interpret noisy DIA
data. In this thesis, DDA was used in Chapter 2 for the systems biology study of CHO cells
producing a biopharmaceutical. The high throughput and data analysis tools provided
comprehensive peptide identification and quantitation. DIA was applied to analyze low abundant
HCPs in the therapeutic product in Chapter 3, to take advantage of its un-biased sampling.
Discovery proteomics based on DDA and DIA is a powerful tool to study and quantitate
the overall species in a sample and to provide hypotheses for systems biology. After that,
individual proteins can be identified and recognized as significant such as biomarkers. Then, MS-
based targeted approaches, e.g., multiple reaction monitoring (SRM or MRM) (95), and PRM (96),
can be applied for individual proteins quantitation.
With MS-based targeted approaches, the specific data on each target peptide is needed such
as the m/z values of the precursor ions, fragment ion information, and the retention time of LC
separation. In the analysis, the specific peptide precursor ions are selected by predefined m/z values
44
(or in combination of retention time windows), and the resultant fragment ions are then detected
(95, 97). Heavy isotopically labeled homologues of the target peptides can be spiked into the
samples at known amounts as internal standards. The quantitation of the specific peptides is
achieved by comparing the signal intensities between the internal standards and the analytes of
interest. Since the internal standards are isotopologues that elute at the same time as the analytes,
variations caused by ion suppression are eliminated. These internal standards can even be used as
identification confirmation of the target peptides because peptides with the same sequence share
the same retention time and fragmentation patterns. Moreover, most of the interferences are filtered
out before they can reach the mass analyzer, so the sensitivity and selectivity of this approach is
high enough to detect low amol levels of analytes with complicated biochemical background (97).
SRM (or MRM) is currently the most widely used MS-based targeted approach for
quantitation of peptides and small molecules. The scheme of SRM is shown in Figure 1-12A. SRM
is most commonly performed with a triple quadrupole MS (98), and other instrumentations such
as QqTOF are also reported to use SRM. The target peptide precursors are selected by the first
quadrupole, and the selected ions are fragmented by the collision cell. Then, the third quadrupole
or TOF acts as a filter to select several specific fragment ions with predefined m/z values. SRM
requires the knowledge of both the peptide precursor information and the corresponding
fragmentation information, the combination of which is called a “transition”. For a given peptide
precursor, the intensities of individual fragment ions vary in a wide range, and the fragment ions
with high intensity are preferred as transition choices to maximize the detection sensitivity.
Moreover, MS parameters for every single target peptide, especially collision energy values, need
to be optimized in order to ensure the presence of specific fragment ions (98). Consequently,
45
significant efforts are needed for method development of SRM assay, which is often laborious and
time consuming.
Figure 1- 12 The schemes of PRM and SRM processes.
A. SRM (MRM). SRM is performed with triple quadruple mass spectrometer. Specific transitions,
which are precursor-fragment pairs, are selected and detected. Several transitions can be detected
for one peptide precursor. B. PRM. PRM, on the other hand, is generally used with Q-Orbitrap.
Instead of selecting specific transitions, all fragments of one peptide precursor are detected to take
advantage of the high resolution of the Orbitrap and its property that the whole spectrum can be
obtained at once.
PRM is an alternate choice when one has a hybrid quadrupole-Obitrap instrument (Figure
1-12B). PRM is also recently reported to run on QqTOF (99). Instead of detecting specific
transitions, PRM collects all of the fragment ion information for a certain peptide precursor,
utilizing a mass analyzer which can obtain the whole spectrum at once (Orbitrap) with high
46
resolution, or one able to scan fast enough to detect many fragment ions within a short time (TOF).
After the data acquisition, 3 to 7 fragment ions can be chosen for quantitation with the isotopically
labeled peptides internal standards spiked in. In PRM assay development, one does not need to
pick specific fragment ions or optimize MS parameters for specific transitions. Thus, PRM assay
development is more time efficient and less laborious, and can be established more rapidly
compared to the conventional SRM/MRM. In Chapter 3, the PRM approach was employed to
quantify individual HCPs in a therapeutic antibody drug.
1.8.3 LC-MS based quantitative proteomics and protein analysis
LC-MS based quantitative proteomics strategies have emerged as powerful tools for
systems biology study and biomarker discovery. The capabilities of broad proteome coverage and
accuracy of quantitation of these approaches keep improving, and these strategies have been
applied to address a wide range of biological questions. LC-MS based quantitative proteomics
approaches can be categorized into three groups: label-free and labeled approaches. Their
application, advantages and disadvantages will be discussed in this section.
In Chapter 2, LC-MS/MS based relatively quantitative proteomics based on shotgun
approach is used for a broad survey of the overall proteome and expression differences across
several CHO cell samples at different growth time points and cultivation conditions. Potential
biomarkers for CHO cell growth status are also suggested. In Chapter 3, label-free quantitation
and targeted MS based on LC-MS with PRM were both applied for individual HCP quantitation.
47
1.8.3.1 Label-free quantitation
Label-free quantitation, in comparison to labeled, requires less sample handing. It does not
require modification and/or specific treatment of proteins/peptides compared to labeling
approaches. After enzymatic digestion, the resultant peptide mixture is analyzed by LC-MS. Each
sample needs separate MS runs, and the quantitation information of the samples are obtained from
comparison across several corresponding individual MS runs. Hence, this strategy is
straightforward and cost-efficient. It can also apply to any type of biological samples.
Label-free approaches can be based on either intensity or spectral-counting (100). For the
former, peak intensities or areas of specific peptides in the chromatographic profile are used as an
indicator of their abundance. Spectral counting relies on the positive correlation between the
number of identified MS/MS spectra and the peptide/protein amounts based on data dependent
acquisition (DDA) (89). Label-free analysis allows an unlimited number of samples that can be
compared. However, each sample must be handled and tested individually. Thus, throughput is not
as high as that of labeling techniques. Moreover, the direct comparison across several LC-MS/MS
runs can be affected by the run-to-run variation, and therefore several replicates are required to
build statistical validation of the analysis.
1.8.3.2 Labeled quantitation approaches
Labeling approaches require the introduction one or several heavy isotopologues, typically
13C, 15N, 18O, and/or D, to the proteins or peptides. MS can recognize the predictable mass
difference or specific reporter signals introduced by the labeling and also distinguish identical
peptides from different samples. Thus, samples labeled with different isotopes or isotopic
48
combinations can be pooled together and analyzed in one MS run. Such sample multiplexing
increases the throughput significantly by reducing the overall analysis time. Moreover, in these
approaches, MS run-to-run variation can be eliminated. The identical peptides are expected to have
similar behavior during LC-MS analysis including retention time, ionization efficiency, and ESI
signal response factor (101), and hence testing within one MS run provides additional statistical
validation. These approaches are especially desirable when one is interested in the differential
regulations of proteomes among samples under several physiological conditions.
There are two major categories of labeling strategy, in vivo metabolic labeling and in vitro
chemical derivatization processes. Stable isotope labeling by amino acids in cell culture (SILAC)
is widely used as an in vivo metabolic labeling approach (102). There are several types of chemical
derivatization labeling techniques, including enzymatic labeling of 16O/18O (103), dimethyl
labeling (104), isotope-coded affinity tag (ICAT) labeling (105), and isobaric mass tag labeling
(106, 107).
Based on different MS spectral information used to recognize identical peptides from
different samples, there are two types of labeling techniques. One is to recognize the peptides by
predictable mass shifts of the precursor ions in the MS1 spectra. Specific mass shifts correspond
to specific labeled samples. Also, in this strategy quantitative information generally depends on
precursor ions, by calculating the ratio of precursor ion intensities of the heavy/light peptide pairs.
The other technique is to obtain the information at the MS2 level of the specific “reporter ions”
for each sample from an isobaric mass tag containing different combinations of isotopes. An
illustration of the categories is shown in Figure 1-13. In the following sections, the major labeling
approaches will be discussed in detail. In Chapter 2, the isobaric mass tag labeling, TMT, was used
to compare different cell culture growths.
49
Figure 1- 13 The categories of labeling approaches.
Based on the MS spectral recognition (in red), MS can recognize the labeling of the MS1 precursor
ions for SILAC, ICAT, 16O/18O labeling, and dimethyl labeling. The quantitative information can
be obtained at the MS2 level for isobaric labeling including iTRAQ, TMT, DiLeu, and DiART.
According to the labeling mechanisms (in purple), SILAC is metabolic labeling, and chemical
labeling includes ICAT, 16O/18O labeling, dimethyl labeling, and isobaric labeling. Moreover,
SILAC, ICAT, and isobaric labeling are generally performed at the protein level, and 16O/18O,
dimethyl, and isobaric labeling can be used at peptide level after enzymatic digestion.
Metabolic labeling
In SILAC, cells are cultured in the growth media containing one or several essential amino
acids labeled with stable isotopes. Then, all of the expressed proteins are incorporated
metabolically with these labeled amino acids after several cell cycles of replication. Media
containing different sets of labeled peptides can provide a series of samples with various mass
differences. These samples are pooled together based on either equal number of cells or equal
amount of total protein and then analyzed by MS after necessary sample handling and preparation.
Peptide intensity ratios of the “heavy” and “light” pairs represent their relative abundances,
which can be interpreted for relative expression levels of the corresponding proteins. Traditional
SILAC typically used Arg and/or Lys labeled with 13C and/or 15N for essential amino acids, and it
has been reported to reach as high as 5-plex SILAC (108, 109). SILAC provides the most accurate
50
quantitation information among quantitative proteomic approaches. Since the samples can be
pooled at the very early stage of sample handling, the systematic and random variations from
sample preparation can be significantly reduced (110). However, SILAC is not easy to apply on
tissue samples or biofluids, limiting its application for sample types mainly to cell culture.
Moreover, SILAC may not be practical to be employed for large production scale (liters to kilo-
liters) in the biopharmaceutical industry.
Enzymatic labeling with 16O/18O
Enzymatic labeling of 16O/18O is one of the earliest isotopic labeling techniques used in
proteomics (111). As shown in Figure 1-14, 18O atoms can be introduced into the C terminus of
peptides during or after protein enzymatic digestion in H218O solution with enzymes such as
trypsin, Lys-C and Glu-C (101, 111). This approach is relatively inexpensive and easy to perform
(101). However, since different labeling efficiencies usually occur for different peptides and
oxygen back-exchange can happen, the labeling protocol needs to be optimized (112-114). The
number of sample multiplex is up to three.
51
O
NH
RR OH trypsin
NH2
R
O
Otrypsin
R
O
OHROH trypsin
O
OHR
OH trypsin
O
Otrypsin
R
OH
OROH trypsin
OH H
OH H
OH H
+ 18 +
18
18
+
18
18 +
18
18
Figure 1- 14 The reaction of enzymatic labeling of 16O/18O (115).
ICAT
ICAT chemical reagents specifically react with Cys residues of proteins and peptides.
ICAT labeling is generally performed at the protein level, and the “light” and “heavy” isotopically
labeled ICAT reagents (typically duplex) provide a mass shift of 9 Da per labeled Cys residue
(105). After labeling, the proteins undergo enzymatic digestion. There is a biotin group presenting
in the ICAT labels, so the Cys-containing peptides, which are labeled by ICAT reagents, can then
be isolated and enriched by an Avidin column. The enriched Cys-containing peptides are analyzed
by the following MS analysis (105, 116). Therefore, the sample complexity is reduced, increasing
the possibility of identification and quantification of proteins with low abundance. However, the
proteins containing no Cys-residue cannot be analyzed, and low Cys-containing proteins would be
only identified with single peptides or even cannot be identified.
52
Dimethyl labeling
Dimethyl labeling can reach up to triplex sample comparison, and the reaction has been
automated online with LC-MS (104, 117, 118). The labeling reaction is performed at the peptide
level after protein digestion. Dimethyl labeling is based on the reaction between primary amines
and formaldehyde to from a Schiff base, and then reduced by cyanoborohydride (Figure 1-15)
(119). The primary amine groups of the peptides, which are N-termini and epsilon amino group of
Lys-residues, are converted to a dimethyl labeled amine, and the N-terminus proline can be
converted to a monomethylamine (120). With the combination of different forms of formaldehyde
(normal, deuterated, and deuterated with 13C labeled) and cyanoborohydride (normal and
deuterated) (Figure 1-15), triplex “mass tags” can be obtained. The advantages of the approach
include low cost, quick reaction, and high labeling efficiency (sub-micrograms to milligrams of
sample) (104). However, the disadvantage is that the deuterated groups are usually around the
hydrophobic portion of the peptides, and the retention time of the identical peptides with different
labeled tags can have noticeable retention time shifts with RP chromatography, complicating the
data analysis. The relative quantitation of the identical peptide species requires a search across a
retention time range instead of in one spectrum (104).
53
Figure 1- 15 Chemical reaction of dimethyl labeling.
Figure is reprinted with the permission from Boersema et al. (117).
Isobaric labeling-based relative quantitation
In isobaric labeling, the quantitative and qualitative information of analytes are obtained at
the MS2 level during MS analysis. An isobaric labeling reagent set has the same chemical structure
with identical mass, but with different combinations of isotopic substitutions. The general scheme
of the isobaric labeling reagent structure is shown in Figure 1-16. The structure is composed of a
reactive group, a mass normalizer group, and a mass reporter group. The reactive group is used to
covalently attach the mass tag onto the peptide. The mass reporter groups of the individual reagents
have different masses in one set, resulting from different isotopic combinations, distinguishable in
the MS2 spectrum. The function of the mass normalizer group, that also contain different
combinations of isotopes, is to balance the mass difference of the mass reporter group, making the
overall reagent isobaric.
54
Figure 1- 16 The scheme of (A) isobaric labeling reagents and (B) labeled peptide.
Most of the isobaric labeling reagents react with the primary amine groups of peptides, which are
N-terminus and epsilon amino group of Lys-residues, and such reactions occur without major side
reactions.
For the isobaric labeling-based approaches, each reagent in one isobaric labeling reagent
set is to provide the each analyte in a given sample with a specific isobaric mass tag, and then the
group of samples under study are pooled together. This method can label the samples before or
after enzymatic digestion. Individual peptides from the multiplexed samples co-elute with the same
LC retention time. The precursor ions are indistinguishable in the MS1 spectrum and will be
isolated together for the following fragmentation event. During the MS2 scan event, two types of
non-overlapping product ions are generated, (1) reporter ions from the labeled mass tags at low
55
m/z values and (2) peptide fragment ions at high m/z values. In the MS2 scans, the signal intensities
of the reporter ions from the different mass tags provide quantitation information across the
different samples, and the MS2 peptide fragment ions are used for peptide identification. An
example of the widely used isobaric labeling reagent, tandem mass tag (TMT), as well as the
detection principle, is shown in Figure 1-17.
There are advantages of isobaric labeling technology over the MS1-based detection
approaches. First, this technique is very flexible. It can be applied to any type of sample including
cell lines, tissues, and body fluids and can be employed at the protein or peptide level. Second,
isobaric labeling is able to reach high numbers of sample multiplexing without significantly
reducing the sensitivity of the MS analysis. The precursor ions of identical peptides from different
samples show the same m/z value, and the co-isolation of these precursor ions thus does not
compromise the MS1 sensitivity while still allowing MS2 spectra to be obtained. As a result,
isobaric labeling can commonly reach high numbers of multiplexing, boosting the throughput
significantly. The 8-plex iTRAQ is commercially available from Sciex (Framingham, MA), and
10-plex TMT are from Thermo Scientific (Rockford, IL) (121). The combinatorial isobaric mass
tags (CMTs) was reported to potentially reach 28-plex in one set (122).
56
Figure 1- 17 TMT 6-plex labeling reagents and the technology principle.
A. The chemical structure of TMT reagent, and the new peptide bond formation at the primary
amine group with the TMT isobaric tags. The isotope distribution for the TMT 6-plex is also shown.
B. The scheme of how TMT labeling technique provides identification and quantitation
information for multiple samples in one LC-MS/MS run.
However, only mass spectrometers which can detect low m/z values in the MS2 scan can
be used for isobaric labeling because the m/z values of the reporter ions generally range from m/z
57
110 to 135. Moreover, the quantitative accuracy can be compromised (121, 123-125). The isolation
window for precursor ion selection is often from 1.5 to 3 Th. Besides the targeted precursors, all
other precursor ions of the co-eluting peptides within this isolation window are also selected. After
fragmentation, the reporter ion signals for all the co-isolated labeled peptide precursor ions
contribute to the reporter ion signal. Consequently, the actual analyte abundances can be incorrect.
The effect of such interference is unpredictable, dependent on the complexity of samples. However,
the relative quantitation difference from proteomic samples are typically considered to be
underestimated and compressed with the assumption that the majority of proteins would not be
differentially regulated in the biological studies. Thus, the quantitative data interpretation of the
isobaric labeling requires special attention. One approach was reported to utilize the triple-stage
mass spectrometry (MS3), which is capable of both ion-trap-based CID and HCD fragmentation,
to increase the quantitative accuracy (124). The precursor ions are first fragmented by CID, which
is with low fragmentation energy, and then the most intense product ions are then selected for the
subsequent HCD fragmentation (MS3). In this way, interference can be removed. However, in this
thesis, such specific instrument is not available, so specific ratio cutoff was used in Chapter 2
instead of considering the actual fold change values.
58
Figure 1- 18 Chemical structure of major isobaric reagents.
A. iTRAQ (107); B. DiLeu (126); C. DiART (127); D. CMT (122). In this figure, (a) shows the
chemical structure of the isobaric reagent, and (b) illustrates the new peptide bond formation at the
primary amine group with the isobaric tag.
59
There are several types of isobaric reagents with their corresponding reaction chemistry
mechanisms which have been developed. Tandem mass tag (TMT) reagents (106) and isobaric tag
for relative and absolute quantitation (iTRAQ) reagents (107) are commercially available and most
commonly used. Several other novel isobaric labeling reagents have also been reported including
N,N-dimethyl leucines (DiLeu) (126), deuterium isobaric amine-reactive tag (DiART) (127), and
combinatorial isobaric mass tags (CMT) (122). The chemical structures of these isobaric reagents
are shown in Figures 1-17 and 1-18. All of the reagents react with the primary amine groups of
peptides, which are N-terminus and epsilon amino group of Lys-residues, and such reaction has
been proven without major side reactions (101). Other isobaric labeling for specific post-
translational modification and cysteine-specific isobaric tags are based on other chemistry
reactions such as carbonyl- and sulfhydryl-reactions (121).
In Chapter 2, TMT 6-plex (reporter ions from m/z 126 to 131) has been applied to study
proteome profile changes of CHO cells across several times using different culture conditions.
This approach is more suitable and practical than other techniques such as SILAC because
industrial large scale cultivation was studied.
Notably, as mentioned previously, there are multiple choices of MS data acquisition
strategies, label-free and labeled quantitative approaches. Individual choices can be combined,
yielding an extensive number of workflows. For example, DDA and isobaric labeling can be used
for discovery proteomics, and SRM can also work with labeled approaches to quantify target
proteins with extended sensitivity. One can make choices depending on the issues that are
attempted to address.
60
1.8.4 MS-based proteomic data interpretation
1.8.4.1 Peptide and protein identification
In the shotgun bottom-up proteomic analysis, peptide identification is the first step to
interpret the MS data, which is then used to infer the presence of corresponding proteins. Also, the
quantitative information, can also obtained from peptide to reflect the regulation level of proteins.
To date, there are three strategies to identify peptides from MS2 spectra: database searching, de
novo sequencing and spectral library searching.
Currently, database searching is the most widely used strategy. The data interpretation in
database searching is fundamentally supported by a protein sequence database (128). In this
strategy, the protein sequences in the database are in silico digested by specific enzyme, and the
information of all possible precursors and their corresponding possible fragment ions of the peptide
candidates are collected. Then the experimental MS raw data (MS1 and MS2 spectra) are
compared to this information, yielding a matching score for each peptide spectral match (PSM)
based on either similarity or probability, which can indicate how confident the match is. SEQUEST
(129) and Mascot (130) are widely used examples of such strategy. Note that, with the database
searching algorithm alone, false positive PSMs still occur. To reduce this false positive discovery
rate, a PSM evaluation strategy, called target-decoy strategy, is frequently used (131). A ‘decoy’
database which contains reversed or shuffled peptide sequences and the ‘target’ database which is
composed of true protein sequences are combined, and the experimental MS raw data searched
against this combined ‘target-decoy’ database. Since the PSMs matching the ‘decoy’ sequences
are known to be false, the cutoff of the matching score can be determined based on the desired
false discovery rate (FDR). Database searching is effective and relatively accurate. However, it
61
cannot recognize peptides which are not present in the database. In Chapter 2, database searching
was used for peptide and protein identification for CHO proteomic MS data analysis.
The algorithm of de novo sequencing, on the other hand, only relies on the peptide fragment
pattern of experimental MS data to determine the peptide sequences, without the assistance of a
protein sequence database (132). The approach can identify peptides which are not contained in
the reference database, which could be particularly helpful to identify protein variants and proteins
from organisms whose genomes have not been sequenced (132, 133). However, de novo
sequencing suffers from inaccuracy caused by factors such as noisy spectra, incomplete ion series,
and limited MS2 mass accuracy (133). As a result, database searching has generally outperformed
the de novo sequencing strategy. However, the rapid development of MS instrumentation is
promising to provide higher quality of the MS2 spectra, which may lead to more attention for de
novo sequencing.
Spectral library searching (134-136) is based on the assumption that under similar
conditions, a given peptide would yield nearly identical MS2 spectra including the fragment ion
species and their intensities. If an MS2 spectrum is an accurate peptide spectral match (PSM), it
can be compiled to build up a reference called a spectral library. The newly obtained experimental
MS data can then be compared with the PSMs in the spectral library to identify known peptides.
Moreover, LC retention time information can also be added in the spectral library. When one is
using similar LC separation conditions (e.g. RP LC), the retention time information can increase
the identification certainty. This method is highly time efficient with high accuracy, especially
when using a comprehensive spectral library with high quality. However, it can only identify the
peptide containing in the spectral library, and it also requires that the new experiments should
follow the similar data acquisition conditions with which the spectral library has been built up.
62
For DDA data, the algorithm can be applied directly since each MS2 spectrum is from a
well-selected precursor. On the other hand, the data analysis for DIA may require some conversion
because of noisy MS2 spectra and limited precursor isolation. There are generally two strategies
for DIA data analysis. The first one is called targeted extraction, which basically relies on spectral
library searching. In this approach, the spectral library is required, which should contain PSMs and
their retention times. Then, the spectral information from the DIA data can be extracted and
analyzed based on the spectral library for peptide identification. OpenSWATH is one of such
approach (137). The other strategy is mentioned as untargeted peptide identification, which can
utilize database searching to interpret DIA data. In this strategy, DIA data are computationally
reconstructed into pseudo MS2 spectrum. That is, from the DIA data, each precursor is grouped
with its all possible fragment ions. The resultant precursor-fragment group, called pseudo MS2
spectrum, can then be put into a database searching strategy for peptide identification. DIA-Umpire
is one of these search tools (138).
Peptide identification is then used for protein inference. Two general groups of strategy,
probabilistic and non-probabilistic methods, have been developed. The non-probabilistic approach
provides the protein entries which can be explained by the identified peptides with high confidence.
In the probabilistic method, the quality of PSMs will also be taken into consideration to provide a
probability evaluation of the protein presence. For example, it may assign a high probability for a
protein with multiple low scoring PSMs.
Notably, no single tool or search engineer can yield the “best” results, and it has been
reported that the combination of multiple algorithms can provide a more robust pipeline to provide
more peptide identifications than a single approach (133). Several softwares can provide such a
platform to insert the desired tools and to build the customized MS data analysis pipeline. For
63
example, the commercial software Proteome Discoverer (Thermo Scientific) can integrate several
searching engines and PSM evaluation methods together. In Chapter 2, we used three searching
engines to maximize the number of identified proteins by Proteome Discoverer 1.4.
1.8.4.2 Biological analysis
In shotgun proteomic analysis for systems biology study, the protein abundance
information needs to be further interpreted to provide biological reasons or to generate testable
hypotheses for the biological system under specific perturbations. Generally, a large number of
differentially regulated proteins/metabolites/mRNAs can be determined through statistical
analysis, and the biofunctions and/or pathways that are supported by differentially regulated
molecules can be selected as candidates of cellular responses under certain circumstances. The
hypotheses can then be generated, and follow up experiments performed to test the hypotheses.
The lists of differentially regulated proteins/metabolites/mRNAs from ’Omics experiments
are generally long, and manual data analysis can hence be laborious and hard to yield biological
insight. In order to interpret the data, several strategies have been developed. One popular approach
is Gene Ontology (GO) enrichment analysis (http://geneontology.org/page/go-enrichment-
analysis). GO annotate proteins/genes based on three biological categories: “biological process”,
“molecular function” or “cellular component” (139). After the proteins from the experiments are
annotated with GO terms, enrichment analysis identify which specific GO terms show more
abundance and are over-represented in samples under perturbation, highlighting the likely
involved underlying biology. Another strategy is pathway analysis. The pathway database is built
up by assigning the relevant proteins/mRNA/metabolites into certain pathways based on their
64
biological effects. Then, the protein list obtained from experiments is mapped into the pathway
database. The specific pathways which are over-represented by the experimental data are
considered as biological processes being affected by perturbations. Kyoto encyclopedia of genes
and genomes (KEGG) (140) and Ingenuity Pathway Knowledge Base are examples of such
pathway databases.
Currently many biological databases and software tools are available for ‘Omics data
analysis, such as Ingenuity Pathway Analysis (IPA) (Ingenuity® Systems, www.ingenuity.com,
Redwood City, CA), MetaCore (Thomson Reuters, https://portal.genego.com/, New York City,
NY), and Visualization and Integrated Discovery (141) (DAVID) (Leidos Biomedical Research,
Inc., https://david.ncifcrf.gov/, Frederic, MD). All of the software mentioned here can provide both
GO term and pathway analysis. In this thesis, IPA and MetaCore were used in Chapter 2 to analyze
the proteomic and metabolomics data for biology analysis. For both IPA and MetaCore, the
biology databases are generated from current publication and knowledge. Proteomics,
metabolomics, and transcriptomics data can be used as input. Specifically, IPA can analyze
combined proteomic and metabolomics data sets (142), which was utilized in Chapter 2.
1.9 Conclusion
The success of the biopharmaceutical production in industry depends on advances of
upstream bioprocess and downstream purification at the large scale. Despite the achievements to
date, the limited understanding of the systems biology of CHO, the primary workhorse for
therapeutic protein production, hinders the upstream bioprocess development. Meanwhile, the
detection and evaluation of residual HCPs in the final drug product is still challenging. LC-MS
65
based quantitative proteomic and protein analysis is a powerful tool to study the systems biology
of CHO cell lines as well as a promising platform for HCP analysis.
1.10 Reference
1. Walsh G (2014) Biopharmaceutical benchmarks 2014. Nat. Biotechnol. 32(10):992-1000.
2. Walsh G (2010) Biopharmaceutical benchmarks 2010. Nat. Biotechnol. 28(9):917-924.
3. Datta P, Linhardt RJ, & Sharfstein ST (2013) An 'omics approach towards CHO cell
engineering. Biotechnol. Bioeng. 110(5):1255-1271.
4. Jayapal KR, Wlaschin KF, Hu WS, & Yap MGS (2007) Recombinant protein therapeutics
from CHO cells - 20 years and counting. Chem. Eng. Prog. 103(10):40-47.
5. Griffin TJ, Seth G, Xie HW, Bandhakavi S, & Hu WS (2007) Advancing mammalian cell
culture engineering using genome-scale technologies. Trends Biotechnol. 25(9):401-408.
6. Barnes LM, Bentley CM, & Dickson AJ (2001) Characterization of the stability of
recombinant protein production in the GS-NS0 expression system. Biotechnol. Bioeng.
73(4):261-270.
7. Baldi L, et al. (2005) Transient gene expression in suspension HEK-293 cells: Application
to large-scale protein production. Biotechnol. Prog. 21(1):148-153.
8. Chu L & Robinson DK (2001) Industrial choices for protein production by large-scale cell
culture. Curr. Opin. Biotechnol. 12(2):180-187.
9. Dalton AC & Barton WA (2014) Over-expression of secreted proteins from mammalian
cell lines. Protein Sci. 23(5):517-525.
66
10. Kober L, Zehe C, & Bode J (2012) Development of a novel ER stress based selection
system for the isolation of highly productive clones. Biotechnol. Bioeng. 109(10):2599-
2611.
11. Kaufman RJ, et al. (1985) Coamplification and coexpression of human tissue-type
plasminogen-activator and murine dihydrofolate-reductase sequences in Chinese-hamster
overy cells. Mol. Cell. Biol. 5(7):1750-1759.
12. Tjio JH & Puck TT (1958) Genetics of somatic mammalian cells. II. Chromosomal
constitution of cells in tissue culture. J. Exp. Med. 108(2):259-268.
13. Wurm MF (2013) CHO quasispecies-Implications for manufacturing processes. Processes.
1:296-311.
14. Hamilton WG & Ham RG (1977) Clonal growth of Chinese-hamster cell lines in protein-
free media. In. Vitro. Cell. Dev. B. 13(9):537-547.
15. Shukla AA & Thommes J (2010) Recent advances in large-scale production of monoclonal
antibodies and related proteins. Trends Biotechnol. 28(5):253-261.
16. Farid SS (2007) Process economics of industrial monoclonal antibody manufacture. J.
Chromatogr. B Analyt. Technol. Biomed. Life Sci. 848(1):8-18.
17. Jain E & Kumar A (2008) Upstream processes in antibody production: Evaluation of
critical parameters. Biotechnol. Adv. 26(1):46-72.
18. Glacken MW, Fleischaker RJ, & Sinskey AJ (1983) Large-scale production of mammalian-
cells and their products-engineering principles and barriers to scale-up. Ann. N. Y. Acad.
Sci. 413(DEC):355-372.
19. Grima EM, Chisti Y, & MooYoung M (1997) Characterization of shear rates in airlift
bioreactors for animal cell culture. J. Biotechnol. 54(3):195-210.
67
20. Gronemeyer P, Ditz R, & J. S (2014) Trends in upstream and downstream process
development for antibody manufacturing. Bioeng. 1:188-212.
21. De Jesus M & Wurm FM (2011) Manufacturing recombinant proteins in kg-ton quantities
using animal cells in bioreactors. Eur. J. Pharm. Biopharm. 78(2):184-188.
22. Shukla AA & Gottschalk U (2013) Single-use disposable technologies for
biopharmaceutical manufacturing. Trends Biotechnol. 31(3):147-154.
23. Bibila TA & Robinson DK (1995) In pursuit of the optimal fed-batch process for
monoclonal-antibody production. Biotechnol. Prog. 11(1):1-13.
24. Birch JR & Racher AJ (2006) Antibody production. Adv. Drug Delivery Rev. 58(5-6):671-
685.
25. Goochee CF & Monica T (1990) Environmental-effects on protein glycosylation. Nat.
Biotechnol. 8(5):421-427.
26. Wong DCF, Wong KTK, Goh LT, Heng CK, & Yap MGS (2005) Impact of dynamic
online fed-batch strategies on metabolism, productivity and N-glycosylation quality in
CHO cell cultures. Biotechnol. Bioeng. 89(2):164-177.
27. Yang M & Butler M (2000) Effects of ammonia on CHO cell growth, erythropoietin
production, and glycosylation. Biotechnol. Bioeng. 68(4):370-380.
28. Andersen DC, Bridges T, Gawlitzek M, & Hoy C (2000) Multiple cell culture factors can
affect the glycosylation of Asn-184 in CHO-produced tissue-type plasminogen activator.
Biotechnol. Bioeng. 70(1):25-31.
29. Baker KN, et al. (2001) Metabolic control of recombinant protein N-glycan processing in
NS0 and CHO cells. Biotechnol. Bioeng. 73(3):188-202.
68
30. Xie LZ & Wang DIC (1997) Integrated approaches to the design of media and feeding
strategies for fed-batch cultures of animal cells. Trends Biotechnol. 15(3):109-113.
31. Voisard D, Meuwly F, Ruffieux PA, Baer G, & Kadouri A (2003) Potential of cell retention
techniques for large-scale high-density perfusion culture of suspended mammalian cells.
Biotechnol. Bioeng. 82(7):751-765.
32. Betts JI & Baganz F (2006) Miniature bioreactors: current practices and future
opportunities. Microb. Cell. Fact. 5:14.
33. Tai M, Ly A, Leung I, & Nayar G (2015) Efficient high-throughput biological process
characterization: Definitive screening design with the Ambr250 bioreactor system.
Biotechnol. Prog. 31(5):1388-1395.
34. Bhambure R, Kumar K, & Rathore AS (2011) High-throughput process development for
biopharmaceutical drug substances. Trends Biotechnol. 29(3):127-135.
35. Xing ZZ, Kenty BN, Li ZJ, & Lee SS (2009) Scale-up analysis for a CHO cell culture
process in large-scale bioreactors. Biotechnol. Bioeng. 103(4):733-746.
36. Aranibar N, et al. (2011) NMR-based metabolomics of mammalian cell and tissue cultures.
J. Biomol. NMR 49(3-4):195-206.
37. Junker BH (2004) Scale-up methodologies for Eseheriehia coli and yeast fermentation
processes. J Biosci. Bioeng. 97(6):347-364.
38. Marks DM (2003) Equipment design considerations for large scale cell culture.
Cytotechnology. 42(1):21-33.
39. Sieblist C, Jenzsch M, Pohlscheidt M, & Lubbert A (2011) Insights into large-scale cell-
culture reactors: I. Liquid mixing and oxygen supply. Biotechnol. J. 6(12):1532-1546.
69
40. Sieblist C, et al. (2011) Insights into large-scale cell-culture reactors: II. Gas-phase mixing
and CO2 stripping. Biotechnol. J. 6(12):1547-1556.
41. Tscheliessnig AL, Konrath J, Bates R, & Jungbauer A (2013) Host cell protein analysis in
therapeutic protein bioprocessing - methods and applications. Biotechnol. J. 8(6):655-670.
42. Shukla AA, Hubbard B, Tressel T, Guhan S, & Low D (2007) Downstream processing of
monoclonal antibodies - Application of platform approaches. J. Chromatogr. B Analyt.
Technol. Biomed. Life Sci. 848(1):28-39.
43. Hober S, Nord K, & Linhult M (2007) Protein A chromatography for antibody purification.
J. Chromatogr. B Analyt. Technol. Biomed. Life Sci. 848(1):40-47.
44. Shukla AA & Hinckley P (2008) Host cell protein clearance during protein A
chromatography: development of an improved column wash step. Biotechnol. Prog.
24(5):1115-1121.
45. Azevedo AM, Rosa PAJ, Ferreira IF, & Aires-Barros MR (2009) Chromatography-free
recovery of biopharmaceuticals through aqueous two-phase processing. Trends Biotechnol.
27(4):240-247.
46. Guiochon G & Beaver LA (2011) Separation science is the key to successful
biopharmaceuticals. J. Chromatogr. A 1218(49):8836-8858.
47. Briggs J & Panfili PR (1991) Quantitation of DNA and protein impurities in
biopharmaceuticals. Anal. Chem. 63(9):850-859.
48. Wolter T & Richter A (2005) Assays for controlling host-cell impuriteis in
biopharmaceuticals. Bioprocess. Int. 3(2):2-6.
49. Xu X, et al. (2011) The genomic sequence of the Chinese hamster ovary (CHO)-K1 cell
line. Nat. Biotechnol. 29(8):735-U131.
70
50. Lewis NE, et al. (2013) Genomic landscapes of Chinese hamster ovary cell lines as
revealed by the Cricetulus griseus draft genome. Nat. Biotechnol. 31(8):759-765.
51. Kim JY, Kim YG, & Lee GM (2012) CHO cells in biotechnology for production of
recombinant proteins: current state and further potential. Appl. Microbiol. Biotechnol.
93(3):917-930.
52. Kildegaard HF, Baycin-Hizal D, Lewis NE, & Betenbaugh MJ (2013) The emerging CHO
systems biology era: harnessing the 'omics revolution for biotechnology. Curr. Opin.
Biotechnol. 24(6):1102-1107.
53. Mallick P & Kuster B (2010) Proteomics: a pragmatic perspective. Nat. Biotechnol.
28(7):695-709.
54. Chaudhuri S, et al. (2015) Investigation of CHO secretome: Potential way to improve
recombinant protein production from bioprocess. J. Bioprocess. Biotech. 5(7):1000240.
55. Doolan P, et al. (2013) Transcriptomic analysis of clonal growth rate variation during CHO
cell line development. J. Biotechnol. 166(3):105-113.
56. Clarke C, et al. (2011) Large scale microarray profiling and coexpression network analysis
of CHO cells identifies transcriptional modules associated with growth and productivity.
J. Biotechnol. 155(3):350-359.
57. Ley D, et al. (2015) Multi-omic profiling of EPO-producing Chinese hamster ovary cell
panel reveals metabolic adaptation to heterologous protein production. Biotechnol. Bioeng.
112(11):2373-2387.
58. de Zafra CLZ, Quarmby V, Francissen K, Vanderlaan M, & Zhu-Shimoni J (2015) Host
cell proteins in biotechnology-derived products: A risk assessment framework. Biotechnol.
Bioeng. 112(11):2284-2291.
71
59. Hogwood CEM, Bracewell DG, & Smales CM (2014) Measurement and control of host
cell proteins (HCPs) in CHO cell bioprocesses. Curr. Opin. Biotechnol. 30:153-160.
60. Wang X, Hunter AK, & Mozier NM (2009) Host cell proteins in biologics development:
identification, quantitation and risk assessment. Biotechnol. Bioeng. 103(3):446-458.
61. Beatson R, et al. (2011) Transforming growth factor-beta 1 is constitutively secreted by
Chinese hamster ovary cells and is functional in human cells. Biotechnol. Bioeng.
108(11):2759-2764.
62. Gao SX, et al. (2011) Fragmentation of a highly purified monoclonal antibody attributed
to residual CHO cell protease activity. Biotechnol. Bioeng. 108(4):977-982.
63. Gutierrez AH, Moise L, & De Groot AS (2012) Of hamsters and men A new perspective
on host cell proteins. Hum. Vaccin. Immunother. 8(9):1172-1174.
64. Krawitz DC, Forrest W, Moreno GT, Kittleson J, & Champion KM (2006) Proteomic
studies support the use of multi-product immunoassays to monitor host cell protein
impurities. Proteomics 6(1):94-110.
65. Jin M, Szapiel N, Zhang J, Hickey J, & Ghose S (2010) Profiling of host cell proteins by
two-dimensional difference gel electrophoresis (2D-DIGE): implications for downstream
process development. Biotechnol. Bioeng. 105(2):306-316.
66. Levy NE, Valente KN, Choe LH, Lee KH, & Lenhoff AM (2014) Identification and
characterization of host cell protein product-associated impurities in monoclonal antibody
bioprocessing. Biotechnol. Bioeng. 111(5):904-912.
67. Zhu GJ, et al. (2012) A rapid cIEF-ESI-MS/MS method for host cell protein analysis of a
recombinant human monoclonal antibody. Talanta 98:253-256.
72
68. Doneanu CE, et al. (2012) Analysis of host-cell proteins in biotherapeutic proteins by
comprehensive online two-dimensional liquid chromatography/mass spectrometry. mAbs
4(1):24-44.
69. Farrell A, et al. (2015) Quantitative host cell protein analysis using two dimensional data
independent LC-MS^E. Anal. Chem. 87(18):9186-9193.
70. Doneanu CE, et al. (2015) Enhanced detection of low-abundance host cell protein (HCP)
impurities in high-purity monoclonal antibodies down to 1 ppm using ion mobility mass
spectrometry coupled with multidimensional liquid chromatography. Anal. Chem.
87(20):10283-10291.
71. Aebersold R & Mann M (2003) Mass spectrometry-based proteomics. Nature
422(6928):198-207.
72. Bantscheff M, Schirle M, Sweetman G, Rick J, & Kuster B (2007) Quantitative mass
spectrometry in proteomics: a critical review. Anal. Bioanal. Chem. 389(4):1017-1031.
73. Furey A, Moriarty M, Bane V, Kinsella B, & Lehane M (2013) Ion suppression; A critical
review on causes, evaluation, prevention and applications. Talanta 115:104-122.
74. Motoyama A & Yates JR (2008) Multidimensional LC separations in shotgun proteomics.
Anal. Chem. 80(19):7187-7193.
75. Gilar M, Olivova P, Daly AE, & Gebler JC (2005) Orthogonality of separation in two-
dimensional liquid chromatography. Anal. Chem. 77(19):6426-6434.
76. Link AJ, et al. (1999) Direct analysis of protein complexes using mass spectrometry. Nat.
Biotechnol. 17(7):676-682.
77. Washburn MP, Wolters D, & Yates JR (2001) Large-scale analysis of the yeast proteome
by multidimensional protein identification technology. Nat. Biotechnol. 19(3):242-247.
73
78. Peng JM, Elias JE, Thoreen CC, Licklider LJ, & Gygi SP (2003) Evaluation of
multidimensional chromatography coupled with tandem mass spectrometry (LC/LC-
MS/MS) for large-scale protein analysis: The yeast proteome. J. Proteome Res. 2(1):43-
50.
79. Gilar M, Olivova P, Daly AE, & Gebler JC (2005) Two-dimensional separation of peptides
using RP-RP-HPLC system with different pH in first and second separation dimensions. J.
Sep. Sci. 28(14):1694-1703.
80. Michalski A, Cox J, & Mann M (2011) More than 100,000 Detectable Peptide Species
Elute in Single Shotgun Proteomics Runs but the Majority is Inaccessible to Data-
Dependent LC-MS/MS. J. Proteome Res. 10(4):1785-1793.
81. Mann M & Kelleher NL (2008) Precision proteomics: The case for high resolution and
high mass accuracy. Proc. Natl. Acad. Sci. U. S. A. 105(47):18132-18138.
82. Yates JR, Ruse CI, & Nakorchevsky A (2009) Proteomics by mass spectrometry:
Approaches, advances, and applications. Annual Review of Biomedical Engineering,
Annual Review of Biomedical Engineering, (Annual Reviews, Palo Alto), Vol 11, pp 49-
79.
83. Chernushevich IV (2000) Duty cycle improvement for a quadrupole-time-of-flight mass
spectrometer and its use for precursor ion scans. Eur. J. Mass Spectrom. 6(6):471-479.
84. Chernushevich IV, Loboda AV, & Thomson BA (2001) An introduction to quadrupole-
time-of-flight mass spectrometry. J. Mass Spectrom. 36(8):849-865.
85. Marshall AG & Hendrickson CL (2008) High-Resolution Mass Spectrometers. Annual
Review of Analytical Chemistry, Annual Review of Analytical Chemistry, (Annual
Reviews, Palo Alto), Vol 1, pp 579-599.
74
86. Michalski A, et al. (2011) Mass Spectrometry-based Proteomics Using Q Exactive, a High-
performance Benchtop Quadrupole Orbitrap Mass Spectrometer. Mol. Cell. Proteomics
10(9):11.
87. Olsen JV, et al. (2007) Higher-energy C-trap dissociation for peptide modification analysis.
Nat. Methods 4(9):709-712.
88. Stahl DC, Swiderek KM, Davis MT, & Lee TD (1996) Data-controlled automation of
liquid chromatography tandem mass spectrometry analysis of peptide mixtures. J. Am. Soc.
Mass 7(6):532-540.
89. Liu HB, Sadygov RG, & Yates JR (2004) A model for random sampling and estimation of
relative protein abundance in shotgun proteomics. Anal. Chem. 76(14):4193-4201.
90. Chapman JD, Goodlett DR, & Masselon CD (2014) Multiplexed and data-independent
tandem mass spectrometry for global proteome profiling. Mass Spectrom. Rev. 33(6):452-
470.
91. Plumb RS, et al. (2006) UPLC/MSE; a new approach for generating molecular fragment
information for biomarker structure elucidation. Rapid Commun. Mass Spectrom.
20(13):1989-1994.
92. Gillet LC, et al. (2012) Targeted data extraction of the MS/MS spectra generated by data-
independent acquisition: A new concept for consistent and accurate proteome analysis. Mol.
Cell. Proteomics 11(6):17.
93. Egertson JD, et al. (2013) Multiplexed MS/MS for improved data-independent acquisition.
Nat. Methods 10(8):744-746.
75
94. Geromanos SJ, et al. (2009) The detection, correlation, and comparison of peptide
precursor and product ions from data independent LC-MS with data dependant LC-MS/MS.
Proteomics 9(6):1683-1695.
95. Picotti P & Aebersold R (2012) Selected reaction monitoring-based proteomics: workflows,
potential, pitfalls and future directions. Nat. Methods 9(6):555-566.
96. Peterson AC, Russell JD, Bailey DJ, Westphall MS, & Coon JJ (2012) Parallel reaction
monitoring for high resolution and high mass accuracy quantitative, targeted proteomics.
Mol. Cell. Proteomics 11(11):1475-1488.
97. Gallien S, et al. (2012) Targeted proteomic quantification on quadrupole-orbitrap mass
spectrometer. Mol. Cell. Proteomics 11(12):1709-1723.
98. Lange V, Picotti P, Domon B, & Aebersold R (2008) Selected reaction monitoring for
quantitative proteomics: a tutorial. Mol. Syst. Biol. 4:14.
99. Schilling B, et al. (2015) Multiplexed, Scheduled, High-Resolution Parallel Reaction
Monitoring on a Full Scan QqTOF Instrument with Integrated Data-Dependent and
Targeted Mass Spectrometric Workflows. Anal. Chem. 87(20):10222-10229.
100. Zhu WH, Smith JW, & Huang CM (2010) Mass spectrometry-based label-free quantitative
proteomics. J. Biomed. Biotechnol. 2010:840518-840523.
101. Xie F, Liu T, Qian WJ, Petyuk VA, & Smith RD (2011) Liquid chromatography-mass
spectrometry-based quantitative proteomics. J. Biol. Chem. 286(29):25443-25449.
102. Ong SE, et al. (2002) Stable isotope labeling by amino acids in cell culture, SILAC, as a
simple and accurate approach to expression proteomics. Mol. Cell. Proteomics 1(5):376-
386.
76
103. Stewart, II, Thomson T, & Figeys D (2001) O-18 Labeling: a tool for proteomics. Rapid
Commun. Mass Spectrom. 15(24):2456-2465.
104. Boersema PJ, Raijmakers R, Lemeer S, Mohammed S, & Heck AJR (2009) Multiplex
peptide stable isotope dimethyl labeling for quantitative proteomics. Nat. Protoc. 4(4):484-
494.
105. Gygi SP, et al. (1999) Quantitative analysis of complex protein mixtures using isotope-
coded affinity tags. Nat. Biotechnol. 17(10):994-999.
106. Thompson A, et al. (2003) Tandem mass tags: A novel quantification strategy for
comparative analysis of complex protein mixtures by MS/MS. Anal. Chem. 75(8):1895-
1904.
107. Ross PL, et al. (2004) Multiplexed protein quantitation in Saccharomyces cerevisiae using
amine-reactive isobaric tagging reagents. Mol. Cell. Proteomics 3(12):1154-1169.
108. Tzouros M, et al. (2013) Development of a 5-plex SILAC Method Tuned for the
Quantitation of Tyrosine Phosphorylation Dynamics. Mol. Cell. Proteomics 12(11):3339-
3349.
109. Molina H, et al. (2009) Temporal Profiling of the Adipocyte Proteome during
Differentiation Using a Five-Plex SILAC Based Strategy. J. Proteome Res. 8(1):48-58.
110. Merrill AE, et al. (2014) NeuCode labels for relative protein quantification. Mol. Cell.
Proteomics 13(9):2503-2512.
111. Schnolzer M, Jedrzejewski P, & Lehmann WD (1996) Protease-catalyzed incorporation of
O-18 into peptide fragments and its application for protein sequencing by electrospray and
matrix-assisted laser desorption/ionization mass spectrometry. Electrophoresis 17(5):945-
953.
77
112. Qian WJ, et al. (2005) Quantitative proteome analysis of human plasma following in vivo
lipopolysaccharide administration using O-16/O-18 labeling and the accurate mass and
time tag approach. Mol. Cell. Proteomics 4(5):700-709.
113. Fenselau C & Yao XD (2009) (18)O(2)-labeling in quantitative proteomic strategies: A
status report. J. Proteome Res. 8(5):2140-2143.
114. Petritis BO, Qian WJ, Camp DG, & Smith RD (2009) A Simple Procedure for Effective
Quenching of Trypsin Activity and Prevention of (18)O-Labeling Back-Exchange. J.
Proteome Res. 8(5):2157-2163.
115. Miyagi M & Rao KCS (2007) Proteolytic O-18-labeling strategies for quantitative
proteomics. Mass Spectrom. Rev. 26(1):121-136.
116. Shiio Y & Aebersold R (2006) Quantitative proteome analysis using isotope-coded affinity
tags and mass spectrometry. Nat. Protoc. 1(1):139-145.
117. Boersema PJ, Aye TT, van Veen TAB, Heck AJR, & Mohammed S (2008) Triplex protein
quantification based on stable isotope labeling by peptide dimethylation applied to cell and
tissue lysates. Proteomics 8(22):4624-4632.
118. Raijmakers R, et al. (2008) Automated online sequential isotope labeling for protein
quantitation applied to proteasome tissue-specific diversity. Mol. Cell. Proteomics
7(9):1755-1762.
119. Hsu JL, Huang SY, Chow NH, & Chen SH (2003) Stable-isotope dimethyl labeling for
quantitative proteomics. Anal. Chem. 75(24):6843-6852.
120. Hsu JL, Huang SY, Shiea JT, Huang WY, & Chen SH (2005) Beyond quantitative
proteomics: Signal enhancement of the a(1) ion as a mass tag for peptide sequencing using
dimethyl labeling. J. Proteome Res. 4(1):101-108.
78
121. Rauniyar N & Yates JR (2014) Isobaric labeling-based relative quantification in shotgun
proteomics. J. Proteome Res. 13(12):5293-5309.
122. Braun CR, et al. (2015) Generation of multiple reporter ions from a single isobaric reagent
increases multiplexing capacity for quantitative proteomics. Anal. Chem. 87(19):9855-
9863.
123. Ow SY, et al. (2009) iTRAQ underestimation in simple and complex mixtures: "the good,
the bad and the ugly". J. Proteome Res. 8(11):5347-5355.
124. Ting L, Rad R, Gygi SP, & Haas W (2011) MS3 eliminates ratio distortion in isobaric
multiplexed quantitative proteomics. Nat. Methods 8(11):937-940.
125. Wenger CD, et al. (2011) Gas-phase purification enables accurate, multiplexed proteome
quantification with isobaric tagging. Nat. Methods 8(11):933-935.
126. Xiang F, Ye H, Chen RB, Fu Q, & Li LJ (2010) N,N-Dimethyl Leucines as Novel Isobaric
Tandem Mass Tags for Quantitative Proteomics and Peptidomics. Anal. Chem. 82(7):2817-
2825.
127. Zhang JX, Wang Y, & Li SW (2010) Deuterium Isobaric Amine-Reactive Tags for
Quantitative Proteomics. Anal. Chem. 82(18):7588-7595.
128. Eng JK, Searle BC, Clauser KR, & Tabb DL (2011) A face in the crowd: Recognizing
peptides through database search. Mol. Cell. Proteomics 10(11):9.
129. Eng JK, McCormack AL, & Yates JR (1994) An approach to correlate tandem mass-
spectral data of peptides with amino acid sequences in a protein database. J. Am. Soc. Mass
5(11):976-989.
79
130. Perkins DN, Pappin DJC, Creasy DM, & Cottrell JS (1999) Probability-based protein
identification by searching sequence databases using mass spectrometry data.
Electrophoresis 20(18):3551-3567.
131. Elias JE & Gygi SR (2010) Target-Decoy Search Strategy for Mass Spectrometry-Based
Proteomics. Proteome Bioinformatics, Methods in Molecular Biology, eds Hubbard SJ &
Jones AR (Humana Press Inc, Totowa), Vol 604, pp 55-71.
132. Seidler J, Zinn N, Boehm ME, & Lehmann WD (2010) De novo sequencing of peptides
by MS/MS. Proteomics 10(4):634-649.
133. Hoopmann MR & Moritz RL (2013) Current algorithmic solutions for peptide-based
proteomics data generation and identification. Curr. Opin. Biotechnol. 24(1):31-38.
134. Craig R, Cortens JC, Fenyo D, & Beavis RC (2006) Using annotated peptide mass
spectrum libraries for protein identification. J. Proteome Res. 5(8):1843-1849.
135. Frewen BE, Merrihew GE, Wu CC, Noble WS, & MacCoss MJ (2006) Analysis of peptide
MS/MS spectra from large-scale proteomics experiments using spectrum libraries. Anal.
Chem. 78(16):5678-5684.
136. Lam H, et al. (2007) Development and validation of a spectral library searching method
for peptide identification from MS/MS. Proteomics 7(5):655-667.
137. Rost HL, et al. (2014) OpenSWATH enables automated, targeted analysis of data-
independent acquisition MS data. Nat. Biotechnol. 32(3):219-223.
138. Tsou CC, et al. (2015) DIA-Umpire: comprehensive computational framework for data-
independent acquisition proteomics. Nat. Methods 12(3):258-264.
139. Ashburner M, et al. (2000) Gene Ontology: tool for the unification of biology. Nat. Genet.
25(1):25-29.
80
140. Kanehisa M & Goto S (2000) KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic
Acids Res. 28(1):27-30.
141. Huang DW, Sherman BT, & Lempicki RA (2009) Systematic and integrative analysis of
large gene lists using DAVID bioinformatics resources. Nat. Protoc. 4(1):44-57.
142. Kramer A, Green J, Pollard J, & Tugendreich S (2014) Causal analysis approaches in
Ingenuity Pathway Analysis. Bioinformatics 30(4):523-530.
81
Chapter 2: Combined Metabolomics and Proteomics Reveals
Hypoxia as A Cause of Lower Productivity on Scale-up to a
5000-Liter CHO Bioprocess
The paper based on this chapter is published in Biotechnology Journal in 2016, and its digital
object identifier is DOI: 10.1002/biot.201600030. A poster on this work was presented in the 64th
American Society for Mass Spectrometry (ASMS) conference in June 2016 (abstract ID: 280605).
Yuanwei Gao1, Somak Ray1, Shujia Dai1, Alexander R. Ivanov1, Nicholas R. Abu-Absi2, Amanda
M. Lewis2, Zhuangrong Huang2, Zizhuo Xing2, Michael C. Borys2, Zheng Jian Li2, Barry L.
Karger1
1Barnett Institute and Department of Chemistry and Chemical Biology, Northeastern University,
Boston, MA, 02115
2Biologics Development, Global Manufacturing and Supply, Bristol-Myers Squibb, 38 Jackson
Road, Devens, MA 01434
I thank Somak Ray for proteomic sequence database construction and statistic data analysis, Dr.
Shujia Dai for his contribution of the early stage of this study, Dr. Alexander Ivanov for discussion,
and Dr. Barry Karger for conceptual design, idea contribution, and manuscript composition. I also
thank the scientists from Bristol-Myers Squibb for their strong collaboration, especially, Dr.
Nicholas Abu-Absi for providing bioreactor data, and Dr. Amanda Lewis for helpful discussions.
82
2.1 Abstract
Large-scale bioprocessing is key to the successful manufacturing of a biopharmaceutical.
However, cell viability and productivity are often lower in the scale-up from laboratory to
production. In this study, we analyzed CHO cells, which showed lower percent viabilities and
productivity in a 5-KL production scale bioreactor compared to a 20-L bench-top scale under
seemingly identical process parameters. An increase in copper concentration in the media from 0.2
μM to 0.4 μM led to a doubling of percent viability in the production scale albeit still at a lower
level than the bench-top scale. Combined metabolomics and proteomics revealed the increased
copper reduced the presence of reactive oxygen species (ROS) in the 5-KL scale process. The
reduction in oxidative stress was supported by the increased level of glutathione peroxidase in the
lower copper level condition. The excess ROS was shown to be due to hypoxia (intermittent), as
evidenced by the reduction in fibronectin with increased copper. The 20-L scale showed much less
hypoxia and thus less excess ROS generation, resulting in little to no impact to productivity with
the increased copper in the media. The study illustrates the power of ‘Omics in aiding in the
understanding of biological processes in biopharmaceutical production.
83
2.2 Introduction
Biologics, including antibodies, hormones and cytokines, represent an increasingly
important class of therapeutics, with 7 of the 10 top selling drugs in 2013 in this class (1). The
majority of biologics are manufactured using mammalian cellular hosts, especially Chinese
hamster ovary (CHO) cells. The biotechnology industry is under pressure to bring therapeutics to
market faster and at lower cost. In order to meet these demands, it is increasingly important for
the industry to develop high yielding, scalable, and robust processes that are controllable and well
understood. Many advances in the field of bioprocessing are facilitating this goal. Engineering
design of large volume bioreactors to provide cells with an optimal environment for high cell
density and high yield processes continues to improve. At the same time, understanding the
biology of cellular processes during CHO cell protein production is being actively pursued (2, 3).
Recently, molecular profiling-based genomics, proteomics, and metabolomics are being applied
to bioreactor processes to study cellular production (4-6). Ultimately, the combination of multiple
‘Omics methods will lead to an improved understanding of the manufacturing and cellular
processes, resulting in improvements in bioreactor productivity (4), production of specific
glycoforms (6-8), and potential identification of biomarkers for process assessment and control
(9).
At present, there are important gaps in our understanding that need to be addressed. A
significant gap is process scalability. To date, the vast majority of biological development studies
have been conducted using small volume reactors at the liter scale because of the ease of handling
and cost. Moreover, advances in high-throughput technologies and robotics are driving bioprocess
development to even smaller, milliliter scale (10). Yet, it is known that productivity is generally
lower in large (KL) production relative to small (L or less) reactor scales (2, 11, 12).
84
Accurate and reproducible manipulation of the large-scale bioprocess is central to the
success of the expensive and time-consuming production of biopharmaceuticals. Considerable
effort has been made by industry to develop and qualify bench-top bioreactor systems as
representative models of cell behavior at the manufacturing scale. Application of the ‘Omics tools
should aid our ability to translate results from laboratory scale to the large scale, leading to a
significant impact on biotechnology production.
This paper presents, for the first time, a combined proteomic and metabolomic study to
compare a CHO culture reactor at the manufacturing (5-KL) versus the laboratory scale (20-L).
Phenotypically, it was found that the viable cell density during the stationary phase of the process
was significantly lower for the production scale reactor, using the same media and dosing regimen.
At the same time, an unexpected 20-fold increase in trace level copper (Cu2+) in the buffering
agent, sodium carbonate, from 0.02 μM to 0.4 μM, led to a 2-fold increase in the viable cell density
for the 5-KL process while having only a minor effect on the 20-L scale. Copper, a well-known
trace metal of cell culture media, serves as a cofactor of many enzymes controlling their functional
states and activity levels (13, 14). There have been several previous reports describing the effects
of copper levels on the productivity of CHO cells, albeit for low volume reactors or shake flasks
(15-20).
The present study identifies the changes in proteomic and metabolomic profiles that occur
as a result of the increased copper on the production scale process relative to the laboratory scale.
Statistically significant network and pathway analysis of the combined ‘Omics results revealed
that an excess of reactive oxygen species (ROS) occurred in the 5-KL reactor and that the increased
level of copper reduced this stress, leading to decreased apoptosis and cell death. On the other
hand, the oxidative stress was found to be less pronounced for the 20-L bioreactor, and, as a result,
85
the influence of copper level less significant. Based on the ‘Omics data, along with qPCR, ELISA
and western blotting, the excess ROS production for the 5-KL production scale reactor has been
attributed to intermittent hypoxia resulting as the cells periodically enter zones of lower oxygen
concentration. The hypoxia is likely due to limited mass transfer and homogeneity of the gas
throughout the 5-KL bioreactor. For the 20-L scale, oxygen distribution was more complete,
leading to far less hypoxia and stress.
2.3 Materials and methods
2.3.1 Chemicals and reagents
Formic acid, urea, triethylammonium bicarbonate buffer (TEAB) (1.0 M, pH 8.5),
dithiothreitol (DTT), iodoacetamide (IAM), phenylmethanesulfonyl fluoride (PMSF), and
ammonium hydroxide solution (≥ 25% in H2O) were purchased from Sigma- Aldrich (St. Louis,
MO). Sequencing-grade modified trypsin was from Promega (Madison, WI), and mass
spectrometry grade lysyl endopeptidase (Lys-C) was purchased from Wako (Richmond, VA). The
bicinchoninic acid (BCA) protein assay kit, tandem mass tag (TMT) 6-plex kit, HaltTM protease
and phosphatase inhibitor cocktail (EDTA free), cell extraction buffer, SuperSignal west femto
trial kit, Quant-iTTM protein assay kit, RiboPure RNA extraction kits (AM1924), LC-MS grade
water, LC-MS grade acetonitrile, and SDS-polyacrylamide NuPAGE Noves 4-12% Bis-Tris
protein gels were from Thermo Fisher Scientific (Rockford, IL). The PVDF membranes for the
western blotting protein transfer, ECL western blotting substrate kit, and fibronectin ELISA kit
(ab108849) were from Abcam (Cambridge, MA). The primary antibodies against β-actin (sc-
47778), superoxide dismutase 1 (SOD1) (sc-11407), fibronectin (sc-9068), and glutathione
86
peroxidase (GPx 1/2) (sc-30147), as well as the secondary antibodies against the corresponding
primary antibodies were purchased from SantaCruz Biotech (Heidelberg, Germany). First strand
synthesis kits (330401), SYBR green mix (330520) and RT-PCR primers were purchased from
SABiosciences, a division of Qiagen (Valencia, CA).
2.3.2 CHO Cell Culture Conditions
A CHO DG44 cell line expressing a recombinant antibody fusion protein using a vector
with the dihydrofolate reductase-deficient (DHFR) selection marker was used for all experiments.
The bioreactor experiments all employed the same proprietary, chemically defined media. The cell
line and process conditions were similar to those previously described (12). Cell cultures were
expanded in a series of shake flasks, rocker bags, and seed bioreactors to generate enough cells to
inoculate production bioreactors. Rocker bags were used in place of seed bioreactors for small
scale experiments; however, the number of population doublings was controlled to be similar for
all experiments. The media contained 0.02 μM and 0.4 μM CuSO4 for the low and high
concentrations (15), respectively. The high copper concentration resulted from impurities in
Na2CO3 used for pH control. Experiments were carried out in either 20-L or 5-KL bioreactors with
initial working volumes of 11-L or 3-KL, respectively. The temperature, initially controlled at
37 °C, was reduced to a lower temperature at a pre-defined time to extend the viability and
productivity of the culture. The pH in the bioreactor was controlled through the addition of CO2
gas and 1 M Na2CO3. The bioreactors were operated in fed-batch mode, with the timing and
amount of feed determined by measured glucose concentrations in the bioreactor. Dextran sulfate
87
was added to the bioreactors on Day 3. The time of culture harvest was determined according to
proprietary pre-determined criteria.
Cell culture samples were monitored for cell density and viability using either Cedex
(Roche Diagnostics Corp., Indianapolis, IN) or ViCell (Beckman Coulter, Inc., Indianapolis, IN)
automated cell counters that operate using the trypan blue dye exclusion method. Monitoring of
pH, pCO2, pO2, glucose, lactate, glutamine, glutamate, and ammonium were performed using
either BioProfile 400 or NovaFlex instruments (Nova Biomedical, Waltham, MA). Dissolved
oxygen (DO) was measured using in-line probes and controlled at the same set points throughout
the bioreactor runs for all experiments at both scales. All bioreactors were sparged with fixed flow
rates of air to match vessel volumes per minute at each bioreactor scale. Oxygen was supplied to
the bioreactors automatically by the controller to maintain the specified DO set-points, and the DO
profiles for all bioreactors were similar for all runs regardless of culture condition or scale. Product
titers were measured by a Protein-A high performance liquid chromatography (HPLC) assay based
on reference standards of the purified product at known concentrations. The number and frequency
and sampling was determined in part by schedule of GMP manufacturing operations, different
harvest timing between bioreactors, and complexity of sample analysis. Supernatant and cell pellet
samples were collected from bioreactors to enable metabolomic and proteomic analyses. A
smaller sub-set of samples was analyzed via proteomic methods due to the expense and complexity
of the methods involved. Cell pellets containing either 5 (for metabolomics) or 10 (for proteomics)
x106 total cells/mL were retained. Sample time points were Day 3, Day 5, and Day 7 for the 5-KL
scale, and Day 3, Day 6, and Day 10 for the 20-L scale. These time points correspond to
exponential, stationary, and early death phase at the corresponding bioreactor scales. Two
88
biological replicates (two bioreactors) were taken for all time points of high and low copper
conditions at both scales.
The cell cultivation in this chapter was performed by Bristol-Myer Squibb (Devens, MA),
and they also provided the data related to the cultivation performance.
2.3.3 Metabolomic analysis
Metabolomic analysis was performed by Metabolon, Inc. (Durham, NC) according to their
standard analysis platform (21). For data analysis, with each replicate, the intracellular metabolite
signal intensities were first normalized by total cellular protein amount at a given time point, and
then these protein-normalized intensities were divided by Day 0 protein-normalized intensities of
the particular metabolite. The metabolite ratios (high copper : low copper) for each replicate were
calculated by dividing the resultant normalized intensities for the high copper sample by that of
the low copper sample for a given time point. The replicate ratios were either averaged or ratios
of average normalized intensities were calculated. A cut-off ratio of 1.5 or 0.67 was used to define
if the metabolite was up or down regulated, respectively. For comparison of metabolite abundance
between 5-KL and 20-L samples, shared metabolites involved in the Ingenuity Pathway Analysis
(IPA) (Qiagen, Redwood City, CA) reported “reactive oxygen formation” biofunction were
identified from the Day 7 (5-KL bioreactor) and Day 10 (20-L bioreactor) results. Ratios of
replicate average abundance were calculated for these selected metabolites for Day 7/Day 10 with
high vs high copper, and low vs. low copper samples. The metabolites and their corresponding
log-ratios were subjected to the IPA ‘disease and biofunction activation’ prediction tool.
89
2.3.4 Sample preparation for proteomics
A pellet consisting of approximately 107 cells for each sample was reconstituted in 500 μL
of cell lysis buffer (10 M urea and 5 mM DTT in 100 mM TEAB pH 8.0). Cell lysate was prepared
by using a Model 505 Sonic Dismembrator (Thermo Fisher Scientific, Pittsburgh, PA). The lysate
protein concentration was determined using the BCA protein assay. Approximately 100 μg protein
was denatured and reduced with freshly prepared 10 mM DTT at 37 ℃ for 1 hour, and then
alkylated with 10 mM IAM in the dark at room temperature for 45 minutes with the presence of
10 M urea and 100 mM TEAB (pH 8.0). Proteins were precipitated by adding cold acetone and
maintained at -20 ℃ overnight. After the supernatant was discarded, the protein was reconstituted
in 200 μL of 25 mM TEAB (pH 8.0) in 90% water and 10% acetonitrile. The digestion of Lys-C
was conducted at 37 ℃ for 6 hours with an enzyme to protein ratio of 1:200 (w/w), and then the
tryptic digestion was performed with an enzyme to protein ratio of 1:50 (w/w) at 40 ℃ overnight.
Protein digests (100 μg per TMT channel) from the samples of a given bioreactor size for three
cell growth time points under high and low copper media were labeled with six TMT channels,
following the protocol supplied by the manufacturer. The solutions were lyophilized to dryness
and stored at -80 ℃ prior to 2D-LC/MS analysis.
2.3.5 2D LC-MS/MS
The labeled protein digest mixture was separated and analyzed by 2D high pH/low pH
reversed phase (RP/RP) liquid chromatography coupled with a Q-Exactive mass spectrometer
(Thermo Fisher Scientific, San Jose, CA). The first-dimension separation was off line with the
platform containing an Agilent 1200 series system with diode array detector (Agilent Technologies,
90
Santa Clara, CA) and a 300Extend_C18 column (3.5 μm beads, 2.1x150 mm) (Agilent
Technologies). Mobile phase A and B were 20 mM ammonium formate in water (pH 10), and 20
mM ammonium formate (pH 10) in 90% acetonitrile/10% water, respectively. After reconstitution
with mobile phase A, 200 μg of labeled digest was injected on the column. After desalting by
mobile phase A for 1 hour at 200 μL/min, a gradient was then run at a flow rate of 200 μL/min
(from 2% B to 100% B in 44 minutes, 100% B to 2% B and 2% B for 9 minutes). The fractions
were collected in 2-minute intervals, and pooled to equalize protein levels for a final fraction
number of 19 based on the UV absorption profile at 214 nm.
For the second dimension LC, samples were analyzed on an Ultimate 3000
chromatography system with a home-packed IntegraFrit column (20 cm x 75 μm, New Objective,
Woburn, MA) with 200 Å Magic C18 AQ particles (3 μm diameter) (Michrom Bioresources,
Auburn, CA). Mobile phase A was 0.1% formic acid in water, and mobile phase B was 0.1%
formic acid in acetonitrile. The flow rate was 300 nL/min for the sample injection and desalting
processes, followed by a separation gradient (2% B to 32 % in 120 minutes, 32% B to 90% B in
20 minutes, 90% B for 3 minutes) with 200 nL/min flow rate. The sample was then detected online
by the Q-Exactive mass spectrometer.
MS data were collected in the data dependent data acquisition mode with a survey single
stage MS (MS1) scan followed by high collision dissociation (HCD) MS/MS scans of the top 12
most intense precursor ions. The full MS scans were acquired in the Orbitrap with a resolution of
70,000 (m/z = 200) and a scan range of m/z 375 to 1600. HCD spectra were acquired for MS2
with a resolution of 17,500 and the fixed first mass of m/z 100. The isolation window is 2.0 m/z.
For accurate mass measurement, the lock mass option was enabled using the
polydimethylcyclosiloxane ion at m/z 455.12002 as an internal calibrant.
91
2.3.6 Construction and annotation of DG44 CHO cell proteome database
The DG44 CHO protein sequence database was developed by Somak Ray, the second
author of the paper published corresponding to this chapter. DG44 CHO transcriptomic sequences
were pooled from published transcriptomic (22) and in-house sequencing data by Roche 454
(Branford, CT) using fifty-base, single-end runs. The final transcriptomic data was assembled
using CLC Genomics Workbench Version 4 (http://www.clcbio.com/) with the NCBI mouse
RefSeq set of transcripts (http://www.ncbi.nlm.nih.gov/refseq/) as reference. For annotating the
resulting CHO transcript, annotations for the same mouse sequence that led to the assembly of the
final CHO sequence in the CLC Genomics software were used.
To augment the set of the transcriptomic sequences, the published CHO genome sequence
(23) from NCBI RefSeq was used. First, the transcript subsequence coding for amino acids was
identified using the transcript nucleotide sequence along with the corresponding mouse protein
sequence using a dynamic programming alignment algorithm based method “FrameBot” (24)
which also corrects for frameshift mutations. The CHO protein coding sequences, which were less
than 90 percent of the length compared to the corresponding mouse protein sequences, were
subjected to extension of their sequences using homolog sequences from the CHO genome. For
extension, a low-complexity masked CHO genome sequence was generated using Windowmasker
(25). The ‘protein2genome’ (abbreviated p2g) module from the “Exonerate” software (26) was
run to best align the mouse protein homolog of the CHO protein against the low-complexity
masked CHO genomic sequence. The original protein coding DG44 CHO transcript sequence and
the protein coding CHO sequence from the top hit of p2g were globally aligned, and gaps were
filled using sequence information from the CHO homolog. The corresponding amino acid
92
sequence from the extended transcript sequences along with the rest of the DG44 sequences were
generated using Framebot.
Those mouse RefSeq proteins for which no homologous DG44 CHO transcriptomic
sequence were found, were searched against the new CHO genome. These mouse sequences were
used as query to search against the ‘Windowmasker’ masked CHO genome and the ‘p2g’ module
of Exonerate. The top hit of p2g from the CHO genome was retained. Together the peptide
sequences derived from transcripts of DG44 CHO and mouse homologs present in the CHO
genome made up the protein sequence database with a total 18,075 sequences.
The CHO proteome sequence databased construction and annotation was instituted by
Somak Ray, the second author of the published paper corresponding to this chapter.
2.3.7 Protein identification of proteomics analysis
The raw data files from each LC-MS/MS run were processed in Proteome Discoverer 1.4
(PD 1.4) (Thermo Fisher Scientific) and searched against the CHO database with three search
engines: Sequest HT, Mascot, and MS Amanda (27). Cysteine carbamidomethylation and TMT 6-
plex modification at the N-terminus and lysine were set as fixed modifications, along with
oxidation of methionine and deamidation of asparagine and glutamine set as dynamic
modifications. Up to two missed tryptic cleavages were allowed. Mass tolerance was set at 10
ppm for precursor ions, and 0.05 Da for fragment ions. Percolator was used to filter matches to
1% peptide false discovery rate (FDR). The quantitation method was chosen as TMTe 6plex
(custom), with the peak integration tolerance of 20 ppm. The proteins with at least one unique
93
peptide identified in all 6 time points with reporter ions satisfying the above criteria were
considered as “identified proteins”.
2.3.8 Quantitation and differential expression analysis
The reporter ion intensities of a given TMT channel for each peptide-spectrum match (PSM)
extracted from PD 1.4 were normalized by dividing individual intensities by the sum of the
intensities of that particular channel. The proteins with at least two PSMs identified with reporter
ions in all six TMT channel were considered as “quantified proteins”. Among these “quantified
proteins”, an intensity-based filtering technique was applied on each TMT channel to improve the
reproducibility between the two replicates from separate bioreactor runs (see next section). Then,
with the given replicate the common proteins from all six channels were determined as the protein
list with high confidence of quantitation information. The common proteins identified in the two
separate replicates of a given bioreactor size were taken as the final list with high confidence of
quantitation.
Relative quantitation of proteins (high copper vs. lower copper for each time point) was
achieved by pairwise comparison of TMT reporter ion intensities among samples using the DanteR
software (version 0.1.1; Pacific Northwest National Laboratory, Richland, WA;
http://omics.pnl.gov) (28). The median of log2-ratios of the proteins obtained from DanteR was
adjusted to zero. Then, the protein log ratios were determined, based on the protein list obtained
after the intensity-based filtering technique. The average of each protein log2-ratio (high copper:
low copper) from the two separate bioreactor replicates was taken as the protein log ratio. The
94
ratio-based filtering technique (see next section) was applied to choose the differentially expressed
proteins with high confidence.
2.3.9 Data filtering technique applied on the proteomics data
Because of the complexity of the bioreactor process at the large scale, as well as the general
sampling procedures, we adopted a strategy of several levels of filtering to obtain consistent results
to compare the data between the high and low copper conditions.
For the first level, we filtered proteins according to their abundances as measured by the
normalized sum of the PSMs of TMT reporter ion intensities in order to remove outliers, based on
an M-A plot (29). Each PSM reporter ion intensity was normalized by dividing by the
corresponding sum of all PSM intensities belonging to that particular channel. For each TMT
channel, the sum of all normalized reporter ion intensities of all PSMs for each protein was
calculated as a measure of abundance of that protein. For a given channel, the log2- average
abundance for each protein in the two replicates was plotted in ascending order. The average log2-
abundances were next binned, with each bin containing data points for 300 proteins. The log2
ratio of the abundances was calculated, and only those proteins within ±1.5σ of the mean ratio of
log abundance of that bin were retained for further consideration. Among them, the shared proteins
of all six TMT channels were taken as “proteins after the intensity-based filtering technique”. The
first level of filtering technique was instituted by Somak Ray, the second author of the published
paper corresponding to this chapter.
To determine the differentially expressed proteins, a second level of filtering was applied
in which three criteria were used. (a) The protein log ratios of the two replicates for each bioreactor
95
size were with the same sign. (b) The average of the protein log ratios from the two replicates
was > 0.3 (fold change 1.23) for up-regulation and < -0.30 (fold change 0.81) for down-regulation.
(c) Each of the fold changes of the two replicates was > 0.11 (fold change 1.08, which is 12.5%
less than 1.23) for up-regulation and < -0.11 (fold change 0.93) for down-regulation. After this
second level of filtering, the remaining differentially expressed proteins were used for network
and pathway analysis with the averaged log ratios from the two replicates.
2.3.10 Interaction network and pathway analysis
MetaCore (Thomson Reuters, https://portal.genego.com/, New York City, NY) and
Ingenuity Pathway Analysis (IPA) were used to map the significant differentially regulated
proteins and metabolites into biological networks and pathway maps. The list of differentially
expressed proteins and metabolites for specific samples and their corresponding log2-ratios were
subjected to IPA core analysis using its default values. Unless otherwise noted, for the IPA
“diseases or biological functions” analysis, only disease/biological functions involving both
metabolites and proteins that were statistically significant with a |Z score| ≥ 2.0 for activation or
repression and p-value < 0.05 were reported. The time point of Day 3 of both of the replicates were
not considered for further analysis because of the limited differences found between the high and
low copper conditions.
Also, the differentially expressed proteins for each time point were submitted into
MetaCore, in which the “pathway maps” in the “Functional Ontology Enrichment” tool was used.
Pathways with p-values less than 0.01 were considered as statistically significant. The activation
96
or deactivation of the pathways were determined by the up- or down- regulation of the relevant
differentially regulated proteins and their functions.
2.3.11 Western blotting
The cell lysates were prepared using radioimmunoprecipitation assay (RIPA) buffer with
addition of the protease inhibitor cocktail and 1 mM PMSF for the western blot analysis. The total
protein concentration was determined by the BCA assay. About 40 μg of protein was denatured
and separated by SDS-polyacrylamide gel electrophoresis. Proteins were then transferred to PVDF
membranes. SOD1, fibronectin, and GPx were detected, with β-actin used as the loading control.
After incubating with the primary and then secondary antibodies, protein bands were visualized
using the ChemiDoc MP imaging system (Bio-Rad Laboratories, Hercules, CA).
2.3.12 Quantitation of fibronectin levels by ELISA
The cell pellets were washed with ice-cold 1×PBS and lysed in cell extraction buffer with
addition of the protease inhibitor cocktail and 1 mM PMSF. After incubation on ice for 30 minutes
with occasional vortexing, the lysates were centrifuged at 13,000 g for 10 min at 4°C. The
supernatants were subsequently transferred, and total protein concentration was determined using
Quant-iTTM protein assay kit. By following the protocol supplied by the manufacturer, fibronectin
levels were measured by the fibronectin mouse ELISA kit with a microplate reader Spectramax
384 (Molecular Devices, Sunnyvale, CA) at a wavelength of 450 nm. Fibronectin levels were
97
determined using the standard curve generated using the fibronectin standard provided by the
ELISA kit.
2.3.13 Real-Time PCR
The whole cell RNA was extracted from cell pellets using the RiboPure kit. RNA was
converted to cDNA using the RT2 First Strand Synthesis Kit. Quantitative PCR was carried out
using SYBR Green Master Mix and PCR primers. Primers for FN1 (encoding fibronectin) were
optimized for use in CHO was optimized for use in rat. All measurements were done in duplicate,
and a difference of less than or equal to 0.5 CT between duplicates was considered acceptable.
The ViiA™ 7 Real-Time PCR System (Thermo Fisher Scientific, Foster City, CA) and software
was used to run RT-PCR and analyze results. The comparative CT method was used to normalize
measurements relative to hypoxanthine-guanine phosphoribosyltransferase-like (LOC100769768),
an established CHO housekeeping gene. After normalization, relative mRNA expression for each
gene of interest was made across scales, treatment and time using the comparative CT method. A
log fold change greater than 2 was considered statistically significant.
The ELISA test of fibronectin and the qPCR test were performed by Bristol-Myer Squibb
(Devens, MA).
2.4 Results
A significant difference in cell culture performance at the 5-KL scale was initially observed
due to a 20-fold difference in copper concentration (0.02 μM to 0.4 μM), as described below.
98
Interestingly, for the laboratory scale (20-L), with the same conditions, no significant phenotypic
difference was observed for the two copper concentration levels, and the overall cell performance
(viability and product titer) was much higher compared to manufacturing scale. We sought to
understand the underlying biological cause for the phenotypic differences at large scale due to
copper and bioreactor scale through proteomics and metabolomics studies.
2.4.1 CHO cell growth and productivity in 5-KL vs. 20-L scale bioreactors with two levels
of copper concentration in the media (conducted by Bristol Myers Squibb)
Scale-up of a CHO DG44 fed-batch bioreactor process from 20-L to 5-KL scale showed a
dramatic decrease in productivity (Figure 2-1). During one run of the production scale, a doubling
of the productivity during the stationary phase was observed. After some testing, the increased
productivity was attributed to an elevated level, from 0.02 μM to 0.4 μM, of the trace metal, copper,
found in Na2CO3 used for pH control. Copper concentrations in basal and feed media were similar
for all experiments. For simplicity, we will refer throughout the paper to the two concentration
levels as low and high copper, respectively.
99
Figure 2- 1 The cell density, viability, titer productivity, and lactate profiles of the 5-KL and 20-L
bioreactors.
(A) Viable cell density, (B) viability, and (C) normalized titer profiles (D) lactate profiles for 20-
L (dashed lines) and 5-KL (solid lines) bioreactors under the high (blue lines) and low (red lines)
copper conditions. There are two bioreactor experiments for each condition. Profiles for the 5-KL
conditions extend out for variable durations from day 9 to day 12 since they all met the pre-defined
harvest criteria at different times, while the 20-L cultures were all extended out to 12 days.
As seen in Figure 2-1A, viable cell density (VCD) was similar for all conditions for the
first three days of culture. For the 20-L scale, VCD continued to increase for an additional 4 days
before leveling off, while for the 5-KL scale, VCD sharply decreased after 3 days before leveling
off. Significantly, the high copper condition for the 5-KL scale maintained 2 fold higher VCD
levels compared to the low copper condition. In contrast, at the 20-L scale, the VCD was only
100
about 15% higher with the high copper level compared to the low copper level. Similar trends were
observed for the %viability (%V) profiles (Figure 2-1B) with a steep decline in %V after 3 days
for the 5-KL scale. The high copper condition for the production scale process leveled off at %V
about twice that of the low copper condition; however, the high copper level still had a significantly
lower %V than found on the 20-L scale. Finally, in contrast to the 5-KL scale, for the 20-L scale,
little or no differences in %V profiles was observed between the copper levels.
The titer profiles normalized to the low copper condition for the 5-KL scale (Figure 2-1C)
also follow the trends observed for the VCD and %V profiles. The product titers in the 20-L
bioreactors reached 3.5 to 4 times higher compared to the 5-KL bioreactors, demonstrating that
extrapolation from laboratory to production scale can be limited. The high copper condition
increased the titer by 50% for the large scale, but this was still far lower than the titer observed in
20-L bioreactors. On the other hand, the increase in titer resulting from higher Cu was only about
10% for the 20-L reactors. The increase in productivity with additional copper in the 5-KL scale
and for the 20-L relative to the 5-KL scale follows directly the increase in the number of viable
cells. Further, and importantly, the specific productivity as well as the product quality attributes
were determined to be similar, independent of the copper level or scale of the reactor.
Interestingly, comparing the profiles across bioreactor scales showed little or no lactate
consumption for the 5-KL, with no significant effect resulting from increased copper (Figure 2-
1D). For the 20-L scale, the lactate concentration decreased after 120 hours; however, again, there
was no difference between the copper concentration levels. Addition of copper has previously been
shown to result in consumption of lactate, which is generally considered a desirable phenotype for
improvement of production and viability (16-20, 30). Our results are at variance with those
reported in the literature; however, it was pointed in a recent paper that the lactate behavior with
101
added copper is dependent on the cell line and conditions used (19). We next explored the causes
for the phenotypic changes of cultured cells observed in Figure 2-1 using the combination of
proteomics and metabolomics.
2.4.2 Proteomic and metabolomics analysis platform
Cells growing under the high (0.40 M) or low (0.02 M) copper levels at three different
time points in the 20-L and 5-KL bioreactors were collected and the cell pellets processed for
proteomic study. Relative quantitation of protein expression was achieved by the TMT
multiplexing labeling method with six channels. With each bioreactor size, relative protein
expression at specific time points under the two concentrations of copper was profiled. Proteins
were subsequently identified using the annotated CHO-DG 44 proteome database with Proteome
Discoverer (PD) 1.4. Cell pellets from three time points for the high and low Cu conditions at both
scales, were analyzed from duplicate bioreactors. The strategy of high resolution 2D-RP/RP LC
coupled to a Q-Exactive mass spectrometer achieved high coverage of the CHO cell proteome with
TMT reporter ion quantitation.
Between 6400 and 7000 proteins were identified in the samples of both bioreactor scales,
with the number of common quantifiable proteins close to 5000, considering only proteins with at
least 2 PSMs in all 3 time points in both replicates. The number of identified and quantifiable
proteins is listed in Table 2-1.
102
Table 2- 1 The number of identified and quantified proteins with the 5-KL and 20-L bioreactors
from the proteomic data analysis.
Protein numbers 20-L scale 5-KL scale
Total number of identified proteins 5967 6352
Number of quantified proteins 4941 5354
Number of proteins after intensity-based filtering technique 4027 4199
Due to the large volume operation and general sampling procedures, as described above,
we utilized several levels of data filtering of proteins to obtain consistent results to compare high
and low copper conditions. With the first level of filtering, more than 4000 proteins remained.
Second, the thresholds of the protein log2 ratios of each replicate and the average protein log2
ratios were set to determine the differentially regulated proteins, resulting in less than 100
differentially expressed proteins in each time point. For the 5-KL scale, 212 and 282 metabolites
were identified and quantified for the large and small bioreactor scales, respectively. The numbers
of differentially regulated proteins and metabolites determined are shown in Table 2-2 and the
differentially regulated proteins and metabolites can be found in Table 2-4 in section 2.7 Appendix.
Table 2- 2 Numbers of differentially regulated proteins and metabolites at each time points of the
both scales. The differentially regulation is by comparing the high and low copper conditions of
a given scale.
5-KL scale 20-L scale
Day 5 Day 7 Day 6 Day 10
Number of differentially regulated proteins 99 66 44 34
Number of differentially regulated metabolites 73 105 69 62
103
2.4.3 Analysis of combined differentially regulated proteins and metabolites in the 5-KL
reveals significant reduction in ROS with higher level of copper concentration in the
media and no significant copper effect in the 20-L reactor
As detailed in the section 2.3 Materials and Methods, we conducted proteomic analysis and
obtained metabolomics data for both bioreactor scales at various time points at high vs. low copper.
The differentially regulated proteins and metabolites under the various conditions are listed in
Table 2-4 in section 2.6 Appendix.
We focus first on results from analysis of the 5-KL scale cultures. Using IPA (31), we
combined the differentially regulated protein and metabolite data at several time points (Days 3, 5
and 7); however, for Days 3 and 5, no biological functions were found to be differentially altered
at a statistically significant level between the two copper levels. Importantly, on Day 7 we
observed significant changes due to the higher copper concentration (Figure 2-2 and Figure 2-3).
Figure 2-2 shows that an increase in copper led to a significant decrease in the biological functions
“cell death”, “killing of cells” and “apoptosis” (Z-values < -2), as defined by IPA. The reduction
in these biological functions is consistent with the measured phenotypic changes shown in Figure
2-1.
104
Figure 2- 2 Prediction of significantly repressed biological functions related to cell fate for the 5-
KL bioreactor using IPA.
Combined differentially regulated proteins and metabolites (high vs. low copper) were used as
input to the IPA tool. The color codes for quantitative measurements and predicted outcomes are
shown in the inset panel. Significantly repressed biological functions related to cell death and
survival (blue octagon) at day 7. Z scores of cell death, killing of cells, and apoptosis of pancreatic
cancer cell lines were -2.227, -2.179, and -2.176, respectively.
105
Figure 2- 3 Prediction of significantly repressed biological functions related to ROS generation for
the 5-KL bioreactor using IPA.
Combined differentially regulated proteins and metabolites (high vs. low copper) were used as
input to the IPA tool. The color codes for quantitative measurements and predicted outcomes are
shown in the inset panel. Significantly repressed biological functions related to free radical
scavenging (blue octagon) at day 7. Z score of production of reactive oxygen species was -2.686,
and Z score of synthesis of reactive oxygen species was -2.836.
Figure 2-3 presents the other significant biological function from the combined data – the
high copper condition reduced the level of ROS. The results in Figure 2-2 and Figure 2-3 suggest
that the reduction in ROS with higher copper is related to a decreased level of cell death. This is
106
not surprising given that increased ROS is known to cause damage to proteins, DNA, and lipids,
leading to cell death (32-34). There are a number of metabolites and proteins listed in Figure 2-2
and Figure 2-3 that are related to both oxidative stress and cell death, supporting the connection
between the two biological functions.
With respect to metabolites, cholesterol, linoleic acid, oleic acid, and glutamic acid were
observed to be down-regulated with the high copper condition on Day 7. Cholesterol can
potentially alter the mitochondrial membrane potential and hence increase ROS generation,
resulting in activation of apoptosis (35, 36). Oleic and linoleic acids have been reported to induce
ROS production through activation of NADPH oxidase (37, 38). These acids can also lead to free
fatty acid-mediated apoptosis and cell death (39, 40). Glutamic acid can cause increased ROS
production by affecting the function of succinate dehydrogenase in mitochondria (41). Thus, the
down-regulation of these metabolites with additional copper supports the reduction of ROS
generation and cell death. Further, glycine and cysteine were up-regulated in the high copper
condition, both of which are known to suppress apoptosis (42-44). Moreover, glycine and
guanosine, which was also found to be up-regulated with additional copper, were reported to
inhibit the ROS generation caused by glutamic acid (41).
With respect to proteins, the high copper condition leads to down regulation of BAX (Bcl-
associated X protein) and BAK1 (Bcl-2 antagonist or killer) on Day 5 and Day 7, respectively.
BAX and BAK1 are well-known apoptotic regulators, belonging to the Bcl-2 (B cell lymphoma 2)
family, which control and regulate the apoptotic mitochondrial events by governing the
permeability of mitochondrial membrane (45). Their down-regulation should decrease apoptosis
(46). Arylsulfatase A (ARSA), which can hydrolyze ascorbic acid 2-sulfate to ascorbic acid (47),
was up-regulated with high copper on Day 7, which again points to reduction of both cell death
107
and ROS production. Additional analysis for individual proteins and metabolites which are related
to cell death and ROS production can be found in the section 2.4.5 Additional differentially
regulated protein analysis.
We next turn to examine the combined ’Omics data for the 20-L scale where the
productivity was much higher than the 5-KL scale and where there was much lower influence with
increased copper. Here, three time points were examined - Days 3, 6 and 10. No significant
difference at any time point was found in ROS production or cell death between the two copper
levels, in agreement with the results in Figure 1. We did find, however, that there was a significant
increase in the biological function of protein synthesis on Day 6 (Z = 2.213) (Figure 2-8) and a
decrease in cell aggregation for Days 6 (Z = -2.203) and 10 (Z = -2.203). The increase in protein
synthesis may relate to the small increase in titer observed for the higher copper level in Figure 2-
1C.
To further compare the 20-L and 5-KL scales, we selected metabolites from the IPA
biological function category “reactive oxygen formation” that were observed for both Day 7 for
the 5-KL and Day 10 for the 20-L scales and compared the two scales under the same copper
concentration level. The shared relevant metabolites were reduced glutathione (GSH), sphingosine,
sorbitol, NAD+, cysteine, guanosine, glycine, cholesterol, homocysteine, and glutamic acid.
Subjecting these metabolites to IPA analysis, we found that at the low copper condition, ROS
formation was significantly elevated for the 5-KL scale, relative to the 20-L scale (Z = 2.190), and
was still higher for the high copper condition (comparing again 5-KL to 20-L) but not at a
significant level (Z = 1.452) (Figures 2-4A and 2-4B). The conclusion is that the ROS stress is
observed for the low copper condition for the industrial production scale, relative to the 20-L scale,
108
but that the high copper condition for the 5-KL scales moderates the stress sufficiently that the
ROS difference is not statistically different for the two scales.
Figure 2- 4 Prediction of the formation of ROS for 5-KL vs 20-L scales with low and high copper
conditions.
The color legends are same as in Figure 2. (A) Z score 2.190 under low copper conditions; (B) Z
score 1.452 under high copper condition. (C) The western blotting of glutathione peroxidase (GPx)
1/2 for the 5-KL (Day 5 and Day 7) and 20-L scale (Day 6 and Day 10) and under different copper
levels. Higher regulated GPx 1/2 level indicated higher oxidative stress. The ROS formation is
significantly activated between the two bioreactor scales with the low copper condition, but not
with the high copper condition, indicating that the difference for 5-KL vs 20-L scales in ROS
formation is moderated by additional copper. The quantitative estimation was performed with
ImageJ (http://imagej.nih.gov/ij/).
To provide additional support for the finding of oxidative stress in the large scale bioreactor
and its reduction at the higher copper level, the level of glutathione peroxidase (GPx) was
determined by western blotting. GPx, a marker for oxidative stress, catalyzes the reduction of
109
peroxides by means of reducing glutathione, forming glutathione disulfide and water (33). As
seen in Figure 2-4C, GPx was down-regulated with high copper for Days 5 and 7 of the 5-KL scale,
reflecting the reduced response of the cells to lower oxidative stress. Further, for the 20-L scale,
GPx showed no significant difference between the two copper levels for western blotting on Days
6 or 10 (Figure 2-4C). As a clear demonstration of the difference in oxidative stress between the
5-KL and 20-L scales, Figure 3C further shows by western blotting that a much higher level of
GPx is produced for the industrial scale bioreactor under the low copper condition. The cells in
the 5-KL scale respond to the increased oxidative stress by producing more GPx to try to alleviate
this stress. Furthermore, with further study, GPx could become a biomarker for the oxidative stress
for CHO cell bioreactors. The picture that emerges is that the 5-KL reactor is under oxidative
stress and that this stress affects viable cell density (apoptosis and cell death).
2.4.4 Hypoxia (intermittent) in 5-KL bioreactor reduces cell viability and productivity
The cause of the increased ROS in the 5-KL industrial scale was next explored. One of the
major concerns in process control, especially for production scale bioreactors, is the level and
uniformity of oxygen mixing throughout the reactor. Previous studies have shown that the oxygen
transfer coefficient is 50% lower in 5-KL scale relative to the 20-L scale (11), suggesting the
potential for insufficient oxygen mass transfer in 5-KL bioreactors. In addition, the mixing time
of a 5-KL bioreactor at its maximum allowed agitation (>100 seconds) is much longer than that of
a 20-L bioreactor (42-86 seconds) (11). It is reasonable to assume the presence of an oxygen
gradient in 5-KL bioreactors during cell culture. Since cells are under continual movement due to
stirring with a relatively long mixing time, cells can, from time to time (intermittent), experience
110
lower oxygen levels, i.e. hypoxia. Furthermore, hypoxia is known to increase ROS (48), and,
separately, it has been shown that intermittent hypoxia induces potentially even greater cellular
oxidative stress than continuous hypoxia (48-52). In the present study, hypoxia in the large
bioreactor and increased ROS observed in the ‘Omics studies (Figure 2-2 and Figure 2-3) are likely
related.
Although DO profiles are monitored and controlled at the same set-points for all
experimental conditions, the measurements taken are indicative of DO levels at a single point in
the bioreactor. Furthermore, DO gradients leading to hypoxic conditions are difficult to measure
in large tanks. Therefore, to support the hypothesis of hypoxic stress in the large scale process,
we measured the relative levels of fibronectin by western blotting, ELISA, and qPCR. Fibronectin
is known to be up-regulated in hypoxia (53, 54). As shown in Figures 2-5A and 2-5C (ELISA and
western blotting), relative to high copper, fibronectin is greater at the low copper level on Day 7
for the production scale process of 5-KL. qPCR (Figure 2-5B) also showed up-regulation of the
fibronectin gene for the lower copper level on Day 7 for the production scale (Day 5 was not
measured by qPCR). Further, the increased level of fibronectin with process time for the 5-KL
scale with the low copper condition indicates that the effects of hypoxia were cumulative. On the
other hand, for the 20-L scale, Figure 2-5 shows the differences of the fibronectin level for the two
copper conditions to be far less relative to the 5-KL scale. The results support that intermittent
hypoxic stress is a factor in the reduction of the viable cell density after Day 3 (Figure 2-1) for the
industrial scale process and that the higher copper level is able to moderate the stress, resulting in
the higher productivity for the 5-KL bioreactor. These results suggest that fibronectin could
become a biomarker for hypoxic stress with production scale CHO cell bioreactors.
111
Figure 2- 5 Results demonstrating hypoxic stress.
The regulation profile of selected marker for hypoxia stress, fibronectin, from (A) ELISA, (B) the
qPCR, and (C) the western blotting for the 5-KL and 20-L scale and under different copper levels
at certain time points. Higher level of fibronectin correlates with a higher hypoxic stress level. The
error bars were calculated based on the two biological replicates. The quantitative estimation was
performed with ImageJ (http://imagej.nih.gov/ij/).
2.4.5 Analysis of additional differentially regulated proteins supports the ROS and hypoxia
roles in the 5-KL bioreactor
We next explored the differentially regulated proteins separately using a different data
analysis platform – MetaCore – to search for additional non-metabolic pathways which could
further support the connection between hypoxia, oxidative stress, and the influence of the copper.
For the 5-KL scale, deactivation of apoptotic and cell adhesion-related pathways was found with
112
high copper (see Table 2-2). It has been reported that increased ROS levels can alter endothelial
barrier function and cause the differential-regulation of certain cell adhesion related proteins (51,
55). Thus, the reduction in these pathways by high copper are in support of the reduction in ROS
and oxidative stress in the large bioreactor.
Table 2- 3 MetaCore analysis of proteomic data of the 5-KL scale. The significant differentially
regulated proteins related to apoptosis and cell adhesion pathways
Pathway groups a Day 5 Day 7
Cell adhesion N-cadherin (0.52)b,
β-actin (0.93)
β-catenin (-0.44),
p120-catenin (-0.36),
desmoplakin (-0.37)
Apoptosis and survival Bax (-0.38) elF2S1 (-0.36),
Bak (-0.51)
a Significant pathways with p values < 0.01.
b Log2 ratios of high vs. low copper conditions.
Further, β-catenin, one of the adherent junction proteins, was found to be significantly
down-regulated with the high level of copper in the production scale bioreactor (Table 2-3). The
level of β-catenin is decreased as excess ROS is diminished (55), supporting the role of copper in
the reduction of ROS. Moreover, since β-catenin is an important control of FOXO, HIF-1, and
Wnt signaling, the regulation of these signaling pathways may potentially play a role in the cell
fate under hypoxic conditions (56). With increase in copper, several proteins downstream of Wnt
signaling, MMP-3 (Day 5) (57) and CD44 (Day 7) (58), were found to be down-regulated,
indicating the deactivation of Wnt signaling (Table 2-4). The deactivation of Wnt/β-catenin
113
signaling likely slowed the G1/S phase transition of the cell cycle through the regulation of cyclin
D1 and c-Myc (59, 60). This result could decrease cell death and thus raise CHO cell productivity
(61).
2.4.6 The differentially regulated proteins related to important biological functions and
pathways
As shown in Figure 2-2, 2-3, Table 2-3 and Table 2-4, there are several differentially
regulated proteins and metabolites involved in significant biological functions and pathways with
the 5-KL scale. Multiple endocrine neoplasia I (MEN1), which was down-regulated with the high
copper condition on Day 7, can induce apoptosis through the response of BAX and BAK1 (62).
p120-Catenin (CTNND1) was also down-regulated on Day 7 with high copper for the 5-KL scale.
The reduction of this protein was reported to decrease apoptosis (63). Moreover, p120-catenin
directly functions as a part of the “core” cadherin-catenin complex (64) with other members
including β-catenin, which was also down-regulated on Day 7. Other proteins, including folliculin
(FLCN, also known as BHD) (65), angiomotin (AMOT) (66), and spliceosome-associated protein
CWC15 (CWC15) (67) were all reported to be involved in the apoptotic process. The down-
regulation of these proteins on Day 7 with high copper points to the reduced cell death as seen in
Figure 1.
γ-Glutamyltransferase (GGTL3) was down-regulated on Day 7 with the high copper
condition. It is involved in glutathione degradation for other physiological functions other than
scavenging ROS (68, 69). The lower level of GGTL3 indicates a lower GSH degradation rate and
a larger GSH pool as antioxidant. Moreover, several amino acids were found differentially
114
expressed in the high copper condition. Histidine, being a singlet oxygen free radical scavenger,
can protect the cell from ROS induced apoptosis and cell death (70). γ-Amino butyric acid (GABA),
which was found up-regulated in high copper, can also prevent cell death by inhibiting apoptosis
by membrane depolarization and Ca2+ influx, the latter of which activates PI3-K/Akt-dependent
growth and survival pathways (71). A number of polyamine compounds such as spermine,
spermidine and putrescine were also found up-regulated in the presence of high copper.
Polyamines can protect against ROS generated stress by reducing oxidative damage of DNA both.
Polyamines can also protect cells against ROS-induced glutathione oxidation, lipid peroxidation
and protein oxidation (72). Urea, a cellular nitrogenous compound breakdown product, was found
to be downregulated in the high copper samples. Urea has been shown to increase production of
mitochondria associated ROS generation (73).
2.4.7 Superoxide dismutase 1 is potentially involved in the reduction of intermittent
hypoxia and oxidative stress with addition of copper in the 5-KL bioreactor
Copper is a well-known trace metal co-factor for a number of enzymes (13). Given the
reduction in oxidative stress with increased copper, we sought potential copper binding enzyme
targets that acted as antioxidants. One such enzyme which is a copper binding protein is superoxide
dismutase 1 (SOD1) which catalyzes the conversion of the superoxide radical to oxygen or
hydrogen peroxide (74). While SOD1 was detected in our proteomics workflow, it was removed
by the filtering steps. Nevertheless, we, probed SOD1 at both copper levels on Day 7 by western
blotting. SOD1 is found to be up-regulated on Day 7 for high copper with the 5-KL scale (Figure
5). This up-regulation may result in a stronger defense against the excess ROS generated by the
115
intermittent hypoxia. On the other hand, additional copper did not up-regulate SOD1 expression
for the 20-L scale (Figure 2-6), likely due to the lack of need for the additional antioxidant.
Moreover, as seen in the western blotting, the expression of the SOD1 was similar between the 5-
KL and 20-L scale under the low copper condition. Since copper is a regulator for a wide range of
signal transduction, the full range of specific multifaceted molecular actions of copper clearly
require further investigation.
Figure 2- 6 Western blotting of SOD1, a copper-binding enzyme, for the 5-KL and 20-L scales
and under different copper levels.
The quantitative estimation was performed with ImageJ (http://imagej.nih.gov/ij/).
2.5 Discussion
We have, for the first time, presented a systems biology study of a manufacturing scale
CHO bioprocess. In this work, lower cell viability and process productivity were observed in a 5-
KL production scale, relative to a 20-L benchtop bioreactor (Figure 2-1). We found that, due to
trace copper in the sodium bicarbonate used to control pH (increase in copper from 0.02 μM to 0.4
μM), the cell viability and process productivity of the 5-KL scale was doubled during the stationary
phase process, without at the same time disturbing the product quality attributes. This increase in
116
copper concentration had only a minor effect on the performance of the 20-L bioreactor. These
results clearly show that the extrapolation of phenotypic behavior on the lab scale does not
necessarily lead to the same behavior in the production scale process.
In previous studies, researchers found that increased copper led to high productivity of the
CHO cell process with a decrease (consumption) of lactate in the stationary region. It was reasoned
that the higher copper level affected copper binding to the COX proteins, thus reducing ROS
produced in the mitochondria (18, 20). In the present study, we did not find lactate consumption
for the 5-KL scale. Importantly, the previous studies used much smaller laboratory scale bioreactor
volumes of 5-L or less, at least a 1000- fold decrease compared to the manufacturing scale here.
Also, the concentration of copper was generally higher, and process protocols likely differed from
our study (e.g., media, CHO strains, feeding regimens, reactor designs, etc.).
We sought to understand the reasons for doubling of the titer with the addition of trace
amounts of copper in the 5-KL scale, while the same effect was not observed for the lab scale
process. Quantitative proteomics was conducted on the two bioreactor scales at the two copper
levels in the growth, stationary and early death phases. For the 5-KL production scale, analysis of
the proteins differentially regulated above and below the designated cut-off threshold of high
versus low copper provided only limited statistically significant insight into the causes of the
phenotypic differences. Analysis of the metabolomics data also provided only limited insight.
Importantly, the combination of the proteomics and metabolomics differentially regulated data did
lead to statistically significant biological insight. This success of the combined-omics
demonstrates the potential of ’Omics in elucidating underlying biology of complex processes
(systems biology). Undoubtedly, further additions of other ’Omics data, e.g. lipidomics,
117
transcriptomics, etc., would provide deeper insight into the biology of the production of
biopharmaceuticals.
The picture that emerges from our ’Omics analysis is that for the 5-KL scale, cell death
and ROS production was reduced when the higher level of copper was present (Figure 2-2 and 2-
3). The reduced ROS generation was likely related to decrease cell death, since oxidative stress
caused by excess ROS is a well-known trigger for apoptosis (33, 45, 75). Reduction in the high
copper condition of oleic, linoleic and glutamic acids, known to increase ROS production supports
this conclusion. That apoptosis was decreased in the high copper level can be clearly seen in the
reduction in the concentration of BAX and BAK1. On the other hand, analysis of the 20-L scale
between the two copper levels did not show statistically significant differences of ROS production,
suggesting less oxidative stress than found for the 5-KL scale. Western blot analysis of glutathione
peroxidase, a marker for oxidative stress, clearly shows that the 5-KL scale is under more stress
than the 20-L scale (Figure 2-4).
We then investigated the cause of the increased ROS production for the industrial scale
process. Given the general concern of gas mixing in kiloliter scale processes and a previous study
that suggested the existence of lower oxygen concentration zones in large-scale bioreactors (11),
we hypothesized that the periodic contact of CHO cells with lower oxygen regions (intermittent
hypoxia) was related to the increased ROS level in the large scale bioreactor. Hypoxia is well
known to create excess ROS (48, 52). Fibronectin, which is known to be up-regulated under
hypoxia (53, 54), was found by qPCR, ELISA and western blotting to be higher in the large scale
bioreactor compared to the 20-L scale under the low copper condition (Figure 2-5). Thus, it seems
likely that the difficulty in efficient oxygen mixing for the 5-KL scale resulted in intermittent
hypoxia.
118
Figure 2-5 further shows that increased copper had a significant effect in reducing the level
of fibronectin in the production scale process, and only a minor reduction of the protein for the lab
scale process. Thus, the increased copper would appear to be counteracting the hypoxia. As
the ’Omics results show, the higher copper level reduced the production of ROS, an outgrowth of
the hypoxia. Further study would be required to determine exactly how copper affects the
production of ROS, given that copper, as other trace metals, can bind to many enzymes. We did
find by western blotting that SOD-1, a copper binding enzyme and antioxidant, appears to be
affected by the increased copper (Figure 2-6). However, more study would be required to elucidate
the biology involved.
In conclusion, the present study is a clear example of how multi-omics approaches can be
applied to explore the biology of the process and improve productivity (4). Such strategies will
drive improvement in bioprocessing and enable the efficient development of robust, scalable and
well-understood processes. Furthermore, such studies can lead to biomarkers that can be utilized
to rapidly monitor changes in a process. With continued advances, there is little doubt systems
biology will find increasing use in biomanufacturing through a greater understanding of the
biology of the process.
2.6 Conclusion
As a summary of the conclusion (Figure 2-7) in the industrial scale 5-kL bioreactors for
biopharmaceutical production, CHO cells undergo intermittent hypoxia due to incomplete oxygen
mixing. As found through combined metabolomics and proteomics datasets, this condition leads
to excess ROS. The resultant stress causes decreased productivity compared to bench-scale 20-L
119
bioreactors. Additional copper, from 0.02 μM to 0.4 μM, reduced the stress and improved the
productivity 2 fold for the 5-KL scale. For the 20-L scale, copper did not affect productivity
significantly as the cell stress was lower.
Figure 2- 7 The scheme of the summary that increased copper reveals hypoxia as a cause of lower
productivity on scale-up to industrial CHO bioprocess.
120
2.7 Appendix
2.7.1 Perspective of biological effects caused by additional copper in the media.
CHO cells in the large scale bioreactor were shown to encounter hypoxic stress as well as
hypoxia-induced oxidative stress caused by excess ROS. An increase in copper concentration
helped the cells resist the stresses, leading to increased viability. Previously, we proposed that the
copper-binding protein SOD1 could be one of the reasons of how copper affected the bioprocess
in the large scale. However, in addition to SOD1, there are other copper influenced processes that
could possibly be affected based on the current biological analysis. Two of the most promising
hypotheses are list below, but they will need further study to refine or confirm.
First, it could be the mitogen-activated protein kinase (MAPK) signaling transduction (14,
76, 77) that was affected by additional copper and which may aid in cell survival under cellular
stresses. The significant pathways from MetaCore proteomic analysis of both the 5-KL and 20-L
bioreactors were studied, and MAPK signaling transduction attracted our attention. The pathway
maps from MetaCore contain many sub-biological processes, and MAPK signaling was the one
most shared in the top 10 statistically significant pathways (ranked by the p values) of the 5-KL
and 20-L scales. Although the proteins directly related to MAPK signaling such as mitogen-
activated protein kinase kinases (MAPKKs) and mitogen-activated protein kinases (MAPKs) were
not differentially regulated, the activity of the MAPKKs and MAPKs depends on their
phosphorylated forms, not their overall protein amounts. Unfortunately, phosphoproteomics was
beyond the scope of the present study. For the 5-KL bioreactor, JNK and p38 were related to
apoptosis and survival pathways. For the 20-L bioreactor, ERK was related to cell adhesion and
cytoskeleton remodeling pathways, and JNK showed up in the development pathways. Moreover,
121
the elevation of N-cadherin in the 20-L bioreactor in Day 6 could also be an indicator of the
enhancement of ERK signaling with the high copper condition. The N-cadherin (CDH2) level was
reported to decrease after the ERK inhibition treatment, and the knockdown of N-cadherin also
caused a decrease of the phosphorylation rate of ERK (78). Also, the activation of JNK could
increase phosphorylated Bax and Bak, resulting in the activation of these two pro-apoptotic
proteins (79), and p38 would activate the phosphorylated Bax as well (80). Importantly, the
oxidative stress caused by ROS can potentially induce activation of MAPK pathways (77), which
are reported to be regulated by copper (14). It is possible that copper helped to balance the
activation status of the MAP kinase which benefitted the cell survival under the oxidative stress in
the large scale bioreactor. Thus, MAPK signaling could be a candidate process through which
copper affected the growth status of CHO cells.
Second, it is possible that the regulation between Wnt signaling and FOXO and HIF-1
related signaling transductions was influenced by additional copper, leading to a stronger defense
against both hypoxic and oxidative stresses in the 5-KL scale. In our study, β-catenin was found
to be down-regulated with additional copper on Day 7 in the 5-KL scale. First, a higher ROS level
is known to increase the abundance of β-catenin (55), supporting the role of copper in the reduction
of ROS in the large scale. Second, β-catenin is a key protein that regulates the activity balance of
FOXO, HIF-1, and Wnt signaling, all of which pathways play a role in cell fate under the hypoxic
condition (56). With the high copper condition in the 5-KL scale, several proteins which are
downstream of Wnt signaling, MMP-3 (Day 5) (57) and CD44 (Day 7) (58), were down-regulated,
indicating the deactivation of Wnt signaling transduction. The deactivation of Wnt/β-catenin
signaling can potentially slow or even arrest the G1/S phase transition of cell cycle through the
regulation of cyclin D1 and c-Myc (59, 60). The arresting/slowing of cells in G1/S phase could be
122
one reason of the decreased cell death and hence increase of the host cell productivity for the
therapeutic protein in the 5-KL scale (61, 81). Moreover, cellular oxygen consumption under the
hypoxia condition can lead to depleted oxygen, resulting in even more severe hypoxic stress. For
this reason, the beneficial cellular stress response for hypoxia should be the cell cycle arrest to
reduce oxygen consumption, increasing cell survival chances (82). Thus, copper might promote
an advantageous response through the regulation of the Wnt/FOXO/HIF-1 signaling balance in the
CHO cells under stress in the large scale.
Based on the biological analysis, MAPK signaling and Wnt/FOXO/HIF-1 signaling are
promising targets which might have been affected by additional copper, but further study is
required to explore these hypotheses. Moreover, the effects of copperas any trace metal are likely
complex with multiple factors. The unraveling of this complex effect will need to be explored in
future studies.
123
Figure 2- 8 Prediction of significantly activated biological functions for the 20-L bioreactor using
IPA at Day 6.
Combined differentially regulated proteins and metabolites (high vs. low copper) were used as
input to the IPA tool. The color codes for quantitative measurements and predicted outcomes are
shown in the inset panel. Significantly activation of biological function related to synthesis of
protein (orange octagon) is with Z-score 2.213.
Table 2- 4 Differentially regulated proteins and metabolites with the 5-KL and 20-L bioreactors.
5-KL scale Day 5
Differentially Expressed Proteins Differentially Regulated Metabolites
RefSeq Genes names Average Day 5 log2 ratios
(high copper/low copper)
Metabolite Names Average Day 5 log2 ratios
(high copper/low copper)
NM_177717.4 4732456N10Rik 0.52 γ-glutamylglycine 1.44
NM_001163493.1 Stard13 0.60 palmitoyl sphingomyelin -0.63
NM_153319.2 Amot 0.40 n-butyl oleate -0.70
NM_175318.4 Spty2d1 0.38 13-methylmyristic acid -0.73
NM_172382.2 Kdm4a 0.44 leucylphenylalanine -0.90
NM_145615.4 Etfa 0.65 leucylisoleucine -0.66
NM_009610.2 Actg2 0.93 γ-glutamyltryptophan 0.80
NM_175659.1 Hist1h2ah 2.19 isoleucylglycine -0.66
NM_025314.3 Dtd1 0.39 Alanyl-Leucine -1.26
NM_026758.3 Mphosph6 0.38 Valyl-Phenylalanine -1.15
NM_133939.1 Lsm8 0.51
(S)-3-methyl-2-
oxopentanoic acid 0.75
NM_001048250.2 2810008M24Rik 0.51
1-oleoyl
lysophosphatidylcholine 1.26
NM_001033135.3 -1.01 1-oleoylglycerol -0.64
12
4
Table 2-4 (continued)
5-KL scale Day 5 (continued)
Differentially Expressed Proteins Differentially Regulated Metabolites
RefSeq Genes names Average Day 5 log2 ratios
(high copper/low copper)
Metabolite Names Average Day 5 log2 ratios
(high copper/low copper)
NM_001001602.2 Dab2ip -0.38 1-palmitoylglycerol -1.02
NM_146183.2 Zfp428 -0.40 2-oleoylglycerol -0.99
NM_198113.2 Ssh3 -0.41 2-palmitoylglycerol -1.86
NM_139236.3 Nol6 -0.68 3'-adenylic acid -0.72
NM_027992.3 Tmem106b -0.74 4-guanidinobutanoic acid 0.69
NM_172508.2 Dse -0.50
4-hydroxyphenyllactic
acid, (DL)-isomer 1.01
NM_145531.2 Spg11 -0.41 9Z-hexadecenoic acid -0.86
NM_026636.2 5430437P03Rik -0.41 adenine -0.61
NM_029841.3 2510039O18Rik -0.32 α-D-glucose 6-phosphate 0.88
NM_001005767.4 Parl -0.50 Asp-Leu -1.71
NM_183106.2 Ttc17 -0.33 behenic acid -1.30
NM_008822.2 Pex7 -0.52 cis-10-heptadecenoic acid -1.37
NM_197991.2 2310044H10Rik -0.31 desmosterol 0.68
NM_130881.2 Pabpc4 -0.47 D-fructose 0.61
NM_007527.3 Bax -0.38 erythronic acid -0.75
NM_133683.3 Tmem19 -0.30 GABA -0.73
12
5
Table 2-4 (continued)
5-KL scale Day 5 (continued)
Differentially Expressed Proteins Differentially Regulated Metabolites
RefSeq Genes names Average Day 5 log2 ratios
(high copper/low copper)
Metabolite Names Average Day 5 log2 ratios
(high copper/low copper)
NM_010380.3 H2-D1 -0.50 galactitol 0.90
NM_144857.1 BC011248 -0.53 galactose-1-phosphate -0.67
NM_008788.2 Pcolce -0.34 γ-glutamylisoleucine 0.67
NM_008408.4 Stt3a -0.38 γ-glutamyl-leucine 0.73
NM_080561.3 Rnf216 0.34 γ-glutamylphenylalanine 0.73
NM_175394.2 Wtap 0.30 γ-glutamylthreonine 0.76
NM_030252.2 BC003266 0.32 γ-glutamyl-valine 0.75
NM_001177965.1 Naa10 0.33 GDP fucose 0.66
NM_001008238.2 Bnip2 0.47 glutathione -0.91
NM_001076676.1 Usp33 0.37 glycylleucine -2.19
NM_144806.2 Prpsap2 0.32 guanine 0.93
NM_009104.2 Rrm2 0.34 hypotaurine 0.80
NM_009736.3 Bag1 0.42 Ile-Ala -0.64
NM_134131.2 Tnfaip8 0.33 Ile-Ile -1.19
NM_172552.3 Tdg 0.37 inositol 1-phosphate -1.52
NM_025349.2 Lsm7 0.39 lauric acid -0.66
NM_138753.2 Hexim1 0.33 Leu-Leu -0.93
12
6
Table 2-4 (continued)
5-KL scale Day 5 (continued)
Differentially Expressed Proteins Differentially Regulated Metabolites
RefSeq Genes names Average Day 5 log2 ratios
(high copper/low copper)
Metabolite Names Average Day 5 log2 ratios
(high copper/low copper)
NM_001205226.1 Cnot1 0.30 L-glutamic acid -0.77
NM_021414.5 Ahcyl2 0.55 L-glutamine -1.99
NM_001033439.2 Lrch1 0.40 L-homocysteine -0.93
NM_177545.4 Vangl1 0.42 linoleic acid -0.95
NM_001099624.2 Rapgef2 0.47 L-lactic acid 0.93
NM_025365.3 Tomm6 0.54 L-ornithine 1.43
NM_009680.3 Ap3b1 0.37 L-serine 0.59
NM_013840.3 Uxt 0.38 myristic acid -0.60
NM_001161816.1 Gm15455 0.37 N-acetyl-L-tyrosine -0.60
NM_001177946.1 Aamdc 0.31 nervonic acid 0.63
NM_009193.1 Slbp 0.39 N-formylmethionine 0.69
NM_007664.4 Cdh2 0.52 oleylamide -1.04
NM_001162989.1 Phax 0.33 orotic acid 1.83
NM_008761.3 Fxyd5 -0.38 Phe-Leu -0.90
NM_019478.3 Pqbp1 -0.38 phenylalanylphenylalanine -1.13
NM_007856.2 Dhcr7 -0.38 Phe-Ser -1.12
NM_011127.2 Prrx1 -0.31 putrescine 1.24
12
7
Table 2-4 (continued)
5-KL scale Day 5 (continued)
Differentially Expressed Proteins Differentially Regulated Metabolites
RefSeq Genes names Average Day 5 log2 ratios
(high copper/low copper)
Metabolite Names Average Day 5 log2 ratios
(high copper/low copper)
NM_010809.1 Mmp3 -0.47 pyrophosphate -0.68
NM_011399.3 Slc25a17 -0.30 pyrrolidonecarboxylic acid 0.59
NM_001135567.1 1190007I07Rik -0.31 rac-1-stearoylglycerol -0.72
NM_008130.2 Gli3 -0.59 Ser-Leu -1.53
NM_001081118.1 Phrf1 -0.32 stearic acid amide -1.05
NM_028440.1 3110003A17Rik -0.87 trans-4-hydroxy-L-proline 0.59
NM_026744.3 Mrpl53 -0.33 Tyr-Leu -0.62
NM_019951.1 Sec11a -0.45 UDP-D-galactose -0.80
NM_028208.1 Ptar1 -0.38 urea -0.66
NM_172402.3 Slc25a32 -0.31 xanthosine monophosphate 0.83
NM_026554.4 Ncbp2 -0.47
NM_019755.4 Plp2 -0.35
NM_011014.2 Sigmar1 -0.34
XM_003946427.1 Pwp2 -0.56
NM_008222.4 Hccs -0.39
NM_178602.3 Grinl1a -0.49
NM_198027.2 Alkbh6 -0.35
12
8
Table 2-4 (continued)
5-KL scale Day 5 (continued)
Differentially Expressed Proteins
RefSeq Genes names Average Day 5 log2 ratios
(high copper/low copper)
NM_175518.5 D730040F13Rik -0.50
NM_001160182.1 Tor1aip2 -0.36
NM_027259.1 Polr2i -0.30
NM_001081151.1 Gan -0.40
NM_025969.4 1700034H14Rik -0.37
NM_009444.1 Tgoln2 -0.40
NM_025813.3 Mfsd1 -0.43
NM_145959.3 D15Ertd621e -0.34
NM_008889.2 Ppp1r14b -0.31
NM_001078167.1 Sfrs1 -0.36
NM_146019.2 Chd3 -0.32
NM_026698.2 Tmem129 -0.38
NM_001167680.1 Rhbdf2 -0.31
NM_024187.4 U2af1 -0.41
NM_053162.2 Mrpl34 -0.40
NM_020585.2 Golga7 -0.30
NM_029211.2 Rnf121 -0.34
12
9
Table 2-4 (continued)
5-KL scale Day 5 (continued)
Differentially Expressed Proteins
RefSeq Genes names Average Day 5 log2 ratios
(high copper/low copper)
NM_009279.3 Ssr4 -0.30
NM_030566.2 Rabep2 -0.35
5-KL scale Day 7
Differentially Expressed Proteins Differentially Regulated Metabolites
RefSeq Genes names Average Day 7 log2
ratios
(high copper/low copper)
Metabolite Names Average Day 7 log2
ratios (high copper/low
copper)
NM_177717.4 4732456N10Rik -0.95 γ-glutamylglycine 1.34
NM_001033135.3 Rnf149 -0.98 palmitoyl sphingomyelin -0.96
NM_144857.1 BC011248 -0.55 n-butyl oleate -1.20
NM_026229.4 Gpr89 -0.40 13-methylmyristic acid -2.08
NM_025368.3 Josd2 -0.58 γ-glutamyltryptophan 1.35
NM_021330.4 Acp1 -0.52 1-stearoyl-GPI (18:0) -1.16
NM_011515.4 Vamp7 -0.36 isoleucylglycine -0.64
NM_025628.2 Cox6b1 -0.33 Leucyl-Methionine 1.22
13
0
Table 2-4 (continued)
5-KL scale Day 7 (continued)
Differentially Expressed Proteins Differentially Regulated Metabolites
RefSeq Genes names Average Day 7 log2
ratios
(high copper/low copper)
Metabolite Names Average Day 7 log2
ratios (high copper/low
copper)
NM_001177785.1 Cd44 -0.35
(S)-3-methyl-2-oxopentanoic
acid 1.62
NM_023434.3 Tox4 0.36 13,16-docosadienoic acid -1.20
NM_175151.4 Tatdn1 0.33
1-oleoyl
lysophosphatidylcholine 1.26
NM_153319.2 Amot 0.46 1-oleoylglycerol -1.33
NM_011749.4 Zfp148 0.32
1-oleoyl-
lysophosphatidylethanolamine -0.98
NM_031998.2 Tsga14 0.48 1-palmitoylglycerol -1.14
NM_029759.3 Fam54b 0.31 2-oleoylglycerol -3.07
NM_175095.4 Commd2 0.41 2-palmitoylglycerol -1.66
NM_177780.3 Dock5 0.39 3'-adenylic acid -2.26
NM_026899.3 Ssu72 0.30 4-hydroxybutanoic acid -1.15
NM_025946.5 Romo1 1.28
4-hydroxyphenyllactic acid,
(DL)-isomer 0.64
NM_028626.1 Mcee 0.64 9Z-hexadecenoic acid -1.82
NM_153530.2 Dis3l2 0.35 9Z-tetradecenoic acid -0.91
13
1
Table 2-4 (continued)
5-KL scale Day 7 (continued)
Differentially Expressed Proteins Differentially Regulated Metabolites
RefSeq Genes names Average Day 7 log2
ratios
(high copper/low copper)
Metabolite Names Average Day 7 log2
ratios (high copper/low
copper)
NM_009713.4 Arsa 0.39 acetylleucine 0.83
NM_008130.2 Gli3 -0.34 α-D-glucose 6-phosphate 0.90
NM_026120.4 2410127L17Rik -0.31 α-ketoisocaproic acid 1.04
NM_024231.2 Zfpl1 -0.31 azelaic acid -0.82
NM_026758.3 Mphosph6 -0.48 behenic acid -0.84
NM_019951.1 Sec11a -0.69 β-alanine -0.85
NM_025436.2 Sc4mol -0.31 β-glycerophosphoric acid -1.05
NM_013897.2 Timm8b -0.41 biopterin 0.74
NM_173376.3 Rbmx2 -0.48 cadaverine 2.18
NM_010593.1 Jup -0.44 cholesterol -1.05
NM_026932.4 Ebna1bp2 -0.36 cis-10-heptadecenoic acid -1.70
NM_146018.1 Flcn -0.34 coenzyme A 0.64
NM_026282.5 Spc24 -0.35 D-fructose 0.74
NM_007523.2 Bak1 -0.51 eicosa-11Z, 14Z-dienoic acid -1.49
NM_198102.2 Tra2a -0.37 erythritol 1.18
13
2
Table 2-4 (continued)
5-KL scale Day 7 (continued)
Differentially Expressed Proteins Differentially Regulated Metabolites
RefSeq Genes names Average Day 7 log2
ratios
(high copper/low copper)
Metabolite Names Average Day 7 log2
ratios (high copper/low
copper)
NM_198113.2 Ssh3 -0.58 ethanolamine -1.58
NM_027992.3 Tmem106b -0.69 fumaric acid 0.65
NM_019733.2 Rbpms -0.31 GABA 1.55
NM_030147.2 Brd8 -0.34 galactose-1-phosphate -0.67
NM_197991.2 2310044H10Rik -0.40 γ-glutamylisoleucine 0.80
NM_025813.3 Mfsd1 -0.38 γ-glutamyl-leucine 1.02
NM_026114.3 Eif2s1 -0.36 γ-glutamylphenylalanine 0.61
NM_007924.2 Ell -0.40 γ-glutamylthreonine 1.05
NM_009009.4 Rad21 -0.31 γ-glutamyl-valine 1.24
NM_145531.2 Spg11 -0.36 glycine 0.75
NM_018776.1 Crlf3 -0.43 glycylglycine 0.92
NM_133835.2 Ubac1 -0.31 glycylleucine -0.94
NM_019837.2 Nudt3 -0.30 gondoic acid -2.06
NM_001205226.1 Cnot1 -0.42 guanosine 0.98
NM_023153.3 Cwc15 -0.30 heptadecanoic acid -1.16
13
3
Table 2-4 (continued)
5-KL scale Day 7 (continued)
Differentially Expressed Proteins Differentially Regulated Metabolites
RefSeq Genes names Average Day 7 log2
ratios
(high copper/low copper)
Metabolite Names Average Day 7 log2
ratios (high copper/low
copper)
NM_018740.2 Rai12 -0.64 heptanoic acid -0.86
NM_144786.2 Ggt7 -0.41 hypoxanthine 1.22
NM_001168488.1 Men1 -0.36 indole-3-lactic acid 2.86
NM_001135567.1 1190007I07Rik -0.50 inositol 1-phosphate -1.98
NM_001177556.1 Gng12 -0.33 L-alanine 0.80
NM_026490.2 Mrpl19 -0.35 lauric acid -1.44
NM_028812.3 Gtf2e1 -0.40 L-cysteine 0.68
NM_028440.1 3110003A17Rik -0.34 L-cystine -0.74
NM_145073.2 Hist1h3g -0.30 Leu-Trp 0.67
NM_026448.3 Klhl7 -0.33 L-glutamic acid -0.86
NM_026452.2 Coq9 -0.47 L-glutamic acid 5-methyl ester 0.83
NM_023842.2 Dsp -0.37 L-histidine 0.97
NM_001085450.1 Ctnnd1 -0.36 L-homocysteine -1.23
NM_145610.2 Ppan -0.36 L-homoserine 1.35
NM_133763.1 Dnttip1 -0.30 lignoceric acid -1.12
13
4
Table 2-4 (continued)
5-KL scale Day 7 (continued)
Differentially Regulated Metabolites
Metabolite Names Average Day 7 log2
ratios (high copper/low
copper)
linoleic acid -2.00
L-isoleucine 0.63
L-lactic acid 1.09
L-ornithine -0.59
L-serine 0.75
myristic acid -1.70
N-acetyl amino acid 0.99
N-acetyl-L-methionine -1.26
NAD+ 0.65
nervonic acid -0.88
N-formylmethionine 0.72
oleic acid -0.95
oleylamide 1.05
orotic acid 1.26
pentadecanoic acid -1.17
pelargonic acid -0.78
13
5
Table 2-4 (continued)
5-KL scale Day 7 (continued)
Differentially Regulated Metabolites
Metabolite Names Average Day 7 log2
ratios (high copper/low
copper)
phosphate -0.80
phosphorylcholine 0.85
putrescine 1.54
pyridoxal -0.65
pyrophosphate -1.13
pyrrolidonecarboxylic acid 0.70
rac-1-stearoylglycerol -1.27
riboflavin 0.71
S-adenosylhomocysteine -0.60
Ser-Leu -1.38
sn-glycero-3-phosphocholine -1.55
sorbitol 1.49
spermidine 1.32
trans-4-hydroxy-L-proline 0.78
Tyr-Leu -1.39
spermine 3.19
13
6
Table 2-4 (continued)
5-KL scale Day 7 (continued)
Differentially Expressed Proteins Differentially Regulated Metabolites
Metabolite Names Average Day 7 log2
ratios (high copper/low
copper)
trans-4-hydroxy-L-proline 0.78
Tyr-Leu -1.39
UDP 1.28
UDP-D-galactose -0.60
undecanoic acid -1.28
uracil -1.09
urea -1.47
UTP 2.16
vaccenic acid -1.34
13
7
Table 2-4 (continued)
20-L scale Day 6
Differentially Expressed Proteins Differentially Regulated Metabolites
RefSeq Gene names Average Day 6 log2 ratios
(high copper/low copper)
Metabolite Names Average Day 6 log2 ratios
(high copper/low copper)
XM_003945892.1 LOC101056392 0.72 isoleucylglycine 0.74
NM_198294.2 Tanc1 0.60 γ-glutamyltryptophan 0.66
NM_001077265.1 Hnrnpd 0.62 γ-glutamylglycine 2.80
NM_001033294.3 Ddx31 0.33 1,2-dipalmitoylglycerol 1.02
NM_019880.3 Mtch1 0.39 (S)-2-hydroxystearic acid -0.63
NM_025558.5 Cyb5b 0.35
(S)-3-methyl-2-oxopentanoic
acid 0.83
NM_026411.1 1700021F05Rik -0.49
1-oleoyl
lysophosphatidylcholine 1.16
NM_173376.3 Rbmx2 -0.44 2-oleoylglycerol 0.82
NM_145070.3 Hip1r -0.35 2-tyrosine 1.32
NM_023060.3 Eefsec -0.52 4'-phosphopantetheine -0.59
NM_001033196.2 Znfx1 -0.36 5-aminovaleric acid 0.71
NM_009483.1 Kdm6a -0.59 5'-methylthioadenosine 0.60
NM_001081412.2 Bcr -0.80 7β-hydroxycholesterol 0.72
NM_172758.4 Slc38a7 -0.93 acetylcholine 1.08
NM_001029850.3 Magi1 -0.50 acetylleucine 0.73
13
8
Table 2-4 (continued)
20-L scale Day 6 (continued)
Differentially Expressed Proteins Differentially Regulated Metabolites
RefSeq Gene names Average Day 6 log2 ratios
(high copper/low copper)
Metabolite Names Average Day 6 log2 ratios
(high copper/low copper)
NM_021305.3 Sec61a2 -0.37 adenine 0.91
NM_011787.2 Amfr -0.53 adenosine -0.61
NM_025573.3 Sfrs9 -0.40 α-D-glucose 6-phosphate 0.66
NM_026422.2 Mrrf -0.39 α-ketoisocaproic acid 0.90
NM_028173.4 Tram1 -0.43 ascorbic acid 1.19
NM_010744.3 Tmed1 -0.33 cadaverine 1.52
NM_175294.3 Nucks1 -0.46 CDP ethanolamine -0.84
NM_029934.3 Mboat7 0.36 citric acid 0.81
NM_172911.3 D8Ertd82e 0.66 coenzyme A 0.65
NM_019921.2 Akap10 0.44 deoxyguanosine 0.62
NM_001136069.2 Ldha 0.42 deoxyuridine 0.69
NM_025403.3 Nop10 0.33 D-glyceric acid 0.69
NM_024166.6 Chchd2 0.41 D-sphingosine 3.04
NM_001162989.1 Phax 0.44 D-threitol 0.84
NM_011417.3 Smarca4 0.39 galactitol 0.63
NM_007513.4 Slc7a1 0.52 galactose 1.23
13
9
Table 2-4 (continued)
20-L scale Day 6 (continued)
Differentially Expressed Proteins Differentially Regulated Metabolites
RefSeq Gene names Average Day 6 log2 ratios
(high copper/low copper)
Metabolite Names Average Day 6 log2 ratios
(high copper/low copper)
NM_011743.2 Zfp106 0.35 γ-butyroβine 0.82
NM_139153.2 Agap3 -0.30 γ-glutamylalanine 0.76
NM_011135.4 Cnot7 -0.33 γ-glutamylcysteine 0.65
NM_026169.4 Frmd8 -0.36 γ-glutamylglutamate 0.83
NM_001171582.1 Mars -0.32 γ-glutamylmethionine 0.68
NM_080793.5 Setd7 -0.42 glucose-1-phosphate 0.67
NM_009805.4 Cflar -0.34 glutathione 0.85
NM_009336.2 Vps72 -0.33 glycylglycine 0.75
NM_146251.4 Pnpla7 -0.30 glycylleucine 2.20
NM_001033528.1 Usp36 -0.34 guanosine 0.65
NM_011497.3 Aurka -0.31 inositol 1-phosphate 0.93
NM_001081218.1 Hcfc2 -0.37
L-α-lysophosphatidylcholine,
palmitoyl 1.05
NM_013790.2 Abcc5 -0.41
L-α-lysophosphatidylcholine,
stearoyl 1.24
lanosterol 0.66
L-cystine 1.25
14
0
Table 2-4 (continued)
20-L scale Day 6 (continued)
Differentially Expressed Proteins Differentially Regulated Metabolites
RefSeq Gene names Average Day 6 log2 ratios
(high copper/low copper)
Metabolite Names Average Day 6 log2 ratios
(high copper/low copper)
L-homoserine 1.21
L-kynurenine 0.83
L-leucine 0.83
L-lysine 0.68
L-malic acid 0.70
L-xylonate 1.41
N-acetyl-β-alanine 0.67
N-acetyl-D-glucosamine 6-
phosphate 1.27
N-acetylneuraminic acid 0.60
NAD+ 0.84
N-glycolylneuraminic acid 0.65
oleylamide -1.05
rac-1-stearoylglycerol 0.61
L-Arabitol 1.92
ribitol 0.65
ribose 1.03
14
1
Table 2-4 (continued)
20-L scale Day 6 (continued)
Differentially Regulated Metabolites
Metabolite Names Average Day 6 log2 ratios
(high copper/low copper)
sedoheptulose 7-phosphate 1.39
S-methyl-L-cysteine 0.67
sn-glycerol-3-phosphate 0.59
thymidine 0.96
thymine 0.87
UDP-D-galactose 0.82
xanthosine monophosphate 0.75
20-L scale Day 10
Differentially Expressed Proteins Differentially Regulated Metabolites
RefSeq Gene names Average Day 10 log2 ratios
(high copper/low copper)
Metabolite Names Average Day 10 log2 ratios
(high copper/low copper)
NM_052976.3 Ophn1 0.38 1,2-dipalmitoylglycerol 0.69
NM_008943.2 Psen1 0.52 2-hydroxy-3-methylvalerate 0.79
NM_011625.1 Ppp1r13b 0.41 γ-glutamylglycine 2.67
14
2
Table 2-4 (continued)
20-L scale Day 10 (continued)
Differentially Expressed Proteins Differentially Regulated Metabolites
RefSeq Gene names Average Day 10 log2 ratios
(high copper/low copper)
Metabolite Names Average Day 10 log2 ratios
(high copper/low copper)
NM_145541.4 Rap1a 0.37 N-acetylisoleucine 0.65
NM_008234.3 Hells 0.44 γ-glutamyltryptophan 0.70
NM_007483.2 Rhob -0.38
(S)-3-methyl-2-
oxopentanoic acid 1.21
NM_001161816.1 Gm15455 -0.37
1-oleoyl
lysophosphatidylcholine 0.65
NM_008471.2 Krt19
-0.49
1-stearoyl-2-hydroxy-sn-
glycero-3-
phosphoethanolamine
0.64
NM_025542.2 2410001C21Rik -0.56 2-hydroxyisovaleric acid 0.85
NM_023172.3 Ndufb9 -0.45 2-tyrosine 1.05
NM_019760.3 Serinc1 -0.46 4'-phosphopantetheine -0.70
NM_008788.2 Pcolce -0.38 5-aminovaleric acid 0.80
NM_010233.1 Fn1 -0.50 acetylcholine 1.18
NM_025461.5 Cox16 0.53 acetylleucine 0.85
NM_144522.5 Tbc1d10b 0.32 adenine 1.10
NM_019921.2 Akap10 0.38 alanylglycine 1.85
NM_145596.3 Gatad2a 0.38 α-D-glucose 6-phosphate 1.73
14
3
Table 2-4 (continued)
20-L scale Day 10 (continued)
Differentially Expressed Proteins Differentially Regulated Metabolites
RefSeq Gene names Average Day 10 log2 ratios
(high copper/low copper)
Metabolite Names Average Day 10 log2 ratios
(high copper/low copper)
NM_023128.4 Palm 0.31 α-hydroxyisocaproic acid 1.09
NM_175150.3 Txndc15 0.34 α-ketoisocaproic acid 1.20
NM_139146.2 Satb2 0.31 ascorbic acid 1.72
NM_028364.2 Puf60 0.36 cadaverine 1.63
NM_172468.2 Snx30 -0.32 CDP ethanolamine -0.62
NM_025698.1 Tmed7 -0.34 citric acid 0.68
NM_025666.2 Ubr7 -0.31 coenzyme A 0.63
NM_177878.2 Mblac1 -0.41 desmosterol 0.63
NM_177717.4 4732456N10Rik -0.30 D-glyceric acid 0.89
NM_007833.4 Dcn -0.30 D-sphingosine 1.90
NM_183270.2 Chchd8 -0.40 erythronic acid 0.61
NM_011580.3 Thbs1 -0.33 galactitol 0.95
NM_020579.2 B4galt3 -0.32 γ-glutamylalanine 1.22
NM_027772.2 Pdss2 -0.30 γ-glutamylcysteine 1.62
NM_027410.1 Tecpr1 -0.31 γ-glutamylglutamate 1.24
NM_020587.2 Sfrs4 -0.33 γ-glutamylmethionine 0.66
14
4
Table 2-4 (continued)
20-L scale Day 10 (continued)
Differentially Expressed Proteins Differentially Regulated Metabolites
RefSeq Gene names Average Day 10 log2 ratios
(high copper/low copper)
Metabolite Names Average Day 10 log2 ratios
(high copper/low copper)
NM_028173.4 Tram1 -0.35 glucose-1-phosphate 0.77
glutathione 1.65
glycylleucine 1.20
inositol 1-phosphate 1.22
L-α-
lysophosphatidylcholine,
stearoyl
2.21
lanosterol 0.85
L-asparagine 0.71
lathosterol 0.85
L-cystine 0.94
L-glutamine 0.76
L-homoserine 1.34
L-lactic acid 0.66
L-lysine 0.69
L-ornithine 0.63
L-xylonate 0.71
14
5
Table 2-4 (continued)
20-L scale Day 10 (continued)
Differentially Expressed Proteins Differentially Regulated Metabolites
RefSeq Gene names Average Day 10 log2 ratios
(high copper/low copper)
Metabolite Names Average Day 10 log2 ratios
(high copper/low copper)
N-acetyl-α-D-galactosamine 0.83
N-acetyl-D-glucosamine 6-
phosphate 1.13
NAD+ 0.65
NADH 0.64
N-glycolylneuraminic acid 0.85
palmitoylethanolamide -0.99
phenylalanylglycine 1.10
ribitol 0.89
sedoheptulose 7-phosphate 1.25
S-methyl-L-cysteine 1.06
sorbitol 1.40
stearic acid amide 3.04
thymine 1.06
UDP-D-galactose 1.11
14
6
147
2.8 Reference
1. Walsh G (2014) Biopharmaceutical benchmarks 2014. Nat. Biotechnol. 32(10):992-1000.
2. Jayapal KR, Wlaschin KF, Hu WS, & Yap MGS (2007) Recombinant protein therapeutics
from CHO cells - 20 years and counting. Chem. Eng. Prog. 103(10):40-47.
3. Kim JY, Kim YG, & Lee GM (2012) CHO cells in biotechnology for production of
recombinant proteins: current state and further potential. Appl. Microbiol. Biotechnol.
93(3):917-930.
4. Lewis AM, Abu-Absi NR, Borys MC, & Li ZJ (2016) The use of 'Omics technology to
rationally improve industrial mammalian cell line performance. Biotechnol. Bioeng.
113(1):26-38.
5. Farrell A, McLoughlin N, Milne JJ, Marison IW, & Bones J (2014) Application of multi-
omics techniques for bioprocess design and optimization in Chinese hamster ovary cells.
J. Proteome Res. 13(7):3144-3159.
6. Kildegaard HF, Baycin-Hizal D, Lewis NE, & Betenbaugh MJ (2013) The emerging CHO
systems biology era: harnessing the 'omics revolution for biotechnology. Curr. Opin.
Biotechnol. 24(6):1102-1107.
7. Birch JR & Racher AJ (2006) Antibody production. Adv. Drug Delivery Rev.Adv. Drug
Delivery Rev. 58(5-6):671-685.
8. Yang Z, et al. (2015) Engineered CHO cells for production of diverse, homogeneous
glycoproteins. Nat. Biotechnol. 33(8):842-844.
148
9. Mallick P & Kuster B (2010) Proteomics: a pragmatic perspective. Nat. Biotechnol.
28(7):695-709.
10. Tai M, Ly A, Leung I, & Nayar G (2015) Efficient high-throughput biological process
characterization: Definitive screening design with the Ambr250 bioreactor system.
Biotechnol. Prog. 31(5):1388-1395.
11. Xing ZZ, Kenty BN, Li ZJ, & Lee SS (2009) Scale-up analysis for a CHO cell culture
process in large-scale bioreactors. Biotechnol. Bioeng. 103(4):733-746.
12. Aranibar N, et al. (2011) NMR-based metabolomics of mammalian cell and tissue cultures.
J. Biomol. NMR 49(3-4):195-206.
13. Kim BE, Nevitt T, & Thiele DJ (2008) Mechanisms for copper acquisition, distribution
and regulation. Nat. Chem. Biol. 4(3):176-185.
14. Grubman A & White AR (2014) Copper as a key regulator of cell signalling pathways.
Expert Rev. Mol. Med. 16:e11.
15. Qian YM, et al. (2011) Cell culture and gene transcription effects of copper sulfate on
Chinese Hamster Ovary cells. Biotechnol. Prog. 27(4):1190-1194.
16. Luo J, et al. (2012) Comparative metabolite analysis to understand lactate metabolism shift
in Chinese hamster ovary cell culture process. Biotechnol. Bioeng. 109(1):146-156.
17. Yuk IH, et al. (2014) Effects of copper on CHO cells: Insights from gene expression
analyses. Biotechnol. Prog. 30(2):429-442.
18. Kang S, et al. (2014) Proteomics analysis of altered cellular metabolism induced by
insufficient copper level. J. Biotechnol. 189:15-26.
19. Yuk IH, et al. (2015) Effects of Copper on CHO Cells: Cellular Requirements and Product
Quality Considerations. Biotechnol. Prog. 31(1):226-238.
149
20. Nargund S, Qiu JS, & Goudar CT (2015) Elucidating the role of copper in CHO cell energy
metabolism using C-13 metabolic flux analysis. Biotechnol. Prog. 31(5):1179-1186.
21. Lawton KA, et al. (2008) Analysis of the adult human plasma metabolome.
Pharmacogenomics 9(4):383-397.
22. Schaub J, et al. (2010) CHO gene expression profiling in biopharmaceutical process
analysis and design. Biotechnol. Bioeng. 105(2):431-438.
23. Lewis NE, et al. (2013) Genomic landscapes of Chinese hamster ovary cell lines as
revealed by the Cricetulus griseus draft genome. Nat. Biotechnol. 31(8):759-765.
24. Wang Q, et al. (2013) Ecological patterns of nifH genes in four terrestrial climatic zones
explored with targeted metagenomics using FrameBot, a new informatics tool. mBio
4(5):e00592-00513.
25. Morgulis A, Gertz EM, Schaffer AA, & Agarwala R (2006) WindowMasker: window-
based masker for sequenced genomes. Bioinformatics 22(2):134-141.
26. Slater GS & Birney E (2005) Automated generation of heuristics for biological sequence
comparison. BMC Bioinf. 6:31.
27. Dorfer V, et al. (2014) MS Amanda, a universal identification algorithm optimized for high
accuracy tandem mass spectra. J. Proteome Res. 13(8):3679-3684.
28. Taverner T, et al. (2012) DanteR: an extensible R-based tool for quantitative analysis of -
omics data. Bioinformatics 28(18):2404-2406.
29. Onsongo G, et al. (2010) LTQ-iQuant: A freely available software pipeline for automated
and accurate protein quantification of isobaric tagged peptide data from LTQ instruments.
Proteomics 10(19):3533-3538.
150
30. Li F, Vijayasankaran N, Shen A, Kiss R, & Amanullah A (2010) Cell culture processes for
monoclonal antibody production. mAbs 2(5):466-479.
31. Kramer A, Green J, Pollard J, & Tugendreich S (2014) Causal analysis approaches in
Ingenuity Pathway Analysis. Bioinformatics 30(4):523-530.
32. Finkel T & Holbrook NJ (2000) Oxidants, oxidative stress and the biology of ageing.
Nature 408(6809):239-247.
33. Valko M, et al. (2007) Free radicals and antioxidants in normal physiological functions
and human disease. Int. J. Biochem. Cell Biol. 39(1):44-84.
34. D'Autreaux B & Toledano MB (2007) ROS as signalling molecules: mechanisms that
generate specificity in ROS homeostasis. Nat. Rev. Mol. Cell Biol. 8(10):813-824.
35. Lu XL, et al. (2011) Cholesterol induces pancreatic beta cell apoptosis through oxidative
stress pathway. Cell Stress Chaperones 16(5):539-548.
36. Subramanian S, et al. (2011) Dietary cholesterol exacerbates hepatic steatosis and
inflammation in obese LDL receptor-deficient mice. J. Lipid Res. 52(9):1626-1635.
37. Hatanaka E, et al. (2013) Oleic, linoleic and linolenic acids increase ROS production by
fibroblasts via NADPH oxidase activation. PLoS One 8(4):e58626.
38. Shirakawa J, et al. (2011) Protective effects of dipeptidyl peptidase-4 (DPP-4) inhibitor
against increased beta cell apoptosis induced by dietary sucrose and linoleic acid in mice
with diabetes. J. Biol. Chem. 286(29):25467-25476.
39. Wrede CE, Dickson LM, Lingohr MK, Briaud I, & Rhodes CJ (2002) Protein kinase B/Akt
prevents fatty acid-induced apoptosis in pancreatic beta-cells (INS-1). J. Biol. Chem.
277(51):49676-49684.
151
40. Shirakawa J, et al. (2011) Protective effects of dipeptidyl peptidase-4 (DPP-4) inhibitor
against increased beta-cell apoptosis induced by dietary sucrose and linoleic acid in mice
with diabetes. J. Biol. Chem. 286(29):25467-25476.
41. Corte CLD, Bastos LL, Dobrachinski F, Rocha JBT, & Soares FAA (2012) The
combination of organoselenium compounds and guanosine prevents glutamate-induced
oxidative stress in different regions of rat brains. Brain Res. 1430:101-111.
42. Schwartz LB, Carcangiu ML, Bradham L, & Schwartz PE (1991) Rapidly progressive
squamous-cell carcinoma of the cervix coexisting with human-immunodeficiency-virus
infection-clinical opinion. Gynecol. Oncol. 41(3):255-258.
43. Cullinan SB & Diehl JA (2004) PERK-dependent activation of Nrf2 contributes to redox
homeostasis and cell survival following endoplasmic reticulum stress. J. Biol. Chem.
279(19):20108-20117.
44. Beal MF (1995) Aging, energy, and oxidative stress in neurodegenerative diseases. Ann.
Neurol. 38(3):357-366.
45. Elmore S (2007) Apoptosis: A review of programmed cell death. Toxicol. Pathol.
35(4):495-516.
46. Gross A, McDonnell JM, & Korsmeyer SJ (1999) BCL-2 family members and the
mitochondria in apoptosis. Genes Dev. 13(15):1899-1911.
47. Fluharty AL, Stevens RL, Miller RT, Shapiro SS, & Kihara H (1976) Ascorbic acid 2-
sulfate sulfhohydrolase activity of human arylsulfatase-A. Biochim. Biophys. Acta
429(2):508-516.
48. Clanton TL (2007) Hypoxia-induced reactive oxygen species formation in skeletal muscle.
J. Appl. Physiol. 102(6):2379-2388.
152
49. Lavie L & Lavie P (2009) Molecular mechanisms of cardiovascular disease in OSAHS:
the oxidative stress link. Eur. Resp. J. 33(6):1467-1484.
50. Prabhakar NR, Kumar GK, Nanduri J, & Semenza GL (2007) ROS signaling in systemic
and cellular responses to chronic intermittent hypoxia. Antioxid. Redox Signal. 9(9):1397-
1403.
51. Makarenko VV, et al. (2014) Intermittent hypoxia-induced endothelial barrier dysfunction
requires ROS-dependent MAP kinase activation. Am. J. Physiol.-Cell Physiol.
306(8):C745-C752.
52. Majmundar AJ, Wong WHJ, & Simon MC (2010) Hypoxia-inducible factors and the
response to hypoxic stress. Mol. Cell. 40(2):294-309.
53. Lokmic Z, Musyoka J, Hewitson TD, & Darby IA (2012) Hypoxia and hypoxia signaling
in tissue repair and fibrosis. International Review of Cell and Molecular Biology, Vol 296,
International Review of Cell and Molecular Biology, ed Jeon KW (Elsevier Academic
Press Inc, San Diego), Vol 296, pp 139-185.
54. Qian Y, et al. (2014) Hypoxia influences protein transport and epigenetic repression of
CHO cell cultures in shake flasks. Biotechnol. J. 9(11):1413-1424.
55. Usatyuk PV & Natarajan V (2005) Regulation of reactive oxygen species-induced
endothelial cell-cell and cell-matrix contacts by focal adhesion kinase and adherens
junction proteins. Am. J. Physiol.-Lung Cell. Mol. Physiol. 289(6):L999-L1010.
56. Hoogeboom D & Burgering BMT (2009) Should I stay or should I go: beta-catenin decides
under stress. Biochim. Biophys. Acta-Rev. Cancer 1796(2):63-74.
57. Prieve MG & Moon RT (2003) Stromelysin-1 and mesothelin are differentially regulated
by Wnt-5a and Wnt-1 in C57mg mouse mammary epithelial cells. BMC Dev. Biol. 3:2.
153
58. Wielenga VJM, et al. (1999) Expression of CD44 in Apc and Tcf mutant mice implies
regulation by the WNT pathway. Am. J. Pathol. 154(2):515-523.
59. Davidson G & Niehrs C (2010) Emerging links between CDK cell cycle regulators and
Wnt signaling. Trends Cell Biol. 20(8):453-460.
60. Niehrs C & Acebron SP (2012) Mitotic and mitogenic Wnt signalling. EMBO J.
31(12):2705-2713.
61. Jain E & Kumar A (2008) Upstream processes in antibody production: Evaluation of
critical parameters. Biotechnol. Adv. 26(1):46-72.
62. Schnepp RW, et al. (2004) Menin induces apoptosis in murine embryonic fibroblasts. J.
Biol. Chem. 279(11):10685-10691.
63. Schackmann RCJ, et al. (2013) Loss of p120-catenin induces metastatic progression of
breast cancer by inducing anoikis resistance and augmenting growth factor receptor
signaling. Cancer Res. 73(15):4937-4949.
64. Goodwin M & Yap AS (2004) Classical cadherin adhesion molecules: coordinating cell
adhesion, signaling and the cytoskeleton. J. Mol. Histol. 35(8-9):839-844.
65. Cash TP, Gruber JJ, Hartman TR, Henske EP, & Simon MC (2011) Loss of the Birt-Hogg-
Dube tumor suppressor results in apoptotic resistance due to aberrant TGF beta-mediated
transcription. Oncogene 30(22):2534-2546.
66. Zheng YJ, et al. (2009) Angiomotin-Like Protein 1 Controls Endothelial Polarity and
Junction Stability During Sprouting Angiogenesis. Circ. Res. 105(3):260-270.
67. Hitomi JI, et al. (2008) Identification of a molecular signaling network that regulates a
cellular necrotic cell death pathway. Cell 135(7):1311-1323.
154
68. Leh H, et al. (1996) Cloning and expression of a novel type (III) of human gamma-
glutamyltransferase truncated mRNA. FEBS Lett. 394(3):258-262.
69. Wu GY, Fang YZ, Yang S, Lupton JR, & Turner ND (2004) Glutathione metabolism and
its implications for health. J. Nutr. 134(3):489-492.
70. Napoli C, et al. (2000) Mildly oxidized low density lipoprotein activates multiple apoptotic
signaling pathways in human coronary cells. FASEB J. 14(13):1996-2007.
71. Soltani N, et al. (2011) GABA exerts protective and regenerative effects on islet beta cells
and reverses diabetes. Proc. Natl. Acad. Sci. U. S. A. 108(28):11692-11697.
72. Rhee HJ, Kim EJ, & Lee JK (2007) Physiological polyamines: simple primordial stress
molecules. J. Cell Mol. Med. 11(4):685-703.
73. Zhou XM, Burg MB, & Ferraris JD (2012) Water restriction increases renal inner
medullary manganese superoxide dismutase (MnSOD). Am. J. Physiol. Renal Physiol.
303(5):F674-F680.
74. Abreu IA & Cabelli DE (2010) Superoxide dismutases-a review of the metal-associated
mechanistic variations. Biochim. Biophys. Acta,- Proteins Proteomics 1804(2):263-274.
75. Perez-Matute P, Zulet MA, & Martinez JA (2009) Reactive species and diabetes:
counteracting oxidative stress to improve health. Curr. Opin. Pharmacol. 9(6):771-779.
76. Raman M, Chen W, & Cobb MH (2007) Differential regulation and properties of MAPKs.
Oncogene 26(22):3100-3112.
77. Son Y, et al. (2011) Mitogen-activated protein kinases and reactive oxygen species: How
can ROS activate MAPK pathways? J. Signal Transduction 2011:792639.
155
78. Velpula KK, et al. (2012) Glioma stem cell invasion through regulation of the
interconnected ERK, integrin alpha 6 and N-cadherin signaling pathway. Cell. Signal.
24(11):2076-2084.
79. Weston CR & Davis RJ (2007) The JNK signal transduction pathway. Curr. Opin. Cell
Biol. 19(2):142-149.
80. Kim BJ, Ryu SW, & Song BJ (2006) JNK- and p38 kinase-mediated phosphorylation of
Bax leads to its activation and mitochondrial translocation and to apoptosis of human
hepatoma HepG2 cells. J. Biol. Chem. 281(30):21256-21265.
81. Oh SKW, Vig P, Chua F, Teo WK, & Yap MGS (1993) Substantial overproduction of
antibodies by applying osmotic-pressure and sodium butyrate. Biotechnol. Bioeng.
42(5):601-610.
82. Ortmann B, Druker J, & Rocha S (2014) Cell cycle progression in response to oxygen
levels. Cell. Mol. Life Sci. 71(18):3569-3582.
156
Chapter 3: Identification and Quantitation of Host Cell
Proteins in Therapeutic Product
Two posters based on this chapter were presented at the 63rd conference of American Society for
Mass Spectrometry (ASMS) in June 2015 (abstract ID: 620) and the 64th ASMS conference in
June 2016 (abstract ID: 283778), respectively. A manuscript based on this chapter is in preparation.
Yuanwei Gao1, Simion Kreimer1, Somak Ray1, Alexander R. Ivanov1, Mi Jin2, Zhijun Tan2,
Nesredin Mussa2, Li Tao2, Zhengjian Li2, Barry L. Karger1
1Barnett Institute and Department of Chemistry and Chemical Biology, Northeastern University,
Boston, MA, 02115
2Biologics Development, Global Manufacturing and Supply, Bristol-Myers Squibb, 38 Jackson
Road, Devens, MA 01434
I thank Simion Kreimer for strong collaboration and script construction, Somak Ray for script
writing, Dr. Alexander Ivanov for discussion, and Dr. Barry Karger for conceptual design and idea
contribution. I also want to thank the scientists at Bristol-Myers Squibb for their collaboration and
sample donation, especially, Dr. Mi Jin for initiating the project and Dr. Nesredin Mussa for
discussions.
157
3.1 Preface and Abstract
Host cell proteins (HCPs) are a major class of process related-impurities in
biopharmaceutical products. HCP analysis is critical to ensure drug quality, and HCP clearance is
an important indicator of bioprocess robustness. HCP detection requires the analysis of multiple
species over a wide dynamic concentration range relative to the high therapeutic protein product
background at high throughput and reasonable cost. The conventional method, ELISA, however,
cannot satisfy all circumstances for HCP detection and monitoring. Liquid chromatography-mass
spectrometry (LC-MS)-based approaches have been shown to be powerful for HCP analysis,
emerging as the most promising method to complement ELISA.
In this work, a therapeutic monoclonal antibody drug sample was provided by Bristol-
Mayer Squibb (Devens, MA) at three purification stages, Protein A chromatography (PA), cation
exchange chromatography (CEX), and ultrafiltration/diafiltration (UF/DF), processes widely
employed in the downstream purification of monoclonal antibody drugs. The goal of this work is
to develop a method for HCP identification and quantitation, which can not only detect and
quantify HCPs at single digit ppm level in the drug product, to be used at any stage of the
purification to support downstream processing design.
In the present study, preliminary information on the HCP population and distribution in
teach sample of purification was investigated with two dimensional-liquid chromatography-mass
spectrometry (2D-LC-MS) in the data dependent acquisition (DDA) mode. Several HCPs
identified in the post-UF/DF sample were quantified using 1D-LC-MS with parallel reaction
monitoring (PRM) with isotopically labeled peptides as internal standards, demonstrating that 1D-
LC-MS-PRM was capable of detecting and quantifying HCPs at the low ppm level. The important
158
properties of the sample for HCP analysis were obtained, including sample complexity, dynamic
range, potential difficulties as well as the limitation of the LC-MS-DDA method employed.
Based on these results, in collaboration with Simion Kreimer, Ph.D. candidate in our lab,
a novel DIA-to-PRM workflow with high sensitivity and selectivity was designed for HCP
identification and quantitation. The method was demonstrated to detect HCPs at low ppm levels
with reasonably rapid throughput. The detailed discussion of the DIA-to-PRM workflow can be
found in Simion Kreimer’s thesis. In the current chapter, a spectral assay library was generated
with the 2D-LC-MS/MS analysis of the mAb from the Protein A stage of purification for the
targeted DIA data analysis. The overall workflow is summarized.
159
3.2 Introduction
Biopharmaceuticals are generally synthesized in non-human host cells and require
purification from other components derived from the expression system. Host cell proteins (HCPs)
are a significant class of process-related impurities that are inevitably co-purified with the
biopharmaceutical product despite multiple steps of downstream purification. Additional
background of downstream process and current HCP detection methods can be found in Chapter
1 Page 36-40 and Page 44-47. This section contains extended details of the HCP analytical
challenges and current advances of mass spectrometry-based methods for HCP analysis. We focus
on mAb and related proteins expressed in the CHO expression system as this is the focus of our
research.
For biopharmaceutical production, HCP detection and monitoring in downstream
purification processes are of great importance for two reasons. First, the presence of HCPs, i.e.
impurities, in the final drug product is a critical product quality concern. Besides safety concerns
of potential immunological response (1), some residual HCP species can be proteases (2, 3), which
might generate degraded product that could be inactive or even harmful, or influence drug storage
stability (2-4). Second, the ability of efficient and consistent HCP clearance is a benchmark of the
manufacturing capability and robustness, which is also a part of process validation (5). The risk
assessment of the HCP present in the final product is important. It is necessary for the
biopharmaceutical industry to take action to mitigate the actual or potential safety issues by
optimized downstream purification. Moreover, from a regulatory perspective, the function of each
purification process is also expected to be well described and understood, requiring information
on which impurities remain or are removed during certain purification steps (4). To obtain such
160
understanding of the manufacturing process, HCP detection and monitoring is critical, and the
ideal method should apply at any stage of the manufacturing process.
Multi-analyte enzyme-linked immunosorbent assay (ELISA), which has high throughput,
high sensitivity and selectivity, is the current “gold standard” for HCP detection (1, 5, 6). It can
detect multi-analytes and provide the total HCP quantitation (1 ppm-100 ppm) (7), however,
without individual HCP information. Two dimensional gel electrophoresis (2D-DE) or 2D-DE in
combination with western blotting is usually employed to offer the complementary information
such as HCP distribution and, particularly, individual HCP properties including molecular weight
and isoelectric point (pI). These tools, especially ELISA, have been used to provide valuable
residual HCP information in drug product for decades.
However, these conventional methods cannot fit all circumstances for HCP detection and
monitoring. 2D-DE has low sensitivity, and some HCPs can be masked by the overloaded drug
product. The HCP detection and quantitation of immunodetection-based methods such as ELISA
and western blotting rely on the quantity and affinity of the pool of anti-HCP antibodies.
There are several disadvantages hampering the HCP detection by ELISA. First, not all HCP
species can be detected with high sensitivity. Non- and low immunoreactive HCPs from the animal
(e.g. rabbit) used to generate the antibodies can be underestimated, resulting from low abundance
or low affinity anti-HCP antibodies for such HCP species. Considering that fact that the immune
response between humans and animals can be different, this underestimation of certain species
could lead to a potential safety issue. Second, some HCP species, which are immunoreactive and
can be recognized by ELISA, may not be detected with sufficient accuracy and hence the overall
HCP quantitation by a given ELISA assay could be underestimating the HCP level. Also, dilution
dependent non-linearity of HCP ELISA is often observed especially with samples at a late
161
purification stage (5). In this case, one or several HCPs can saturate the corresponding polyclonal
antibodies within the whole anti-HCP antibody pool, leading to higher observed HCP
concentration in the original sample with a higher sample dilution factor (5). It has been reported
that phospholipase B-like 2 is one of such HCP resulting in ELISA non-linearity (8). In principle,
if the sample is diluted sufficiently for an ELISA test, these HCPs can reach a sufficiently low
level that would not saturate the antibodies. However, usually the detection limit of the ELISA
assay is reached first with the sample dilution before the plateau of non-linearity is reached. In this
case, a process specific or analyte-specific ELISA assay needs to be developed to eliminate such
an effect.
Third, the complex anti-HCP antibody pool of the multi-analyte ELISA could cross-react
with the therapeutic drug product (1). Such cross-reactivity would impact the HCP ELISA
quantitation especially at late purification stages. Fourth, ELISA is not that interchangeable. A
given HCP ELISA only responds to the product synthesized by the cell line used to develop the
assay and may show unreliable HCP quantitation to another bioprocess. For example, during
clinical development, the HCP level of somatotropin was detected as 20 ppm with a commercially
available general ELISA kit, but tested as actually 1400 ppm HCP with a process-specific ELISA
assay (9). ELISA could thus fail to demonstrate the HCP changes resulting from bioprocess
development. However, it is time consuming (12-18 months) to develop a validated ELISA with
high sensitivity and selectivity (1). Consequently, the general commercial ELISA is often used for
drug candidates at early stage of clinical experiments, and a process-specific ELISA is only
developed for the most promising molecules and bioprocesses (9).
It is important to note that ELISA cannot provide individual HCP information. According
to previous HCP studies for monoclonal antibodies and Fc-fusion proteins, although certain HCPs
162
such as clusterin and heat shock protein have been shown to be associated with many of the drug
products (10-12), there are some HCPs which are drug product-specific (11, 12). It has been
reported that the difference of only two residues near the complementarity determining regions
(CDRs) of the mAb yielded significant changes in the HCP profile (11). As a result, it is difficult
to predict HCP distribution for a new drug product, even for closely related molecules. This fact
demonstrates the necessity to develop a process-related ELISA for each new drug product. Thus,
although ELISA has shown efficiency of HCP detection, the use of orthogonal methods,
particularly ones that can provide rapid HCP distribution independent of immunoreactivity are
desired.
Mass spectrometry (MS)-based technology has emerged as the most promising orthogonal
method to ELISA for HCP detection (4, 5, 9, 11). MS-based methods are able to identify and
quantify a large number of protein species with relatively high throughput. The proteomic
approach of the combination of in-gel protein digestion and mass spectrometry with data
dependent acquisition has been applied to HCP identification (13-15). However, this method still
cannot reach the required high dynamic range (>105), hampering the detection of HCPs at the 10
ppm level or lower in the background of the bulk biopharmaceutical. The top-down approach,
surface-enhanced laser desorption/ionization time-of-flight mass spectrometry (SELDI-TOF), has
been reported as a screening tool for HCP detection (6, 13, 16, 17). The advantages of SELDI-
TOF include simple and time efficient sample preparation and quantitative estimation of individual
HCPs. The method can be used to track individual HCPs for optimizing a given purification step
and/or during downstream purification in a high throughput manner (17, 18). However, the
sensitivity of the method decreases rapidly for proteins larger than 30 kDa, and it is difficult to
obtain identification without assistance of other approaches (17).
163
Liquid chromatography coupled with MS (LC/MS) has also been reported for HCP
detection and is becoming the most promising approach (19-22). HPLC coupled with matrix-
assisted laser desorption/ionization time-of-flight (MALDI-TOF/TOF) off-line was reported for
HCP analysis for a null cell line supernatant (23) and HCP recovery procedure for sample
preparation optimization (24). Although this method is not involved in HCP detection with the
drug product present, these studies provides helpful insights into sample preparation for the HCP
analysis. Doneau et al. reported use of 2D-LC/MS in a data independent MSE mode with a Q-TOF-
MS for HCP identification and quantitative estimation, and that multiple reaction monitoring
(MRM) was used for accurate quantitation with isotopically labeled peptides (19). With ion
mobility coupled to the Q-TOF to improve species separation, this 2D-LC/MSE technique has been
reported to identify and quantify HCPs at the 1 ppm level (25). 2D-LC/MSE provides both
identification and quantitative information of HCPs with high sensitivity and selectivity. However,
this 2D-LC separation approach is time consuming, requiring more than 12 hours per run, which
may not be practical for large number of samples and/or rapid screening purposes.
The development of qualitative and quantitative protein analysis based on LC-MS is still
an effective platform for HCP analysis. In the present study, two-dimensional high pH/low pH
reversed phase (RP/RP) liquid chromatography coupled to tandem mass spectrometry (2D-LC-
MS/MS) with data dependent acquisition (DDA) was initially employed to investigate the residual
HCPs in the mAb sample during three purification steps, Protein A chromatography (PA), cation
exchange chromatography (CEX), and ultrafiltration/diafiltration (UF/DF) (buffer exchange),
initially building up the understanding of the basics of the HCP distribution in terms of downstream
impurity removal and providing insight into the development of new HCP detection strategy. Four
HCPs, clusterin, putative phospholipase B, 78 kDa glucose-regulated protein precursor, and
164
protein disulfide-isomerase, were quantified in the UF/DF sample using 1D-LC-MS with parallel
reaction monitoring (PRM) using isotopically labeled peptides as internal standards,
demonstrating that 1D-LC-MS was able to detect and quantify HCPs at low ppm level with PRM.
This study not only provides a preliminary understanding of the HCP sample analysis, but also
supports downstream processing by comprehensively tracking HCP profiles across different
purification steps.
Based on the initial information obtained from the 2D-LC-MS-DDA analysis, in
collaboration with Simion Kreimer, Ph.D. candidate in our lab, a novel DIA-to-PRM HCP analysis
workflow of HCP analysis was developed, with high dynamic range for the relatively low
complexity sample. This workflow was demonstrated to detect HCPs at low ppm levels in a
purified therapeutic mAb expressed by CHO cell system, providing comprehensive HCP profiles
of a certain drug product. A high resolution Orbitrap mass spectrometer was employed in this
method. As described in this chapter, a spectral assay library was constructed with the 2D-LC-
MS/MS analysis of the mAb after early stage PA purification. A targeted spectral library search
based on OpenSWATH and untargeted database search relying on DIA-Umpire were combined to
promote DIA data interpretation followed by PRM verification. This workflow was developed by
Simion Kreimer and is detailed in his thesis. This methodology can provide identification of HCPs
at the sub-ppm level with a high sensitivity and specificity, and estimation of individual HCPs at
the low ppm level. This novel workflow can be used as a general method for HCP detection and
monitoring in the biopharmaceutical industry.
165
3.3 Materials and Methods
3.3.1 Chemicals and reagents
The therapeutic product, a monoclonal antibody (mAb), was provided by Bristol-Myers
Squibb (Devens, MA). This product was purified by Protein A (PA) chromatography, cation
exchange chromatography (CEX), and ultrafiltration/diafiltration (UF/DF), and the samples were
obtained after each purification step. Triethylammonium bicarbonate buffer (TEAB) (1.0 M, pH
8.0), dithiothreitol (DTT), iodoacetamide (IAM), urea, MS RT calibration mix, LC-MS grade
ammonium hydroxide solution (≥ 25% in H2O), and LC-MS grade formic acid were obtained from
Sigma- Aldrich (St. Louis, MO). LC-MS grade water, LC-MS grade acetonitrile, PierceTM peptide
retention time calibration mixture, and the bicinchoninic acid (BCA) protein assay kit was from
Thermo Fisher Scientific (Rockland, IL). Sequencing-grade modified trypsin was purchased from
Promega (Madison, WI). The mass spectrometry grade lysyl endopeptidase (Lys-C) was purchased
from Wako (Richmond, VA). Stable isotopically labeled and non-labeled peptide standards from
SpikeTides tumor associated antigens (TAA) set, as well as custom isotopically labeled peptide
standards of target HCPs for quantitation were obtained from JPT Peptide Technologies GmbH
(Berlin, Germany).
3.3.2 Sample preparation
The protein concentration of the mAb samples were determined by the BCA protein assay.
Approximately 300 μg protein was denatured by 10 M urea in 100 mM TEAB (pH 8.0). The
sample was reduced with 10 mM DTT and 10 M urea in 100 mM TEAB (pH 8.0) at 37 °C for 1
166
hour and then alkylated with 10 mM IAM and 10 M urea in 100 mM TEAB (pH 8.0) in the dark
at room temperature for 45 minutes. Cold acetone pre-chilled at -20 °C was added at six fold of
the solution volume, and the mixture was incubated at -20 ºC overnight to precipitate the protein.
Then the mixture was centrifuged at 14,000 g for 15 minutes, and the supernatant was discarded.
The digestion buffer was 25 mM TEAB (pH 8.0) in 90% water and 10% acetonitrile. A
volume of 100 μL digestion buffer was added to each sample to redissolve the precipitated proteins.
Lys-C and trypsin stock solution were prepared individually with the digestion buffer. The
digestion of Lys-C was performed at 37 ºC for 6 hours with an enzyme to protein ratio of 1:100
(w/w). Next, the trypsin was added to the mixture with an enzyme to protein ratio of 1:50 (w/w).
The digestion was conducted at 40 ºC overnight for about 18 hours. Then, the digestion mixture
was dried by speed vacuum.
For 2D-LC-MS/MS analysis of HCP identification and quantitative estimation, two
replicates were performed for the protein A sample with separate digestion, and three replicates
for CEX samples. For HCP screening in UF/DF samples with 2D-LC-MS/MS, two technical
replicates were tested with one digestion.
3.3.3 LC-MS/MS
The resultant digest mixture was separated and analyzed by 2D high pH/low pH reversed
phase (RP/RP) liquid chromatography coupled with high resolution/mass accuracy mass
spectrometer. Two different sets of experiments were performed: one was with the nanoflow-LC
for the second dimension separation, and the other was with the microflow-LC for the second
dimension separation.
167
For the nanoflow-LC as the second dimension, the first-dimension of the separation was
performed off-line. The platform consisted of an Agilent 1200 series (Santa Clara, CA) LC system
with diode array detector, a 300Extend_C18 column (3.5 µm beads, 2.1x150 mm), and a Gilson
FC 203B fraction collector (Gilson Inc., Middleton, WI). Mobile phase A was 20 mM ammonium
formate in water (pH 10), and mobile phase B was 20 mM ammonium formate (pH 10) in 90%
acetonitrile/10% water. The lyophilized digested sample was solubilized with 40 µL of mobile
phase A with vortexing and 10 seconds of sonication to maximize the recovery. After injection,
the column was flushed with mobile phase A at 0.2 mL/min for 10 minutes for desalting. A
gradient was then run at a flow rate of 200 µL/min (from 2% B to 100% B in 44 minutes, 100% B
down to 2% B and 2% B for 9 minutes). The fractions were collected in 2-minute intervals from
1 to 55 minutes across the gradient, for a total of 27 fractions. For the PA and CEX samples, based
on the UV absorption profile at 214 nm, several fractions were pooled to equalize protein levels
for a final fraction number of 20. Each fraction was lyophilized to dryness and stored at -80 ºC.
For the UF/DF sample, the fractions were pooled to five. The resultant five fractions were also
dried by speed vacuum and stored at -80 ºC. For the second dimension LC, each fraction was
reconstituted with 0.1% formic acid in water. For spectral library generation, the retention time
markers were added into each fraction before LC-MS analysis. Samples were analyzed on an
Ultimate 3000 chromatography system coupled to the Q Exactive mass spectrometer (Thermo
Fisher Scientific). The column was a home-packed IntegraFrit column (New Objective, Woburn,
MA), 25 cm x 75 μm, with 200 Å Magic C18 AQ particles (3 μm diameter) (Michrom Bioresources,
Auburn, CA). Mobile phase A was 0.1% formic acid in water, and mobile phase B was 0.1%
formic acid in acetonitrile. The flow rate was 200 nL/min, and the separation gradient was 2% B
to 32 % in 120 minutes, 32% B to 90% B in 20 minutes, 90% B for 3 minutes.
168
For the microflow-LC as the second dimension, the first-dimension of the separation was
also performed off-line with the same LC-UV system and mobile phases as described above. The
separation column, on the other hand, was an XBridge peptide BEH C18 column (3.5 µm beads,
300Å, 2.1x150 mm) (Waters, Milford, MA). After injection of 1.8 mg protein digest sample, the
desalting process was performed with mobile phase A at 0.20 mL/min for 60 minutes until the UV
absorption reached the baseline. A 60-minute gradient was then used for fractionation at a flow
rate of 200 μL/min (from 2% B to 42% B in 46 minutes, 42% B to 100% B in 2 minutes, 100% B
for 7 minutes, 100% B down to 2% B and 2% B for 5 minutes). A total number of 24 fractions
were collected with a 2.5-minute intervals from 1 to 60 minutes across the gradient, and the
fractions were then pooled to a final number of 10 fractions. These fractions were lyophilized to
dryness and stored at -80 ºC. For the second dimension LC separation, the retention time marker
mix was added into each fraction before the LC-MS analysis. Samples were analyzed on an
Ultimate 3000 chromatography system coupled to the Q Exactive Plus mass spectrometer (Thermo
Fisher Scientific). To generate the desirable flow rate, the flow selector of the nanoflow-LC
system was replaced by one which was capable of producing a flow rate 2.5 -50 μL/min. The same
mobile phases were used as described above, and the column was an ACQUITY UPLC M-class
peptide CSH C18 column (1.7 µm beads, 130Å, 0.3 x 150 mm) (Waters, Milford, MA). The flow
rate was 10 μL/min, and a 2-hour gradient was used (3% B for 10 mintes, 3% to 6% B within 11
minutes, 6% to 32% B in 110 minutes, then to 95% B for 10 minutes, 95% B for 9 minutes, and
down to 3% B for 10 minutes).
For 1D LC-MS/MS, after the enzymatic digestion, 2 μL of 1% formic acid in water were
added into the resultant mixture to end the digestion. Then the sample was lyophilized and stored
169
at -80 ºC. LC-MS/MS analysis conditions were the same as the second dimension LC separation
described above.
3.3.4 Mass spectrometry parameters
For the data dependent acquisition (DDA) mode, MS data were collected with a survey
single stage MS (MS1) scan followed by high collision dissociation (HCD) MS/MS scans of the
top 12 most intense precursor ions. For the full MS scans, the resolution was 70,000 (m/z = 200)
with a scan range of m/z 375 to 1600. The automatic gain control (AGC) target value was set at
1x 106 ions. The MS2 spectra were acquired with a resolution of 17,500 (m/z = 200). The isolation
window was m/z 2.0, and the AGC target value was 1x105. The normalized collision energy was
28.0. The maximum ion injection time was 100 milliseconds for both MS1 and MS2 scans. Target
ions that had been selected for MS/MS were dynamically excluded for 60 seconds. The intensity
threshold, which displayed the minimum intensity required to initiate a data dependent scan, was
1.0x104 ion counts. For accurate mass measurement, the lock mass option was enabled using the
polydimethylcyclosiloxane ion at m/z 455.12002 as an internal calibrant.
PRM was set with the pre-knowledge of the retention time for each target peptide. For the
LC-MS/MS using nanoflow-LC, PRM was used for peptide quantitation. To generate the target
peptide-specific spectral library, a peptide standard mixture without the mAb peptide background
was tested by DDA with the same method as the second dimension LC separation in the 2D-LC-
MS/MS. The spectral library for these target peptides were generated, and the retention time
information and the dominant charge states of each peptide standard was obtained manually. In
the PRM MS setting, MS1 full scan was performed first, and then the MS2 scans were obtained
170
based on the schedule of retention time window and m/z values in the inclusion list. Both of the
MS1 and MS2 scans were with a resolution 70,000 (m/z = 200), AGC target 3x106, and a maximum
injection time of 300 milliseconds. The MS1 full scan range was m/z 330 to 1500. For the MS2
scan, the isolation window was 2.0 m/z, and the normalized collision energy was 28.0.
For the LC-MS/MS using microflow-LC, PRM was used for both confirmation of the
putative identification of DIA analysis and quantitative estimation of the HCPs (for details see
Simion Kreimer’s thesis).
3.3.5 HCP identification and quantitative estimation through 2D LC-MS/MS
HCP identification and quantitative estimation was achieved through 2D LC-MS/MS with
using nanoflow-LC as second dimension separation. The raw MS data files obtained from the
DDA mode were processed in Proteome Discoverer 1.4 (PD 1.4) (Thermo Fisher Scientific). The
raw data was combined and searched against the CHO protein sequence database (26). Sequest
HT, Mascot (PD 1.4), and MS Amanda (27) were used. Cysteine carbamidomethylation was set
as fixed modification, and oxidation of methionine and deamidation of asparagine and glutamine
were set as dynamic modifications, allowing for two missed tryptic cleavages. Mass tolerance for
the precursor ions was set at 10 ppm, and that for fragment ions at 0.05 ppm. The peptide false
discovery rate (FDR) was 1%.
For PA and CEX samples, within each set of fractions, the peptide spectra match (PSM)
number of the unique peptide for HCPs was calculated, and normalized by the total PSMs of all
fractions. The average of the PSM number of each HCP was calculated between the replicates, and
this average was used as the PSM count for each HCP. For PA samples, proteins identified in both
171
of the replicates with at least 2 PSMs in the PA sample were considered as the identified HCPs.
For CEX samples, HCPs identified in two out of the three replicates with at least 2 PSMs were
considered as the identified HCPs. The HCP PSM counts were also used as an indicator of HCP
abundance.
For UF/DF samples, two technical replicates were tested for each fraction, and the sum of
the PSM of the unique peptide for HCPs from all MS runs was consider as the PSM count for the
given HCPs. The HCPs with at least 4 PSMs were manually checked, and the ones with validated
peptide matches were considered as identified HCPs in the UF/DF sample.
3.3.6 HCP quantitation through PRM with isotopically label peptides
The quantitation of HCP was based on the PRM experiments with nanoflow-LC-MS/MS.
The mixture of all target peptides were tested by nanoflow-LC-MS/MS with the DDA mode, and
the raw MS file was search against the protein sequence database containing all the corresponding
proteins. The target peptide-specific spectral library was then generated through Skyline for the
following PRM data analysis, which was also performed with Skyline. For PRM method
evaluation, 11 peptides were chosen from the TAA kit from TAA SpikeTides sets (Table 3-1), and
both of the isotopically labeled and non-labeled peptides were spiked into the digested UF/DF
sample. The ppm level was estimated with the assumption that the average HCP molecular weight
was 50 kDa. For HCP quantitation, the stable isotopically labeled peptides for the target HCPs
were spiked in the digested PA, CEX and UF/DF samples as internal standards. For HCP
quantitation through PRM, 5 to 7 product ions were used for each peptide, depending on the quality
of the transitions. The peak area ratio of each product ion of the light HCP peptide and heavy
172
internal standard peptide was calculated, and the average ratio of the several chosen product ions
was obtained to calculate the amount of the HCP peptide. When several peptides were quantified
for a certain HCP, the average of those peptides was taken to calculate the abundance of the
corresponding HCP.
3.3.7 Spectral assay generation
The PA sample were digested and analyzed by 2D-microflow-LC-MS-DDA with 10
fractions from the first dimension separation, and the resultant raw data were used to generate an
assay library which was used to assist the targeted analysis of DIA data interpretation. The resultant
raw DDA data were searched by Myrimatch (28) and MS-GF+ (29) with semi-tryptic searching.
Cysteine carbamidomethylation was set as fixed modification. Oxidation of methionine and
glutamine, and isotopically labeled lysine (+10 m/z) and arginine (+8 m/z) were set as dynamic
modifications. Two missed tryptic cleavage was allowed. Mass tolerance for the precursor ions
and the fragment ions was set at 6 ppm. The peptide false discovery rate (FDR) was 1%. The
validation of spectral matches was performed by PeptideShaker (30). Then the search results were
processed by an in-house script, which selected the highest scoring spectrum for each charge state
of the peptides identified and generates a PRM assay based on the at least 3 and maximum 8
highest intensity identified by and/or y ion transitions and retention time normalized based on a
set of retention time calibrant peptides.
173
3.4 Results and discussion
Therapeutic mAb samples at various stages of purification were provided by Bristol-Myers
Squibb. The estimated HCP level was determined at Bristol Myers using a commercial (non-
optimized) ELISA kit (Cygus Technologies, Southport, NC). The ELISA results showed more
than 700 ppm of HCPs in the PA sample, about 50 ppm in the CEX sample, and less than 20 ppm
in the UF/DF sample. No further information such as HCP identity or specific HCP levels was
known. HCP analysis consists of a sample of relatively low complexity with a concentration range
of up to 5 to 6 orders of magnitude between the HCPs and the therapeutic drug. In order to obtain
preliminary information on the sample, we performed 2D-LC-MS/MS-DDA utilizing high
resolution/mass accuracy mass spectrometry. The preliminary information described below
provided significant insight for developing a novel workflow. With the collaboration of Simion
Kreimer, a workflow for HCP analysis based on 1D-LC-MS-DIA, followed by PRM, was
developed. In the following, the preliminary studies using 2D-LC-MS/MS-DDA and quantitation
of HCPs using 1D-LC-MS-PRM with isotopically labeled peptides will be described and the
workflow summarized. The details of the novel workflow will be presented in Simion Kreimer’s
thesis.
3.4.1 Low pH RP LC gradient optimization
For low pH RP nanoflow-LC-MS/MS, we optimized the gradient time for HCP sample
analysis. The gradient time affects the LC separation power as well as MS performance. With a
given gradient steepness and a given amount of analyte loaded on the column, longer gradient
times yield improved resolution, but wider peak widths, and hence lower signal intensity. Higher
174
LC separation power decreases species overlap and hence reduces potential ion suppression for
MS from peptides from the therapeutic drug, but lower signal intensity might affect detection of
HCP peptides. Shorter gradient times will yield sharper peaks and thus higher signal, and also
higher throughput. However, large amounts of therapeutic product peptides may reduce the chance
to detect co-eluting HCP peptides at low levels. Thus, there needs to be an optimum in gradient
rate.
We tested digested CEX purified samples with gradient times of 2 hours, 3 hours, and 4
hours, respectively, ramping linearly from 2% to 32 % of mobile phase B using DDA for MS
analysis. The resultant MS raw data were searched with the CHO protein sequence database
obtained from the C. griseus genome published in 2013 by Lewis et al. (26) by PD 1.4. The results
showed that the 2-hour gradient separation yielded the best performance, balancing separation
power and sensitivity, with highest number of identified peptides and HCPs (Table 3-1).
Table 3- 1 The number of identified HCPs and peptides along with different length of LC
separation gradient.
2-hour gradient 3-hour gradient 4-hour gradient
Number of identified HCPs a 41 33 25
Number of identified peptides b 558 531 466
a The criteria is the HCPs identified in at least two out of three technical replicates.
b Total number of peptides identified from the therapeutic protein as well as the HCPs, but not
peptides identified from proteins considered as common contaminants in the common repository
of adventitious protein (cRAP) sequences database (31).
175
3.4.2 HCP sample preparation protocol
HCP sample preparation for LC-MS-based approaches is challenging. There are several
critical factors that need to be considered when choosing the suitable protocol for HCP analysis.
First, sample preparation without HCP enrichment is preferred. HCP species co-purified with the
mAb would either have similar physiochemical properties to those of the mAb or be associated
with the mAb molecule itself by attractive interactions. Consequently, certain HCP species could
be lost with any enrichment method.
Second, acetone precipitation was chosen as the desalting and protein recovery procedure
before protein digestion, instead of using denaturing detergents such as RapGiestTM SF surfactant.
Acetone precipitation is a well-accepted and widely used protein recovery procedure for proteomic
sample preparation (32, 33). The procedure is easy to use with a high level of protein resuspension
(33). With acetone precipitation, most of the formulation components (e.g. salts) in the original
mAb sample can be removed with good protein recovery (24, 32). On the other hand, using
denaturing detergents such as RapGiestTM SF does not remove the original buffer components and
requires a separate desalting procedure. Thus, we chose acetone precipitation as the sample
preparation step.
3.4.3 HCP identification and estimation by 2D LC-MS/MS with DDA for PA and CEX
samples for preliminary testing
Preliminary HCP identification and quantitative abundance of the PA and CEX samples
were conducted by 2D LC-MS/MS with DDA. As mentioned in Materials and Methods Section
176
3.2, 20 fractions were taken from the first dimension high pH LC separation, and the 2-hour linear
gradient was used for the second dimension low pH separation. The average of normalized PSM
counts (number of peptide spectrum matches) between replicates was used as an estimate for label-
free quantitation.
The HCP distribution in terms of the PSM counts is shown in Figure 3-1, and the specific
HCPs are shown in Tables 3-2 and 3-3. PSM counting is a widely used label-free strategy for
relative quantitation. It is based on the empirical observation of positive correlation between the
number of identified MS/MS spectra and the peptide/protein amounts based on data dependent
acquisition (DDA) (34). Since this approach does not directly measure any protein or peptide
physical properties, it is semi-quantitative. However, since we aim at getting preliminary
understanding of the HCP sample, PSM counting is suitable for relative amount comparison for a
give HCP between the samples after different purification steps and for individual HCPs in the
same sample with significant difference of PSM counts.
With the PA sample, 728 HCPs were identified in both of the replicates with at least 2
PSMs. Among the HCPs, 43 were with at least 50 PSMs, and 211 HCPs were within the 10 to 49
PSM range. The number of the proteins identified with less than 10 PSMs was 474. Although a
large number of HCPs were detected, many were of low abundance. On the other hand, in the CEX
sample, 151 HCPs were identified in at least two out of three replicates with at least 2 PSMs. Five
HCPs were with at least 50 PSMs, and 20 were with from 10 to 49 PSMs. As shown in Figure 3-
1, the total number of identified HCPs, as well as the number with high PSM counts in the CEX
samples, significantly decreased compared with those in the PA sample, as expected. Examining
individual HCPs in Tables 3-2 and 3-3, a large number identified in the PA sample with high PSM
counts were no longer found in the CEX sample or were found with very low PSM counts (HCPs
177
with less than 10 PSMs in the CEX sample are not shown). The results demonstrate that CEX
chromatography was effective to reduce the residual HCPs carried along from the Protein A
purification.
Figure 3- 1 The number of proteins as a function of PSM counts for PA and CEX samples.
A. The comparison of identified HCP numbers for the PA and CEX samples. B. A chart of the
comparison of identified HCP numbers for the PA and CEX samples. The identified HCPs were
grouped into three categories: those with at least 50 PSMs, from 10 to 49 PSMs, and from 2 to 9
PSMs.
Table 3- 2 The list of identified HCPs in the PA sample with at least 50 PSM counts.
HCP identified with at least 50 PSMs in the
PA sample
PSM
counts
Number of identified
unique peptides
1 Putative phospholipase B-like 2 503 55
2 Clusterin 377 34
3 Elongation factor 2 320 53
4 Endoplasmin 277 58
5 78 kDa glucose-regulated protein precursor 249 44
6 Serine protease HTRA1 isoform X2 242 27
178
Table 3-2 (continued)
HCP identified with at least 50 PSMs in the
PA sample
PSM
counts
Number of identified
unique peptides
7 Pyruvate kinase PKM isoform X2 239 48
8 Glyceraldehyde-3-phosphate dehydrogenase 172 28
9 Elongation factor 1-alpha 1 168 25
10 Protein disulfide-isomerase, partial 128 29
11 Glutathione S-transferase P 1 128 20
12 α-Enolase isoform X3 121 33
13 Lysosomal alpha-glucosidase isoform X2 111 26
14 Filamin-B isoform X4 110 41
15 Elongation factor 1-gamma 110 20
16 Protein disulfide-isomerase A3 precursor 109 26
17 Isoamyl acetate-hydrolyzing esterase 1
homolog
109
24
18 Calreticulin precursor 104 18
19 Cytosolic purine 5'-nucleotidase 99 22
20 Hypoxia up-regulated protein 1 precursor 97 33
21 Heat shock cognate 71 kDa protein 94 22
22 Actin, cytoplasmic 1 88 16
23 Filamin-A isoform X4 86 33
24 Complement C1r subcomponent 84 20
25 Complement C1s subcomponent 84 20
26 T-complex protein 1 subunit theta isoform X3 80 22
27 Heat shock protein HSP 90-beta 77 28
28 Transketolase isoform X2 75 23
29 Peroxiredoxin-1 73 19
30 Fructose-bisphosphate aldolase A isoform X3 73 21
31 Uncharacterized protein LOC103163294 68 2
32 Lumican 67 12
33 Phosphoglycerate kinase 1 67 16
34 Alanine--tRNA ligase, cytoplasmic 66 26
35 Plasminogen activator inhibitor 1 isoform X2 66 21
36 Prolow-density lipoprotein receptor-related
protein 1 isoform X3
65
25
37 Myosin-9 isoform X3 62 29
38 Adenylyl cyclase-associated protein 1 59 18
39 ATP-citrate synthase isoform X3 58 21
40 T-complex protein 1 subunit zeta 58 20
41 Lipoprotein lipase isoform X2 52 23
42 Protein-glutamine gamma-glutamyltransferase
2
51
22
43 Fibronectin isoform X11 50 55
Table 3- 3 The list of identified HCPs in the CEX sample with at least 10 PSMs and their corresponding PSM counts in the PA
sample.
HCP identified with at least
10 PSMs in the CEX sample
Theoretical
pI
PSM counts in
the CEX
sample
Number of
identified unique
peptides in the
CEX sample
PSM counts in
the PA sample
Number of
identified unique
peptides in the
PA sample
1 Putative phospholipase B-like 2 5.90 175 42 503 55
2 78 kDa glucose-regulated
protein precursor
5.07 127 40 249 44
3 Uncharacterized protein
LOC103163294
6.32 97 5 68 2
4 Protein disulfide-isomerase 4.84 75 24 128 29
5 Clusterin 5.51 55 14 377 34
6 Uncharacterized protein
LOC103161293
4.88 32 4 19 3
7 Uncharacterized protein
LOC100756391 isoform X2
5.58 29 17 19 11
8 Anionic trypsin-2 isoform X2 4.79 26 3 23 2
9 Calreticulin precursor 4.33 24 10 104 18
10 Multidrug resistance protein 1 8.86 21 1 (4)* 1
11 Protein artemis isoform X6 8.65 21 3 (1)* 1
12 Desmoplakin isoform X2 6.53 21 13 4 1
13 Apoptogenic protein 1,
mitochondrial
9.33 20 1 18 2
14 Uncharacterized protein
LOC103162254 isoform X2
9.12 18 14 8 1
15 Fam178a family with sequence
similarity 178, member A
8.95 18 2 27 2
16 Olfactory receptor 11H6 8.76 18 8 19 3
17 Anionic trypsin-2 7.46 17 4 16 1
17
9
Table 3-3 (Continued)
HCP identified with at least
10 PSMs in the CEX sample
Theoretical
pI
PSM counts in
the CEX
sample
Number of
identified unique
peptides in the
CEX sample
PSM counts in
the PA sample
Number of
identified unique
peptides in the
PA sample
18 Fibrous sheath-interacting
protein 2 isoform X2
5.78 16 4 22 2
19 Myeloid cell surface antigen
CD33 isoform 2 precursor
8.29 15 1 (9)* 1
20 Glutathione S-transferase P 1 7.64 15 4 128 20
21 Lg κ chain V-III region MOPC
321-like isoform X2
6.08 13 3 24 3
22 Glyceraldehyde-3-phosphate
dehydrogenase
8.49 13 6 172 28
23 Complement C1r
subcomponent
5.70 12 6 84 20
24 Protein-glutamine gamma-
glutamyltransferase 2
5.08 10 9 51 22
25 Hypoxia up-regulated protein 1
precursor
5.09 10 7 97 33
* These HCP were only found in one replicate in the PA sample analysis, and the PSM counts obtained in the only replicate are shown
in the parentheses.
The highlighted columns indicate the HCPs with a theoretical pI higher than 8.00 which were at higher or compatible level in the CEX
sample compared with those in the PA samples.
18
0
181
Since the PA and CEX samples were analyzed under the same 2D-LC-MS/MS protocol
with 20 fractions collected with the first dimension separation, the PSM counts can be used as an
indicator of the relative HCP level. The HCPs identified with at least 10 PSMs in the CEX sample
are listed in Table 3-3 as well as their corresponding PSM counts obtained in the PA sample.
Comparing Tables 3-2 and 3-3, it can be seen that most of the HCPs were significantly decreased
in PSM counts in the CEX sample in comparison to the PA sample. For example, the PSM count
of putative phospholipase B-like 2 dropped from 503 to 175 after CEX purification; the clusterin
PSM number was 377 in the PA sample and 55 in the CEX sample; and calreticulin precursor was
with 104 and 24 PSM counts in the PA and CEX sample.
However, several HCPs showed comparable, or even higher, levels in the CEX samples.
Apoptogenic protein 1, uncharacterized protein LOC103162254 isoform X2, fam178a family with
sequence similarity 178, and Olfactory receptor 11H6 are found with roughly 20 PSM counts in
both the CEX and PA samples. All have a theoretical pI value larger than 8. With CEX purification,
it is likely that these basic HCPs co-eluted with the mAb. A similar explanation can be applied to
the uncharacterized protein LOC103163294 with a theoretical pI 6.32. This protein showed
compatible levels between the PA and CEX samples with more than 50 PSMs.
Interestingly, three HCPs in Table 3-3, multidrug resistance protein 1, protein artemis
isoform X6, and myeloid cell surface antigen CD33 isoform 2 precursor, were not considered as
identified HCPs in the PA sample, because they were only detected in one replicate out of two
with low PSM counts (less than 10 PSMs in the single replicate). However, the proteins passed the
filtering criteria and were considered to be identified HCPs with relatively high PSM numbers in
the CEX sample. It is possible that they were “enriched” by the CEX chromatography since Protein
182
A purification and CEX chromatography are orthogonal methods. The fact that the HCPs were
only identified in one out of two replicates of the Protein A sample shows that the reproducibility
for low level species when determined by DDA is limited due to stochastic sampling, as is well
known (35).
3.4.4 HCP identification by 2D LC-MS/MS with DDA mode for UF/DF samples
HCP identification of the UF/DF samples was achieved by 2D LC-MS/MS-DDA with 5
fractions from the 1D high pH separation. A total number of 18 HCPs were identified in the UF/DF
sample, as shown in Table 3-4. The number of unique peptides for each HCP are also listed. UF/DF
is a buffer exchange procedure with a specific molecular weight cutoff (in this case 30 kDa) which
is generally used to concentrate, and at the same time, remove impurities of low molecular weight.
183
Table 3- 4 The identified HCPs in the UF/DF sample and their PSM counts in the PA sample*.
HCP identified in the UF/DF sample Molecular
weight
Theoretical
pI
Number of
identified unique
peptides
1 78 kDa glucose-regulated protein 72.3 kDa 5.07 8
2 PAX-interacting protein 1 isoform X2 104.4 kDa 6.57 1
3 Clusterin 51.7 kDa 5.51 4
4 DNA repair endonuclease XPF 103.2 kDa 6.79 1
5 Tubulin polyglutamylase TTLL11
isoform X3
62.7 kDa 8.90 1
6 Putative phospholipase B-like 2 65.8 kDa 5.90 4
7 Heparin cofactor 2 54.3 kDa 6.25 2
8 Protein disulfide-isomerase 54.2 kDa 4.84 3
9 cAMP-specific 3',5'-cyclic
phosphodiesterase 4D isoform X1
84.5 kDa 5.02 1
10 Protein FAM35A 102.5 kDa 6.54 1
11 Leukocyte immunoglobulin-like
receptor subfamily B member 3
isoform X3
70.6 kDa 5.91 1
12 Probable G-protein coupled receptor 75 59.4 kDa 9.19 1
13 Protocadherin-9 113.8 kDa 5.23 1
14 Ewing's tumor-associated antigen 1
isoform X2
94.5 kDa 6.45 1
15 Cadherin-13 77.6 kDa 4.96 1
16 Cadherin EGF LAG seven-pass G-type
receptor 3 isoform X2
357.7 kDa 6.30 1
17 Uncharacterized protein
LOC103162254 isoform X2
49.9 kDa 9.12 1
18 MAP/microtubule affinity-regulating
kinase 4 isoform X2
74.0 kDa 9.72 1
The shaded columns demonstrate the HCPs which were not identified in either the PA or the CEX
samples.
Comparing the identified HCP list of the UF/DF (Table 3-4), CEX (Table 3-3), and PA
(Table 3-2) samples, several HCPs, among the highest abundance in the PA sample, are still on
the top of the list in the CEX sample, and also identified in the UF/DF samples. These proteins are
putative phospholipase B-like 2, 78 kDa glucose-regulated protein, protein disulfide-isomerase,
184
and clusterin. Interestingly, examining other HCP studies, these four proteins also stand out as
being among the most commonly reported HCPs with mAb and Fc fusion proteins.
Clusterin has been identified in different mAbs even after several chromatographic
purifications based on different separation mechanisms including Protein A and CEX (10, 19, 21,
36, 37). Moreover, Levy et al. reported that clusterin showed interaction with several different
mAbs and Fc fusion proteins through cross-interaction chromatography (CIC) (11). Similarly,
putative phospholipase B-like 2 (8, 36), 78 kDa glucose-regulated protein (19, 21, 36, 37), and
protein disulfide-isomerase (13, 21, 36) have been identified in the post-Protein A purification and
CEX chromatography in different mAb molecules.
As shown in Table 3-4, most of the identified HCPs, 13 out of 18, were identified with
only one unique peptide. After multiple purification steps, this post-UF/DF mAb sample contained
relatively low amounts of HCPs, resulting in low numbers of unique peptides. Note that, the PSM
counts obtained in these experiments cannot be used to compare the relative amounts of a give
HCP in the Tables 3-2 or 3-3, because the fraction numbers after the first dimension separation
were different.
3.4.5 HCP quantitation based on PRM and isotopically labeled internal standards
Quantitation of several HCPs was next developed with 1D-LC-MS using parallel reaction
monitoring (PRM) to explore an approach for HCP analysis. To evaluate the sensitivity and
specificity of the PRM method, 11 standard peptides, chosen from a commercial tumor associated
antigen (TAA) kit, were spiked into the digested UF/DF sample with their isotopically labeled
homologues. The overall protein amount of the sample was determined by BCA assay. The ppm
185
level of each peptide will depend on the molecular weight of the protein to which it is associated.
Here, we assumed the protein to be at 50 kDa. With 1D-nano-LC-MS using the PRM approach, 2
µg of digested sample was injected on the column, representing 0.04 fmol per injection or 1 ppm
level of the protein. Within the background of UF/DF digested sample, 8 peptides could be
identified at the 1 ppm (and potentially lower) level, and 1 peptide could be detected at the 5 ppm
level. Two peptides could not be detected at as high as 40 ppm level, likely due to the interference
(ion suppression) of the monoclonal antibody drug peptides. With the isotopically labeled peptides
as internal standards, 7 peptides showed good linearity with an R2 value more than 0.99, from 1
ppm to 40 ppm, 1 peptide from 1 ppm to 20 ppm, and 1 peptide from 5 ppm to 40 ppm. The
summary of the results is shown in Table 3-5, and the calibration curves in Figure 3-2. The results
demonstrate that the PRM method can be used to quantify very low HCP levels using isotopically
label peptides as internal standards.
186
Table 3- 5 Peptide pairs chosen from SpikeTide Set TAA, their identification and calibration
linear range against the post-ultrafiltration digested sample
Peptides chosen from
SpikeTides Set TAA
Identifiable at 1 ppm level Linear range for
calibration curve
KPAAGFLPSLLK √ 1- 40 ppm
LVSALIGEEK √ 1- 40 ppm
VIEASFPAGVDSSPR √ 1- 40 ppm
EGTPPIEER √ 1- 40 ppm
VGILHLGSR X Can be identified from 5 ppm level 5- 40 ppm
ESESTAGSFSLSVR X Cannot be identified at any level NA
GAAPPAAATAYDR √ 1- 40 ppm
TLGDSSAGEIALSTR √ 1- 40 ppm
GLALWEAYR √ 1-20 ppm
AASWGLPSVSLDLPR X Cannot be identified at any level. NA
TFEDIPLEEPEVK √ 1- 40 ppm
187
Figure 3- 2 The calibration curves of standard peptides from TAA SpikeTide Set.
188
Figure 3-2 (continued) The calibration curves of standard peptides from TAA SpikeTide Set. (A)
KPAAGFLPSLLK. (B) LVSALIGEEK. (C) VIEASFPAGVDSSPR. (D) EGTPPIEER. (E)
GAAPPAAATAYDR. (F) TLGDSSAGEIALSTR. (G) TFEDIPLEEPEVK. (H) GLALWEAYR.
(I) VGILHLGSR.
Table 3- 6 Target peptides and quantitation results for peptides from several HCPs.
HCPs Target peptides Molecular weight UF/DF sample CEX sample PA sample
Clusterin EIQNAVQGVK
LTQQYNELLHSLQTK
51.7 kDa 20 ppm
(CV% 20%)*
18 ppm
(CV% 42%)
387 ppm
(CV% 1%)
Putative phospholipase B VTSFSLAK
SVLLDAASGQLR
AFIPNGPSPGSR
65.8 kDa 39 ppm
(CV% 6%)
54 ppm
(CV% 56%)
336 ppm
(CV% 12%)
78 kDa glucose-regulated protein TWNDPSVQQDIK
NQLTSNPENTVFDAK
72.3 kDa 66 ppm
(CV% 3%)
50 ppm
(CV% 22%)
88 ppm
(CV% 8%)
Protein disulfide-isomerase VHSFPTLK 54.2 kDa 3 ppm
(CV% 18%)
3 ppm
(CV% 12%)
7 ppm
(CV% 3%)
*The quantitation is based on two biological replicates and three technical replicates. The calculation of the CV% is based on the two
biological replicates. The technical replicates CVs were much smaller (shown in Table 3-7). The lysine and arginine residues of the
internal standard peptides are labeled with stable isotopes that produce a mass shift of +8 Da and +10 Da, respectively.
18
9
Table 3- 7 The quantitative information of the selected HCPs of the two biological replicates.
HCPs UF/DF sample CEX sample PA sample
Replicate 1 Replicate 2 Replicate 1 Replicate 2 Replicate 1 Replicate 2
Clusterin 22 ppm
CV% 4.6%*
17 ppm
CV% 3.1%
13 ppm
CV% 1.5 %
23 ppm
CV% 2.4%
390 ppm
CV% 1.1%
383 ppm
CV% 4.9%
Putative phospholipase B 38 ppm
CV% 10.5%
41 ppm
CV% 3.3%
32 ppm
CV% 0.5%
75 ppm
CV% 2.3%
308 ppm
CV% 0.9%
364 ppm
CV% 1.4%
78 kDa glucose-regulated protein 64 ppm
CV% 14.9%
67 ppm
CV% 8.9 %
42 ppm
CV% 4.7%
57 ppm
CV% 5.7%
92 ppm
CV% 1.2%
83 ppm
CV% 4.4%
Protein disulfide-isomerase 3 ppm
CV% 8.5 %
2 ppm
CV% 37%
2 ppm
CV% 4.4 %
3 ppm
CV% 7.6%
8 ppm
CV% 10.8%
5 ppm
CV% 9.7%
*The calculation of the CV% is based on the three technical replicates.
19
0
191
Given the PRM method developed above, the four HCPs discussed in the previous section
- clusterin, putative phospholipase B, 78 kDa glucose-regulated protein, and protein disulfide-
isomerase - were quantitated in the PA, CEX, and UF/DF samples using homologous isotopically
labelled peptides as internal standards. The quantitative results are listed in Table 3-6. The large
decrease in concentration for the tested HCPs between PA and CEX can be seen. On the other
hand, little change occurred between the CEX and UF/DF samples, as ultrafiltration was used
mainly as a buffer exchange step and the filter cut-off was 30 kD.
Note that protein disulfide-isomerase was at a very low level in the PA sample and did not
change very much CEX and UF/DF samples. The precursor and product ions of both heavy and
light peptide “VHSFPTLK” are shown in Figure 3-3. It is clear that the quality of the transitions
was good for identification and quantitation even at such a low level. For low level peptides which
can only result in poor quality MS/MS spectra, the isotopically labeled homologues help to confirm
the identification and quantitation of the peptide since both the peptide and its isotopic internal
standard have the same retention time and fragment pattern.
192
A. B.
Figure 3- 3 Precursors and fragments of the peptide VHSFPTLK of protein disulfide-isomerase.
(A) The light peptide. (B) The heavy peptide. The spectrum is from the PRM analysis by Skyline.
3.4.6 The generation of assay library from 2D-microflow-LC-MS-DDA
A spectral assay library was generated with PA sample for DIA data analysis. 2D-LC-MS-
DDA was used, and the second dimension separation was a microflow-LC system with ACQUITY
UPLC M-class peptide CSH C18 column (1.7 µm beads, 130Å, 0.3 x 150 mm). Ten fractions from
the first dimension separation was used instead of 20 in order to increase the time efficiency. In
193
order to obtain as much spectral assay as possible, three injections for each fraction were made;
30 µg of sample was injected for the first run, and 15 µg was for the second run. Then those MS
raw files were analysis by database search. In the third run, around 100 species which were
identified with high confidence in the previous two runs were excluded for MS2 scan. In this way,
a total number of 4,535 assays corresponding to 3,505 peptides in various charge states were
generated, and the number of proteins identified was 759. The reason of using PA sample as well
as microflow-LC will be discussed in the next section.
3.4.7 The insights provided by the preliminary results from 2D-LC-MS/MS-DDA and the
generation of the novel workflow
In the present study, 2D-LC-MS/MS strategy with the PSM counting approach for relative
quantitation provided high sensitivity and selectivity to distinguish the HCP peptides of low
abundance from the high level of therapeutic protein peptides. 1D-LC-MS-PRM showed high
sensitivity to quantify species at very low ppm level with high throughput. This strategy was able
to identify and track HCP amount in the therapeutic mAb samples across several purification steps,
which support the reasoning design of downstream processing. 2D-LC separation power decreases
species co-elution and hence reduces potential ion suppression for MS. The DDA mode is currently
the conventional strategy for MS data acquisition, and many data analysis methods are available
and ready to use. PSM counting was used to estimate the HCP level across different purification
steps and provided a straightforward reference for the purification efficiency.
194
Disadvantages of 2D-LC-MS-DDA strategy for HCP analysis
Despite the high resolving power and straightforward data interpretation, some
disadvantages remain for the 2D-LC-MS-DDA strategy. First, the throughput of 2D-LC is low.
For a given sample, it took around more than a week for triplicate runs of the second dimension
LC to analyze the 20 fractions from the first dimension separation. Decreasing the number of
fractions can increase the throughput but may compromise the analysis sensitivity due to co-elution
of HCP and therapeutic peptides. Nonetheless, even with the 5 fractions used to analyze the UF/DF
sample, the throughput is still not desirable that each sample needs several days for analysis. Thus,
this strategy may not be practical when one aims to screen HCPs rapidly.
Second, with DDA, the reproducibility of low abundant HCP detection is limited. As
discussion in section 1.8.2, Chapter 1, DDA sampling for MS2 scan biases toward the high
abundant species, e.g. top 15 highest abundant precursors, and the analytes with low abundance
may not be sampled in every technical replicate runs due to such stochastic sampling. With the PA
sample, for example, there were more than 1600 HCP species detected in either replicate, but only
about half of them were identified in both replicates. Since they usually identified with single
unique peptide and only several PSMs, it is difficult to determine which ones were the true
identification.
Third, to avoid false positives and/or false negatives, manual checking the identified
peptide from the database search can be helpful to increase the confidence, as we did when we
analyzed the UF/DF sample. However, it can be laborious, especially when there are a large
number of HCPs which need to be confirmed, such as the PA sample. Sometimes manual checking
can still not be definitive due to low quality of the MS2 spectra.
195
Meanwhile, we also observed that the retention time of a give species could vary somewhat
from run to run. The retention time can be sensitive to factors which might not be easy to control
during the nanoflow-LC experiment. Since PRM collects MS/MS on a predefined schedule, these
variations of retention time could cause some difficulty to set up the PRM parameters in the present
study.
Moreover, this 2D-LC-MS-DDA strategy provides a rough HCP distribution in the sample
and relative amount changes of HCPs across different purification steps. 1D-LC-MS-PRM was
able to quantify individual HCPs with a high throughput using isotopically labeled peptides. On
the other hand, these customized internal standards need to be synthesized in-house or ordered
from a third party after the HCP identification, which may be time consuming, and expensive. As
a result, a more rapid approach to estimate the individual and/or overall HCP levels is desirable.
Overview of the DIA-to-PRM HCP analysis workflow
Based on the preliminary results and experience, we understood that we were dealing with
hundreds of HCP species in the HCP sample of early purification stage and several tens in the final
product. Such kinds of samples are not as complex as proteomic samples such as cell lysate (with
thousands of proteins). The dynamic range of the HCP sample is high, where, the protein species
of interest are at very low levels in the high abundance of therapeutic protein.
In collaboration with Simion Kreimer and detailed in his thesis, a DIA-to-PRM HCP
analysis workflow was developed for HCP identification and quantitative estimation (Figure 3-4).
In this workflow, 1D-microflow-LC-MS with DIA was used to test the sample at later stages of
purification. A therapeutic-specific spectral assay library was generated from the sample at early
196
purification stage (Protein A) by 2D-LC-MS-DDA, which was used to assist the targeted DIA data
analysis. The untargeted DIA data analysis was achieved with the CHO protein sequence database.
The combination of targeted and untargeted DIA data analysis was applied to interpret the DIA
raw data. The putative peptide identifications were then tested by the 1D-LC-PRM method to
validate the identification. With this workflow, a total number of 37 HCPs were identified in the
UF/DF sample. Compared with 18 HCPs identified with the 2D-LC-MS-DDA approach cited
earlier, the increased number of identified HCPs indicates improved sensitivity of the HCP
analysis.
Figure 3- 4 The scheme of the DIA-to-PRM HCP analysis workflow.
197
Insights provided with 2D-LC-MS-DDA and the reasoning of the DIA-to-PRM HCP analysis
workflow
The preliminary results obtained with the 2D-LC-MS-DDA provided valuable insight to
guide the design of the DIA-to-PRM workflow. First, despite the high sensitivity with nanoflow
LC-MS, microflow-LC separation was chosen for the new workflow to enhance the robustness
and reproducibility of the retention time as well as signal intensity. With a wider column (300 m),
more sample material can be injected, and hence overcome the potential sensitivity decrease. The
increased robustness and reproducibility allowed the automation of PRM parameter setting as well
as HCP quantitative estimation without isotope homologues. Second, DIA was used instead of
DDA. As discussion in 1.8.2, Chapter 1, DIA systematically fragments all precursor ions, which
is suitable for detection of low abundant species.
Third, the therapeutic-specific spectral assay library generated from the PA sample was
used for targeted DIA data analysis. HCP identified Protein A pool is a reasonable HCP reference
for the sample after purification, especially for the relatively high abundance HCPs. Examing the
detected HCPs in either of the two replicates of the PA samples regardless their PSM counts,
around 1600 proteins, all HCPs identified with at least 10 PSMs in the CEX sample (Table 3-3)
were identified in the PA samples, indicating the good coverage of HCP reference from the PA
sample. The null CHO cell line was not used for spectral assays generation, which could cover the
whole proteome. The reason is that certain HCPs may be at low abundance in the CHO protein,
but can carry along with the drug product through Protein A purification due to attractive
interaction between mAb and HCP, and hence be present at relatively high level in the PA sample.
As a result, using the PA sample can yield more identified peptides and MS2 spectral with a higher
quality of HCPs, compared with the spectral assays generated from the null CHO cell line.
198
Fourth, besides the targeted spectral assay, the untargeted database search is needed for
more complete DIA analysis. Although the spectral assay obtained from the PA sample is a good
reference for HCP pool, this reference does not necessarily reflect the comprehensive residual HCP
profile. For example, three HCPs identified with high PSM counts were only detected in one
replicate of PA sample analysis. There are 22 proteins identified in the CEX sample which could
not be found in the 1600 proteins of the PA sample, though all 22 proteins had very low PSM
counts between 2 to 10. This result indicates that certain HCPs were overwhelmed by the presence
of other high abundant HCPs in the PA sample with DDA analysis; however, they were observed
in the later CEX sample through orthogonal purification mechanism. The untargeted database
search using the CHO protein sequence database can overcome this drawback to enhance the depth
of DIA data interpretation.
Moreover, the putative identification obtained from the DIA data analysis can be validated
by the following PRM test, which can avoid tedious manual checking and be with high sensitivity
and selectivity. The preliminary results showed that 1D-LC-PRM can identify at single digit ppm
level of HCP species.
3.5 Conclusion
As a major class of process related-impurities, HCP detection and quantitation is of great
importance for drug quality control. In this chapter, 2D-LC-MS with DDA was used to analyze
the HCP distribution and profile changing in the mAb therapeutic product along several
purification steps. Several individual HCPs were quantified with isotopically labeled peptides as
internal standard using 1D-LC-MS-PRM. The preliminary results and understanding obtained
199
from this study provided valuable information in the following workflow development. A DIA-to-
PRM workflow of HCP analysis was developed with the collaboration of Simion Kreimer, and
details of this workflow can be found in Simion Kreimer’s thesis.
3.6 References
1. Wang X, Hunter AK, & Mozier NM (2009) Host cell proteins in biologics development:
identification, quantitation and risk assessment. Biotechnol. Bioeng. 103(3):446-458.
2. Gao SX, et al. (2011) Fragmentation of a highly purified monoclonal antibody attributed
to residual CHO cell protease activity. Biotechnol. Bioeng. 108(4):977-982.
3. Robert F, et al. (2009) Degradation of an Fc-Fusion Recombinant Protein by Host Cell
Proteases: Identification of a CHO Cathepsin D Protease. Biotechnol. Bioeng.
104(6):1132-1141.
4. Bracewell DG, Francis R, & Smales CM (2015) The future of host cell protein (HCP)
identification during process development and manufacturing linked to a risk-based
management for their control. Biotechnol. Bioeng. 112(9):1727-1737.
5. Zhu-Shimoni J, et al. (2014) Host cell protein testing by ELISAs and the use of orthogonal
methods. Biotechnol. Bioeng. 111(12):2367-2379.
6. Tscheliessnig AL, Konrath J, Bates R, & Jungbauer A (2013) Host cell protein analysis in
therapeutic protein bioprocessing - methods and applications. Biotechnol. J. 8(6):655-670.
7. Flatman S, Alam I, Gerard J, & Mussa N (2007) Process analytics for purification of
monoclonal antibodies. J. Chromatogr. B Analyt. Technol. Biomed. Life Sci. 848(1):79-87.
200
8. Vanderlaan M, et al. (2015) Hamster phospholipase B-like 2 (PLBL2): A host-cell protein
impurity in therapeutic monoclonal antibodies derived from Chinese hamster ovary cells.
Bioprocess. Int. 13(4):18-55.
9. de Zafra CLZ, Quarmby V, Francissen K, Vanderlaan M, & Zhu-Shimoni J (2015) Host
cell proteins in biotechnology-derived products: A risk assessment framework. Biotechnol.
Bioeng. 112(11):2284-2291.
10. Levy NE, Valente KN, Lee KH, & Lenhoff AM (2016) Host cell protein impurities in
chromatographic polishing steps for monoclonal antibody purification. Biotechnol. Bioeng.
113(6):1260-1272.
11. Levy NE, Valente KN, Choe LH, Lee KH, & Lenhoff AM (2014) Identification and
characterization of host cell protein product-associated impurities in monoclonal antibody
bioprocessing. Biotechnol. Bioeng. 111(5):904-912.
12. Aboulaich N, et al. (2014) A novel approach to monitor clearance of host cell proteins
associated with monoclonal antibodies. Biotechnol. Prog. 30(5):1114-1124.
13. Tait AS, Hogwood CEM, Smales CM, & Bracewell DG (2012) Host cell protein dynamics
in the supernatant of a mAb producing CHO cell line. Biotechnol. Bioeng. 109(4):971-982.
14. Hogwood CEM, Tait AS, Koloteva-Levine N, Bracewell DG, & Smales CM (2013) The
dynamics of the CHO host cell protein profile during clarification and protein A capture in
a platform antibody purification process. Biotechnol. Bioeng. 110(1):240-251.
15. Krawitz DC, Forrest W, Moreno GT, Kittleson J, & Champion KM (2006) Proteomic
studies support the use of multi-product immunoassays to monitor host cell protein
impurities. Proteomics 6(1):94-110.
201
16. Bomans K, et al. (2013) Identification and monitoring of host cell proteins by mass
spectrometry combined with high performance immunochemistry testing. PLoS One
8(11):11.
17. Tarrant RDR, Velez-Suberbie ML, Tait AS, Smales CM, & Bracewell DG (2012) Host cell
protein adsorption characteristics during protein a chromatography. Biotechnol. Prog.
28(4):1037-1044.
18. Berrill A, Ho SV, & Bracewell DG (2010) Product and contaminant measurement in
bioprocess development by SELDI-MS. Biotechnol. Prog. 26(3):881-887.
19. Doneanu CE, et al. (2012) Analysis of host-cell proteins in biotherapeutic proteins by
comprehensive online two-dimensional liquid chromatography/mass spectrometry. mAbs
4(1):24-44.
20. Farrell A, et al. (2015) Quantitative host cell protein analysis using two dimensional data
independent LC-MSE. Anal. Chem. 87(18):9186-9193.
21. Zhang QC, et al. (2014) Comprehensive tracking of host cell proteins during monoclonal
antibody purifications using mass spectrometry. mAbs 6(3):659-670.
22. Schenauer MR, Flynn GC, & Goetze AM (2012) Identification and quantification of host
cell protein impurities in biotherapeutics using mass spectrometry. Anal. Biochem.
428(2):150-157.
23. Valente KN, Lenhoff AM, & Lee KH (2015) Expression of difficult-to-remove host cell
protein impurities during extended Chinese hamster ovary cell culture and their impact on
continuous bioprocessing. Biotechnol. Bioeng. 112(6):1232-1242.
24. Valente KN, Schaefer AK, Kempton HR, Lenhoff AM, & Lee KH (2014) Recovery of
Chinese hamster ovary host cell proteins for proteomic analysis. Biotechnol. J. 9(1):87-99.
202
25. Doneanu CE, et al. (2015) Enhanced detection of low-abundance host cell protein (HCP)
impurities in high-purity monoclonal antibodies down to 1 ppm using ion mobility mass
spectrometry coupled with multidimensional liquid chromatography. Anal. Chem.
87(20):10283-10291.
26. Lewis NE, et al. (2013) Genomic landscapes of Chinese hamster ovary cell lines as
revealed by the Cricetulus griseus draft genome. Nat. Biotechnol. 31(8):759-765.
27. Dorfer V, et al. (2014) MS Amanda, a universal identification algorithm optimized for high
accuracy tandem mass spectra. J. Proteome Res. 13(8):3679-3684.
28. Tabb DL, Fernando CG, & Chambers MC (2007) MyriMatch: Highly accurate tandem
mass spectral peptide identification by multivariate hypergeometric analysis. J. Proteome
Res. 6(2):654-661.
29. Kim S & Pevzner PA (2014) MS-GF plus makes progress towards a universal database
search tool for proteomics. Nat. Commun. 5:10.
30. Vaudel M, et al. (2015) PeptideShaker enables reanalysis of MS-derived proteomics data
sets. Nat. Biotechnol. 33(1):22-24.
31. Mellacheruvu D, et al. (2013) The CRAPome: a contaminant repository for affinity
purification-mass spectrometry data. Nat. Methods 10(8):730-736.
32. Jiang L, He L, & Fountoulakis M (2004) Comparison of protein precipitation methods for
sample preparation prior to proteomic analysis. J. Chromatogr. A 1023(2):317-320.
33. Bodzon-Kulakowska A, et al. (2007) Methods for samples preparation in proteomic
research. J. Chromatogr. B Analyt. Technol. Biomed. Life Sci. 849(1-2):1-31.
34. Liu HB, Sadygov RG, & Yates JR (2004) A model for random sampling and estimation of
relative protein abundance in shotgun proteomics. Anal. Chem. 76(14):4193-4201.
203
35. Chapman JD, Goodlett DR, & Masselon CD (2014) Multiplexed and data-independent
tandem mass spectrometry for global proteome profiling. Mass Spectrom. Rev. 33(6):452-
470.
36. Joucla G, et al. (2013) Cation exchange versus multimodal cation exchange resins for
antibody capture from CHO supernatants: Identification of contaminating Host Cell
Proteins by mass spectrometry. J. Chromatogr. B Analyt. Technol. Biomed. Life Sci.
942:126-133.
37. Farrell A, et al. (2015) Quantitative host cell protein analysis using two dimensional data
independent LC-MS^E. Anal. Chem. 87(18):9186-9193.
204
Copyrights
205
206
207
208
209
210