Master’s in Proteomics and Bioinformatics Shortcut Shotgun IEF · 2007-06-15 · High-throughput...

44
Master’s in Proteomics and Bioinformatics Shortcut Shotgun IEF Alexis Chauvet Biomedical Proteomics Research Group Department of Structural Biology and Bioinformatics Geneva University Project Director: Prof. Denis Hochstrasser Group Leader: Pierre Lescuyer Supervisors: Alireza Vaezzadeh Prof. Jacques Deshusses

Transcript of Master’s in Proteomics and Bioinformatics Shortcut Shotgun IEF · 2007-06-15 · High-throughput...

Page 1: Master’s in Proteomics and Bioinformatics Shortcut Shotgun IEF · 2007-06-15 · High-throughput techniques are required for effective simultaneous analysis of multiple protein

Master’s in Proteomics and Bioinformatics

Shortcut Shotgun IEF

Alexis Chauvet

Biomedical Proteomics Research Group

Department of Structural Biology and Bioinformatics

Geneva University

Project Director: Prof. Denis Hochstrasser

Group Leader: Pierre Lescuyer

Supervisors: Alireza Vaezzadeh

Prof. Jacques Deshusses

Page 2: Master’s in Proteomics and Bioinformatics Shortcut Shotgun IEF · 2007-06-15 · High-throughput techniques are required for effective simultaneous analysis of multiple protein

- i -

Abstract Recently, the field of high-throughput identification of proteins by digestion of the whole proteome with the conjugation of multi-dimensional separation techniques has been matured and is now referred as “Shotgun” proteomics. It takes advantage of the Isoelectric Focusing (IEF) for the first dimensional separation, which is beneficial due to its high loading capacity, high resolving power, broad dynamic range and high reproducibility. A total extract of proteins (often purified by precipitation) is tryptically digested and purified before being focused on IPG strips as a first dimension. Reverse phase chromatography is used for the second dimension. Once the focusing is obtained, the strips are cut in tens of fractions and peptides are extracted before being further separated by LC and analysed by data-dependent acquisition in tandem mass spectrometry to obtain spectra that can be submitted to identification and analysis using powerful bioinformatics tools. The goal of this study was to further develop a shortcut approach to the original pipeline of the “shotgun isoelectric focusing”. The peptides are transferred from the IPG strip onto a PVDF membrane after IEF using a transfer technique based on the capillarity. The membrane is then covered with matrix and directly scanned in a mass spectrometer. Acquisition is performed all along the membrane to generate an MS image of the membrane representing the peptide distribution. The ultimate goal was to analyze this image to determine how the membrane should be fractionated to extract the peptides for the second dimension to obtain the most identification. Another goal was to perform tandem MS directly on the PVDF membrane in an attempt to bypass the second dimension (LC) to render the process high-throughput. Acknowledgments I am greatly indebted to Alireza Vaezzadeh for accepting me on his journey through the shotgun isoelectric focusing world. He has had endless patience to teach me and assist me along the way. He possesses a rare faculty of leadership mixed with friendship which makes it a pleasure to work with him. I have learned not only from a scientific point of view but also on personal grounds and continue to do so. I would like to express my gratitude to Professor D. Hochstrasser for taking the time to supervise my work and give answers to all my questions. I would like to thank the whole BPRG group for their help and support. Special thanks go to Alexandre Heinard and Loïc Dayon for helping me pass my samples on the Q-TOF. I am very grateful to Professor J. Deshusses and Dr. P. Lescuyer for their day to day counsels and supervision. I would also like to thank the whole MSight team for spending hours on developing specific tools for us. Special thanks go to Daniel Walther and Sébastien Catherinet for listening to my endless complaints and solving my problems and I highly thank Patricia Palagi and Pierre-Alain Binz which provided inestimable help on the ASMS poster. I am also grateful to René Demellayer and his team from the Ecole d’Ingénieur de Genève for helping us develop the sarcophagi.

Page 3: Master’s in Proteomics and Bioinformatics Shortcut Shotgun IEF · 2007-06-15 · High-throughput techniques are required for effective simultaneous analysis of multiple protein

- ii -

Table of contents

1. INTRODUCTION.....................................................................1

1.1 “Shortcut” shotgun isoelectric focusing ........................................ 1

1.2 Proteomics ........................................................................................ 2

1.2.1 Protein separation ..................................................................................... 2 1.2.1.1 Electrophoretic separation............................................................................3

1.2.1.1.1 SDS-PAGE............................................................................................3 1.2.1.1.2 Isoelectric focusing ...............................................................................3 1.2.1.1.3 Staining..................................................................................................4 1.2.1.1.3.1 Coomassie staining.............................................................................5 1.2.1.1.3.2 Silver staining.....................................................................................5 1.2.1.1.4 One-dimensional versus two-dimensional SDS-PAGE ........................5

1.2.1.2 Transblot.......................................................................................................6 1.2.1.3 Liquid chromatography................................................................................7

1.2.2 Mass spectrometry.................................................................................... 8 1.2.2.1 Sample preparation for mass spectrometry ..................................................9

1.2.2.1.1 Sample clean-up ....................................................................................9 1.2.2.1.2 Trypsin ..................................................................................................9

1.2.2.2 MALDI-TOF MS .........................................................................................9 1.2.2.3 Tandem mass spectrometry........................................................................11 1.2.2.4 Other mass spectrometry techniques..........................................................11

1.2.2.4.1 Electrospray ionization........................................................................11 1.2.2.4.2 Quadrupoles and ion traps...................................................................12

1.2.3 Protein identification .............................................................................. 12 1.2.3.1 Peptide mass fingerprinting........................................................................13 1.2.3.2 De novo sequencing ...................................................................................14 1.2.3.3 LC-MS/MS.................................................................................................14 1.2.3.4 Bioinformatics............................................................................................14

1.2.3.4.1 Data analysis .......................................................................................14 1.2.3.4.3 Proteomics tools ..................................................................................15

1.3 Shotgun IEF........................................................................................ 16

1.3.1 Concepts ................................................................................................. 17

1.3.2 Shortcut shotgun IEF.............................................................................. 18

1.3.3 MS imaging and quantitation ................................................................. 19

Page 4: Master’s in Proteomics and Bioinformatics Shortcut Shotgun IEF · 2007-06-15 · High-throughput techniques are required for effective simultaneous analysis of multiple protein

- iii -

2. MATERIALS AND METHODS............................................21

2.1 Reagents and chemicals..................................................................... 21

2.2 Sample preparation ........................................................................... 21

2.2.1 Growth conditions and time point .......................................................... 21

2.2.2 Chloroform precipitation........................................................................22

2.2.3 Digestion protocol .................................................................................. 22

2.2.4 Purification ............................................................................................. 22

2.3 Electrophoresis...................................................................................22

2.3.1 IPG-IEF .................................................................................................. 22

2.3.2 SDS......................................................................................................... 23

2.4 Transblot............................................................................................. 23

2.4.1 Centrifugation......................................................................................... 23

2.4.2 Capillarity ............................................................................................... 24

2.5 Peptide extraction and purification ................................................. 24

2.5.1 From an IPG strip ................................................................................... 24

2.5.2 From a PVDF membrane ....................................................................... 25

2.6 Mass spectrometry............................................................................. 25

2.6.1 MS imaging ............................................................................................ 25

2.6.2 LC-MS/MS............................................................................................. 26 2.6.2.1 TOF/TOF....................................................................................................26 2.6.2.2 Q-TOF ........................................................................................................26

2.7 Data analysis....................................................................................... 26

2.7.1 Protein identification .............................................................................. 26

2.7.2 Visualisation ........................................................................................... 27

3. RESULTS AND DISCUSSION..............................................29

3.1 Transfer .............................................................................................. 29

3.1.1 Centrifugation......................................................................................... 29 3.1.1.1 Results ........................................................................................................29 3.1.1.2 Discussion ..................................................................................................30

3.1.2 Capillarity ............................................................................................... 31 3.1.2.1 Results ........................................................................................................31 3.1.2.2 Discussion ..................................................................................................33

Page 5: Master’s in Proteomics and Bioinformatics Shortcut Shotgun IEF · 2007-06-15 · High-throughput techniques are required for effective simultaneous analysis of multiple protein

- iv -

3.2 Capture membrane............................................................................ 33

3.2.1 Results .................................................................................................... 33

3.2.2 Discussion............................................................................................... 34

3.3 Strip versus membrane fractionation .............................................. 35

3.3.1 Results .................................................................................................... 35

3.3.2 Discussion............................................................................................... 37

3.4 Visualisation ....................................................................................... 38

3.4.1 Theoretical versus practical MS imaging............................................... 38 3.4.1.1 Results ........................................................................................................38 3.4.1.2 Discussion ..................................................................................................39

3.4.2 MSight developments.............................................................................40 3.4.2.1 Results ........................................................................................................40 3.4.2.2 Discussion ..................................................................................................42

3.5 Direct tandem MS.............................................................................. 43

3.5.1 Results .................................................................................................... 43

3.5.2 Discussion............................................................................................... 44

4. CONCLUSION AND OUTLOOK.........................................46

5. REFERENCES ........................................................................50

6. APPENDICES ........................................................................... I

6.1 Identifications of fractionation ............................................................I

6.1.1 From the IPG strip .....................................................................................I 6.1.1.1 Summary of the proteins identified.............................................................. I 6.1.1.2 Peptide matches......................................................................................... IV

6.1.2 From the PVDF membrane ................................................................. XIII 6.1.2.1 Summary of the proteins identified.........................................................XIII 6.1.2.2 Peptide matches........................................................................................XV

6.2 Mascot result of direct MS/MS......................................................XIX

Page 6: Master’s in Proteomics and Bioinformatics Shortcut Shotgun IEF · 2007-06-15 · High-throughput techniques are required for effective simultaneous analysis of multiple protein

- v -

List of abbreviations:

3D-IT Three-dimensional ion traps 3Q Triple-quadrupole AcN Acetonitrile BA Ammonium bicarbonate BSA Bovine serum albumin CA Carrier ampholytes CBB Coomassie brilliant blue CHAPS 3-{(3-Cholamidopropyl)dimethylammonio}-1-propanesulfonate CHCA α-cyano-4-hydroxycinnamic acid CID Collision induced dissociation Da Dalton DTE 1, 4-dithioerythritol EIG Ecole d’Ingénieur de Genève ESI Electrospray ionization eV Electron volt FA Formic acid FRET Fluorescence resonance energy transfer FT-ICR Fourier-transform ion-cyclotron resonance HPLC High-performance liquid chromatography Hz Hertz ICAT™ Isotope-coded affinity tagging ID Internal diameter IEF Isoelectric focusing IPG Immobilised pH gradient IT Ion trap kV kilovolt kVh kilovolt x hour LC Liquid chromatography LIMS Laboratory information management system m/z Mass-to-charge ratio MALDI Matrix-assisted laser desorption/ionisation MRI Magnetic resonance imaging MS Mass spectrometry MS/MS Tandem mass spectrometry MSI Mass spectrometric imaging MW Molecular weight NIRF Near-infrared fluorescence NMR Nuclear magnetic resonance OD Optical density ORF Open reading frame PCR Polymerase chain reaction pI Isoelectric point PMF Peptide mass fingerprinting ppm Parts per million PTM Post-translational modifications PVDF PolyVinylidene DiFluoride Q Quadrupole

Page 7: Master’s in Proteomics and Bioinformatics Shortcut Shotgun IEF · 2007-06-15 · High-throughput techniques are required for effective simultaneous analysis of multiple protein

- vi -

RPLC Reverse-phase liquid chromatography

rpm Revolution per minute

S/N Signal-to-noise ratio

SCX Strong cation exchange

SDS-PAGE Sodium dodecyl sulfate polyacrylamide gel electrophoresis

SIMS Secondary ion mass spectrometry

SSIEF Shortcut shotgun isoelectric focusing

TFA Trifluoroacetic acid

TIC Total ion current

TLC Thin layer chromatography

TOF Time-of-flight

TrEMBL Translation of European molecular biology laboratory

W Watt

UV Ultra violet

Page 8: Master’s in Proteomics and Bioinformatics Shortcut Shotgun IEF · 2007-06-15 · High-throughput techniques are required for effective simultaneous analysis of multiple protein

- vii -

List of figures

Figure 1. 1-DE SDS-PAGE and 2-DE PAGE of Staphylococcus aureus (strain N315)...... 6

Figure 2. Transfer by capillarity ........................................................................................... 7

Figure 3. Schematic of MALDI-TOF process and instrument ........................................... 10

Figure 4. The shotgun IEF workflow ................................................................................. 18

Figure 5. The sarcophagus developed with the help of the EIG.........................................24

Figure 6. Different transfer times for capillarity transfer ................................................... 31

Figure 7. Reproducibility comparison between a transfer by centrifugation and a transfer by capillarity .......................................................................................... 32

Figure 8. Scanned image of the membranes and IPG strips after transfer ......................... 33

Figure 9. Comparison of the number of identifications by extraction from an IPG strip compared with an extraction from a PVDF membrane .............................. 36

Figure 10. Fractionation of the PVDF membrane after MS imaging ................................... 37

Figure 11. Superposition of an MSight image and a theoretical digestion of the proteins ................................................................................................................ 39

Figure 12. The theoretical tryptic peptides of S. aureus N315............................................. 39

Figure 13. Improvements on the image treatment using the MSight software..................... 41

Figure 14. BioMap image of the membrane......................................................................... 42

Figure 15. MS image using the 4700 MALDI-TOF/TOF .................................................... 44

List of tables

Table 1. Poteins used to make a pool of standard proteins for initial experiments........... 29

Table 2. Cmparison of the number of identifications between theoretical and practical digestion. .............................................................................................. 35

Page 9: Master’s in Proteomics and Bioinformatics Shortcut Shotgun IEF · 2007-06-15 · High-throughput techniques are required for effective simultaneous analysis of multiple protein

- 1 -

1. INTRODUCTION

1.1 “Shortcut” shotgun isoelectric focusing High-throughput techniques are required for effective simultaneous analysis of multiple protein samples. There are three essential functions inherent to a high-throughput proteomic analysis: it should be able to identify the proteins present in a complex sample in a fast and reproducible manner, permit quantitation for comparisons between different samples and characterize possible protein modifications. Up to date, few efficient high-throughput workflow exist that can be applied for the analysis of complex mixtures typically found in clinical samples. The shotgun isoelectric focusing workflow is the coupling of two highly performing technologies: shotgun proteomics and isoelectric focusing (IEF). Shotgun proteomics, where a tryptic digest of a complex proteome sample is directly analyzed by either mono- or multidimensional liquid chromatography tandem mass spectrometry (LC-MS/MS), has gained general acceptance in the proteomics world. Such an approach can be facilitated by the use of multidimensional protein identification technology (MudPIT), which incorporates multidimensional high-pressure liquid chromatography, tandem mass spectrometry and powerful database-searching algorithms [1]. On the other hand, IEF is based in the separation of proteins or peptides based on the amphoteric properties corresponding to the protein isoelectric point (pI). IEF has one of the highest resolving powers among the biochemical separation techniques. The first time shotgun IEF was mentioned was in 2004, when Stephenson et al. described the use of immobilised pH gradients (IPG) and their application as first dimension in a shotgun proteomic approach [2]. Over the recent years, a few groups have worked on the development of the shotgun IEF workflow. Much progress has been achieved, particularly in the Clinical Proteomics Group (CPG) in the Geneva University Hospital. In the shotgun IEF workflow, proteins are digested into peptides before being separated by IEF. The strips are then cut into several tens of fractions and the focused peptides are extracted from the gel before being separated in a second dimension by liquid chromatography (LC). The samples are then analyzed by mass spectrometry and the data obtained are analyzed using bioinformatics tools. Shotgun IEF is a powerful high-throughput semi-automated method for the simultaneous analysis of thousands of proteins processed under identical conditions, taking advantage of the properties of the IEF such as the high resolving power, loading capacity and reproducibility. Shotgun proteomics have also many advantages such as the increased throughput and speed due to the automated data acquisition and a better digestion of the sample due to the easier application of detergents and salts. Although such a workflow provides promising results, there are still steps such as the LC-MS/MS that are very time-consuming and labour-intensive. Therefore we developed a shortcut approach of the original workflow which is based on transferring the focused peptides onto a solid porous support, such as a PVDF membrane, for direct MS imaging or fractionation and the peptide extraction. This membrane is then scanned with a mass spectrometer, in order to create an image representing the position of the focused peptides. The fraction size is determined by visually analyzing the relative amount of peptide spots present on the MS image.

Page 10: Master’s in Proteomics and Bioinformatics Shortcut Shotgun IEF · 2007-06-15 · High-throughput techniques are required for effective simultaneous analysis of multiple protein

- 2 -

Peptides are then extracted from the membrane and the shotgun IEF workflow is resumed. Another way to shorten the workflow could be to analyze the membrane directly by tandem mass spectrometry, thus bypassing the time-consuming process of the liquid chromatography. The present study was destined to further develop and optimize the parameters of various steps in the whole workflow of the shotgun IEF. Particular attention was paid to shortening some procedures and to improve its throughput and practicability. While the whole process was studied, the following particular aspects were investigated:

• optimization of the transfer step by comparison of transfer methods • comparison of different capture membranes • comparison of fractionation and extraction from the strip or from the

membrane • comparison of theoretical versus practical MS Imaging for the visualization

step All experiments where conducted on samples containing a pool of standard proteins as a model. The new developments were then applied to the analysis of membrane proteins of Staphylococcus aureus bacterium.

1.2 Proteomics In the last century, the idea of having genomes completely sequenced was only a mere dream. It is now a reality. Fuelled by the ever-growing DNA sequence information, a new fundamental concept called proteomics – literally meaning the PROTEin complement to a genOME or the large scale analysis of proteins – was proposed by Marc Wilkins [3]. It quickly became one of the most important disciplines for characterizing and identifying gene function. It is also a very powerful tool for building functional linkages between protein molecules and for providing insight into the mechanisms of biological processes in a high-throughput mode. It should thus drastically help to unravel biochemical and physiological mechanisms of complex diseases at the molecular level. The term proteome was first proposed in 1994, during the first congress “From Genome to Proteome” in Sienna. It was coined to make an analogy with genomics and describe the set of proteins encoded by the genome [4]. This large-scale study of protein functions and structures has acquired a lot of maturity and now evokes not only all the proteins in any given cell, but also the set of all protein isoforms and modifications. It also makes mention of their function, sub-cellular localization, functional analysis, their interactions, the structural description of proteins and their higher-order complexes. Basically, it is involved in almost everything that has to do with the 'post-genomic'. The field of proteomics is particularly important because most diseases are expressed at the level of protein activity. Consequently, proteomics seeks to correlate directly the involvement of specific proteins, leading to the identification of new drug targets that can be used to diagnose and treat diseases. In summary, proteomics permits the investigation of thousands of proteins in a given state and time of any given sample, through in vivo, in vitro (wet lab) and in silico (dry lab) experiments.

1.2.1 Protein separation The most difficult task in protein analysis is having to cope with the multitude of diverse properties inherent to each protein: isoelectric point (pI), molecular weight (MW), etc..

Page 11: Master’s in Proteomics and Bioinformatics Shortcut Shotgun IEF · 2007-06-15 · High-throughput techniques are required for effective simultaneous analysis of multiple protein

- 3 -

A particularly problematic property is the hydrophobicity of certain proteins, especially the membrane ones. Another obstacle is their variation in the concentrations, which is more than 12 orders of magnitude in body fluids and up to 7 in the cells. Therefore, it is not possible to use a unique method for a full analysis of a complex sample, especially because of the many orders of magnitude difference, as no amplification process such as the polymerase chain reaction (PCR) is available for proteins. It is for these reasons that the critical step in proteomic studies is to use efficient pre-fractionation, concentration and separation techniques before the characterization step.

1.2.1.1 Electrophoretic separation

There are two essential techniques actually used for the separation of proteins taking advantage of the physico-chemical properties of the proteins:

• Isoelectric focusing (IEF): separation as a function of the pI of proteins • SDS-PAGE: separation as a function of the MW of proteins

Both these techniques can be used separately. However when coupled as two-dimensional gel electrophoresis (2D-GE), they become much more efficient in terms of separation and can be applied to the analysis of hundreds to thousands of proteins in the same sample.

1.2.1.1.1 SDS-PAGE

The separation of proteins using their MW is probably the oldest [5, 6] and the most employed partition technique. In order to make the proteins migrate only using their molecular mass, the proteins are mixed with an anionic detergent such as sodium dodecyl sulfate (SDS) and react to form negatively charged micelles. The SDS interacts with the protein backbone, which has been previously unfolded and denatured, at the rate of 1.4 grams of SDS per gram of protein, thus giving the same amount of SDS per protein on a weight basis. In order to actually separate the proteins, we need a matrix of polyacrylamide obtained by copolymerization of acrylamide and a cross linker such as piperazine diacrylyl. The concentration of the mixed solution will define the pore size with its concentration. This is the reason why we call this technique SDS-PAGE: sodium dodecyl sulfate polyacrylamide gel electrophoresis. Furthermore, the cross linker can have a homogenous repartition along the migration axis or produce a gradient [7], but the latter is much more difficult to obtain. The gradient is such that large pores are found at the beginning of the gel, facilitating the entrance of the proteins. The pores then tighten in the direction of the electric field allowing a wider resolution of proteins.

1.2.1.1.2 Isoelectric focusing

Isoelectric focusing (IEF) is a separation technique for proteins or polypeptides based on the amphoteric properties corresponding to their pI. In the case of a migration being effected in a liquid medium with a pH gradient, as the proteins move along this gradient their surface charge will become null. At this point they will have reached their pI and will no more be attracted by the electrical field. The anode region is at a lower pH than the cathode and the pH range is chosen such that the proteins to be separated have their pI within this range. The pH gradient is established by adding carrier ampholytes (CA).

Page 12: Master’s in Proteomics and Bioinformatics Shortcut Shotgun IEF · 2007-06-15 · High-throughput techniques are required for effective simultaneous analysis of multiple protein

- 4 -

These ampholytes are a mixture of engineered molecules which have particular pK and establish a pH gradient when mixed. These ampholytes can be found commercially, spanning either a wide pH range (e.g. pH 3 - 10) or narrow range (e.g. pH 5.3 - 6.5). Initially, the CAs were simply added to the buffer. Nowadays immobilized pH gradients (IPG's) are used, where thousands of ampholytes are linked to an acrylamide or an agarose gel to form a continuum of pIs thus creating a pH gradient. Therefore, when in an electric field, proteins become focused into sharp stationary bands with each protein positioned at a point corresponding to its pI in the pH gradient. The current use of IEF is as an analytical technique for the complexity or purity assessment of protein samples. However, as with SDS-PAGE, if the necessary quantity of pure protein is only of a few micrograms, even analytical scale electro-focusing may be sufficient to prepare it. Analytical IEF is carried out either in vertical polyacrylamide rod gels (especially as the first-dimensional separation of two-dimensional gel electrophoresis) or in horizontal polyacrylamide or agarose slab gels. Horizontal IEF is now the most widely used technique for electro-focusing. Thin layers of polyacrylamide or agarose gel containing carrier ampholytes chosen to give the correct pH range can be made in the laboratory or found commercially, dried up for storage and mounted on glass or plastic plates for easier manipulations. The samples can be applied to the gel by rehydration using a reswelling tray [8]. The advantage of such strips is that the polymerized CA stay in the gel and do not contaminate the samples. The focusing conditions depend on the size of the strip, the pore size of the gel, as well as on the pH gradient. Therefore, agarose gels would be chosen in preference to polyacrylamide gels for the separation of larger proteins because of their larger pore size. The gel thickness has also an effect on the focusing: thin gels allow shorter running times (since higher voltages can be used) and improved resolution. The biggest advantage of IEF is the excellent resolution obtained by this separation technique. IEF can indeed resolve proteins that only differ by as little as 0.01 unit of pI [9]. In other terms, proteins differing only by one net charge can be separated. Another advantage of IEF is the great loading capacity of the IPG strips, allowing loading of high amounts of sample for the detection of low abundant proteins.

1.2.1.1.3 Staining

Separating the proteins using their physico-chemical properties is important for all proteomic studies but they would be useless without protein visualization, as it has a direct influence on their detection and characterization. Furthermore, it is an important step in sample preparation as it establishes a quality control. The best way to determine the location of the proteins on a gel is by staining it using metallic ions (silver, zinc) or organic dyes such as Coomassie brilliant blue (CBB), SYPRO or Amido-Black. Other methods are also available to detect spots, such as radio-isotope labelling, antibody labelling, immuno-fluorescence, UV light absorbance, affinity tagging, etc.. Each method has its advantages and weak points, but only some examples of the organic dyes and metallic ions staining will be discussed here as they are the only ones relevant to shotgun IEF, particularly the Coomassie brilliant blue R250 (CBR250) and the silver staining.

Page 13: Master’s in Proteomics and Bioinformatics Shortcut Shotgun IEF · 2007-06-15 · High-throughput techniques are required for effective simultaneous analysis of multiple protein

- 5 -

1.2.1.1.3.1 Coomassie staining

As all organic dyes, the CBR250 works by reacting with the basic residues of the polypeptide chain such as lysine, arginine or histidine. This binding is possible thanks to the interaction of the dyes sulphonate groups with the side chains of the relevant residues, such as the ε-amino groups of the lysine. As the proteins are folded into their tertiary structure, the staining response will occur with the basic sites at the surface of the protein and will be directly proportionate to the latter [10]. The hydrophobicity also plays a role in the amount of staining, as hydrophobic interactions will occur between the aromatic groups of the stain and the hydrophobic parts of the protein [11]. In extreme cases the staining can be due to an aggregation of the dye on a single basic position at the surface of the protein [12]. Coomassie brilliant blue is considered a relatively sensitive dye. Indeed, its limit of detection is of 50-100 ng but for quantitative purposes it is limited to 20 µg. The advantages of using such a staining method is that it is quite reproducible and shows a relatively good linearity that can be used for relative quantitation given a fairly accurate calibration curve and a good densitometer. But it can in no case be used for absolute protein quantitation, due to the large variability in staining intensities observed for different proteins [13]. Furthermore, CBR250 staining can be coupled to mass spectrometry for protein characterization as it is easy to destain the gel.

1.2.1.1.3.2 Silver staining

The silver staining is considered as a denaturing method even if it is a widely used protein visualization technique. The mechanism embedded under this dye is not fully understood, but is based on the ability that proteins have to bind their carboxylic and sulfonic groups to silver ions, these being reduced to metal. This reduction is visible to the eye as it creates a brownish-black metallic stain at the position of the focused protein on the gel. Furthermore, it is a much more sensitive dye than the CBB, as it is still visible for proteins under a concentration of 1-10 ng when mixed to glutaraldehyde as a sensitizer and cross-linker agent [14] that binds together the amino groups of the proteins. Although it is more sensitive, it produces unidentifiable and mixed peptides [15] and it is incompatible with mass spectrometry. In order to be compatible, the staining must be carried out without glutaraldehyde [16, 17]. Compared to the CBB, the major drawback of the silver staining is that the contrast of the gel image is very low and makes it sometimes difficult to localize some proteins. Furthermore, it is impossible to perform any type of quantitation by colorimetry or densitometry.

1.2.1.1.4 One-dimensional versus two-dimensional SDS-PAGE

As previously stated, the two most commonly used separation techniques are the IEF and the SDS-PAGE, both pulling their strength from the physico-chemical properties of the proteins. Alone, each method is powerful in its own way, but when combined together, thus giving a separation in two dimensions, it is possible to drastically increase the resolving power, such that thousands and not hundreds of proteins can be resolved in a single experiment allowing the major proteins in a sample to be isolated. Hence, protein levels in related samples to be compared can be separated in a single run. By convention, the first dimension is the IEF, which separates the proteins according to their net surface charge and corresponds to the x axis. The second dimension, corresponding to the y axis, is the SDS-PAGE, which separates the proteins according to their MW. We hereby obtain an array of protein spots or map. As a geographical map we can read a 2D gel and determine the x and y coordinates of each protein. Compared to a standard 1D gel, each band of the latter will be spread in the second dimension, showing all the proteins corresponding to each band (Figure 1).

Page 14: Master’s in Proteomics and Bioinformatics Shortcut Shotgun IEF · 2007-06-15 · High-throughput techniques are required for effective simultaneous analysis of multiple protein

- 6 -

Figure 1. 1-DE SDS-PAGE (left) and 2-DE PAGE (right) of Staphylococcus aureus (strain N315) (courtesy of Swiss-2D-PAGE)

The technique is particularly powerful in comparative proteomics when comparing related samples such as healthy tissue versus disease tissue. Comparative 2D-PAGE can also be used for protein expression similarity research under the same set of conditions (these may have related functions) and for protein identification produced in response to drug therapy (these may be responsible for drug-related side effects). In combination with mass spectrometry, the proteins can also be identified. Despite the huge effectiveness of the 2D gels in the analysis of complex biological samples [18], its major drawback remains its cost and the fact that it is rather time-consuming. Another major problem of such a technique is the difficulties to obtain good gel to gel reproducibility for diagnosis purposes.

1.2.1.2 Transblot

For as much as the gel separation of proteins is efficient and powerful, there is a great concern regarding analysis of the proteins after migration, as well as the problem of storage. Indeed, when in a gel matrix, such as agarose or polyacrylamide, the proteins are trapped and have a very small accessibility. Therefore, the interactions with chemical reagents allowing the detection, analysis or modification of proteins are limited. Another problem is that the PAGE gels are fragile and can be broken easily. They can also dry, making the protein extraction very difficult and hazardous, even if it has been demonstrated that proteins could be successfully identified from a gel after 8 years of dried gel storage [19]. A good solution would be to extract the proteins from the gel matrix and transfer them to a solid support. Though many transfer methods exist, there are only a few surfaces capable of capturing. These are called capture membranes and the most widely used are the PolyVinylidene DiFluoride (PVDF) and the nitrocellulose membranes. The capture membranes are made of a porous matrix that allows the transfer buffer ions to sieve through but bind the proteins to the surface, thus capturing them by hydrophobic interactions, ionic or covalent bonds. Three techniques can be used to pursue this transfer: tank electrophoresis, semi-dry electrophoresis and the diffusion process. The two electrophoresis techniques are the most widely used because they are considered an active transfer-process, an electric field permitting the proteins/peptides to migrate from the gel to the capture membrane.

Page 15: Master’s in Proteomics and Bioinformatics Shortcut Shotgun IEF · 2007-06-15 · High-throughput techniques are required for effective simultaneous analysis of multiple protein

- 7 -

Even though, the third method is the one used for the SSIEF. The diffusion process is considered a passive process and is based on the effective diffusion technique used for DNA or RNA blotting [20-22]. The principle is that the gel is laid directly onto the capture membrane, the latter being itself laid on a blotting paper, soaked with transfer buffer. The whole is then placed on some dry absorbing paper to allow migration of the buffer and therefore the blotting of the peptides and proteins which were in the gel by capillarity (Figure 2). This transfer can be either done by upward migration of the buffer or by downward capillarity, taking advantage of the gravitational force alone, to pull the sample out of the gel. This technique usually needs a few hours in order to obtain a satisfying result and a good transfer rate but it can also be helped by putting the whole into a centrifugation machine [23, 24].

Figure 2. Transfer by capillarity. The IPG strip is laid directly onto the capture membrane. The whole is placed onto blotting paper soaked with transfer buffer to allow passive transfer of the peptides using the gravitational force and diffusion.

The capture membrane can be stained, as a gel, in a quick step before further investigation, in order to visualise the transferred proteins and their general pattern. The most widely used dye for membrane staining is the Amido-Black, but it is not as sensitive as the silver staining.

1.2.1.3 Liquid chromatography

In the liquid chromatography (LC) the separation is, as the IEF and the PAGE, based on the proteins physico-chemical properties. Different techniques exist which are based on exploiting a different property of the analytes: ionic charge, MW, hydrophobicity, etc.. The ion exchange [25-27] takes advantage of the ionic charge of the proteins and is particularly good for studying post-translational modifications. The MW of proteins can be used with a size exclusion chromatography [28, 29] and can be also used as a sample desalting step [22] in purification processes. The reverse-phase liquid chromatography (RPLC) [30-34], exploiting the hydrophobicity of each protein or polypeptide, is the most frequently used in laboratories. The idea behind this technique is to separate the proteins and peptides by elution in a gradient of different organic solvent ratios. The amino acids will, depending on their side chains, act on the hydrophobicity of the polypeptide chains. The peptides containing hydrophilic residues (Arg, Trp, Tyr, Lys …) will require low concentrations of organic solvents to be eluted, whereas hydrophobic residues (Ile, Leu, Pro, Val …) will be released with much higher concentrations of organic solvents.

Page 16: Master’s in Proteomics and Bioinformatics Shortcut Shotgun IEF · 2007-06-15 · High-throughput techniques are required for effective simultaneous analysis of multiple protein

- 8 -

As the basic gel separation techniques, RPLC can be used for multidimensional chromatography (MudPIT). The use of several liquid chromatography methods in a serial approach can be used to improve protein separation, such as the use of RPLC as a desalting step after ion exchange LC [30]. Furthermore, the RPLC has the advantage of being compatible with mass spectrometry. Indeed, the eluting solvent does not interfere with MALDI-MS [31] or ESI-MS [25]. So, there is a multitude of different techniques for the separation of proteins which can be used depending on the given amount of sample and its complexity. They all tend to be fully integrated methods.

1.2.2 Mass spectrometry Mass spectrometry (MS) is a very complicated but powerful analysis technique which has had long years to mature and nowadays has surpassed and replaced the Edman sequencing procedure [35]. It is considered an indispensable tool for most proteomics areas. It is used on a daily basis primarily in three major biotechnological fields. First, MS is employed in the identification of proteins. Then, it is used for characterization and quality control of recombinant proteins and other macromolecules. Finally, scientists use such techniques for the characterisation of post-translational modifications. All MS instruments are capable of producing and separating ions according to their mass-to-charge ratio (m/z). Such a separation is possible by generating an electric or magnetic field inside the instrument. The instrumentation consists of three essential components: an ion source, where molecules are ionized a mass analyser, where ions are separated according to their m/z ratio and a detector, where the ions collide at the end of their journey. The MS data is then recorded as numerous spectra, displaying the ion abundance compared to the m/z ratio and processed with a large panel of bioinformatics tools for visualisation, identification and characterisation of proteins. Many different methods were experimented, for all three components, but only a few have been retained. There are two preferred ionisation sources: electrospray ionisation (ESI) and matrix-assisted laser desorption/ionization (MALDI). Although they both produce charged ions in a gaseous phase, which is the basis of mass spectrometry, each source is based on a very different principle. ESI introduces the sample into the mass spectrometer in a liquid phase under atmospheric pressure, producing a constant stream of ions. MALDI ionizes molecules from a solid phase, typically a metal plate under vacuum, by shooting with a laser in a pulsed manner. Although larger proteins/peptides can be analyzed by ESI-MS, the electrospray source is much less tolerant to salts and other detergents. The analyser is the part where the separation takes place, in an electric and/or a magnetic field produced to deflect ions from their original trajectory. Although the geometry of the analysers was continuously improved to increase resolution, all have turned around three principles: separation based on the time-of-flight (TOF) of ions accelerated in a flight tube, separation by an electric field created by four parallel rods: the quadrupole (Q) and the selective separation of ions retained by an electric or magnetic field in an ion trap. As a MALDI source produces short pulses of ions it is preferable coupled to a TOF analyser, whereas an ESI source will be coupled to quadrupole and ion trap mass spectrometry. Furthermore, it is possible to couple analysers together in order to perform tandem mass spectrometry (MS/MS), provided the selected peptides are fragmented before they enter the second analyser.

Page 17: Master’s in Proteomics and Bioinformatics Shortcut Shotgun IEF · 2007-06-15 · High-throughput techniques are required for effective simultaneous analysis of multiple protein

- 9 -

1.2.2.1 Sample preparation for mass spectrometry

1.2.2.1.1 Sample clean-up

The advantage of MS is that it allows analysing any and all components in a specific sample equally well. But this is also a disadvantage as any chemicals present in the sample can be potentially ionized and detected. This means all non-volatile non-biological material present in the sample, such as chemical agents, used to prepare the proteins and peptides for analysis, can produce a signal, thus interfering with the proteins and peptides. It is therefore generally better to process the sample through a clean-up step in order to remove certain reagents. There are three categories of components: salts and buffers are mainly metal salts, such as K+ or Na+, but can also be charged organic molecules, which compete with the peptides and massively suppress the signal. Chaotropic agents, such as urea or guanidine hydrochloride, are much like salts and buffers. Detergents are positively charged zwitterionic and non-ionic reagents, such as CHAPS, Triton X-100, NP-40 or polyethyleneglycols, which can become highly charged in the MS sources and suppress the analyte signal. ESI is very sensitive to all these compounds and sample clean-up by liquid chromatography is considered mandatory. A number of groups simultaneously developed methods for the clean-up of small quantities of peptide mixtures [36-39]. These different approaches all employ reverse-phase material to bind to the peptides, allowing the wash away of salts, buffers and other polar gel-related contaminants or at least the considerable reduction of concentration of the latter.

1.2.2.1.2 Trypsin

The direct identification of uncleaved proteins is limited. Current MS-based proteomic strategies rely primarily on the digestion of gel-separated proteins into peptides. Protein bands are excised from the gel and subjected to reduction, alkylation and enzymatic digestion, using a sequence specific protease such as trypsin. The objective of these steps is to obtain sufficient enzymatic or chemical cleavage to successfully extract peptides from the gel in a form that is directly compatible with MS analysis as the extraction and in-gel digestion of proteins is much more difficult and labour-intensive. Trypsin (EC 3.4.21.4) is considered a highly specific endoproteinase of the serine peptidase family since it predominantly cleaves on the carboxyl group of the arginine and the lysine and the amino group of the adjacent amino acid, even though unspecific cleavages do occur. The cleavage rate is lower when the Lys and Arg residues are next to acidic amino groups and the cleavage does not normally occur when they are followed by proline. Furthermore, trypsin has a few advantages when compared to other proteases as it generates limited autolysis products and produces peptides typically between 500-2500 Da being compatible with MS analysis. Another feature of the trypsin is that it will auto-digest, so the trypsin-derived peptides can be used for internal mass calibration, but they can also suppress ionization or obscure target peptides by overlapping if they are in large excess over the peptides of interest.

1.2.2.2 MALDI-TOF MS

In 1988, Karas and Hillenkamp [40] proposed a new strategy for the analysis of biological molecules with a MW greater than 10 kDa. MALDI-MS analysis is performed on samples that have been mixed with a matrix, resulting in co-crystallisation of the analyte.

Page 18: Master’s in Proteomics and Bioinformatics Shortcut Shotgun IEF · 2007-06-15 · High-throughput techniques are required for effective simultaneous analysis of multiple protein

- 10 -

When the laser is shot onto the sample, gas phase ions of the peptides are produced along with matrix-related ions by the matrix absorption of the energy carried by the laser. Such matrices are advantageous as the analytes are protected from the high-energy source (UV laser). The crystallisation step is crucial in the MS process as the solvent evaporation speed will have a direct influence on the MALDI-MS signal. The compounds most frequently used for the matrix are: α-cyanohydroxycinnamic acid (CHCA) for peptide mixtures below 5 kDA [41], 2,5-dihydroxybenzoic acid (gentisic acid) and Trans-3,5-dimethoxy-4-hydroxycinnamic acid (sinapinic acid) used for protein analysis [42]. Figure 3 shows a schematic of the instrument combining the MALDI source and the TOF analyser. Samples are deposited onto a metal plate, the MALDI target, capable of holding typically between 1 and 192 analyte spots.

Figure 3. Schematic of MALDI-TOF process and instrument. (A) A sample co-crystallized with the matrix is irradiated by a laser beam, leading to sublimation and ionization of peptides. (B) About 100-500 ns after the laser pulse, a strong acceleration field is switched on (delayed extraction), which imparts a fixed kinetic energy to the ions produced by MALDI process. These ions travel down a flight tube and are reflected in a mirror, or reflector, to correct for initial energy differences. The mass-to-charge ratio is related to the time it takes an ion to reach the detector (time-of-flight (TOF)); the lighter ions arrive first. The ions are detected by a detector such as channeltron electron multiplier.

The relevant spots are then irradiated by a laser pulse, thus generating a short burst of ions in a gaseous phase. The ions are then accelerated to a fixed amount of kinetic energy and travel down a flight tube. The small ions have a higher velocity and are recorded on the detector before the larger ions producing the TOF spectrum. The laser shots are produced by hundreds and are averaged to produce the final MALDI spectrum. The typical performance of the MALDI-MS instruments is an accuracy in the range of a few parts per million (ppm). Furthermore, typical concentrations of peptide material needed to be deposited on the target are in the pico- to the femtomole, in order to produce a good signal.

Page 19: Master’s in Proteomics and Bioinformatics Shortcut Shotgun IEF · 2007-06-15 · High-throughput techniques are required for effective simultaneous analysis of multiple protein

- 11 -

1.2.2.3 Tandem mass spectrometry

Although MS technology is very powerful and essential for all proteomics studies, it provides no information concerning the peptide sequence but only the m/z ratio of each peptide. For this reason, the peptides need to be fragmented, in order to be able to reconstitute the sequence. Tandem mass spectrometry, generally termed MS/MS, is the linking of two or more mass analyzers in the same spectrometer. The basic principle of MS/MS is the selection of one m/z at a time of a given ion formed in the ion source, called the precursor ion and the introduction of this precursor into a chamber for fragmentation, usually by collision with a neutral, inert gas, like Argon. This is called collision induced dissociation (CID). Once the fragmentation is done, the fragments go through a second analyzer and the product ions are detected. This is therefore a powerful way of confirming the identity of certain compounds and determining the structure of unknown species [43]. The principle of CID is to deposit energy onto the mass-selected peptide, which is normally stable, in order to induce the breaking of the peptide bonds by particle collision and thus the fragmentation of the peptide into pieces of different lengths. During the collision, kinetic energy from the colliding particles is transferred into internal energy, causing dissociation into fragment ions. The charged species, i.e. the precursor, has generally a high translational energy, whereas the neutral species, i.e. the collision gas, has much lower energy. In order to achieve a good dissociation, the necessary energy can be provided from one single collision at high energy of from multiple collisions at lower energy. High energy fragmentation can be obtained by MALDI-TOF/TOF, whereas low energy fragmentation is obtained by quadrupoles, Q-TOF and ion trap MS. Depending on the display of the analyzers and the collision cell, the tandem mass spectrometry can be done in two different ways: tandem in time and tandem in space. Tandem in time means that the precursor selection, the dissociation and the fragment separation take place in the same space. Such MS/MS can be performed by instruments such as 3D-IT or FT-ICR. Tandem in space means that the precursor selection, the dissociation and the fragment separation, take place in different sections. This can be performed in instruments like the triple-quadrupole (3Q), the Q-TOF, the TOF/TOF or the IT-TOF.

1.2.2.4 Other mass spectrometry techniques

1.2.2.4.1 Electrospray ionization

Electrospray was developed by Fenn and his co-workers in the late eighties and is nowadays the most used technique for proteomics and analytical chemistry [44]. The ESI sources exist with different configurations optimized for different flow rates, giving rise to various techniques, such as microspray [45] or nanospray [46]. Unlike MALDI, in ESI-MS the sample enters the source in a liquid phase. It generally comes from a capillary and is usually in an acidic buffer to favour protonation. For this, a high electric potential is applied to the analyte flow at the exit of the capillary. Meanwhile, a much lower potential above ground is applied to the rest of the interface. An excess of positive charge builds up on the surface of the liquid exiting the capillary due to columbic repulsion. The surface tension is overcome by the latter, as well as by the intensity of the surrounding electric field, so that the surface of the liquid begins to expand. This leads to the formation of a so-called Taylor cone from which, if the voltage is high enough, a fine jet of liquid is produced at the tip of the capillary. The liquid projection is not stable and breaks down into droplets carrying a high charge density on their surface.

Page 20: Master’s in Proteomics and Bioinformatics Shortcut Shotgun IEF · 2007-06-15 · High-throughput techniques are required for effective simultaneous analysis of multiple protein

- 12 -

The droplets undergo multiple cycles of “Coulombic explosions” in order to give at the end unsolvated analyte molecules. Whereas MALDI produces more singly charged ions, ESI is conductive to the formation of multiply charged ones. This is an important advantage of ESI as mass spectrometers measure the m/z ratio, thus making it possible to observe much larger molecules with an instrument having a relatively low mass range. Another advantage over MALDI is that, as it is a continuous ionisation method in a liquid phase and under atmospheric pressure, it can be directly coupled to analyte separation by liquid chromatography or capillary electrophoresis so that molecules can be analyzed as they elute from a column. The only drawback comparing to MALDI is once the sample is gone through the spray it is gone and there is no possibility of additional analysis. The fact of separating analytes in time can greatly increase the dynamic range of analysis and allow direct detection of low abundance components. Therefore, in an LC-MS concept, sample clean-up, separation and concentration can all be achieved in a unique step.

1.2.2.4.2 Quadrupoles and ion traps

As previously stated, the analyser of predilection for a MALDI source is the TOF. For electrospray, as the ion flow is constant, it is most often combined with quadrupole and ion trap MS. The quadrupole ions are separated due to the electrical field created by four parallel rods. The ideal rods have an hyperbolic cross-section, but are often cylindrical for economic reasons. Opposite rods are electrically interconnected. The quadrupole acts as a mass filter where only ions of a certain m/z ratio are allowed to pass. The filtering action is effected by the application of an oscillating electric field between the rods. Inside this field, ions describe complex trajectories and only those with stable trajectories will travel along the quadrupole and reach the detector. As this oscillating field is modified, a sequential ejection of ions one by one is observed. It also can provide both MW and structural analysis of biological macromolecules. For the acquisition of MS/MS spectra in an ion trap, the selection of a single precursor ion is done by ejecting all the ions of a larger and lower m/z. The trapped ion is then energised and made to collide with a neutral gas in the collision cell. Other MS instruments exist such as the three-dimensional ion traps (3D-IT), linear ion traps, fourier-transform ion-cyclotron resonance (FT-ICR) [47] and Orbitrap [48]. The two first are based on the same principle as the ion trap, but the two last ones are based on much more complicated physics.

1.2.3 Protein identification What makes a protein unique is its main amino acid sequence even if small sequence deviations still refer to the same protein. Initially, in order to identify an unknown protein it was necessary to determine its complete sequence, by Edman sequencing for example. Today, as many genomes have been fully sequenced, the identification of proteins is considerably simplified if the protein being analyzed is in a sequence database. In this case the identity is established by simple correlation between experimental data and the sequence database. Furthermore, the increasing speed in genome sequencing associated with powerful gene-detection algorithms, permitting the theoretical prediction of protein sequences, makes it possible to perform high throughput proteomic analysis. In April 2007, 578 complete virtual proteomes where available, divided into 45 eukaryote proteomes and 533 microbial proteomes (eukaryote: http://www.ebi.ac.uk/proteome/ and microbial: http://www.expasy.org/sprot/hamap).

Page 21: Master’s in Proteomics and Bioinformatics Shortcut Shotgun IEF · 2007-06-15 · High-throughput techniques are required for effective simultaneous analysis of multiple protein

- 13 -

Furthermore, the protein-knowledgebase SwissProt contains 264’492 annotated protein entries (Release 52.3 of 17-Apr-2007, http://www.expasy.org/sprot/) while the translated nucleotide sequence database TrEMBL contains 4’269’768 protein sequences (Release 35.3 of 17-Apr-2007, http://www.expasy.org/sprot/). Although the amino acid sequence of each protein is specific, their mass is not. It can shift significantly due to post-translational modifications (PTM). However, if a protein is treated with an enzyme that will cleave it at specific predefined sites, such as trypsin, then the mass of the resulting peptides will be highly specific, even though a few will be influenced by PTMs. Therefore, as the mass is involved, mass spectrometry is the method of choice to measure the mass of these peptides as it is fast and can provide precise values. The level of precision will then depend on the instrumentation used. Two types of MS data have been used for protein identification by correlation with sequence databases:

• The accurate mass of peptides (within 5-10 ppm resolution) derived by specific cleavage of the isolated protein

• CID spectra from individual peptides isolated after proteolysis of the target protein

Whereas the peptide masses are used for protein identification by peptide mass fingerprinting (PMF), CID spectra are used for MS/MS protein identification methods. Furthermore, if the sequence of the protein of interest is not contained in a sequence database it has to be determined by de novo sequencing, which is slower, less sensitive and requires more operator input.

1.2.3.1 Peptide mass fingerprinting

Along with the creation and development of mass spectrometry, allowing the accurate measurement of peptide masses, a number of groups [49-52] independently described algorithms for the correlation of the collection of peptide masses generated from the digestion of a pure protein with sequence databases. This technique now uses quite sophisticated algorithms and is referred to as PMF. It is simply described as a method to identify proteins within a sequence database using an algorithm to match a set of peptide masses from the protein of interest generated experimentally. It uses specific cleavage reagents, either chemical or enzymatic, with theoretical peptide masses calculated from each sequence entry in the database. This math is performed using in silico (e.g. virtual, by computer) cleavage, as if the database sequences had been cleaved with the same specificity as the reagent used in the experiment. A ranking score is then calculated to provide a measure of the fit between the observed and the expected peptide mass. A great number of search programs and algorithms can be found on the internet and links to most of them can be found on the Matrix Science web page (http://www.matrixscience.com/links.html). The analysis of the results of PMF experiments may not always be straightforward. In most cases, there are certain input peptide masses that will not match with the expected peptide masses of the highest ranking result. Mass accuracy and cleavage specificity by the endopeptidase are major requirements for PMF. The most commonly used peptidase for peptide-mass searching is the trypsin because it is highly specific (see section 1.2.2.1.2).

Page 22: Master’s in Proteomics and Bioinformatics Shortcut Shotgun IEF · 2007-06-15 · High-throughput techniques are required for effective simultaneous analysis of multiple protein

- 14 -

However, like most proteolytic enzymes it happens that the trypsin cleaves at other sites or not quantitatively. While so-called “missed-cleavage” sites are often included as a parameter for PMF search programs, non-specific cleavages are more difficult to predict and to accommodate in the search algorithms. In addition, PTM and artefactual modifications can greatly complicate the MS spectra and therefore the protein identification.

1.2.3.2 De novo sequencing

Although it is an efficient technique in most cases, PMF characterizes each peptide by only one attribute, the mass value and it often is not sufficient for unequivocal protein identification. A single mass value does not reveal much about the protein or peptide sequence. However, other protein attributes such as ions from internal sequences of the peptides obtained from successive MS fragmentation may give better hints for protein identification. If a sample is analyzed by MS/MS and yields multiple good quality spectra but these still fail to match any database entry using the PMF, de novo sequencing can be used. Algorithms similar to those of PMF are also used for MS/MS ion search. All proteins contained in a database are digested in silico to find the matching parent peaks. These theoretical parent peaks are then fragmented in silico and compared to experimental patterns [53]. Correlation of the theoretical and experimental fragments determines the discrimination score.

1.2.3.3 LC-MS/MS

Liquid chromatography coupled to tandem mass spectrometry, termed LC-MS/MS, is a powerful method for analyzing proteins and peptides. This technique associates the efficient separation of biological materials of LC with the sensitive identification of the individual components by tandem MS. Therefore, complicated mixtures containing thousands of proteins can be analyzed directly, despite the high difference of magnitude between the concentrations of the analytes present in the sample. LC-MS/MS can also be itself coupled to 1-D or 2D electrophoresis, co-immunoprecipitation or other purification techniques for better results. Although LC is better coupled to an ESI source, it is also possible to deposit the sample on a target plate for MADLI analysis. In typical LC-MS/MS experiments, the analyte is eluted from a reverse-phase column in order to separate the peptides by hydrophobicity. It is then ionized and transferred with high efficiency into the mass spectrometer for analysis. A large amount of data regarding each individual species in a complex sample can be generated. The ion current for each scan can be summed and plotted as a function of time. The relevant display will then give a total ion current (TIC). The TIC can be seen as the sum of the separate ion currents carried by the different ions contributing to the spectrum. Any individual peptide can be sequenced without further purification by simply isolating the eluted peptide, fragmenting it by CID and obtaining the MS/MS spectrum. So, a large amount of peptides can be sequenced in a unique MS/MS run. Most of the times, the mass spectrometer is programmed to perform a single scan to determine the peptide masses and then to chose in a data-dependent manner the three to ten most intense peptides and create a precursor list for further fragmentation.

1.2.3.4 Bioinformatics

1.2.3.4.1 Data analysis

Each time an MS run is processed, it produces a raw MS spectra. These raw data can be “cleaned” using certain algorithms in a pre-processing effort.

Page 23: Master’s in Proteomics and Bioinformatics Shortcut Shotgun IEF · 2007-06-15 · High-throughput techniques are required for effective simultaneous analysis of multiple protein

- 15 -

A spectrum is composed of three parts: the baseline, the noise and the actual peptide signal represented by peaks. In order to get the highest peptide signal, it is important to reduce the baseline and the background noise or to increase the signal-to-noise ratio (S/N). Once the spectrum is processed, it is necessary to produce a peak list that will be used for identification purposes. A few corrections are possible in order to facilitate and improve peak detection. Peak centroiding is the means of determining the main mass and intensity value for monoisotopic peaks and reducing correlated values to one [54]. The peak with the lowest mass is frequently referred to as the monoisotopic mass. This is an important step, as the failure to detect a relevant peak can hinder the correct identification of a protein. However, if the sensitivity is too high and too many false peaks are taken into consideration, this may lead to bad results as erroneous and redundant database matches may cause false identifications. As databases grow bigger and bigger it will also cause an increase in the search duration. All instruments come with their own software, but in some cases it is necessary to rewrite them in order to integrate them into automated high throughput identification systems. Another important factor in the correct identification of proteins is the choice of the database against which the data will be compared. It will vary with the needs and the origin of the sample. There are many different data banks and therefore in 2002 a consortium where EBI, SIB and PIR Protein Sequence Database united their efforts to create a unique universal protein database, providing comprehensive, fully classified, richly and accurately annotated protein sequence information, with extensive cross-references and query interfaces [55]: the UniProt Knowledgebase. It is the fusion of Swiss-Prot and TrEMBL and it can be found on http://www.expasy.uniprot.org/ . The main advantage of Swiss-Prot over the other databases is that it is manually curated for most of its sequences, has a minimum redundancy, and is almost daily updated. TrEMBL is a computer-annotated supplement of Swiss-Prot that contains all the translations of EMBL nucleotide sequence entries.

1.2.3.4.3 Proteomics tools

As stated in the section 1.2.3.1, PMF is based on the comparison of theoretically expected peptide masses with the experimental set of peptide masses. This comparison can be done by submitting a peak list to softwares that will do the theoretical digestion and compare the two sets of data. The proteins can then be ranked, for example, according to the number of matched peptides or to the percentage of coverage between the protein sequences. More sophisticated scoring algorithms also exist that include more complex parameters, such as the peptide length or the mass accuracy. Other softwares can be used that integrate MS/MS data for the identification, exploiting the peptide sequence instead of the mass, as in PMF. Two of these have been used for this study:

Mascot (http://www.matrixscience.com/) [56] This multiple alignment system is a powerful engine that uses a three-way alignment algorithm for protein sequence determination. Many MS instruments use softwares that export the data in a format compatible to Mascot [57]. This program can be used for PMF purposes or identification from MS/MS data.

Page 24: Master’s in Proteomics and Bioinformatics Shortcut Shotgun IEF · 2007-06-15 · High-throughput techniques are required for effective simultaneous analysis of multiple protein

- 16 -

Phenyx (http://www.phenyx-ms.com/) This software platform was developed by GeneBio in collaboration with the Swiss Institute of Bioinformatics (SIB). It was designed for high-throughput MS data analysis and dynamic results assessment for the identification and characterization of proteins and peptides, incorporating a probabilistic and flexible scoring system based on OLAV algorithm [58].

Furthermore, dozens of tools have been developed to be able to do data analysis on a protein or a peptide sequence, such as identification and characterization with PMF or MS/MS data as well as similarity searches, pattern and profile searches, post-translational modification prediction, topology prediction, primary structure analysis, secondary structure prediction, tertiary structure treatment, sequence alignment, phylogenetic analysis, etc.. An exhaustive list can be found on http://www.expasy.org/tools/. Here are the ones that had a use in the present study:

Compute pI/Mw (http://www.expasy.org/tools/pi_tool.html) [59] This program allows calculating the theoretical MW and pI of peptide sequences. This program is based on a linear distribution of pI and does not take into account the adjacent amino acid effects. Stephenson et al. developed a new algorithm taking into account the effect of adjacent amino acids ± 3 residues away from a charged aspartic or glutamic acid as well as effects on the C-terminus and applies a correction factor to the pK values of the charged amino acids [60].

InSilicoSpectro (http://www.insilicospectro.vital-it.ch/) [61] This is an open-source Perl program that can be used for protein sequence digestion and theoretical peptide and fragment mass determination. It can also be used for pI estimation and peptide retention time prediction or mass list file format conversions.

Once concluding results have been obtained regardless of the method, their validation is crucial. Indeed, identifications have to be thoroughly checked for consistency and then confronted to other similar results obtained by different methods and databases. A very effective way to deal with great amounts of data in validation processes is the use of a laboratory information management system (LIMS) [62-64]. Such a management system is very useful as it provides a relatively user-friendly application for central data storage and validation facilities, sample tracking and reporting. Furthermore, the LIMS can be linked to visualization and data mining utilities.

1.3 Shotgun IEF The identification of proteins from complex biological samples is, in most actual proteomics pipelines, rather time-consuming and labour-intensive. Most workflows are composed of three steps: first comes the separation of the proteins present in the sample. Then the data are acquired by mass spectrometry. In the last step, identification is attempted using specific properties inherent to the peptides for matching purposes with theoretical attributes obtained from protein sequence databases. Many efforts have been made to reduce the complexity of the workflows and to automate the identification of the proteins present. Shotgun proteomics takes advantage of multidimensional HPLC coupled with MS/MS techniques to allow the automated collection of huge quantities of data. Furthermore, the acquisitions of these enormous datasets lead to the development of powerful bioinformatics tools capable of quickly and effectively handling the ever growing flow of shotgun proteomics data [65, 66].

Page 25: Master’s in Proteomics and Bioinformatics Shortcut Shotgun IEF · 2007-06-15 · High-throughput techniques are required for effective simultaneous analysis of multiple protein

- 17 -

Unfortunately, a major drawback of a shotgun approach is that, as proteins are first digested and only the peptides are analyzed later, the correspondence between protein and peptide is lost, thus making the interpretation of the results much more complicated and challenging. Indeed, a frequent problem is that a peptide may come from different proteins present in the sample or it can be the other way around: peptides from a single protein may be in different fractions leading to ambiguities during data analysis. In an attempt to automate and to reduce the time frame, various robotisation processes have been developed. For instance, Traini et al. reported the development of a prototype robotic system used to image and excise a few hundred protein spots from a stained PVDF membrane. Then protein samples were enzymatically digested with a commercial handling system. Automatic acquisition of mass spectra was effected by MALDI-TOF-MS and automatic data analysis was performed with a database interrogation software [67]. Other approaches were thought of to reduce the sample handling as well as to reduce the sample size in MALDI-TOF-MS applicaitons. Ogorzalek Loo and his team tried to measure protein masses directly from thin-layer IEF gels [68]. McComb and co-workers performed mass spectrometry on samples that had been deposited onto non-porous polyurethane membranes [69]. The advantage of using such supports for MS purposes is that it is possible to acquire spectra separated only by a distance in the micrometer range (µm) by shooting directly onto the membrane with the MALDI-TOF laser. With such acquisitions it is possible to scan the whole surface longitudinally and thus create an image by concatenating each spectrum and observing the intensities of each MS signal, thus localizing peptides. In 1991, Hochstrasser et al. came up with a new technique called the Molecular Scanner [70]. In this concept, the time consuming step of spot excision from the gel is bypassed and the delicate sample handling is avoided as samples are transferred electrophoretically through a tryptic membrane, coupling the digestion and the transfer steps. This concept was further developed by Binz et al. as a way to automate proteomic research and to display proteome images [71]. It wasn’t before 2004 that Cargile and his team, as part of Stephenson’s group, suggested using immobilised pH gradient (IPG) IEF as a first dimension in shotgun proteomics [2]. These were the beginnings of the shotgun IEF. In addition to this, they developed accurate pI prediction tools [60] to filter the acquired data for more accurate peptide/protein identification. Many efforts have been made to increase the throughput of the shotgun IEF pipeline, but steps such as the digestion, the fractionation and the LC are delicate and time-consuming.

1.3.1 Concepts The shotgun IEF methodology is based on separation of the peptides by IEF prior to the usual step of liquid chromatographic separations of the classical shotgun approaches. In the integrated shotgun IEF workflow (Figure 4) proteins are digested with trypsin and purified before being loaded onto the IPG strips by overnight rehydration [8]. The first separation dimension, i.e. focusing, takes place in an IPGphor focusing unit (GE healthcare). As soon as the focusing is achieved, the strips are cut into a certain number of fractions and each fraction is placed in a separate Eppendorf. The peptides are then extracted from the polyacrylamide gel by successive washes in acidic solvents. Once all peptides are extracted they are separated in a second dimension by RPLC. After LC, each fraction is eluted directly onto a MALDI target using a home-made spotting robot, linked to the autosampler of the LC. The next step is tandem mass spectrometry using a 4700 Applied Biosystems MALDI TOF/TOF instrument.

Page 26: Master’s in Proteomics and Bioinformatics Shortcut Shotgun IEF · 2007-06-15 · High-throughput techniques are required for effective simultaneous analysis of multiple protein

- 18 -

Once the data acquisition is achieved, the MS/MS spectra are submitted to powerful bioinformatics tools such as Phenyx or Mascot (see section 1.2.3.3.3) for protein identification and data analysis. However, In case of performing purification on extracted samples, an alternative to MALDI-MS/MS would be to use ESI-MS/MS connected online with the LC for greater automation.

Figure 4. The shotgun IEF workflow. Proteins are first digested into peptides before being focused by IEF on IPG strips. The strips are then cut into tens of fractions and peptides are extracted. Peptides are then eluted directly onto a MALDI target by RPLC with an LC spotting robot. Samples are scanned in a 4700 MALDI-TOF/TOF. The acquired data is then analyzed and proteins identified.

Shotgun IEF can be separated into five distinct parts:

• Sample preparation to digest proteins and purify peptides • IEF, where the peptides are separated according to their pI • Fractionation of the IPG strip and extraction of the peptides from the gel • LC-MS/MS • Protein identification and data analysis

1.3.2 Shortcut shotgun IEF The separation by RPLC is a long and meticulous process, as each fraction has to be individually eluted onto the MALDI target. The robotized matrix deposition also remains a delicate step. Furthermore, the fractionation is inaccurate, as the fraction size is not dependent of the peptide distribution. A representation of such distribution would help to get a better repartition of the peptides in each fraction, simplifying the LC and the identification steps. Such an imaging is possible by MS imaging (see section 1.3.3). The SSIEF is a variation of the regular shotgun IEF workflow. After IEF on the IPG strips, the peptides are transferred to a solid porous support such as a PVDF membrane, using a passive transfer method (see section 1.2.1.2). Once the transfer is finished, the membranes are dried and attached onto a MALDI target.

Page 27: Master’s in Proteomics and Bioinformatics Shortcut Shotgun IEF · 2007-06-15 · High-throughput techniques are required for effective simultaneous analysis of multiple protein

- 19 -

CHCA matrix is then deposited onto the surface by a robot before the target is placed into a MADLI instrument. The data is processed with bioinformatics tools in order to concatenate the spectra and create a mass spectrometry image (MSI). Such an image allows visualisation of peptide distribution along the membrane. It is then possible to determine precisely where the membrane should be cut in order to have a relatively equal amount of peptides per fraction. Once the fraction size is manually determined and the membrane is cut in accordance, the peptides are extracted from the membrane and the regular shotgun IEF workflow is resumed. Another alternative would be to perform direct MS/MS on the membranes. This would solve the problem of fractionation and drastically increase the throughput, as it helps avoiding the time-consuming RPLC step. Once the peptides are transferred onto a PVDF membrane the target is introduced into a TOF/TOF instrument.

1.3.3 MS imaging and quantitation The visualisation of the molecules in a sample can greatly influence in the research for treatments against diseases. It is a very important step in a workflow involving multidimensional data analysis as the latter can be analyzed visually, which can reveal important features otherwise ignored by detection softwares [72]. Therefore, molecular imaging processes are becoming more and more important each day. Such techniques include magnetic resonance imaging (MRI), nuclear magnetic resonance (NMR) and X-Ray crystallography. These methods are specially used for structural imaging, but a number of emerging technologies can gain a better insight on the analyte under investigation for the analysis of the distribution of the latter under specific conditions rather than its 3D structure. Indeed, molecular imaging can provide essential information for biomedical research such as the follow-up of the modulation of a target during treatment, determination of active sites on target drugs and their activity, as well as the distribution of the drug and the metabolites before and after treatment. It can also help in the validation of interactomic data. Many techniques involve tagging of the target such as isotope-coded affinity tagging (ICAT™) [73] and fluorescence imaging such as near-infrared fluorescence [74] (NIRF) or fluorescence resonance energy transfer (FRET) [75, 76]. Other methods involve chemiluminescence or immunohistochemistry [77]. These provide promising new possibilities in the analysis of conformational changes in composition and structure in model organisms and in humans. The major drawback of such techniques, despite being powerful methods allowing in vivo imaging, is that they require the use of specific reporter molecules interacting with the analyte of interest and revealing their presence/absence. The application of such molecules in biomedical research is unfortunately limited as their use is highly invasive, expensive and time consuming. Furthermore, they can only be applied to known proteins and do not permit the imaging of unknown molecules. In the late 90’s, Caprioli and co-workers developed the MSI techniques [77, 78]. Such methods can solve many of the inherent problems to molecular imaging such as not having to know the content of the sample prior to analysis. Furthermore, MSI can surpass immunohistochemistry in terms of sensitivity [77] and is therefore developing rapidly [79-82] The principle is to shoot with the MALDI laser directly onto a solid surface containing the analyte of interest with an interval of about 250 nm between each shoot and concatenating the spectra together to create an image of the sample with the TIC of each spectrum. For such acquisitions, the MALDI-MS instruments have to be equipped with softwares capable of creating and storing the data for such images. Furthermore, once the raw data is obtained, it has to be processed using bioinformatics tools to obtain an image that can be visualised and analyzed.

Page 28: Master’s in Proteomics and Bioinformatics Shortcut Shotgun IEF · 2007-06-15 · High-throughput techniques are required for effective simultaneous analysis of multiple protein

- 20 -

Images obtained from high-throughput MS contain information that remains hidden when looking at a single spectrum at a time, but can be revealed by MSI. The SIB developed a program that is capable of such tasks:

MSight (http://www.expasy.org/MSight/) [83] Created in 2005 by the Proteome Informatics Group (PIG) in the SIB, MSight was initially developed for the image processing of LC-MS datasets for ESI-MS. Now it can read MS and MS/MS files generated by the majority of mass spectrometers and display them in a user-friendly interface, allowing an easy navigation through large amounts of data. Furthermore, it permits the comparison of several sets of data for differential analysis purposes.

It is also possible to create an image of the analyte with MSight and compare it with a theoretical image, obtained by in silico digestion and calculation of the theoretical pI and MW. This can help to match peptides for the adjustment of the images for comparison purposes, as the images of two samples will not be exactly superposed. Another possibility is to use the matching option of MSight, provided the same peptides are identifiable on each image. Furthermore, in vivo imaging is becoming of increasing importance for the development, optimization and modelisation of new drugs. It is possible to quantify their activity and effect by acquiring multi-parametric or dynamic imaging data. Novartis developed a powerful tool for the analysis of such complex data:

BioMap: (http://www.maldi-msi.org/) This image analysis platform allows efficient and flexible modelling and quantification of imaging data in pharmacological research and development. It provides a common visualization and storage platform, which can be used for visualization of data, based on multi-planar reconstruction allowing extraction of arbitrary slices from a 3D-volume.

For SSIEF, BioMap can be used to visualise the distribution of a defined peptide by creating a total ion image of the whole membrane. This can provide useful information on the diffusion of the peptides during the transfer and differential comparison between different samples.

Page 29: Master’s in Proteomics and Bioinformatics Shortcut Shotgun IEF · 2007-06-15 · High-throughput techniques are required for effective simultaneous analysis of multiple protein

- 21 -

2. MATERIALS AND METHODS

2.1 Reagents and chemicals All the reagents are of standard quality, except for the acetonitrile (Fluka), which has HPLC quality. Water was purified by the Millipore’s MilliQ system or LiChrosolv® water was used (Merck). Products were purchased from the following companies: Applied Biosystems (Framingham, MA, USA) Merck (Darmstadt, Germany)

Applied Microbiology Inc (Tarrytown, MA, USA) Millipore (Bedford, MA, USA)

BioRad (Hercules, CA, USA) Schleicher & Schuell (Dassel, Germany)

Difco (Detroit, MI, USA) Sigma Aldrich (St. Louis, MO, USA)

Fluka (Buch, Switzerland) Waters (Milford, MA, USA)

GE Healthcare (Piscataway, NJ, USA))

Bovine serum albumin (BSA), porcine trypsin, bovine carbonic anhydrase, bovine β-casein, bovine β-lactoglobulin, rabbit phosphorylase b, α-cyano-4-hydroxycinnamic acid (CHCA), acetonitrile (AcN), formaldehyde (37%), trifluoroacetic acid (TFA), 1,4-dithioerythritol (DTE), ammonium bicarbonate (BA), iodoacetamide and Tris were purchased from Sigma-Aldrich. SDS-PAGE precast gels 4-20% Tris-HCl, ampholines (3.5-7 and 3-10), Sequi-BlotTM 0.2 µm pore size PVDF membranes and molecular mass markers came from BioRad. Ethanol, formic acid (FA), high boiling-point petroleum ether, acetic acid, glycine and SDS came from Fluka. Chloroform, methanol, saccharose and urea were provided by Merck. ImmobilineTM DryStrips and PlusOne DryStrip Cover Fluid paraffin oil were purchased from GE Healthcare. Mueller Hinton broth came from Difco and hydrolytic enzyme lysostaphin (Ambicin) was purchased from Applied Microbiology Inc.

2.2 Sample preparation

2.2.1 Growth conditions and time point

To obtain protein extracts, S. aureus strain N315 [84] was grown in Mueller Hinton broth (MHB; 200 ml in 1000-ml flask) with agitation at 37oC, as previously described [85]. When the post-exponential phase was reached (OD540nm=6 corresponding to 2-3 x 109 cells/ml), cells were chilled on ice and harvested by centrifugation at 8’000 x g for 5 minutes at 4oC. For the preparation of total protein extracts, 20 ml culture aliquots were washed in 1.1 M saccharose-containing buffer [86] and then suspended in 2 ml aliquots of the same buffer containing 50 µg/ml of the hydrolytic enzyme lysostaphin for 10 minutes at 37oC. For preparation of membrane extracts, protoplasts were recovered after centrifugation (30 minutes at 8’000 x g) and hypo-osmotic shock was applied in the presence of 10 µg/ml DNase I (Fluka)to decrease the viscosity of the medium. Membrane pellets were obtained after ultracentrifugation at 110’000 x g for 50 minutes in a Beckman Optima TLX (Beckman Coulter Int’l S.A., Nyon, Switzerland).

Page 30: Master’s in Proteomics and Bioinformatics Shortcut Shotgun IEF · 2007-06-15 · High-throughput techniques are required for effective simultaneous analysis of multiple protein

- 22 -

2.2.2 Chloroform precipitation To delipidise the protein extracts, samples were evaporated by speed-vac and resolubilized in 100 µl 50mM BA pH 8.5 per mg of crude protein extract. 1 ml of a chloroform/methanol (2:1, v:v) solvent was added, thoroughly vortexed and placed on ice, before being centrifuged at 4oC for 15 minutes at 14’000 rpm. The lower phase containing the CHCl3 was carefully extracted and the supernatant was resuspended in 300 µl of MeOH, thoroughly vortexed and centrifuged at 4oC for 20 minutes at 14’000 rpm. The supernatant was then extracted and the pellet placed in the speed-vac to discard the remaining MeOH. Then the proteins were resuspended in 300 µl BA 50 mM and 2 µl were diluted in 8 µl H2O for the SDS-PAGE control gel.

2.2.3 Digestion protocol Reduction, alkylation and digestion took place in a domestic microwave oven (FUNAI, Hamburg, Germany) with a maximum output power of 850 W and a frequency of 50 Hz but the oven was set on reduced power (~175 W) for all steps. The samples were placed in 1.5 ml Eppendorf tubes in a home made holder placed in a beaker containing 500 ml of water at 25oC and the irradiation was done during 6 minutes each time, which resulted in a gradient of temperature from 25 to ~55°C. For a mg of proteins, the reduction was done by adding 100 µl of DTE 45 mM and the alkylation by addition of 120 µl of iodoacetamide 100 mM. After the alkylation the samples were set on ice to cool down for better digestion. The trypsin enzyme was added for digestion at a protease-to-protein ratio of 1:10. In the case of proteins that are difficult to digest, such as membrane proteins, a double digestion was done, with a ratio of 1:~16 the first time and of 1:25 for the second. 7 µl of the peptides were taken and diluted in 3 µl H2O for the control gel.

2.2.4 Purification After digestion peptides were concentrated and desalted using an Oasis HLB 1 cc 10 mg solid-phase extraction cartridge (Waters). 500 µl of 0.1 % FA were added to the sample and the pH was verified with pH paper (pH 0-14, Merck). If the pH was not around 2-3, 1 to 5 µl of pure FA were added. The column was first equilibrated with 1 ml of 0.1 % TFA 60 % AcN and then equilibrated with 1 ml 0.1 % FA. The sample was passed slowly, washed with 1 ml 0.1 % FA and eluted in 700 µl 0.1 % TFA 60 % AcN. The samples are then evaporated to dryness using a speedvac and then resuspended in 50 µl H2O and re-evaporated to ensure all FA was discarded.

2.3 Electrophoresis

2.3.1 IPG-IEF After purification, samples were re-suspended in 100 µl of rehydration buffer. ImmobilineTM DryStrip 7 cm, pH 3-10 NL or 7 cm, pH 4-7 7 cm strips (GE Healthcare) were rehydrated overnight using the Reswelling Tray (GE Healthcare). Isoelectric focusing was performed on an Ettan IPGphor II system (GE Healthcare). Paper wicks (GE Healthcare) soaked in 145 µl H2O were used for the connection between the strips and the electrodes and the whole was covered in 100 ml of DryStrip Cover Fluid paraffin oil (GE Healthcare). The focusing was done with the following conditions: 5 minutes step at 100 V, 30 minutes linear gradient from 100 V to 500 V, 30 minutes linear gradient from 500 V to 1000 V, 30 minutes step at 1000 V, 30 minutes linear gradient from 1000 V to 5000 V and step at 5000 V up to a total of 9 kVh.

Page 31: Master’s in Proteomics and Bioinformatics Shortcut Shotgun IEF · 2007-06-15 · High-throughput techniques are required for effective simultaneous analysis of multiple protein

- 23 -

The temperature was set at 15oC and the current to 60 µA per strip.

Rehydration Buffer: 4M Urea (Merck) 0.2-0.5 % Pharmalyte 3.5-10 or 4-7 (Amersham) 100 µl bromophenol blue (Fluka) LiChrosolv® H2O to 10 ml

2.3.2 SDS The proteins and peptides were solubilised in 10 µl of Laemmli buffer and reduced by heating at 95oC for 5 minutes. A volume of 20 µl was loaded in each well of a precast 4-20 % Tris-HCl gradient gel (BioRad). All SDS-PAGE gels were done on a Miniprotean II BioRad SDS-PAGE System and the separation took place at a constant voltage of 200 V for about 30 minutes in 1l of running buffer. Once the run was terminated, MS-compatible silver staining was done as described by Allard et al. [87].

Laemmli Buffer:

2% SDS (Fluka) 0.025% bromophenol blue 10% Glycerol (Merck) Trizma Base 50mM , pH 6.8 (Sigma) 0.5%v/v β-mercaptoethanol (Millipore) MilliQ H 2O to 100ml

Running Buffer: Trizma Base 100mM

Glycine 100mM (Fluka) SDS 1% (v/v) MilliQ H 2O to 1 l

2.4 Transblot Transfer by capillarity was performed either by centrifugation or by capillarity. In both cases the same transfer buffer was used. Filter papers (Schleicher & Schuell) and 0.2 µm PVDF membranes (BioRad) were used.

Transfer Buffer: (10X)

Trizma Base pH 8.3, 125mM Glycine 960mM SDS 0.1%

2.4.1 Centrifugation For each IPG strip, 2 filter papers of 7 x 1 cm were cut and soaked in 30 ml transfer buffer for 10 minutes and then thoroughly blotted. A 7 x 1 cm PVDF membrane was first soaked in methanol for 10 minutes and then was rehydrated by immersion in the transfer buffer. As soon as the focusing was finished, the two blotted papers were put into a home-made sarcophagus (Figure 5), the PVDF membrane was placed on top of them and the IPG strip was placed face-down onto the PVDF membrane. The sarcophagus was hermetically closed with its lid and placed onto a 96 well plate holder of the Sorvall Heraeus Megafuge 1.0 R centrifuge machine (Thermo Fisher Scientific, Waltham, MA, USA). The machine was set to 2700 x g for 99 minutes at 10oC.

Page 32: Master’s in Proteomics and Bioinformatics Shortcut Shotgun IEF · 2007-06-15 · High-throughput techniques are required for effective simultaneous analysis of multiple protein

- 24 -

Figure 5. The sarcophagus developed with the help of the EIG. View from (A) the top, (B) the side and (C) the lid.

2.4.2 Capillarity The capillary transfer was done as shown in Figure 2. Three filter papers of 10 x 8 cm were cut and one was soaked in the transfer buffer for 10 minutes and then thoroughly blotted. For each IPG strip, an 8 x 1 cm PVDF membrane was first soaked in methanol for 10 minutes and then rehydrated by total immersion in the transfer buffer. Two 100 x 50 cm pieces of commercially available cellophane film were cut and placed flat in a cross shape. A 20 x 20 cm glass plate with the 2 dry filter papers in the middle was placed in the centre of the cross. As soon as the focusing was finished, the soaked filter paper was placed on top of the two dry ones, the PVDF membrane was placed onto the filter papers (up to 4 membranes per paper) and the IPG strip was placed in the centre of the latter. The sandwich was closed by carefully placing a second 20 x 20 cm glass plate on top and the whole was made hermetic by folding each arm of the cross. A 1.25 kg weight was very carefully placed on top in the middle and the transfer was done during 120 minutes at room temperature. The pressure on the strips was about 16 grams per cm2.

2.5 Peptide extraction and purification The protocol for the extraction of peptides was slightly different depending on the nature of the matrix.

2.5.1 From an IPG strip Once the focusing was finished, the strip was washed 3 times 20 seconds in high boiling-point petroleum ether to remove the paraffin oil. During the fractionation each fraction was put into 0.5 ml Eppendorfs containing already 100 µl of 0.1% TFA and put onto the agitator for 30 minutes after having been vortexed. The 100 µl were removed and placed in a clean Eppendorf. This was repeated twice for 20 minutes and the final volume, i.e. 300 µl, was frozen. The peptides were then purified on Oasis 96-Well µElution Plate (Waters). The plate was washed and equilibrated with 200 µl of 0.1% FA 60% AcN and then with 200 µl of 0.1% FA. Samples were slowly passed, washed with 200 µl of 0.1% FA 5% AcN and eluted in 2 x 50 µl of 0.1% FA 60% AcN.

Page 33: Master’s in Proteomics and Bioinformatics Shortcut Shotgun IEF · 2007-06-15 · High-throughput techniques are required for effective simultaneous analysis of multiple protein

- 25 -

2.5.2 From a PVDF membrane At the end of the transfer, the membranes are scanned on a Voyager DE-STR MALDI-TOF (Applied Biosystems) and the fractionation of the membrane was done in regards to the obtained image. Each fraction was put into 0.5 ml Eppendorfs containing 100 µl of 0.1% TFA 50% AcN and put onto the agitator for 20 minutes after having been vortexed during 5 minutes. The 100 µl were removed and placed in a clean Eppendorf. This was repeated twice for 20 minutes, only the second time the extraction was done without AcN. The final volume, i.e. 300 µl, was evaporated and the peptides were resuspended in 300 µl 0.1% TFA. The peptides were then purified on Oasis 96-Well µElution Plate (Waters) using the same protocol as for the IPG strip fractions described above.

2.6 Mass spectrometry

2.6.1 MS imaging After transfer, the capture membrane was first briefly rinsed in pure water and then applied to a MADLI target containing no wells (modified by Applied Biosystems) with double sided tape (3M). The MALDI-TOF matrix was applied with a home-made spotting robot. Mass spectra were acquired on a voyager DE-STR MALDI-TOF mass spectrometer (Applied Biosystems) equipped with a 337 nm UV nitrogen laser, a delayed extraction device and an acquisition rate of 20 Hz. The acquisition was performed with an acceleration voltage of 20 kV, a grid of 63% and a delay extraction time of 180 nanoseconds. The mass range was defined from 800 to 3000 Da with a low-mass gate fixed at 800 Da. A blank target was selected as the plate file on the Voyager 4.3 acquisition software. The exact position of the membranes on the plate was determined by defining the margins of each one. A spot set was created that defines the area to be scanned. In order to obtain a good representation of the repartition of the peptides without too much data and as the diameter of the laser on the membrane was about 50 µm, each acquisition was spaced by 250 µm. For each spot, 100 spectra were accumulated. In the “Automated” section of the method set-up, the spot file was selected and the number of spectra to be acquired was equalled to the number of points in the spot set file and the “Save All Spectra” option was selected, saving all the spectra from points defined in the spot file into a unique “.dat” file. Once the data was collected, they were transferred to MSight (SIB, Switzerland) for visualization.

MALDI-TOF matrix: 10 mg/ml of CHCA in 70% MeOH-1% TFA 10 mM NH4H2PO4

For direct MS/MS imaging on the PVDF membranes, a thin gold layer was sprayed onto the membrane by using a SCD 040 Sputter Coater (Balzers Union AG, Balzers, Lichtenstein) under a pressure of 0.05 mbar Argon, until reaching a layer of about 5-10 nm thickness. Special care was taken that the surface of the membrane was electrically connected with the MALDI target. For this, a thicker layer (about 15-20 nm) was deposited from the borders of the membranes over the tape and the plate. The remaining surface was covered during this procedure. The MALDI target was introduced into the 4700 MALDI-TOF/TOF (Applied Biosystems) and a template was created in regards to the position of the membranes on the plate. After tuning MS spectra were acquired each 250 microns. Each MS spectrum was individually visualized and the precursor masses were manually selected at random positions for MS/MS.

Page 34: Master’s in Proteomics and Bioinformatics Shortcut Shotgun IEF · 2007-06-15 · High-throughput techniques are required for effective simultaneous analysis of multiple protein

- 26 -

After the MS/MS analysis, a peak list was created for each fraction with the 4700 explorer 2.0 peak-to-Mascot embedded software with these settings: peptide mass range from 60-to-precursor minus 20, minimum S/N 0.5 and maximum 200 peaks per precursor.

2.6.2 LC-MS/MS

2.6.2.1 TOF/TOF

After extraction and purification, samples were resuspended in 20 µl of solution A and a volume of 5 µl of peptide solution of each fraction was loaded on a 10 cm long home-made column with an ID of 100 µm, packed with C18 reverse phase (YMS-ODS-AQ200, Michrom Biosource, CA, USA). The elution gradient of the LC ranged from 4% to 38% solvent B in solvent A (Solvent A: 3% AcN, 0.1% FA, Solvent B: 95% AcN, 0.1% FA) was developed in 40 minutes and samples were eluted directly onto a MALDI target plate using a home-made spotting robot. MALDI-TOF/TOF Matrix was then applied and allowed to dry in a speed-vac. Peptides were analyzed in MS and MS/MS mode using a 4700 MALDI-TOF/TOF, with a Nd:YAG laser at 355 nm operating at 200 Hz repetition. 800 consecutive laser shots were accumulated for MS and 1500 for MS/MS. For the CID, Argon gas was used, at a gas pressure of 4-8 x 10-7 torr. Data-dependent MS/MS analysis was performed automatically on the 10 most intense ions from MS spectra. External calibration with lysozyme C (EC 3.2.1.17) was done in MS and MS/MS (m/z 1753.6) when judged necessary. After the MS/MS analysis, a peak list was created as explained above.

MALDI-TOF/TOF matrix: 5 mg/ml of CHCA in 50% ACN-0.1% TFA 10 mM NH4H2PO4

2.6.2.2 Q-TOF

After extraction and purification, samples were dissolved in 100 µl H2O, 0.1% TFA and 10 µl were loaded for LC-MS/MS analysis. A guard column (µ-precolumn cartridge, C18-PepMap 100, 5 µm, 100 Å) was connected using a switching valve directly to an analytical column (75µm ID, 15-17 cm long) packed in-house with 5µm, 200 Å Magic C18 AQ (Michrom BioResources Inc.). After washing during 3 minutes at 30 µl/min with H2O), 0.1% TFA, a gradient from 10 to 50% solvent B in solvent A (Solvent A: 2% AcN, 0.1% FA, Solvent B: 80% AcN, 0.085% FA) was developed over 50 minutes at a flow rate of 200 nl/min. The concentration of solvent B was increased to 96% before returning to start conditions for re-equilibration of the column. The eluate was sprayed directly into the nanoESI source of a Q-TOF 1 (Micromass, Manchester, UK) with a spray voltage of 2.0-2.5 kV. Data dependent acquisition was used to automatically select 3 precursors for MS/MS from each MS spectrum (m/z range 300-2000). MS/MS spectra were acquired with collision energy of 35 eV and a gas pressure (Argon) of 1.38 bar in the collision cell. Data acquisition and processing were performed using Masslynx 3.5 software.

2.7 Data analysis

2.7.1 Protein identification For the LC-MS/MS with S. aureus samples, peak lists of all fractions of the same strip or membrane were merged before database searching with Phenyx (GeneBio, Switzerland).

Page 35: Master’s in Proteomics and Bioinformatics Shortcut Shotgun IEF · 2007-06-15 · High-throughput techniques are required for effective simultaneous analysis of multiple protein

- 27 -

Searching was performed against a home-made database containing non-redundant predicted ORFs from genome-sequenced strain N315 with 90% homology with other strains [84]. On Phenyx submission webpage MALDI-TOF/TOF was selected as instrument type. The taxonomy selected was other Firmicutes. Two search rounds were selected, both with trypsin as the proteolytic enzyme, oxidized methionine as variable modification and carbamidomethylation of cysteine as fixed modification. In the second round deamidation was also selected as variable modification. In the first round, one missed cleavage with normal cleavage mode was selected whereas in the second round three missed cleavages with half-cleaved node were selected. “Turbo” was selected only in the first round, with a tolerance of 0.4 Da, a coverage of more than 0.2 and a, b and y ion series. The minimum peptide length allowed was 5 amino acids. Parent ion tolerance was 1 Da in the first round and 0.4 Da in the second. The acceptance criteria were slightly lowered in the second round search (1st round: AC score 8.0, peptide Z.score 6.5 and p-value 1.0E-7; 2nd round: AC score 8.0, peptide Z.score 6.0 and p-value 1.0E-7). For the Q-TOF data, every individual peak list was submitted to Phenyxbeta. Searching was performed against a home-made database containing non-redundant predicted ORFs from genome-sequenced strain N315 with 90% homology with other strains [84]. On Phenyx submission webpage ESI-Q-TOF was selected as instrument type and the root taxonomy was selected. A single search round was selected with trypsin (KR_noP) as the proteolytic enzyme, oxidized methionine as variable modification, carbamidomethylation of cysteine as fixed modification and one missed cleavage with normal cleavage mode. “Turbo” with a tolerance of 0.4 Da, a coverage of more than 20 % and a ion series was chosen. The minimum peptide length allowed was 6 amino acids. Parent ion tolerance was set to 0.4 Da. The acceptance criteria were: AC score 7.0, peptide Z.score 6.0 and p-value 1.0E-7. For direct MS/MS on protein standards, the peak lists obtained from the pool were submitted to Mascot (MatrixScience, USA). Searching was performed against UniProtSP database. On Mascot submission webpage MALDI-TOF/TOF was selected as instrument type. The taxonomy selected was other Mammalia. Trypsin as selected as the proteolytic enzyme, oxidized methionine as variable modification and carbamidomethylation of cysteine as fixed modification. Two missed cleavages were selected, as well as monoisotopic mass values. The peptide mass tolerance was set to ± 2 Da and the fragment mass tolerance to ± 1 Da.

2.7.2 Visualisation

For the data obtained from the voyager DE-STR MALDI-TOF mass spectrometer a unique “.dat” file containing all spectra was imported into the MSight software. Each spectrum is automatically concatenated to its neighbours using the “Concatenate Images” function, thus creating an image with the m/z ratio on the x axis and the number of the spectra, corresponding to its position on the membrane and therefore to the pI, on the y axis. Once the image was obtained, it was “cleaned” by using the “Remove Background from Image” function, as well as the “Normalise Images using the TIC” for a harmonisation of the spectra. Once these functions were used, the contrast of the image was also adapted. For the data obtained from the 4700 MALDI-TOF/TOF for each LC fraction 48 separated individual images are created (48 spots per fraction). Once all were open, the “Create Image (merge subfolders and concatenate…)” function was used. Once all spectra from each fraction were merged, all the fractions were concatenated, thus obtaining a 2D image with the m/z ratio on the x axis and the pI on the y axis, the retention time having been squashed during the merge. This gave an MS image of the strip or membrane where the fractions originated.

Page 36: Master’s in Proteomics and Bioinformatics Shortcut Shotgun IEF · 2007-06-15 · High-throughput techniques are required for effective simultaneous analysis of multiple protein

- 28 -

Furthermore, the “Add MS/MS annotation to Image…” function was used to spot on the image the origin of the precursor masses used for identification. For Direct MS/MS, the MS spectra were individually opened, “cleaned” and concatenated, before being normalized. Then the MS/MS spectra were included to the image. For the comparison of the distribution obtained by MSI and a theoretical peptide distribution, the peptide masses were calculated using the Compute pI/Mw tool from Expasy (SIB, Switzerland). The pIs were calculated either by using InSilicoSpectro from Vital-IT (SIB, Switzerland) or the new prediction algorithm from Cargile et al. [60]. A plot of MW/pI was done on Excel for the theoretical representation. For BioMap imaging, the MALDI MS Imaging software (Novartis) was used on the Voyager DE-STR MALDI-TOF. Once the membrane area was defined, the instrument was set to acquire every 250 µm vertically and horizontally, thus creating an image of the whole membrane (about 2500 spectra). The MS parameters were the same as for normal MS imaging (see section 2.6.1). The data was then imported on the BioMap 3.7.5.2 software to visualise the total ion image of the membrane and to select various peptides for localization.

Page 37: Master’s in Proteomics and Bioinformatics Shortcut Shotgun IEF · 2007-06-15 · High-throughput techniques are required for effective simultaneous analysis of multiple protein

- 29 -

3. RESULTS AND DISCUSSION First developments were done using fluorescent markers (prototype products from GE-Healthcare). pI 3.95-4.00 or 4.55-4.60 were used, and 3 µl of marker were mixed to 97 µl of rehydration buffer and focused as stated in section 2.3.1. The rehydration and focusing were done in the dark to preserve the fluorescence and scanned by the 9400 Typhoon Scanner (Amersham Biosciences). Once optimal parameters were obtained, they were submitted to MS imaging with a pool of 5 different proteins for further investigations (Table 1):

Protein SwissProt ID AC number Quantity MW (KDa) BSA ALBU_BOVIN P02769 2.5 mg/ml 69 Carbonic Anhydrase CAH2_BOVIN P00921 2.5 mg/ml 28 β-casein CASB_BOVIN P02666 2.5 mg/ml 25 β-lactoglobulin LACB_BOVIN P02754 2.5 mg/ml 18 Phosphorylase b PYGM_RABIT P00489 2.5 mg/ml 97

Table 1. Proteins used to make a pool of standard proteins for initial experiments

80 µl of each protein was taken to constitute the pool of standard proteins, therefore making a total of 1 mg of protein in 400 µl of BA. When theoretically digested using the InSilicoSpectro software (Vital-it, SIB, Switzerland) these provided a sample containing 364 tryptic peptides. The developments tested with the proteins standard model were then applied to total protein extracts or membrane protein extracts of S. aureus strain N315.

3.1 Transfer

3.1.1 Centrifugation

3.1.1.1 Results

In the first transfer experiments, transfer by centrifugation was done by introducing the transfer sandwich into a plastic bag and sealing it hermetically, before fixing it onto the 96-well plate support placed on the rotor of the centrifuge machine. This gave poor and very irregular results, as the handling was very difficult, especially to avoid putting pressure on the sandwich so the IPG strip would not stick to the capture membrane. Such results led to the development of the sarcophagi with the kind cooperation of the Geneva school of engineering (EIG, Geneva, Switzerland) for optimized transfer and sample handling reduction (Figure 5). For every test, the IPG strip was scanned with a Typhoon scanner 9400 (Amersham Biosciences) before and after the transfer. Another scan was done of the capture membrane for the same purposes. The fluorescence was visualized and quantified using the ImageQuant software (Amersham Biosystems). The different centrifuge machines were first compared. A comparison of a centrifugation effected on a speed-vac and a Sorvall Heraeus Megafuge 1.0 R centrifuge machine (Thermo Fisher Scientific, MA, USA) showed that there was no difference in diffusion and transfer rate between the two. Therefore the Heraeus Megafuge 1.0 R centrifuge was chosen for availability reasons. Once this was established, different centrifugation speeds were tested.

Page 38: Master’s in Proteomics and Bioinformatics Shortcut Shotgun IEF · 2007-06-15 · High-throughput techniques are required for effective simultaneous analysis of multiple protein

- 30 -

The transfer was done using 3 different conditions: at 2700 x g for 99 minutes, at 4000 x g for 99 minutes and at 300 x g for 60 minutes and then 2700 x g for 39 minutes. The results obtained show that the speed has no effect on the transfer rate as well as on the diffusion of the marker. Furthermore, it was shown that the orientation of the sandwich in the machine, either horizontal or vertical, did not affect these parameters either. Further tests were performed to investigate the effect of the paraffin oil used during IEF, on the transfer. It was shown that the removal of the paraffin oil before centrifugation increased the diffusion, so the IPG strips were not washed in transfer buffer before being placed onto the capture membrane. Tests were also made to determine the number of filter papers that needed to be soaked in transfer buffer to provide the best transfer. It was shown that if only one filter paper was used, even though it was not blotted, it was not sufficient to provide an effective transfer. Therefore, it is necessary to soak the two filter papers and blot them by pressing them between two paper towels before using them for the transfer. Special care must be taken during the scan with the Typhoon before the transfer, so the IPG strip does not dry out, otherwise the gel will stick to the capture membrane during the transfer. The sarcophagi were then compared pair-wise to transfer with the “old” method in the plastic bags. The results show that the transfer rate between the two methods is similar but there is less diffusion. Additionally, the manipulations using the sarcophagi are much easier and more reproducible.

3.1.1.2 Discussion

The transfer is a very important step in the shortcut shotgun IEF workflow. Indeed, if it is done correctly, it will produce an exact copy of the position of the peptides on the IPG strips on a surface that can be more easily manipulated than the IPG strip. Furthermore, such a support can theoretically conserve the position of the peptides and can be used for further analysis such as MSI and can then be treated in order to extract the peptides from its matrix for LC-MS/MS purposes and for protein identification. The transfer by centrifugation is a clever way of exploiting the centrifugal and gravitational force for transfer purposes of samples from an IPG strip. In fact, effecting a transfer using an electrical field as in a western blot [88] is not possible for such samples, because the plastic behind the strip blocks the field. For these reasons, we simply put the samples into a centrifuge machine and let the peptides sieve out of the IPG strip. The parameters obtained thanks to the fluorescent markers were tested on the SSIEF workflow with a pool of standard proteins. Even though good images were obtained by centrifugation at 2700 x g for 99 minutes, using two soaked but blotted filter papers and without washing the IPG strip before the transfer, the variability between the different images was too big to conclude to an efficient transfer. The rate of transfer does not significantly change when the sarcophagi are used, but the diffusion is lessened, which makes it more appropriate for SSIEF. The easier manipulations make it also a better technique as there are fewer chances of pressing on the strip by mistake and provoking the gel to stick onto the capture membrane. When comparing the practical image to the theoretical one obtained by computing the pI and MW of the peptides it was to difficult to adjust the images properly to obtain a significant matching. Such variability is probably due to handling difficulties occurred during the insertion and extraction of the capture membrane and the IPG strip from the slots in the sarcophagi.

Page 39: Master’s in Proteomics and Bioinformatics Shortcut Shotgun IEF · 2007-06-15 · High-throughput techniques are required for effective simultaneous analysis of multiple protein

- 31 -

3.1.2 Capillarity

3.1.2.1 Results

Up to date, only DNA, RNA and in few cases proteins have been transferred by capillarity [21, 22]. In an initial step for optimization, capillary transfer was tested with fluorescent markers. The first experiment done for capillary transfer was done by over-night transfer (about 18 hours) to see which of the down→ up or up→ down transfer was the most efficient. In the first one, the IPG strips are placed on the bottom glass plate face up. The capture membrane was then placed on top, as well as the filter papers, before placing the top glass plate and the 1.25 kg weight on top (see Figure 2). Only the filter paper in direct contact with the membrane was soaked with transfer buffer. In this order, the peptides were permitted to follow the liquid flow from bottom to top. In the second option, the set-up was in the reverse order, with the IPG strips at the top facing down onto the capture membrane, the filter papers and the bottom glass plate. The top glass plate was gently placed right onto the strips. Both set-ups were wrapped into cellophane film to close the sandwich hermetically and thus prevent drying, before the weight was set on top. It was shown that the up→ down transfer gave less diffusion than the other and was therefore chosen for future experiments.

Figure 6. Different transfer times for capillarity transfer. Fluorescent markers are rehydrated into the IPG strip and focused. After IEF they are scanned with a Typhoon 9400 scanner and set to transfer. Another scan of the strips and the capture membranes is done after the transfer. (A) IPG strips after the IEF. (B) IPG strips after (1) 2 hours (2) 5 hours and (3) over-night transfer. (C) PVDF membranes after (1) 2 hours (2) 5 hours and (3) over-night transfer. (D) Graph of the intensity of the fluorescent marker on the PVDF membrane after transfer.

Shorter transfer times were evaluated in order to reduce the time frame and increase the throughput of the SSIEF pipeline. After the focusing step, strips were transferred under the same conditions, but for 2 hours, 5 hours and over-night. It was observed that the transfer rate did not significantly change between 2 hours, 5 hours and over-night transfer (Figure 6, D). Moreover, Figure 6 shows that there is almost no marker left on the IPG strip for the over night transfer. We can also see that the diffusion is the same between the 2 hour and the over-night transfer.

Page 40: Master’s in Proteomics and Bioinformatics Shortcut Shotgun IEF · 2007-06-15 · High-throughput techniques are required for effective simultaneous analysis of multiple protein

- 32 -

For high-throughput purposes, the 2 hour transfer was chosen, as the diffusion is the same but the transfer rate is slightly lower, but not significantly. A comparison between the capillarity and centrifugation was done by mixing the fluorescent marker and the pool of standard proteins. It was possible to observe the differences in the diffusion and transfer rate on the Typhoon scanner as well as on the MALDI-TOF using the MSight software. The Typhoon scanner revealed that the capillarity globally gave better results than the centrifugation, even when the centrifugation was done with the sarcophagi. A reproducibility test using a pool of standard proteins confirmed these results (Figure 7). By visually comparing the two images obtained, we notice that there is much more signal (spots) for the capillarity, meaning that the transfer rate was higher. The close-up shows that the spots cover less MS spectra in the capillarity, which means that the peptides are less diffused. A further analysis by 3D imaging of the spots confirmed this and that there is not only more for the capillarity transfer but they are also more intense.

Figure 7. Reproducibility comparison between a transfer by centrifugation and a transfer by capillarity. Images were obtained by MALDI-TOF acquisition visualized by MSight software (SIB). (A) We notice that on the capillary image, there is more signal, which means that there was a better transfer rate than for the centrifugation. On the close-up, we can see that for the centrifugation, the spots are vertically longer, which means that the diffusion was greater. (B) A 3D image of the spots in the close up shows that the spots are less diffused and more intense in the capillarity transfer.

Another observation was that there was always some gel stuck to the membrane during the centrifugation and had to be removed by scraping the membrane, in order to be able to do MS acquisition. This observation was further tested and it was shown that the capillarity transfer gave a considerable reduction of stuck gel (Figure 8).

Page 41: Master’s in Proteomics and Bioinformatics Shortcut Shotgun IEF · 2007-06-15 · High-throughput techniques are required for effective simultaneous analysis of multiple protein

- 33 -

Figure 8. Scanned image of the membranes and IPG strips after transfer by (A) capillarity and (B) by centrifugation. Circled is the gel stuck to the membrane.

3.1.2.2 Discussion

As previously stated, the transfer step is essential in the SSIEF pipeline, as it will determine the quality of the MS image leading to the fractionation of the membrane for optimal extraction and identification of proteins present in the complex sample of interest. The transfer by capillarity is based on the same principal as the one by centrifugation: the gravitational force and the passive diffusion by capillarity. The advantage over the centrifugation is that it does not need any particular instrumentation in order to be efficient and can therefore be done in any laboratory or under any conditions. We showed that it is not necessary to wait over-night for an efficient transfer. This permits to integrate it to the shortcut shotgun IEF workflow, as it is a high throughput technique for efficient transfer of peptides from an IPG strip to a capture membrane. It was also shown that the simple use of the capillarity coupled to the gravitational force is enough and that multiplying this force reduced the efficiency of the transfer and increased the diffusion. This can also be due to the fact that the force applied on the sandwich is not necessarily exactly perpendicular to the IPG strip and the membrane during the centrifugation, as the weight of the sarcophagus and the support is 846.8 g (support: 708.46 g and sarcophagus: 138.34 g), which is not negligible, even on a rotor of 22.5 cm turning at 2700 x g. On the contrary, the capillarity transfer is perfectly perpendicular, so the membrane will provide a perfect copy of the IPG. Furthermore, the gel sticking on the membrane after the transfer is a great obstacle for the mass spectrometry, because it disturbs the matrix deposition as well as the MS acquisition and therefore the imaging. It was shown that the transfer by capillarity gave less gel sticking than the centrifugation. This is probably due to the lower force applied for transfer with capillarity compared to centrifugation.

3.2 Capture membrane

3.2.1 Results As every capture membrane has different characteristics, the first test was to compare PVDF (BioRad), Immobilon (Sigma-Aldrich) and Nitrocellulose (Schleicher & Schuell) membranes in a centrifugation transfer with fluorescent markers.

Page 42: Master’s in Proteomics and Bioinformatics Shortcut Shotgun IEF · 2007-06-15 · High-throughput techniques are required for effective simultaneous analysis of multiple protein

- 34 -

A fourth membrane was added to the lot: a PVDF membrane on which we deposited a thin layer of gold (~10 nm) on the back to slow down the flow rate. The best transfer was obtained with the PVDF membrane, but the difference with or without gold was not significant. Another unfortunate effect was that a great amount of gel was sticking on the membranes. For these reasons, a membrane of cellulose acetate with a pore size of 0.45 µm was inserted between the IPG strip and the PVDF membrane. The cellulose acetate membrane does not fix the peptides and is also very thin and porous. This would allow the peptides to pass through onto the PVDF membrane without having any gel stuck. Unfortunately the transfer rate was much lower and the diffusion was higher. Further tests were made with gold coating to see if the effect of diffusion could be decreased by slowing down the flow rate of the transfer buffer during the transfer by blocking the pores with gold deposition. PVDF and Immobilon membranes were tested with and without a layer of ~10 nm of gold. The fact of slowing down the flow rate doesn’t reduce the diffusion, but instead the transfer rate is reduced by 10%. During these tests, a second slightly larger membrane was added behind, to test whether the first one retained the entire marker or whether it let a certain amount pass through. Results show that in all cases, there was no marker transferred on the second membrane. The role of pore size of the capture membrane was tested, to see if it has an effect on the diffusion. A comparison of 0.2 µm PVDF membranes (BioRad) and 0.1 µm PVDF membranes (Millipore) was done by centrifugation and by capillarity. Results show that, once again, capillarity gave more signals and less diffusion than the centrifugation. The markers on the 0.1 µm PVDF membranes seemed only to be fixed on the surface of the membrane and not have penetrated into the matrix. Furthermore, the MSight images of transfer by capillarity of a pool of standard proteins on 0.1 µm and 0.2 µm PVDF membranes showed that even though there are very intense spots, the signals are more intense with the 0.2 µm pores.

3.2.2 Discussion The capture membrane is a determining piece of the SSIEF puzzle. If the quality of the support to capture peptides is not optimal, the peptides will simply pass through and will be lost in the filter papers. Another important property is the capacity of the capture membrane to fix the peptides. If a too great quantity of peptides is loaded onto the IPG strip and the capture membrane cannot retain them all, there will be significant diffusion, as the peptides will be transferred, over-saturate the membrane and leak around. The peptides with lower affinity would be lost resulting in a peptide distribution not reflecting their real presence. Another point is that if the capture membrane does not retain the peptides the transfer is pointless. The capacity of a membrane to capture the peptides determines the efficiency and the quality of the transfer. The membrane has to hold the peptides but it must also be possible to extract them with a simple and quantitative procedure. Each membrane is composed of a different matrix, such as PolyVinylidene DiFluoride, Polyethylene, Nirtocellulose, Acetate, etc. having different properties, such as hydrophobicity and the number of active sites which will influence the recovery of the peptides. The PVDF membrane was selected for the SSIEF within a panel of tested membranes for the capture of the transferred peptides from the IPG strip. A few years ago, it has been reported that polyethylene membrane was better for the transfer of RNA [21], but unfortunately we were not able to obtain a sample of such product to test it on the transfer of peptides of a complex sample. The pore size is also important factor.

Page 43: Master’s in Proteomics and Bioinformatics Shortcut Shotgun IEF · 2007-06-15 · High-throughput techniques are required for effective simultaneous analysis of multiple protein

- 35 -

If the pores are too large, the peptides will not be fixed to the matrix and lot of peptides will be lost. On the contrary, if the pores are too small, the buffer will not penetrate the membrane and the rate of transfer will be reduced, increasing the diffusion. The surface might also get rapidly saturated resulting also in lateral diffusion. Results show that the membranes with a pore size of 0.1 µm show a greater diffusion than those of 0.2 µm. Furthermore, the extraction of the peptides from the 0.2 µm membrane can be done easily so the peptides do not have to remain at the surface of the membrane. As the diffusion needs to be as limited as possible in the pipeline, a pore size of 0.2 µm was retained as optimal for the SSIEF. The peptides are extracted from the IPG strip because the liquid contained inside the latter will be pushed out by the pressure and capillarity force. The added weight will create a pressure of about 16 grams per cm2 helping to push the peptides out of the strip. Care must be taken not to add too much weight, as this will squeeze the gel onto the membrane. One must also be careful not to let the membrane dry, otherwise the flow will stop and the peptides will no longer penetrate. This is why the transfer is done under sealed conditions. The flow, although induced by very small forces, can be controlled to optimize the fixation of the peptides to the membrane. Indeed, if flow is too great, the peptides will not have time to stick and will simply pass through. On the other hand, if the flow is too little, the peptides will not be pulled out of the IPG gel. It is possible to adjust the flow rate by changing the pressure applied on the strip by modifying the weight or the size of the filter papers, reducing or increasing the weight per square centimetre. Such tests were not done in the present study, as the recovery of the peptides by the PVDF membrane seemed sufficient for the SSIEF. Further investigations should be made for low abundant proteins.

3.3 Strip versus membrane fractionation

3.3.1 Results In order to compare the amount of identifications by extraction of peptides from a strip versus an extraction from a PVDF membrane, total membrane extracts of Staphylococcus aureus strain N315 were focussed on IPG strips and the peptides from one strip were then transferred onto a PVDF membrane. The extraction was done from the two different supports for MS analysis and comparison.

Strip: High Average Low Total Unique theoretical proteins 507 1074 530 2403 Unique practical proteins 176 192 264 308 Common proteins 132 116 44 292 % of theoretical identified 48.20 25.88 53.66 12.81

Membrane: High Average Low Total Unique theoretical proteins 530 1099 537 2403 Unique practical proteins 141 159 213 250 Common proteins 109 91 37 237 % of theoretical identified 39.12 21.01 43.55 10.40

Table 2. Comparison of the number of identifications between theoretical and practical digestion. The number of identifications form the strip and the membrane are compared to the amount of theoretically high, average and low abundant proteins of S. aureus N315.

Page 44: Master’s in Proteomics and Bioinformatics Shortcut Shotgun IEF · 2007-06-15 · High-throughput techniques are required for effective simultaneous analysis of multiple protein

- 36 -

The Q-TOF was used for LC-MS/MS analysis. Unfortunately, the format of the raw data obtained with the Q-TOF is not compatible with MSight for the moment, so it was not possible to generate an image of the sample analysed by Q-TOF-MS and illustrated in the Figure 9, but it was possible to create one with the data initially obtained with the MALDI-TOF/TOF (Figure 9, A). On the Q-TOF, the first results showed that with a regular 40 minutes gradient used for the MALDI-TOF/TOF-MS there was too much sample injected, creating sample overloading, so we changed the gradient to 50 minutes (see section 2.6.2.2) and obtained fairly good chromatograms. This new gradient allowed us to identify 308 proteins for the extraction from the IPG strip and 250 proteins for the extraction from the PVDF membrane using Phenyx (Annexe 1). If we compare the results obtained with the Q-TOF with the theoretical results of the number of high, average and low abundant proteins of S. aureus strain N315, we can see that we identified more low abundant proteins than high abundant and averagely abundant ones in both cases. Furthermore, we identified 12.81% of the total proteins from the strip, whereas 10.40% from the PVDF membrane (Table 2). Of course these results need to be confirmed and the experiment has to be triplicated to be sure the results are correct, but primary results show that we have a loss (~20%) of identification when we transfer the peptides onto the PVDF membrane and extract from the latter.

Figure 9. Comparison of the number of identifications by extraction from an IPG strip compared with an extraction from a PVDF membrane. Analysis is done by LC-MS/MS on a Q-TOF. (A) MSight image. The 48 MS spectra of each fraction were merged into a single spectrum. Each band on the image represents such spectrum. All fractions were then concatenated into a single image. (B) All tryptic peptides are plotted on a graph and every fraction is represented by a different colour. (C) The peptide repartition as a function of the number of peptides per fraction and (D) as a function of the pI. (E) Venn diagram summarising the number of unambiguously identified from the IPG strip and from the PVDF membrane.