MAPAS: a tool for predicting membrane-contacting protein surfaces

1
NATURE METHODS | VOL.5 NO.2 | FEBRUARY 2008 | 119 CORRESPONDENCE MAPAS: a tool for predicting membrane- contacting protein surfaces To the editor: Many important biological processes, from serum phospholipid metabolism to amyloid disease, involve formation of protein-membrane complexes. Thus, tools for identifying membrane- contacting features in a protein structure are very important. However, few algorithmic approaches for membrane-contacting surface predic- tion have yet been reported 1,2 . We developed a program and web-based tool called MAPAS, or membrane-associated-proteins assessment (http://cancer- tools.sdsc.edu/MAPAS/pro2.html). MAPAS uses a set of algo- rithmic scoring functions to predict whether a given protein structure can form strong membrane contacts and to define the regions of the protein surface that most likely form such contacts (Supplementary Methods online). The MAPAS input window (Supplementary Fig. 1 online) accepts Protein Data Bank (PDB) protein identifiers or a pasted file in pdb format. The MAPAS algorithm is based on the assumption that mem- brane-contacting protein surfaces have a specific distribution of membranephilic surface residues in a plane. This planar region would contact the membrane (the explicit assumption is that, on the scale of proteins, the cell membrane can be considered as a plane). These residues must provide the necessary binding energy to keep the protein at the membrane surface. MAPAS (i) identifies the planar surfaces that encompass a given protein, and (ii) scores them according to their membranephilic properties. To provide a measure of membranephilicity, we estimated the relative ten- dency of individual residues to bind to a phospholipid bilayer. We calculated scoring functions using a semi-empiric approach based on steered molecular dynamics (Supplementary Figs. 24 and Supplementary Table 1 online) and Poisson-Boltzmann cal- culations (Supplementary Methods). MAPAS accepts a protein’s three-dimensional structure as input and identifies all planes encompassing the protein structure (Fig. 1a) then calculates all residues that lie in the layer of a given thickness (Supplementary Fig. 5 online). Then MAPAS sorts the planar protein surfaces based on their membranephilic character. The output window displays rotatable three-dimensional presentations of submitted proteins with their possible membrane–contacting surfaces indicated (see for example, Supplementary Figs. 6 and 7 online). We validated the performance of MAPAS with several known membrane-contacting proteins (Fig. 1b and Supplementary Tables 2 and 3 online). MAPAS can predict membrane-contacting proteins, membrane-associated proteins and the membrane-contacting sur- faces of proteins including transmembrane proteins (Supplementary Discussion online). Nevertheless, as with all prediction programs, MAPAS can yield false positive and false negative predictions. One possible source of error is the fact that coordinates of proteins listed in PDB as membrane-contacting do not include the membrane–contacting regions, either because they are disordered or because they are engineered out of the protein to permit crystallization. Another problem is the relatively small area of membrane contact found in some proteins. Our tests show that MAPAS is reliable when the number of membrane-contacting residues is at least 5 (data not shown). With fewer residues in the membrane-contacting zone the statistical error increases. Note: Supplementary information is available on the Nature Methods website. ACKNOWLEDGMENTS Supported by US National Institutes of Health (AG18440 to I.F.T. and E.M., AI057695 for V.K. and S.K.N.) and Department of Energy (INCITE grant). We are grateful for computational support on IBM BlueGene computers at the San Diego Supercomputer Center and at the Argonne National Laboratory. Yuriy Sharikov 1 , Ross C Walker 1 , Jerry Greenberg 1 , Valentina Kouznetsova 2 , Sanjay K Nigam 2,3 , Mark A Miller 1 , Eliezer Masliah 4,5 & Igor F Tsigelny 1,6 1 San Diego Supercomputer Center, 2 Department of Medicine, 3 Department of Pediatrics, 4 Department of Neuroscience, 5 Department of Pathology, 6 Department of Chemistry and Biochemistry, University of California, San Diego, 9500 Gilman Drive, La Jolla, California 92093, USA. e-mail: [email protected] 1. Lomise, A., Pogozheva, I., Lomize, M. & Mosberg, H. Protein Sci. 15, 1318– 1333 (2006). 2. Scott, D., Diez, G. & Goldmann, W. Theor. Biol. Med. Model. 3, 17–20 (2006). Estimation of solvent- accessible surfaces of the residues of studied protein Construction of a set of planes encompassing studied protein Calculation of membrane- association ‘asymmetry’ for studied protein Prediction: is the studied protein membrane-associated? Calculation of membrane- association score for the residues included in these planes Selection of membrane-association scores for these planes Prediction of membrane-associated plane of studied protein Protein structure file in pdb format Table of membrane association scores defined by Steered Molecular Dynamics 90 80 70 60 50 40 30 20 10 0 MAS 0 2 4 6 MRS Membrane–contacting proteins Random proteins a b Figure 1 | The MAPAS algorithm workflow and performance. (a) The solvent- accessible surface of each residue of the entrance protein is calculated, a set of planes encompassing the entire protein is constructed, and then membrane- association asymmetry scores and membrane-contact scores for these planes are calculated using the table of membrane-association scores defined with semi-empirical method. Finally, membrane-associated proteins and the membrane-associated surfaces of these proteins are predicted. (b) Membranephilic area scores (MAS) and membranephilic residues scores (MRS) define membrane-contacting proteins. The majority of membrane- related proteins cluster differently than random non-membrane-contacting proteins. If a protein has MRS > 3 or MAS > 60%, there is a high probability that the protein has a true membrane-contacting region. Given the limitations of each scoring method, we suggest that users select proteins with high MAS and MRS, and then refine the predictions by considering K mpha , the coefficient of ‘membranephilic asymmetry’ (see Supplementary Methods for definitions). © 2008 Nature Publishing Group http://www.nature.com/naturemethods

Transcript of MAPAS: a tool for predicting membrane-contacting protein surfaces

NATURE METHODS | VOL.5 NO.2 | FEBRUARY 2008 | 119

CORRESPONDENCE

MAPAS: a tool for predicting membrane-contacting protein surfacesTo the editor: Many important biological processes, from serum phospholipid metabolism to amyloid disease, involve formation of protein-membrane complexes. Thus, tools for identifying membrane-contacting features in a protein structure are very important. However, few algorithmic approaches for membrane-contacting surface predic-tion have yet been reported1,2.

We developed a program and web-based tool called MAPAS, or membrane-associated-proteins assessment (http://cancer-tools.sdsc.edu/MAPAS/pro2.html). MAPAS uses a set of algo-rithmic scoring functions to predict whether a given protein structure can form strong membrane contacts and to define the regions of the protein surface that most likely form such contacts (Supplementary Methods online). The MAPAS input window (Supplementary Fig. 1 online) accepts Protein Data Bank (PDB) protein identifiers or a pasted file in pdb format.

The MAPAS algorithm is based on the assumption that mem-brane-contacting protein surfaces have a specific distribution of membranephilic surface residues in a plane. This planar region would contact the membrane (the explicit assumption is that, on the scale of proteins, the cell membrane can be considered as a plane). These residues must provide the necessary binding energy to keep the protein at the membrane surface. MAPAS (i) identifies the planar surfaces that encompass a given protein, and (ii) scores them according to their membranephilic properties. To provide a measure of membranephilicity, we estimated the relative ten-dency of individual residues to bind to a phospholipid bilayer. We calculated scoring functions using a semi-empiric approach based on steered molecular dynamics (Supplementary Figs. 2–4 and Supplementary Table 1 online) and Poisson-Boltzmann cal-culations (Supplementary Methods). MAPAS accepts a protein’s three-dimensional structure as input and identifies all planes encompassing the protein structure (Fig. 1a) then calculates all residues that lie in the layer of a given thickness (Supplementary Fig. 5 online). Then MAPAS sorts the planar protein surfaces based on their membranephilic character. The output window displays rotatable three-dimensional presentations of submitted proteins with their possible membrane–contacting surfaces indicated (see for example, Supplementary Figs. 6 and 7 online).

We validated the performance of MAPAS with several known membrane-contacting proteins (Fig. 1b and Supplementary Tables 2 and 3 online). MAPAS can predict membrane-contacting proteins, membrane-associated proteins and the membrane-contacting sur-faces of proteins including transmembrane proteins (Supplementary Discussion online).

Nevertheless, as with all prediction programs, MAPAS can yield false positive and false negative predictions. One possible source of error is the fact that coordinates of proteins listed in PDB as membrane-contacting do not include the membrane–contacting regions, either because they are disordered or because they are engineered out of the protein to permit crystallization. Another problem is the relatively small area of membrane contact found in some proteins. Our tests show that MAPAS is reliable when the number of membrane-contacting residues is at least 5 (data not shown). With fewer residues in the membrane-contacting zone the statistical error increases.

Note: Supplementary information is available on the Nature Methods website.

ACKNOWLEDGMENTSSupported by US National Institutes of Health (AG18440 to I.F.T. and E.M., AI057695 for V.K. and S.K.N.) and Department of Energy (INCITE grant). We are grateful for computational support on IBM BlueGene computers at the San Diego Supercomputer Center and at the Argonne National Laboratory.

Yuriy Sharikov1, Ross C Walker1, Jerry Greenberg1, Valentina Kouznetsova2, Sanjay K Nigam2,3, Mark A Miller1, Eliezer Masliah4,5 & Igor F Tsigelny1,6

1San Diego Supercomputer Center, 2Department of Medicine, 3Department of Pediatrics, 4Department of Neuroscience, 5Department of Pathology, 6Department of Chemistry and Biochemistry, University of California, San Diego, 9500 Gilman Drive, La Jolla, California 92093, USA. e-mail: [email protected]

1. Lomise, A., Pogozheva, I., Lomize, M. & Mosberg, H. Protein Sci. 15, 1318–1333 (2006).

2. Scott, D., Diez, G. & Goldmann, W. Theor. Biol. Med. Model. 3, 17–20 (2006).

Estimation of solvent-accessible surfaces of theresidues of studied protein

Construction of a set ofplanes encompassing

studied protein

Calculation of membrane-association ‘asymmetry’ for

studied protein

Prediction: is the studiedprotein membrane-associated?

Calculation of membrane-association score for the residues

included in these planes

Selection of membrane-associationscores for these planes

Prediction of membrane-associatedplane of studied protein

Protein structure filein pdb format

Table of membraneassociation scoresdefined by Steered

Molecular Dynamics

90

80

70

60

50

40

30

20

10

0M

AS

0 2 4 6 MRS

Membrane–contacting proteins

Random proteins

a

b

Figure 1 | The MAPAS algorithm workflow and performance. (a) The solvent-accessible surface of each residue of the entrance protein is calculated, a set of planes encompassing the entire protein is constructed, and then membrane-association asymmetry scores and membrane-contact scores for these planes are calculated using the table of membrane-association scores defined with semi-empirical method. Finally, membrane-associated proteins and the membrane-associated surfaces of these proteins are predicted. (b) Membranephilic area scores (MAS) and membranephilic residues scores (MRS) define membrane-contacting proteins. The majority of membrane-related proteins cluster differently than random non-membrane-contacting proteins. If a protein has MRS > 3 or MAS > 60%, there is a high probability that the protein has a true membrane-contacting region. Given the limitations of each scoring method, we suggest that users select proteins with high MAS and MRS, and then refine the predictions by considering Kmpha, the coefficient of ‘membranephilic asymmetry’ (see Supplementary Methods for definitions).

©20

08 N

atur

e P

ublis

hing

Gro

up

http

://w

ww

.nat

ure.

com

/nat

urem

eth

od

s