Download - Protein sequence and structure relationship ARMA spectral ... · brane proteins have also been proposed (Argos et al., 1982; Engelmanet al., 1986). It is physically understandable

Transcript
Page 1: Protein sequence and structure relationship ARMA spectral ... · brane proteins have also been proposed (Argos et al., 1982; Engelmanet al., 1986). It is physically understandable

Biophysical Journal Volume 66 June 1994 2092-2106

Protein Sequence and Structure Relationship ARMA Spectral Analysis:Application to Membrane Proteins

Shaojian Sun* and R. Parthasarathy**Department of Pharmaceutical Chemistry, University of California-San Francisco, San Francisco, California 94118, and tBiophysicsDepartment, Roswell Park Cancer Institute, Buffalo, New York 14263 USA

ABSTRACT If it is assumed that the primary sequence determines the three-dimensional folded structure of a protein, then theregular folding patterns, such as a-helix, 1-sheet, and other ordered patterns in the three-dimensional structure must correspondto the periodic distribution of the physical properties of the amino acids along the primary sequence. An AutoRegressive MovingAverage (ARMA) model method of spectral analysis is applied to analyze protein sequences represented by the hydrophobicityof their amino acids. The results for several membrane proteins of known structures indicate that the periodic distribution ofhydrophobicity of the primary sequence is closely related to the regular folding patterns in a protein's three-dimensional structure.We also applied the method to the transmembrane regions of acetylcholine receptor a subunit and Shaker potassium channelfor which no atomic resolution structure is available. This work is an extension of our analysis of globular proteins by a similarmethod.

INTRODUCTION

Membrane proteins play an important role in modulating theactivities of many cellular functions. Despite considerable ef-forts, very few membrane proteins have yielded crystals thatdiffract x rays to high resolution. A large number of predictivealgorithms for the secondary structural topology of membraneproteins have been proposed (see Fasman, 1989; Fasman andGilbert, 1990 for reviews), including statistical methods (Chou-Fasman, 1978; Gamier et al., 1978; Biou et al., 1988), Kyte-Doolittle hydropathy plots (Kyte and Doolittle, 1982), hydro-phobic moment analysis (Eisenberg et al., 1984), hydrophobicityanalysis (von Heijne 1991; Jdhnig 1989), and direct Fouriertransformation-based hydrophobicity analysis (Finer-Mooreet al., 1989). New hydrophobicity scales developed for mem-brane proteins have also been proposed (Argos et al., 1982;Engelman et al., 1986).

It is physically understandable that the periodic distributionof physical properties of amino acids (such as hydropho-bicity, charge, dipole moment, etc.) along the primary se-quence may be one of the important factors determin-ing the regular folding patterns, such as a-helix, 13-sheet, andother ordered patterns, in the three-dimensional structure ofa protein.A primary sequence of a protein of length N amino acids

can be represented by the physicochemical properties of

Receivedfor publication 17September 1993 and infinalform 29 December1993.Address reprint requests to Shaojian Sun, Department of PharmaceuticalChemistry, Laurel Heights Campus, University of California-San Fran-cisco, 3333 California Street, Room 102, San Francisco, CA 94118. Tel.:415-476-8910; Fax: 415-476-1508; E-mail: [email protected].© 1994 by the Biophysical Society0006-3495/94/06/2092/15 $2.00

individual amino acid along the chain,

P 1, P2 01 2

P2 P2

1 2PM PM

1P', P',i i 1P2 P2

9

i i+lPM PM

pNp1

N

P2

()NPM

i E[1,N]

where i is the index for the sequence number, and p' is themth kind physical property of the amino acid at ith positionalong the primary sequence. The physicochemical propertyPm can be the hydrophobicity index, charge density, volume,etc., of an amino acid. We assume that the ordered foldingpatterns observed in the protein crystal structures, such assecondary and tertiary structures in most of proteins, shouldbe characterized, to a large extent, by the periodic distribu-tion of these quantities along their primary sequences. Thequestion then is how to compute the hidden periodicity ofthese physicochemical properties from a given protein se-quence. Instead of working with a vectorial physicochemicalquantity in the computation of periodicity, the simplestapproach is to represent a primary sequence by a scalarsequence, for example, the hydrophobicity sequence of aprotein. The hydrophobic interaction has been identified asone of the most important interactions in determining the

2092

Page 2: Protein sequence and structure relationship ARMA spectral ... · brane proteins have also been proposed (Argos et al., 1982; Engelmanet al., 1986). It is physically understandable

Protein Sequence and Structure Relationship

three-dimensional structures of proteins (Kauzmann, 1959;Dill, 1990). Although similar method can be developed inparallel for vectorial computation, we will only focus on thecomputational method for the scalar representation of proteinprimary sequence.

Spectral analysis methods compute the frequency domainrepresentation of sequential signals. The peaks in the fre-quency domain spectrum correspond to the periodic oscil-lations of signal components in the original sequence. DirectFourier analysis of the hydrophobicity sequence of a proteinto reveal the possible regular folding patterns has been em-ployed by several authors (McLachlan and Stewart, 1976;Finer-Moore et al., 1989; Sun and Parthasarathy, 1990). Thismethod usually produces low resolution spectra because thehydrophobicity sequence of a protein is commonly a noisysequence (Sun, 1993). Although filtering methods can beused in the direct Fourier analysis to partially remedy thissequence noise problem, it is well know that the computedspectra may be distorted by the addition of filters. Furthermore,the direct Fourier transform analysis uses a windowed samplingmethod, assuming zero data outside the window and, therefore,the unestimated autocorrelation sequence values outside thewindow are implicitly zero, normally an unrealistic assumption.In doing so, both the resolution and reliability of the estimatedspectrum are sacrificed (Sun, 1993).

In a recent study, we proposed (Sun and Parthasarathy,1993) an AutoRegressive Moving Average (ARMA) modelmethod (Marple, 1987; Jenkins and Watts, 1968; Kay, 1987)to analyze the periodic distribution of hydrophobicities alongthe primary sequences of proteins. The ARMA modelmethod of spectral estimation does not make a direct Fouriertransform of the original hydrophobicity sequence data.Rather, it uses the hydrophobicity sequence to construct amathematical model. The model is required to approximatethe distribution that generated the observed hydrophobicitysequence data. The frequency-domain spectrum of a givensequence is computed from the parameters in the fittedARMA model. It is well known that (Marple, 1987; Jenkinsand Watts, 1968; Kay, 1987) ARMA-computed spectra havebetter resolution than classical periodograms or correlogramsbased on direct Fourier transforms. In this method of spectralanalysis, the random component in the hydrophobicity se-quence is filtered out automatically. Unlike spectral estima-tion by either periodogram or correlogram, the ARMAmethod overcomes this problem, because the power spectraldensity function of the sequence is totally described in termsof the model parameters and the variance of the white noiseprocess (see Theory). Application of this spectral analysismethod to a set of representative globular and fibrous proteinswith different folding patterns (Sun and Parthasarathy, 1993;Sun, 1993) demonstrated that this method can produce spectraof much higher resolution than the direct Fourier transformmethod. We have found that there is a pronounced correlationbetween the ordered folding patterns in the three-dimensionalstructure of a protein and the computed ARMA spectrum of itshydrophobicity sequence. In the present study, we demonstratethat similar results hold for membrane proteins as well.

THEORY

In this section, we will define anARMA process and providethe formulae for the power spectral density function ex-pressed in terms of the ARMA model parameters. We willexplain further how the model parameters and model orderscan be chosen properly for a given set of data (in our casethe hydrophobicity sequence). We refer readers to severalstandard textbooks (Marple, 1987; Jenkins and Watts, 1968;Kay, 1987) for further technical details.

Let x[n], n = 1, . .. , N denote the hydrophobicity indexof a protein primary sequence of length N amino acids ac-cording to a chosen hydrophobicity scale. This sequence canbe modeled mathematically by a filter linear difference equa-tion (Marple, 1987):

p q

x[n] = -E a[k]x[n - k] + , b[k]u[n - k]k=l k=O

= I h[k]u[n -k],k=O

(1)

where p, q are called the autoregressive order and movingaverage order, respectively, a[k], k E [1, p] and b[k], k E [1,q] are the autoregressive coefficients and moving averagecoefficients, respectively (the assumption b[O] = 1 can bemade without loss of generality because the input u[n] canalways be scaled to account for any filter gain). u[i] is a whitenoise sequence of zero mean and variance pw. (Obviouslyh[k] = 0 for k < 0). The model parameters p, q, a[k], k E[1, p], b[k], k E [1, q], and pw are determined by the givenhydrophobicity sequence derived from protein primary se-quence. This ARMA model equation, with the determinedmodel parameters, describes an infinite data process that hasall of the characteristics of the finite observed data sequence(the hydrophobic sequence in our case). The ARMA powerspectrum is computed from this model equation rather thandirectly computed from the original hydrophobicity se-quence (as it would be in the direct Fourier transformmethod). This is why the ARMA spectrum can achieve amuch higher spectral resolution than the direct Fourier trans-form method.To compute the power spectral density function for the

ARMA model equation, the system function H(z) betweenthe input and the output,

B(z)(z) A(z)'

has to be used, wherep

A(z) = 1 + I a[k]z k,k=l

q

B(z) = 1 + I b[k]z k,k=l

H(z) = 1 + Y h[k]z-kk=l

(2)

(3)

(4)

(5)

The z-transform (Oppenheim and Willsky, 1983) of the out-put sequence x[n] is related to the z-transform of the input

Sun and Parthasarathy 2093

Page 3: Protein sequence and structure relationship ARMA spectral ... · brane proteins have also been proposed (Argos et al., 1982; Engelmanet al., 1986). It is physically understandable

Volume 66 June 1994

random process u[n] by

PX(z) = P..(z)H(z)H*(l/z*)B(z)B*(l/z*)

- A(z)A(*(l/z*).

using Eq. 1 to yield

(6) r. [i] = e{u[n + i]x*[n]}

We will assume further that the sequence u[i] is a white noiseprocess of zero mean and variance p,, so that PUU(z) = p..The ARMA power spectral density function, PAsmA(f) isobtained by substituting z = exp(i2nfT) into Eq. 6 and scal-ing by the sample interval T (T = 1 in our case):

)= (f) 2 eH(f)bbHeq(f )= Tp- ( HpfPAm(f)= Tp' AUf) ep(f)aa~e()

in which the polynomials are defined as

p

A(f ) = 1 + E a[k]exp(-i2iifkT)k=1

q

B(f) = 1 + I b[k]exp(-i2'rfkT)k=1

and

1

exp(i2ifT)epff) = . ,

exp(i2'nfpT) /

1

exp(i2fT)eq(f)= I *

exp(i2rfqT) /

a[l]a=

(a[p]

/1

(b[l]b==

b[q] /

= u[n + i](u* [n] + Y. h*[k]u*[n k])

( ( ~~~ ~~k=l)

= ru.1i] + Y h*[k]ruu[i + k].k=l

Assuming u[k] is a white noise process, then

(7)r. li] = Pw

pwh*[-i]

for i >0for i = 0.

for i <0

The final relationship between the ARMA parameters andthe autocorrelation sequence of the hydrophobicity sequence

(8) x[n] is

rxw[m]

(9)

TheARMApower spectral density function is evaluated overthe range - 1/2T ' f 5 1/2T, and is symmetric for the realhydrophobicity sequence.

The model parameters in the ARMA(p, q) model Eq. 1 are

related to the autocorrelation sequence of the known hydro-phobic sequence of the given protein. Multiplying Eq. 1 byx*[n - m] and taking the expectation of the resulting equa-

tion, we have

p

E{x[n]x* [n - m]} =- a[k]E{x[n - k]x*[n - m]}k=l

(10)

+I b[k]E{u[n - k]x*[n -m],k=O

or

P q

r. [m] = - a[k]r.[m - k] + I b[k]r. [m - k]. (11)k=1 k=O

The cross correlation r.[i] between the input and the outputcan be expressed in terms of the h[k] parameters of Eq. 5 by

rf*I-m] form < 0

P q- a[k]rj[m - k] +pIb[klh*[k-m]q

k=l k=m

for 0 < m ' q

~P

-E a[k]r.. [m -k] for m > qk=l

(14)

where h[O] = 1 has been used.It is easily seen that the autoregressive parameters of the

ARMA model are related by a set of linear equations to theautocorrelation sequence of the known hydrophobicitysequence. Eq. 14 may be evaluated for the p lag indicesq + 1 ' m ' q + p, for example, and grouped into the matrixexpression

/ r.x[q] r,,J[q -1] ... rxq -p + 1]\rjq + 1] r.[q] * r,,jq-p + 2]

Krq+p-1] rj[q + p-2] ... r.[q]

(15)(a[l] r,,jq + 1]a[2] r,[q + 2]

a[p] rjq+p])

These equations are called ARMA Yule-Walker normalequations. All of the diagonal elements are equal(Toeplitz form). This indicates that the autoregressiveparameters may be found separately for the moving av-

erage parameters as the solution of the simultaneous Eq.15. Unfortunately, the moving average parameters of an

(12)

2094 Biophysical Joumal

Page 4: Protein sequence and structure relationship ARMA spectral ... · brane proteins have also been proposed (Argos et al., 1982; Engelmanet al., 1986). It is physically understandable

Sun and Parthasarathy Protein Sequence and Structure Relationship 29

ARMA model cannot be found simultaneously with theimpulse response coefficients h[k], as indicated by Eq. 14,resulting in a nonlinear relationship with the autocorre-lation sequence.

Simultaneous estimate of the AR and MA parameters ofan ARMA model is difficult and computationally intensive.The AR and the MA parameters are typically estimated sepa-rately. Usually, the AR parameters are estimated first, andthen used to construct an inverse filter to apply to the originaldata. The residuals out of this filter should be representativeof a moving average process, to which an MA parameterestimator can be applied. There are three main steps involvedin this ARMA parameter estimation: 1) Estimate the autore-gressive (AR) parameters by the least-squares modifiedYule-Walker method; 2) filter the original data sequencewith the estimated AR parameters; 3) estimate the movingaverage (MV) parameters from the filtered residuals. Theselection of the autoregressive orders and moving averageorders, p and q, is determined by the Akaike informationcriterion (Akaike 1974).

AR parameter a~kJ estimation

From Eq. 14, the autocorrelation sequence for an ARMA(p,q) model satisfies the relationship

p

r. [n] =-I apEk]r.. [n - k] (16)

for n > q. If the autocorrelation sequence are known ex-actly, then the p relations for q + 1 --: n -< q + p can beset up as a set of simultaneous equations and solved for theAR parameters, as indicated by the Yule-Walker normalequations in (15). However, in practice, only finite datasamples are available, so the autocorrelation estimatesmust be substituted for the unknown autocorrelation se-quence. A least-squares modified Yule-Walker method hasbeen developed (Mehra, 1971; Porat and Fridlander,1985). The method uses more than p equations for lagsgreater than q and minimizes a squared error to fit the pparameters. Assuming autocorrelation estimates F,4n] fromlag 0 to lagM are calculated, where M is the largest lag in-dex that can be accurately estimated, the M - q equations(such that M - q > p)

p

=x-n] ap[k]Fxx[n - k] ± E[n] (17)k= 1

for q + 1 -< n -< M are formed. E[n] is the estimation er-ror. The sum of squared error,

Al M

n-q+i n-q+l

the normal equations to be solved are

a[k] / 0)where Tpis a rectangular Toeplitz matrixtion estimates

fxxjq + 1] . .. fxx [q -p + 1

Tp FXXJM -P] . xxq+

fXXIM ... XXIM- P]

of a

(19)

iutocorrela-

(20)

Filter the original data sequenceOnce the AR parameters are estimated by solving Eq. 19, thenext step is to use them to construct an inverse filter. Thenthe reverse filter is applied to obtain the parameters for theMA process. The filter system function is

p

A(z)=1±+ 4k]Z-kk-i

(21)

in which the d[k] are the estimated AR parameters. TheARMA system function is B(z)/A(z), so that

B~A(zY), B(z). (22)

Therefore, passing the measured data sample through thefilter with the system functionA(z) will yield an approximatemoving average process at the filter output. The filtered se-quence of length N - p is given by the convolution

p

z[n] = x[n] + E ci[m]x[n -m]rni

forp + 1 --- n -S N.

MA parameter b[k] estimation

(23)

The most obvious approach to estimate the MA parameterswould be to solve the nonlinear Eq. 14 for P = 0 using theautocorrelation estimates made from the original sequence.Unfortunately, the solutions are very involved (Box andJenkins, 1970). However, there is an alternative methodbased on a high-order AR approximation to the MA processthat uses only linear operations (Kay, 1987; Marple, 1987).An MA(q) process has a system function

q

B(z) = 1 + I b[k]z kkil

(24)

is minimized with respect to the p AR parameters a[k], and AnRo)prcsthtieqvantoaMAqpoeshs

Sun and Parthasarathy 2095

An AR(oo) process that is equivalent to a MA(q) process has

Page 5: Protein sequence and structure relationship ARMA spectral ... · brane proteins have also been proposed (Argos et al., 1982; Engelmanet al., 1986). It is physically understandable

Volume 66 June 1994

a system function of the form 1IA.(z), where

00

Ax(z) = 1 + E a[k]z k (25)k=-

Therefore, we have

B(z)Ao(Z) = 1. (26)

An inverse z-transform of the Eq. 26 yields a convolutionrelation of the MA parameters with the AR parameters

q 1, ifm =O;a[m] + b[n]a[m - n] = 8[m] = o, ifm#°* (27n=1

where, by definition, a[0] = 1 and a[k] = 0, for k < 0. Thus,the moving average parameters can be determined from theparameters of an equivalent infinite-order AR model by solv-ing any subset of q equations formed by (27). In practice, onecan calculate a high-orderAR(M) (M >> q) estimate from thefiltered residual sequence given by (23). From the high-orderAR(M) parameters estimates 1, am[l], * m[M], one canconstruct

q

Em= aM[m] + E b[n][m -n]. (28)n=1

So the estimates ofMA parameters are obtained by the mini-mization of the squared error variance

Pq = E EMA12/M (29)0m-M+q

Eq. 29 is solved by a least-squares technique described in thedetermination of the AR parameters.

Determination of the model orders (p, q)

Selecting the order for an ARMA model is not a simpleprocedure. One of the most frequently used techniques isAkaike information criterion (AIC) (Akaike, 1974). The AICdetermines the model order by minimizing an informationtheoretic function

AIC(p, q) = N ln(ppq) + 2(p + q), (30)

in which Ppq is an estimate of the input white noise varianceof the assumed ARMA(p, q) model, andN is the number ofsampled data. One could also check the whiteness of theresidual output of the inverse filter with system functionB(z)/A(z) using the estimated AR and MA parameters. Theestimated autocorrelation of the residual sequence should beapproximately zero except at lag zero.

SPECTRAL ANALYSIS FOR MEMBRANEPROTEINS

The ARMA spectral analysis method presented in Theory

primary sequence as the only input. The periodicities of thehydrophobicity distribution along the primary sequence arethen calculated from the ARMA model equation with themodel parameters computed from the original hydrophobic-ity sequence. The ARMA spectral peaks correspond to thedominant periodic hydrophobicity distribution along the pri-mary sequence. These dominant periodic hydrophobicitydistributions correspond to possible regular folding patterns,such as a-helix, f3 strand, and other ordered folds.

Application of the ARMA method of spectral analysis toglobular proteins with different folding patterns (Sun and

) Parthasarathy, 1993) reveals the following typical featuresthat seem to appear in the membrane proteins studied here.

1) Global folding patterns: low frequency, 0.10-0.05 cycle/residue (10-20 residue/cycle), peaks in the spectral den-sity function correspond to the large scale orders in thefolded structures.

2) The a helix peak: the spectral peaks around 0.30-0.25cycle/residue (3.3 to 4.0 residues/cycle) and theirhigher harmonics correspond to a possible a-helixconformation.

3) The 3 strand peak: the spectral peaks are typically greaterthan 0.35 cycle/residue (around 2.0 to 2.7 residues/cycle).

4) A randomly shuffled sequence of known protein se-quences has no significant spectral peaks.

In this section, we will examine the computed ARMA powerspectral density function (PSDF) for hydrophobicity se-quences of several membrane proteins, and compare theirspectral characteristics with their known structures. We willalso discuss the structural implications of the computedARMA PSDF for the transmembrane regions of the acetyl-choline receptor a subunit and the Shaker potassium channelprotein. In this study, the hydrophobicity scale of Rose et al.(1985) has been used. The Rose scale is derived from the sol-vent accessibility of amino acids in 23 x-ray crystallography-determined protein structures.

Melittin

The crystal structure of melittin (Terwilliger and Eisenberg,1982), a 26-residue membrane bound protein, shows a benta-helical structure. The bending appears at the middle of thehelix due to a proline residue. Each of the helical sectioncontains about 12 residues. Fig. 1 A is a stereo backbone plotof the x-ray crystallography-determined structure of melittin.Fig. 1 B plots the computed ARMA power spectral densityfunction (PSDF) for the melittin primary sequence. Themelittin PSDF has an a-helix peak at 0.277 cycle/residue(3.61 residue/cycle), indicating that melittin adopts ana-helical conformation. There are also two small peaks in themelittin PSDF plot. One peak at around 0.075 cycle/residue(13 residue/cycle), with a peak amplitude about 6.7% of themajor one, corresponds to the segment length of the twora-helices. This is indeed true in the crystal structure as one

2096 Biophysical Journal

requires the hydrophobicity sequence derived from a protein

Page 6: Protein sequence and structure relationship ARMA spectral ... · brane proteins have also been proposed (Argos et al., 1982; Engelmanet al., 1986). It is physically understandable

Protein Sequence and Structure Relationship

Melittin

dency in this C-terminal segment. Fig. 2 D shows the PSDFfor the sequence segment 227-301. There are four peaks. Thetwo major peaks at 0.465 and 0.353 cycle/residue (2.15 and2.83 residue/cycle) indicate that this segment is a 3-sheetstructure. The peak at 0.059 cycle/residue (17 residue/cycle)suggests an ordered folding pattern at about 17 residues inlength. This pattern can been seen clearly in the three-dimensional crystallographic structure of porin. Fig. 2E plotsthe ARMA PSDF for the sequence segment correspondingthe loop region (74-116) in the three-dimensional structureof porin. There is a clear peak at 0.394 cycle/residue (2.54residue/cycle), indicating a /-sheet-like structure. The crys-tal structure of this segment shows it is indeed a /3-sheet likeloop, running against the inner wall of the /-sheet barrel ofporin.

Colicin

Colicins are antibiotic proteins produced by Eschericia coli,active against sensitive E. coli and closely related bacteria.The largest group of colicins comprises those that can formvoltage-dependent channels in membranes, thereby destroy-ing the the cell's energy potential. The pore-forming activityseems to be located at the carboxy terminus, which comprisesabout 200 residues (389-592 in the primary sequence of co-licin A). This fragment is soluble in aqueous media, but nev-ertheless spontaneously inserts itself into lipid membranes.The crystal structure of this fragment shows (Parker et al.,1989) that colicin forms 10 a-helices (Fig. 3 A).

l The ARMA PSDF (Fig. 3 B) for this sequence fragment0.0 0.1 0.2 0.3 0.4 0.5 (389-592) shows that it is exclusively a-helical. There is

only one strong peak in the PSDF at 0.263 cycle/residuesFrequency (Cyc/rsidues) (3.80 residue/cycle), which is a typical a-helix peak. We also

computed the ARMA PSDF for the sequence fragment 389-(A ) Melittin crystal structure (backbone); (B) ARNIA PSDF. optdteAM SFfrtesqecrget39

433, which corresponds to helices one and two in the crystalstructure. The PSDF (Fig. 3 C) for this smaller sequencefragment shows a very strong a-helix peak at 0.273 cycle/

early ina Fig. 1roundA.4Wehavle/norexplanation fresidue (3.66 residue/cycle), indicating that this smaller frag-nall peak at around 0.4 cycle/residue. ment is in an a-helical conformation. A very minor peak at

0.1 cycle/residues (10 residue/cycle), which can hardly beseen in the linear scale (about 1% of the major peak), indi-cates a very week hydrophobicity oscillation along this

found in the outer membranes of gramnegative smaller fragment.bacteria, mitochondria, and chloroplasts. They form chan-nels for small hydrophilic molecules that are in most casesweakly ion-selective. Porins are trimeric and consist pre-dominantly of /3-pleated sheet structures determined by x-raycrystallographic diffraction (Weiss et al., 1991) (Fig. 2 A).

Fig. 2 B shows the computed ARMA power spectral den-sity function (PSDF) for the entire porin primary sequence.There is a very strong and sharp peak at the frequency 0.375cycle/residue (2.67 residue/cycle), which indicates the pre-dominant content of /3-sheet. This is in agreement with thecrystal structure of porin. The ARMA spectral analysismethod can also be applied to a part of a protein primarysequence. Fig. 2 C shows the PSDF for the porin sequencefragment 1-41. It has a predominant peak at 0.5 cycle/residue(2.0 residue/cycle) indicating a strong 3-sheet structure ten-

Bacteriorhodopsin

The structure of bacteriorhodopsin was first elucidated byhigh resolution electron microscopy of two-dimensionalcrystals (Herderson and Unwin, 1975; Henderson et al.,1990). The 3-A resolution-refined structure shows that bac-teriorhodopsin contains seven transmembrane a-helices(Fig. 4 A). The ARMA PSDF for the bacteriorhodopsin se-

quence (Fig. 41B) shows two spectral peaks at 0.10 and 0.256cycle/residue (10.0 and 3.9 residue/cycle), respectively. Themajor peak at 3.9 residue/cycle indicates that bacteriorho-dopsin forms mostly a-helical structure. The minor peak ataround 10 residue/cycle corresponds to a typical hydropho-bicity periodicity in these a-helical segments. If only the

A

B

4

3

C

0-

FIGURE 1

can see clsecond sn

Porin

Porins are

Sun and Parthasarathy 2097

2 -

1

Page 7: Protein sequence and structure relationship ARMA spectral ... · brane proteins have also been proposed (Argos et al., 1982; Engelmanet al., 1986). It is physically understandable

Biophysical Journal

A

PorinB

Porin, 1-46C

0.0 0.1 0.2 03 0.4 0.5

Frquency (Cyd&mkkas)

Porin, 227-301D

10.0 -

5.0 -

1.0 -

0.5 -

0.1

15 -

10 -

5-

0-

0.0 0.1 02 0.3 0.4 0.5

Frequency (Cyceuurek )

I

0.0 0.1 02 0.3 0.4 0.5

Frequetcy (Cys/edes)

Porin, 75-116E

0.0 0.1 0.2 0.3 OA 0.5

Frequecy (Cy oofedue)

FIGURE 2 (A) Porin crystal structure (backbone); (B)ARMAPSDF for the entire Porin sequence; (C)ARMA PSDF for Porin sequence 1-46; (D)ARMAPSDF for Porin sequence 227-301; (E) ARMA PSDF for Porin sequence 75-116.

sequence segments of the transmembrane portion are used inthe PSDF computation, we obtain similar spectral charac-teristics (Fig. 4 C), in which the two peaks are at 0.11 and0.277 cycle/residue (9.0 and 3.6 residue/cycle). The ampli-tude of the minor peak is only about 2.5% of the major peak

amplitude. The a-helical peak of the transmembrane se-

quences appears much stronger (plotted in logarithmic scale)than the a-helical peak for the entire sequence. The spectraldensity function clearly suggests that the transmembraneportion of bacteriorhodopsin is an a-helical structure.

1000.0

500.0 -

100.0 -

50.0

10.0

5.0

1.0

0.5

3.0

2.5 -

1.5 -

1.0 -

0.5 -

Volume 66 June 19942098

I

Page 8: Protein sequence and structure relationship ARMA spectral ... · brane proteins have also been proposed (Argos et al., 1982; Engelmanet al., 1986). It is physically understandable

Protein Sequence and Structure Relationship

A

Colicin, 389-59r& Colicin, 389-433

5DO0. -

100.0 -

50.0 -

0 10.0. -

5.0 -

1.0 -

0.5 -

0.1 -

0.0 0.1 0.2 0.3 0.4 0.5

C

0.0 0.1 02 0.3 0.4 0.5

Fiequey. (Cydl oeidze)

FIGURE 3 (A) Colicin crystal structure (backbone, transmembrane portion);PSDF for Colicin sequence 389-433.

Photosynthetic reaction center L, M, and HdomainsThe photosynthetic reaction center is a complex of integralmembrane proteins and cofactors (Deisenhofer et al., 1985)that use light energy to transport electrons across the mem-brane. The reaction center from the purple bacterium Rho-dopseudomonas viridis consists of four protein subunits: L,M, H, and c-type cytochromes with four covalently linkedhaem groups (Prince et al., 1976). The structure of photo-synthetic reaction center was determined by Deisenhoferet al. (1984, 1985). Consisting of mainly a-helical structures,subunits L and M form the major transmembrane portion ofthis protein complex. The amino-terminal helical segment ofH subunit forms the only transmembrane portion of H sub-unit. Using the primary sequences of L, M, and H subunits,we calculated their corresponding ARMA spectral densityfunctions. The results from the spectral analysis are in goodagreement with the crystal structure.

Fig. 5 A is the stereo backbone plot for the L subunit. Ithas five transmembrane helices as well as several other non-transmembrane helical segments. Fig. SB shows the ARMAPSDF function for the entire L subunit sequence. The spec-trum has a sharp peak at 0.256 cycle/residue (3.9 residue/cycle), indicating that the L subunit contains predominantly

Frequencyt(ycbhokdum)

(B) ARMA PSDF for the transmembrane portion of Colicin; (C) ARMA

a-helical elements. We also plotted the spectrum (Fig. 5 C)for the transmembrane sequence segment L80-111. TheARMA spectrum for this segment clearly shows that it is ana-helical structure. The spectrum peaks at 0.278 cycle/residue (3.6 residue/cycle). This transmembrane segmentcorresponds to the second helix from the left in Fig. 5 A.

Fig. 6A is the stereo backbone plot for theM subunit. Likethe L subunit, it also has five transmembrane helices andseveral other nontransmembrane helical segments. Fig. 6 Bshows the spectral density function for the entire M subunitsequence. The spectrum has a strong peak at 0.263 cycle/residue (3.8 residue/cycle), suggesting predominant a-helixcontent in the M subunit. Fig. 6 C shows theARMA spectrumfor the sequence segment M113-166. It has a sharp peak at0.244 cycle/residue (4.1 residue/cycle), suggesting an

a-helical structure for this transmembrane segment. In Fig.6 A, the segment Ml 13-166 corresponds to the second andthe third transmembrane helices from the right.The subunitH contains mainly sheet and extended struc-

tures. It has only one transmembrane segment, which isa-helical in the crystal structure (Fig. 7 A). Fig. 7 B showsthe the ARMA PSDF function for the entire H subunit se-

quence. There are three peaks in the spectrum at 0.0763cycle/residue (13.1 residue/cycle), 0.20 cycle/residue (5.0

30

25

15

10

5

0i

2099Sun and Parthasarathy

Page 9: Protein sequence and structure relationship ARMA spectral ... · brane proteins have also been proposed (Argos et al., 1982; Engelmanet al., 1986). It is physically understandable

Biophysical Journal

A

BacteriorhodopsinB

0.0 0.1 0.2 0.3 0.4 0.5

Frequency (Cycle/reedues)

Bacteriorhodopsin (H)

100.0

50.0

0co0L

10.0

5.0

1.0

0.5

0.0 0.1 02 0.3 0.4 0.5

Frequency (Cycb/reskes)

FIGURE 4 (A) Bacteriorhodopsin crystal structure (backbone, transmembrane portion); (B) ARMA PSDF for the entire Bacteriorhodopsin sequence; (C)ARMA PSDF for Bacteriorhodopsin transmembrane portion sequence.

residue/cycle) and 0.385 cycle/residue (2.6 residue/cycle),respectively. The major peak at 0.385 cycle/residue (2.6 resi-due/cycle) and the peak of its higher harmonics at 0.20 cycle/residue (5.0 residue/cycle) seem to suggest that the H subunitcontains predominantly f3 sheet and extended structures. Thepeak at 13.1 residue/cycle suggests an average folding lengthscale. The ARMA spectrum for the transmembrane segmentH11-38 (Fig. 7 C) shows a strong a-helix peak at 0.294residue/cycle (3.4 residue/cycle). This is in full agreementwith the crystal structure of the H subunit (the long helix atthe bottom of Fig. 7A). Fig. 7D shows the ARMA spectrumfor the sequence segment H124-211, a peripheral part of theH subunit on the cytoplasmic side of the membrane. Thereis a strong and major peak at 0.50 residue/cycle (2.0 residue/cycle) in the spectrum, suggesting a 8 sheet structure. Twosmall peaks (less than 5% of the major one in amplitude) are

also shown in the spectrum. The peak at 0.385 cycle/residue(2.6 residue/cycle) suggests (3-sheet conformation. This se-

quence analysis result agrees with the crystal structure verywell. The segment H124-211 corresponds to the ,8-sheet re-

gion in the top of Fig. 7 A.OurARMA spectral analyses of the primary sequences of

the photosynthetic reaction center complex are consistentwith the crystal structures of these proteins.

Nicotinic acetylcholine receptor a subunit

The nicotinic acetylcholine receptor is a cation-selective,ligand-gated ion channel involved in signal transduction atthe chemical synapse. It is composed of five homologousmembrane-spanning subunits (a, at, (3, y, and 8). The se-

quence homology among different species is very high foreach subunit. There is no high-resolution structure availablefor this receptor at the present time. It was proposed, basedon the hydrophobicity plot and on the direct Fourier trans-form analysis (Stroud and Finer-Moore, 1985; Stroud et al.,1990; Galzi et al., 1991), that the transmembrane portion ofeach subunit has four transmembrane helices. Using the ARMAmodel, we have computed the spectra for these transmembranesequence segments in the a-subunit (human). Here we assume

that previous studies have correctly located the transmembraneregion in the primary sequence of the a-subunit.

Fig. 8 A shows the ARMA PSDF for the entire humanacetylcholine receptor a-subunit. It indicates the presence ofboth a-helix and 3-sheet structures. Both of the characteristicspectral peaks are found in the PSDF. Fig. 8 B-E showthe ARMA PSDF for M1(231-257), M2(263-281), M3-(297-318), and M4(429-447) transmembrane segments,respectively. There are two peaks in the PSDF for the Ml

20 -

15 -

0U,

10 -

5

0 -1

Volume 66 June 19942100

Page 10: Protein sequence and structure relationship ARMA spectral ... · brane proteins have also been proposed (Argos et al., 1982; Engelmanet al., 1986). It is physically understandable

Protein Sequence and Structure Relationship

A

Photosynthetic reaction center L

0.0 0.1 0.2 0.3 0.4 0.5

Photosynthetic Reaction Center L80-1 11C

40 -

30-

20 -

10

00

0.0 0.1 029 03 0.4 0.5

Frequency (Cycie/ieeiduee) Frquency (Cycisededues)

FIGURE 5 (A) Photosynthetic Reaction Center L subunit, crystal structure (backbone); (B) ARMA PSDF for the entire L sequence; (C) ARMA PSDFfor transmembrane segment L80-111.

(Fig. 8 B). The major peak at 0.50 cycle/residue (2.0 residue/cycle) indicates that this segment may be a (3-sheet structure.The minor peak at 0.217 cycle/residue (4.6 residue/cycle)suggests a higher harmonics of a typical 13-sheet peak. ThePSDF (Fig. 8 C) for the M2 region has a major peak at 0.50cycle/residue (2.0 residue/cycle), suggesting a 3-sheet struc-ture. The minor peak (log scale in vertical) at 0.0476 cycle/residue (21.0 residue/cycle) suggests a structural motiflength around 21 residues in the M2 region. The PSDF forthe M3 region (Fig. 8D) also indicates that this segment maybe a 13-sheet structure. There is a strong major peak at 0.50cycle/residue (2.0 residue/cycle) in the spectrum. The minorpeak at 0.185 cycle/residue (5.4 residue/cycle) hydropho-bicity periodicity may corresponding to an internal foldinglength within the possible 3-sheet structure. It is interestingto note that Ml and M2 have similar spectral shapes. ThePSDF for the M4 region (Fig. 8 E) is very different from thatof Ml, M2, and M3. It has a strong major peak at 0.313cycle/residue (3.20 residue/cycle), a significant a-helicalpeak, suggesting an a-helical structure structure for this seg-ment. The above results, based on the ARMA spectral analy-sis of the primary sequence, suggest that the transmembraneportion of the a-subunit has three 13-sheet regions and one

a-helix region. These conclusions are different from the pre-

dictions based on the hydrophobicity plot and the windoweddirect Fourier transform analysis (Stroud and Finer-Moore,1985; Stroud et al., 1990), in which all the transmembranesegments have been assigned a-helical structures. However,our analysis seems to be in agreement with two recent ex-

perimental studies. Results using cysteine substitution (Aka-bas et al., 1992) in the M2 region indicate that the M2 regionprobably forms a strand. Unwin has successfully carriedout an electron microscopy analysis (Unwin, 1993) of ace-

tylcholine receptor. He concluded, based on a 9 A resolutionEM structure, that there is only one transmembrane helix inthe a-subunit, and that other three-transmembrane segmentsare in the 1-sheet conformation. Unwin (1993) has assignedM2 to be the transmembrane helix. Our study suggests thatM4 is the transmembrane helical segment.

Potassium channel

Voltage-gated potassium channels are integral membraneproteins that are fundamentally involved in the generation ofbioelectric signals such as nerve impulses (Miller, 1991).There is presently no experimentally determined structural

B

1000

100

10

Sun and Parthasarathy 2101

Page 11: Protein sequence and structure relationship ARMA spectral ... · brane proteins have also been proposed (Argos et al., 1982; Engelmanet al., 1986). It is physically understandable

Biophysical Journal

A

Photosynthetic reaction center MB

-_1- 1 1

0.0 0.1 02 03 0.4 0.5

Photosynthetic Reaction Center Ml 13-166

25 -

15

10

5

0-

c

0.0 0.1 02 03 OA 0.5

Frequency (Cycleduskdkes) Frequency (Cydeiduee)

FIGURE 6 (A) Photosynthetic Reaction Center M subunit, crystal structure (backbone); (B) ARMA PSDF for the entire M sequence; (C) ARMA PSDFfor transmembrane segment M113-166.

information about these proteins. The hydrophobicity plotfor the Shaker potassium channel (Tempel et al., 1987) ini-tially suggested that there were six transmembrane helices(S1-S6) in this protein (Jan and Jan, 1989). Recent experi-ments suggest that there is one more transmembrane region

between the transmembrane helices S5 and S6 (Miller, 1991)(Fig. 9 A).

In this section, we will compute the spectral density func-tion for all of the transmembrane sequence segments (thelocations of the transmembrane sequence segments in Fig. 9A are assumed to be approximately correct) and compare our

results with the previously proposed model (Miller, 1991).Fig. 9 B is the ARMA PSDF function plot for the entire

sequence of the Shaker potassium channel. It has a sharp andstrong peak at 3.23 residue/cycle, a strong indication that thisprotein contains predominantly a-helical structures. This isin agreement with the atomic model recently proposed(Durell and Guy, 1992). Fig. 9 C plots the PSDF function forthe S1 sequence segment. The spectrum shows a very sharppeak at 0.294 cycle/residue (3.4 residue/cycle), suggesting ana-helical structure. The spectrum for the S2 (Fig. 9D) shows

both the j3-sheet peak at 0.50 cycle/residue (2.0 residue/cycle), the major spectral peak, and the a-helix peak at 0.30cycle/residue (3.33 residue/cycle) (about 10% of the majorpeak in amplitude). This suggests that the current S2 locationin the sequence (279-300) is at a position of spatial structuraltransition. Indeed, if we plot the ARMA spectral for the se-

quence segment 290-305 (Fig. 9 E), we find that this se-

quence segment has the characteristic spectrum of a f3 sheetor an extended conformation because it has a sharp peak at0.476 cycle/residue (2.1 residue/cycle). On the other hand,the spectrum for the sequence segment 275-290 (Fig. 9 F)indicates that this segment may be an a-helical structure,because there is a sharp peak at 0.27 cycle/residue (3.7 resi-due/cycle). This analysis suggests that either the current S2location in the primary sequence has to be shifted towardN-terminal side for about 10 residues for it to be a helicalregion or it has to be shifted toward C-terminal in the se-

quence to have a (3-strand conformation to avoid structuretransition in the membrane. The spectral plots for S3 (peakat 0.294 cycle/residue (3.4 residue/cycle)), S4 (peak at 0.33cycle/residue (3.0 residue/cycle)), S5 (peak at 0.244 cycle/

100.0

50.0

i 10.0

5.0

1.0

0.5

Volume 66 June 19942102

Page 12: Protein sequence and structure relationship ARMA spectral ... · brane proteins have also been proposed (Argos et al., 1982; Engelmanet al., 1986). It is physically understandable

Protein Sequence and Structure Relationship

A

Photosynthetic reaction center HB

0.0 0.1 02 0.5 0.4 0.5

F-w ))

1000

100

10

Photosynthetic reaction center H 11-38C

0.0 0.1 02 03 0.4 02

-aPN)

Photosynthetic reaction center H 124-211

00.0 -

10.0

5o.

1.0

0.5

D

0.0 0.1 02 03 OA 0.5

F: )(Cyodd-)

FIGURE 7 (A) Photosynthetic Reaction Center H subunit, crystal structure (backbone); (B) ARMA PSDF for the entire H sequence; (C) ARMA PSDFfor transmembrane segment H11-38; (D) ARMA PSDF for H124-211.

residue (4.1 residue/cycle)), and S6 (peak at 0.313 cycle/residue (3.2 residue/cycle)) show that these four transmem-brane segments have the characteristic spectra of a-helicalstructures. The PSDF function for the transmembrane se-

quence segment 430-450 has a sharp peak at 0.385 cycle/residue (2.6 residue/cycle), suggesting strongly a (3-sheetconformation.The ARMA sequence spectral analysis suggests that the

Shaker potassium channel contains predominantly a-helicalstructures. The transmembrane segments S1, S3, S4, S5, S6may be transmembrane helices. The current S2 locationseems to be a structural transition region (from helix tosheet). Spectral analysis suggests that a 10-residue shift ofthe S2 location will make it a helical or a sheet structure inthe membrane, depending on the direction of the shift in theprimary sequence. The suggested pore-forming region, se-

quence segment 430-450, has the spectral characteristics ofa 3-sheet structure. A (-sheet pore conformation has alsobeen proposed recently by a model at the atomic level(Bogusz et al., 1992).

CONCLUSION

The ARMA spectral analysis of the primary sequences ofseveral membrane proteins shows a clear correlation betweenthe periodicity of the hydrophobicity distribution along theamino acid primary sequences ofmembrane proteins and theordered folding patterns in their three-dimensional struc-tures. The assessment of the secondary structure elements in

the studied membrane proteins (Melittin, Porin, Colicin,Bacteriorhodopsin, Photosynthetic reaction center) agrees

with their crystal structures. We also analyzed the sequence

spectra for the transmembrane regions of acetylcholine re-

ceptor a-subunit and the Shaker potassium channel. Our re-

sults on the acetylcholine receptor a-subunit are differentfrom the predictions derived from the direct Fourier analysis(Stroud and Finer-Moore, 1985; Stroud et al., 1990). How-ever, they are consistent with recent mutation experiments(Akabas et al., 1992) and low resolution EM studies (Unwin,1993). The ARMA spectral analysis for the transmembraneregions of the Shaker potassium channel protein confirms themodel proposed recently by other methods (Miller, 1991).The power spectral density functions computed byARMA

method, in general, have much better resolution than thosecomputed by direct Fourier transforms of the original se-

quences (Marple, 1987; Jenkins and Watts, 1968; Kay,1987). Better resolution has also been demonstrated forhydrophobicity-represented protein sequences analyses (Sunand Parthasarathy, 1993; Sun, 1993). This is because theARMA model equation, along with the model parameters,describes an infinite length of data sequence that has thecharacteristics of the original finite data sequence. In doingthis ARMA model fitting for the original sequence data, thenoise level in the estimated spectrum is much reduced andthe resolution of the computed spectrum is enhanced. Thismethod can be used not only to analyze the hydrophobicitysequence but also to reveal periodic distributions of otherphysical properties (or combinations of physical properties)

1.6

1.4-

12

1.0

0.8

0.6

LI~~~~~ ~~~~~~~~~~~~~~~~~~ I

i

Sun and Parthasarathy 2103

Page 13: Protein sequence and structure relationship ARMA spectral ... · brane proteins have also been proposed (Argos et al., 1982; Engelmanet al., 1986). It is physically understandable

Ach alpha (Human)A

0.0 0.1 02 0.3 0.4 0.5

Frequency (Cyce/residuee)

Ach alpha (Human) M2C

Biophysical Journal

12 -

10 -

8 -

6 -

4 7

2 -

0 -

25 -

20 -

15 -

0

10 -

5 -

0 -

Volume 66 June 1994

Ach alpha (Human) MlB

0.0 0.1 02 0.3 0.4 0.5

Frequency (CysreSike)

Ach alpha (Human) M3D

0.0 0.1 02 0.3

Frequeny (Cycle/reiduee)

0.0 0.1 02 0.3

Frequency (Cycle/e8ikdues)

Ach alpha (Human) M4

8 -

6 -

4 -

2 -

E

0.0 0.1 02 0.3 0.4 0.5

Frequency (Cycle/residues)

FIGURE 8 ARMA PSDF for entire AcH a subunit, Ml, M2, M3, and M4 regions.

2104

10 -

5-~

a

0 -

5.000 -

1.000 -

0.500 -

a

0.100 -

0.050 -

0.010 -

0.005 -

0.4 0.5

II I I~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

-

0.4 0.5

Page 14: Protein sequence and structure relationship ARMA spectral ... · brane proteins have also been proposed (Argos et al., 1982; Engelmanet al., 1986). It is physically understandable

Sun and Parthasarathy Protein Sequence and Structure Relationship 2105

As, S, SI S. S S.

in h

Shaker Potaaaium Channel Shaker Potassium Channel SI

Shaker Poteassium Channel S2 Shaker Potaaaium Channel 290-305D E

F

.~~~~~~~~~~~~~~~~~~~~~~~~~.

00OA .1 GI O0J O 0 2 0 OA 0.5

Shaker Potaaaium Channel 275-290 Shaker Potaaalum Channel S3

F_______tC ___ G t

Shaker Potassium Channel S4 Shaker Potassium Channel S5

DM E

Shaker Potaaaium Channel 430-450 Shaker Potaaaium Channel S6J_ _ _ _ _ K_ _ _ _ _

°° 01 °2 ° OJ OJ ~~ ~ ~~~~~~ ~ ~ ~ ~~~~010O.02 O J

FIGURE 9 (A) Current model of potassium channel topology (afterMiller 1991); (B-K) ARMA PSDF.

along the protein primary sequence, as long as a scale ofphysical properties is given. A scale that correctly combinesseveral major physicochemical properties of amino acids willlikely yield spectra that have even better correlation with thethree-dimensional structures of proteins.A single constant value hydrophobicity representation of

an amino acid may not be good enough to characterize itsinteraction with the inhomogeneous protein environmentaround it. The variety of available hydrophobicity scales(Cornette et al., 1987) may be a reflection of this assessment.Different scales reflect the different behaviors of amino acidresidues in inhomogeneous environments due to variations intheir structural and physicochemical properties. Therefore,different hydrophobicity scales may result in quantitativelydifferent (although qualitatively similar) computed spectrafor a given primary sequence. Scales that better characterizethe folded protein environments will generate ARMA PSDFsthat correspond better to the actual three-dimensional struc-tures (Sun, 1993). In general, if two scales have a high cor-relation coefficient, they will generate quantitatively similarspectra for the same protein sequence.The current ARMA power spectral analysis method for

computing the periodic distribution of physicochemicalproperties of amino acids along the primary sequence can beused to predict the secondary structure propensity of a givensequence. Different from the common methods of secondarystructure prediction (Chou-Fasman, 1978; Gamier et al.,1978) that use local statistical sequence-structure informa-tion based on the secondary structures elements in knownproteins, the ARMA power spectral analysis method doesnot require any statistical information that may be biasedby the limited sampling of known protein structures. Oneof the obvious disadvantages of the spectral analysismethod is that it is not sensitive enough to detect thestarting and ending points of the secondary structure el-ements in a protein sequence, because it operates in thefrequency domain. It seems to us that this method may beused to supplement the common methods of secondarystructure prediction.

We thank Dr. Sarina Bromberg for her critical reading and editorial assis-tance with the manuscript. S. Sun is most grateful to Professor Ken Dill fordiscussions and suggestions.S. Sun is partially supported by National Institutes of Health and Office ofNaval Research grants to Professor Ken Dill.

REFERENCESAkabas, M. H., D. A. Stauffer, M. Xu, and A. Karlin. 1992. Acetylcholine

receptor channel structure probed in cysteine-substitution mutants. Sci-ence. 258:307-310.

Akaike, H. 1974. A new look at the statistical model identification. IEEETrans. Autom. Control. AC-19, 716-723.

Argos, P., J. K. Mohana-Rao, and P. A. Hargrave. 1982. Structural pre-diction of membrane bound proteins. Eur. J. Biochem. 128:565-75.

Biou, V., J. F. Gibrat, J. M. Levin, B. Robson, and J. Gamier. 1988. Sec-ondary structure prediction: combination of three different methods.Protein Eng. 2:185-91.

Bogusz, S., A. Boxer, and D. D. Busath. 1992. An SS1-SS2 beta-barrel structurefor the voltage-activated potassium channel. Protein Eng. 5:285-93.

Page 15: Protein sequence and structure relationship ARMA spectral ... · brane proteins have also been proposed (Argos et al., 1982; Engelmanet al., 1986). It is physically understandable

2106 Biophysical Journal Volume 66 June 1994

Box, G. E. P., and G. M. Jenkins. 1970. Time Series Analysis, Forecastingand Control. Holden-Day Inc., San Francisco.

Chou, P. Y., and G. D. Fasman. 1978. Prediction of the secondary structureof proteins from their amino acid sequence. Adv. EnzymoL 47:45-148.

Cornette, J. L., K. B. Cease, H. Margalit, J. L. Spouge, J. A. Berzofsky, C.DeLisi. 1987. Hydrophobicity scales and computational techniques fordetecting amphipathic structures in proteins. J. Mol. Biol. 195:659-85.

Deisenhofer, J., 0. Epp, K. Miki, R. Huber, and H. Michel. 1984. X-raystructure analysis of a membrane protein complex. Electron density mapat 3 A resolution and a model of the chromophores of the photosyntheticreaction center from Rhodopseudomonas viridis. J. Mol. Biol. 180:385-98.

Deisenhofer, J., 0. Epp, K. Miki, R. Huber, and H. Michel. 1985. Structureof the protein subunits in the photosynthetic reaction center of Rhodo-pseudomonas viridis at 3-A resolution. Nature. 318:618-624.

Dill, K. A. 1990. Dominant forces in protein folding. Biochemistry. 29:7133-55.

Durell, S. R., and H. R. Guy. 1992. Atomic scale structure and functionalmodels of voltage-gated potassium channels. Biophys. J. 62:238-47.

Eisenberg, D., R. M. Weiss, and T. C. Terwilliger. 1984. The hydrophobicmoment detects periodicity in protein hydrophobicity. Proc. Natl. Acad.Sci. USA. 81:140-144.

Engelman, D. M., T. A. Steitz, and A. Goldman. 1986. Identifying nonpolartransbilayer helices in amino acid sequences ofmembrane proteins.AnnuRev. Biophys. Biophys. Chem. 15:321-53.

Fasman, D. G. 1989. Identification of membrane proteins and soluble pro-tein secondary structural elements, domain structure, and packing ar-rangements by Fourrier-transform amphipathic analysis. In Principles ofProtein Structure and Protein Folding. D. G. Fasman, editor. PlenumPress, New York.

Finer-Moore, J., F. Bazan, J. Rubin, and R. M. Stroud. 1989. Principles ofProtein Stucture and Protein Folding. D. G. Fasman, editor. Plenum Press,New York.

Galzi, J. L., F. Revah, A. Bessis, and J. P. Changeux. 1991. Functionalarchitecture of the nicotinic acetylcholine receptor: from electric organ tobrain. Annu. Rev. Pharmacol. Toxicicol. 31:37-72.

Gamier, J., D. J. Osguthorpe, and B. J. Robson. 1978. Analysis of theaccuracy and implications of simple method for predicting the secondarystructure of globular protein. Mol. BioL 120:97-120.

Henderson, R., J. M. Baldwin, T. A. Ceska, F. Zemlin, E. Beckmann, andK. H. Downing. 1990. Model for the structure of bacteriorhodopsin basedon high-resolution electron cryo-microscopy. J. Mol. Biol. 213:899-929.

Henderson, R., and P. N. T. Unwin. 1975. Three-dimensional model ofpurple membrane obtained by electron microscopy. Nature. 257:28-32.

Jihnig, F. 1989. Structure prediction for membrane proteins. In Principlesof Protein Structure and Protein Folding. D. G. Fasman, editor. PlenumPress, New York.

Jan, Y. N., and L. Y. Jan. 1989. Voltage-sensitive ion channels. Cell. 56:13-25.

Jenkins, G. M., and D. G. Watts. 1968. Spectral analysis and its application.Holden-Day, Inc., San Francisco, CA.

Kauzmann, W. 1959. Some factors in the interpretation of protein dena-turation. Adv. in Prot. Chem. 14:1-63.

Kay, S. M. 1987. Modem Spectral Estimation. Prentice-Hall, Inc., Engle-wood Cliffs, NJ.

Kyte, J., and R. F. Doolittle. 1982. A simple method for displaying thehydropathic character of a protein. J. MoL Biol. 157:105-32.

Marple, S. L., Jr. 1987. Digital Spectral Analysis with Applications.Prentice-Hall, Inc., Englewood Cliffs, NJ.

McLachlan, A. D., and M. Stewart. 1976. The 14-fold periodicity in alpha-tropomyosin and the interaction with actin. J. MoL Biol. 103:271-98.

Mehra, R. K. 1972. On-line identification of linear dynamic systemswith applications to Kalman filtering. IEEE Trans. Autom. Control. Vol.AC-16. 12-21.

Miller, C. 1991. Annus mirabilis of potassium channels. Science.252:1092-6.

Oppenheim, A. V., and A. S. Willsky. 1983. Signal and Systems. Prentice-Hall Inc., Englewood Cliffs, NJ.

Parker, M. W., F. Pattus, A. D. Tucker, and D. Tsernoglou. 1989. Structureof the membrane-pore-forming fragment of colicin A. Nature. 337:93-6.

Porat, B., and B. Fridlander. 1985. Asymptotic analysis of the bias of themodified Yule-Walker estimator. IEEE Trans. Autom. Control. Vol AC-30. 765-767.

Prince, R. C., J. C. Leight, and L. Dutton. 1976. Thermodynamic propertiesofthe reaction center ofRhodopseudomonas viridis. In vivo measurementof the reaction center bacteriochlorophyl-primary acceptor intermediaryelectron carrier. Biochim. Biophys. Acta. 440:622-36.

Rose, G. D., A. R. Geselowitz, G. J. Lesser, R. H. Lee, and M. H. Zehfus.1985. Hydrophobicity of amino acid residues in globular proteins.Science. 229:834-8.

Stroud, R. M., and J. Finer-Moore. 1985. Acetylcholine receptor structure,function, and evolution. J. Annu. Rev. Cell Biol. 1:317-51.

Stroud, R. M., M. P. McCarthy, and M. Shuster. 1990. Nicotinic acetyl-choline receptor superfamily of ligand-gated ion channels. Biochemistry.29:11009-11023.

Sun, S. 1993. Protein structure prediction: power spectral analysis and re-duced representation models. Ph.D. thesis. Department of BiophysicalScience, State University of New York at Buffalo, Buffalo, NY.

Sun, S., and R. Parthasarathy. 1990. Preceedings of Converging Approachesin Computational Biology. Albany Conference. September.

Sun, S., and R. Parthasarathy. 1993. Periodic Hydropathy profile of primarysequence implies periodic structural patterns in protein: power spectralanalysis. In press.

Tempel, B. L., D. M. Papazian, T. L. Schwarz, Y. N. Jan, and L. Y. Jan.1987. Sequence of a probable potassium channel component encoded atShaker locus of Drosophila. Science. 237:770-5.

Terwilliger, T. C., and D. Eisenberg. 1982. The structure of melittin.I. Structure determination and partial refinement. J. Biol. Chem. 257:6010-22.

Unwin, N. 1993. Nicotinic acetylcholine receptor at 9 A resolution. J. Mol.Biol. 229:1101-24.

von Heijne, G. 1991. Membrane protein structure prediction. Hydropho-bicity analysis and the positive-inside rule. J. Mol. BioL 225:487-94.

Weiss, M. S., A. Kreusch, E. Schiltz, U. Nestel, W. Welte, J. Weckesser,and G. E. Schulz. 1991. The structure of porin from Rhodobacter cap-sulatus at 1.8 A resolution. FEBS Let. 280:379-82.