Evolution of spider silks: conservation and diversification of the C-terminus

12
Insect Molecular Biology (2006) 15(1), 45–56 © 2006 The Royal Entomological Society 45 Blackwell Publishing Ltd Evolution of spider silks: conservation and diversification of the C-terminus R. J. Challis*, S. L. Goodacre† and G. M. Hewitt† * IEB, University of Edinburgh, King’s Buildings, West Mains Road, Edinburgh, UK; Centre for Ecology, Evolution and Conservation, School of Biological Sciences, University of East Anglia, Norwich, UK Abstract Analysis of DNA sequences coding for the C-terminus of spider silk proteins from a range of spiders suggests that many silk C-termini share a common origin, and that their physical properties have been highly conserved over several hundred million years. These physical prop- erties are compatible with roles in protein synthesis, silk function and in recruiting accessory proteins. Phylogenetic relationships among different silk genes suggest that any recombination has been insufficient to homogenize the different types of silk gene, which appear to have evolved independently of one another. The types of nucleotide substitutions that have occurred suggest that selection may have operated differently in the various silk lineages. Amino acid sequences of flagelliform silk C-termini differ substantially from the other types of spi- der silk studied, but they are expected to have very similar physical properties and may perform a similar function. Keywords: Araneae; C-terminus; sequence evolution; silk; spider. Introduction Spider silks are multimeric, modular fibres that are of considerable interest to biotechnologists because of their unique physical properties (Jin & Kaplan, 2003). Silks are classified according to the gland in which they are produced and by the spinning apparatus of the spider. Those that are woven by spiders with a cribellum (‘cribellate’ silks) are extremely fine and typically achieve stickiness through van der Waals interactions and hydrophobic nodes in their protein sequence (Hawthorn & Opell, 2002). In contrast, the thicker silks of spiders without a cribellum (‘ecribellate’ silks) are optimized for strength and elasticity and are secreted with a coating of an aqueous silk protein to achieve stickiness. Cribellate silks have been isolated from only a few species, including the primitive Mygalomorph spider Euagrus chisoseus and the Haplogyne Plectreurys tristis (Fig. 1, Gatesy et al ., 2001). Much more attention has focused on a range of ecribellate silks found in derived Entelegyne spiders, particularly the Araneoidea (see Table 1), perhaps because these possess the most impressive physical properties. Two classes of ecribellate spidroin silks (MaSp1 and MaSp2) have been isolated from a range of Araneoid spiders. These are produced by the major ampullate gland and are used to form a two component fibre for use as a dragline (Hinman & Lewis, 1992). MaSp1 forms crystalline regions of β -sheet (Hayashi & Lewis, 2000) that have the tensile strength of Kevlar (Gosline et al ., 1999) whilst retaining 35% extensibility (Hayashi & Lewis, 1998); whereas MaSp2 has a high proline content and contains sequence motifs that confer elasticity (Hayashi & Lewis, 2000). Further studies of spiders within the family Araneoidea have identified an additional class of very different silk fibre, which is produced by flagelliform glands. Flagelliform gland silks (flags) have slightly lower tensile strength than the MaSp silks but have many times the extensibility (up to 200%, Vollrath & Edmonds, 1989). This is central to their mechanism of prey capture because it enables webs to arrest the motion of flying organisms without breaking (Hayashi & Lewis, 1998). Extensibility of flagelliform silk is enhanced through interaction with an aqueous aggregate gland silk that coalesces to form sticky droplets (reviewed in Vollrath, 1999), which may serve to hydrate the flagelliform silk (Shao & Vollrath, 1999). A final class of spidroin silk that has been identified in the Araneoidea is spun from the minor ampullate gland. These silks (MiSps) are used in web construction and have lower extensibility than the flagelliform silks whilst retaining similar tensile strength (Gosline et al ., 1999). Silk proteins typically consist of a non-repetitive N-terminus, a highly repetitive (‘repeat’) region and a non-repetitive doi: 10.1111/j.1365-2583.2005.00606.x Received 27 May 2005; accepted following revision 1 August 2005. Corre- spondence: Dr S. Goodacre, School of Biological Sciences, University of East Anglia, Norwich, NR4 7TJ. Tel.: +44-1603-593 853; fax: +44-1603-592 250; e-mail: s.goodacre.uea.ac.uk.

description

Analysis of DNA sequences coding for the C-terminusof spider silk proteins from a range of spiders suggeststhat many silk C-termini share a common origin, and thattheir physical properties have been highly conservedover several hundred million years. These physical propertiesare compatible with roles in protein synthesis,silk function and in recruiting accessory proteins.Phylogenetic relationships among different silk genessuggest that any recombination has been insufficient tohomogenize the different types of silk gene, which appearto have evolved independently of one another. The typesof nucleotide substitutions that have occurred suggestthat selection may have operated differently in the varioussilk lineages. Amino acid sequences of flagelliform silkC-termini differ substantially from the other types of spidersilk studied, but they are expected to have very similarphysical properties and may perform a similar function.

Transcript of Evolution of spider silks: conservation and diversification of the C-terminus

  • Insect Molecular Biology (2006)

    15

    (1), 4556

    2006 The Royal Entomological Society

    45

    Blackwell Publishing Ltd

    Evolution of spider silks: conservation and diversification of the C-terminus

    R. J. Challis*, S. L. Goodacre and G. M. Hewitt*

    IEB, University of Edinburgh, Kings Buildings, West Mains Road, Edinburgh, UK;

    Centre for Ecology, Evolution and Conservation, School of Biological Sciences, University of East Anglia, Norwich, UK

    Abstract

    Analysis of DNA sequences coding for the C-terminusof spider silk proteins from a range of spiders suggeststhat many silk C-termini share a common origin, and thattheir physical properties have been highly conservedover several hundred million years. These physical prop-erties are compatible with roles in protein synthesis,silk function and in recruiting accessory proteins.Phylogenetic relationships among different silk genessuggest that any recombination has been insufficient tohomogenize the different types of silk gene, which appearto have evolved independently of one another. The typesof nucleotide substitutions that have occurred suggestthat selection may have operated differently in the varioussilk lineages. Amino acid sequences of flagelliform silkC-termini differ substantially from the other types of spi-der silk studied, but they are expected to have very similarphysical properties and may perform a similar function.

    Keywords: Araneae; C-terminus; sequence evolution;silk; spider.

    Introduction

    Spider silks are multimeric, modular fibres that are ofconsiderable interest to biotechnologists because of theirunique physical properties (Jin & Kaplan, 2003). Silks areclassified according to the gland in which they are producedand by the spinning apparatus of the spider. Those thatare woven by spiders with a cribellum (cribellate silks) are

    extremely fine and typically achieve stickiness through vander Waals interactions and hydrophobic nodes in their proteinsequence (Hawthorn & Opell, 2002). In contrast, the thickersilks of spiders without a cribellum (ecribellate silks) areoptimized for strength and elasticity and are secreted witha coating of an aqueous silk protein to achieve stickiness.

    Cribellate silks have been isolated from only a fewspecies, including the primitive Mygalomorph spider

    Euagruschisoseus

    and the Haplogyne

    Plectreurys tristis

    (Fig. 1,Gatesy

    et al

    ., 2001). Much more attention has focused ona range of ecribellate silks found in derived Entelegynespiders, particularly the Araneoidea (see Table 1), perhapsbecause these possess the most impressive physicalproperties. Two classes of ecribellate spidroin silks (MaSp1and MaSp2) have been isolated from a range of Araneoidspiders. These are produced by the major ampullate glandand are used to form a two component fibre for use as adragline (Hinman & Lewis, 1992). MaSp1 forms crystallineregions of

    -sheet (Hayashi & Lewis, 2000) that have thetensile strength of Kevlar (Gosline

    et al

    ., 1999) whilst retaining35% extensibility (Hayashi & Lewis, 1998); whereas MaSp2has a high proline content and contains sequence motifsthat confer elasticity (Hayashi & Lewis, 2000).

    Further studies of spiders within the family Araneoideahave identified an additional class of very different silk fibre,which is produced by flagelliform glands. Flagelliform glandsilks (flags) have slightly lower tensile strength than theMaSp silks but have many times the extensibility (up to 200%,Vollrath & Edmonds, 1989). This is central to their mechanismof prey capture because it enables webs to arrest the motionof flying organisms without breaking (Hayashi & Lewis,1998). Extensibility of flagelliform silk is enhanced throughinteraction with an aqueous aggregate gland silk thatcoalesces to form sticky droplets (reviewed in Vollrath, 1999),which may serve to hydrate the flagelliform silk (Shao &Vollrath, 1999). A final class of spidroin silk that has beenidentified in the Araneoidea is spun from the minor ampullategland. These silks (MiSps) are used in web construction andhave lower extensibility than the flagelliform silks whilstretaining similar tensile strength (Gosline

    et al

    ., 1999).Silk proteins typically consist of a non-repetitive N-terminus,

    a highly repetitive (repeat) region and a non-repetitive

    doi: 10.1111/j.1365-2583.2005.00606.xReceived 27 May 2005; accepted following revision 1 August 2005. Corre-spondence: Dr S. Goodacre, School of Biological Sciences, University ofEast Anglia, Norwich, NR4 7TJ. Tel.: +44-1603-593 853; fax: +44-1603-592250; e-mail: s.goodacre.uea.ac.uk.

  • 46

    R.J. Challis

    et al.

    2006 The Royal Entomological Society,

    Insect Molecular Biology

    ,

    15

    , 4556

    C-terminus. To date, most research has focused on the linkbetween sequence motifs in the repeat region and the phys-ical properties of silk (see reviews by Hayashi

    et al

    ., 1999and Craig & Reikel, 2002). The N-terminus has a role intransport as it encodes a signal peptide (Hayashi & Lewis,

    1998) but the role of the C-terminus is unclear. Kerkham

    et al

    . (1991) proposed that the C-terminus is important inmaintaining the aqueous state of silks prior to extrusion.Beckwitt & Arcidiacono (1994) found the C-terminal sequenceof spider silk to be highly conserved and proposed a further

    Figure 1. Simplified morphological phylogeny of the Araneae (based on Coddington & Levi, 1991). Bold type indicates families considered in this study. Numbers in parentheses denote the number of species of each family with silk sequence data available on GENBANK. Also indicated are the nodes calibrated by fossil evidence: () Rosamygale, 240 Mya (Selden & Gall, 1992) and () Macryphantes, 125 Mya (Selden, 1990).

    Table 1. Accession numbers of spider silk sequences in this study. All sequences are from mRNA apart from those indicated by *, which are from genomic DNA

    Family Species Protein Accession number Reference

    Dipluridae Euagrus chisoseus Fib1 AF350271 Gatesy et al. (2001)Plectreuridae Plectreurys tristis Fib1 AF350281 Gatesy et al. (2001)

    Fib2 AF350282 Gatesy et al. (2001)Fib3 AF350283 Gatesy et al. (2001)Fib4 AF350284 Gatesy et al. (2001)

    Araneidae Araneus bicentarius MaSp2* U20328 Hinman & Lewis (1992)Araneus diadematus MaSp1 U47854 Guerette et al. (1996)

    MaSp2 U47856 Guerette et al. (1996)MiSp U47853 Guerette et al. (1996)

    Argiope aurantia MaSp2* AF350263 Gatesy et al. (2001)Argiope trifasciata Flag AF350264 Gatesy et al. (2001)

    MaSp1 AF350266 Gatesy et al. (2001)MaSp2 AF350267 Gatesy et al. (2001)

    Gasteracantha mammosa MaSp2* AF350272 Gatesy et al. (2001)Tetragnathidae Nephila clavipes Flag* AF218621 Hayashi & Lewis (2000)

    MaSp1 U20329 Hinman & Lewis (1992)MiSp AF027736 Colgin & Lewis (1998)

    Nephila madagascariensis MaSp1* AF350277 Gatesy et al. (2001)MaSp2* AF350278 Gatesy et al. (2001)

    Nephila senegalensis MaSp1* AF350279 Gatesy et al. (2001)MaSp2* AF350280 Gatesy et al. (2001)

    Tetragnatha kauaiensis MaSp1* AF350285 Gatesy et al. (2001)Tetragnatha versicolor MaSp1* AF350286 Gatesy et al. (2001)

    Theridiidae Latrodectus geometricus MaSp1 AF350273 Gatesy et al. (2001)Pisauridae Dolomedes tenebrosus AmSp1 AF350269 Gatesy et al. (2001)

    AmSp2 AF350270 Gatesy et al. (2001)

  • Evolution of Spider silks

    47

    2006 The Royal Entomological Society,

    Insect Molecular Biology

    ,

    15

    , 4556

    role in signalling. These functions are not incompatible andboth could require sequence conservation.

    Similarities among different spider silk genes suggest thatthey share a common ancestor (reviewed in Craig & Reikel,2002), but the evolutionary relationships among functionalhomologues are unclear. It is thought that many of thegenes in this family have evolved through gene duplications(Beckwitt & Arcidiacono, 1994). Functional relationships arefurther complicated by the existence of duplicate silk glands,spigots and spinnerets (Coddington & Levi, 1991).

    In this study, we use sequence data for different silktypes from 16 species distributed across six spider families.Sequences are from either genomic or cDNA and the speciesincluded in the study come from both basal and terminalclades within the Araneae (Fig. 1). For clarity, we use thenomenclature of Gatesy

    et al

    . (2001), with the addition ofthe abbreviation Fib for the ecribellate fibroins and AmSpfor the ampullate gland spidroin of

    Dolomedes tenebrosus

    .

    We combine phylogenetic analysis of silk sequences withprediction of secondary structure and physical properties toinvestigate the evolution of the C-terminus of spider silk.

    Results

    Sequence conservation

    There were 26 DNA sequences on G

    EN

    B

    ANK

    for which theC-terminal silk sequence was available: five cribellate fibroinsand 21 ecribellate spidroins/fibroins. The total length of eachsequence varied since all are partial gene sequences withvariable repeat length, repeat number and C-terminal length.Greater sequence conservation was observed at the C-terminus. There is a particularly conserved region at a QALLEamino acid sequence motif (Fig. 2), at which the majority ofsequences share greater than 50% identity (Dayhoff similaritymatrix; Dayhoff

    et al

    ., 1978). When the entire C-terminusis considered, the similarity is lower, but most silks share atleast 30% identity. The flag C-termini are the most highlydiverged. They do not have a complete QALLE motif andshare as little as 23% sequence identity with other silks.

    Phylogenetic analysis

    Phylogenetic analyses were performed on the entire silk C-terminal data set (198 bp, 66 amino acids) and repeated with

    Figure 2. ClustalW alignment of C-terminal amino acid sequences, shaded to indicate similarities (grey) and identities (black). The region of highly conserved sequence about a QALLE motif, which corresponds to a region of predicted -helix in all silk types is also shown.

  • 48

    R.J. Challis

    et al.

    2006 The Royal Entomological Society,

    Insect Molecular Biology

    ,

    15

    , 4556

    the highly diverged flagelliform silk sequences removed.Nucleotide-based trees constructed using both maximumlikelihood and Bayesian methods are shown in Fig. 3 (esti-mated proportion of invariant sites = 0.22/0.19 and

    shapeparameter = 1.66/1.69 for Maximum likelihood/Bayesiantrees, respectively). When the highly diverged flag sequenceswere removed from the analysis, those relationships amongremaining silks that were well-supported in the previousanalysis were found to remain the same (data not shown).Trees constructed by the neighbour-joining method had thesame overall topology as those constructed by maximumlikelihood (data not shown). In all trees the ecribellate silks(AmSp, MaSp and MiSp) of the derived Entelegyne spiders(Fig. 1) cluster separately from the cribellate silks of thebasal genera

    Euagrus

    and

    Plectreurys

    . This relationship hashigh support (posterior probability = 1.00) in the Bayesianbut not in the maximum likelihood (ML) or neighbour joining(NJ) trees, although the overall topology is similar in eachcase. The highly diverged flagelliform silks cluster mostclosely to

    Plectreurys

    Fib4 but this relationship is notstrongly supported by any method of tree estimation.

    Within the MaSp/MiSp silk group there are few well-supported nodes in the ML or NJ trees but strong supportin the Bayesian tree for the following: AmSp and MiSp silkscluster separately from the MaSp silks and MaSp silkscluster in several, well-supported paraphyletic groups, withstrong support for several terminal groupings consistingof either MaSp1 or MaSp2 silks but not both. The singleexception is MaSp1 of

    Araneus diadematus

    , which fallswithin a well-supported group containing MaSp2 silks ofother species.

    Maximum likelihood and Bayesian analysis of amino acidsequences (JTT model of substitution) are shown in Fig. 4.Well supported terminal groups in these trees were alsowell supported by analysis of nucleotide sequences (Fig. 3).

    Sequence evolution

    Tests for recombination made using Recpars (Hein 93, withany gaps in the alignment removed) inferred between 1and 5 recombination events within the phylogeny when therecombination:substitution cost was set at 1.5 : 1. When allsequences were included, at least one recombination event

    Figure 3. (a) Unrooted maximum likelihood tree (198 base pairs) of C-termini. Rate matrix, proportion of invariant sites (0.22) and (1.66) estimated from an initial neighbour-joining tree. Numbers indicate the support for individual branches from 100 bootstrap replicates (values above 70 shown). (b) Unrooted phylogeny constructed using a Bayesian approach (computed with MRBAYES using 4 chains of 1 000 000 generations after a burn in time of 100 000 generations), estimating the proportion of invariant sites (019) and (1.69). Probabilities for each branch are given. Tests for substitution rate heterogeneity among branches labelled 14 are described in Table 2.

  • Evolution of Spider silks

    49

    2006 The Royal Entomological Society,

    Insect Molecular Biology

    ,

    15

    , 4556

    was inferred until the ratio was set at > 6 : 1. Severaltransition:transversion costs were used (0.1 : 1, 0.5 : 1 and1 : 1) and found not to affect the final threshold at whichno recombination events were inferred. The analysis wasrepeated with Fib, Flag, MiSp or MaSp sequences removed.At least one recombination event was inferred in each case,apart from when the Fib silks were excluded, leaving onlyFlag, MaSp, MiSp and AmSp silks. Similar results wereobtained using the DSS approach: two recombination eventswere inferred when all sequences were included in theanalysis (F84 model of nucleotide substitution, 1000 bootstrapreplicates, threshold = 0.95), and at least one event whenFlag, MiSp or MaSp sequences were removed from thealignment, but non-e when the Fib silks were excluded.

    Estimates of

    (the d

    N

    /d

    S

    ratio) were made based uponthe tree topology estimated by Bayesian analysis (Fig. 3).The estimated value of

    was 0.088 when a single valuewas assumed across all sites and branches within the tree(Model = 0, NSites = M0). Parameters estimated assumingdifferent categories of

    are given in Table 2. No sites hadan estimated

    > 1 but the observed data under the modelNSites = M3 (discrete categories of

    ) was found to besignificantly more likely by LRT (

    P

    < 0.0001) than Nsites =M0 (all sites assumed to have the same ratio).

    When a beta distribution with a free ratio of

    where

    can exceed 1 was assumed (Nsites = M8), there was nosignificant increase in likelihood when compared with a betadistribution with all values < 1 (Nsites = M7). An insignificant

    likelihood-ratio in this case might simply reflect a sensitivityto the number of sequences included in the analysis andthe level of sequence divergence (as reviewed by Bielawski& Yang, 2003) but this explanation cannot be evaluatedwithout adding new data to the analysis. Estimates of

    when the MaSp silks were analysed separately similarlyfound no site classes with

    > 1 although the model assum-ing several discrete classes of

    (NSites = M3) was morelikely than the nested model (Nsites = M0), which assumesno rate heterogeneity (

    P

    < 0.0001).Estimates of

    allowing for different values for individualbranches within the tree (indicated in Fig. 3b) show thatwhen models allowing for a difference in substitution ratiobetween lineages are assumed, a significantly higher like-lihood of the data is observed than when only one set ofratios is assumed for all branches (Table 2). Estimatesallowing four different classes of

    with a different set ofratios in a specified lineage gave estimates of

    > 1 for oneclass in the Flag lineage (

    > 13, P = 0.002) and one classin each of (1) the major ampullate (2) major ampullate/minor ampullate and (3) Fib lineages, although in thesecases no significant increase in likelihood of the data wasobserved over the null model.

    The branch-sites analysis was repeated twice with one ofthe most highly diverged sequences, Fib and Flag, removed.With the Fib sequences removed, estimates of for oneclass within the major ampullate lineage were again foundto be > 1 (estimated = 6.692) and the data were found to

    Figure 4. (a) Unrooted maximum likelihood analysis tree of C-termini amino acid sequences (66 amino acids) calculated using MOLPHY (JTT substitution matrix, majority rule consensus of 50 trees produced is shown, branches supported in more than 50% (= 25) trees, are shown). (b) Unrooted Bayesian analysis of amino acid sequences (JTT substitution matrix, chain number, burn in time and branch probabilities as for figure 3b. Majority rule consensus shown).

  • 50 R.J. Challis et al.

    2006 The Royal Entomological Society, Insect Molecular Biology, 15, 4556

    be significantly more likely than when assuming one ratefor all branches (LRT 2 = 8.16, P = 0.017, 2 d.f.). Whenthe Flag sequences were removed, estimates of for oneclass within the Fib lineage and MaSp lineages were also> 1 ( = 7.726 and 5.426, respectively) but the data werenot significantly more likely than under the null model.

    The mean dS and dN ratios for all MaSp sequences were1.53 and 0.23, respectively. Similarly, the mean dS and dNestimates within all MaSp1 sequences were 1.27 and 0.23(28 comparisons), within MaSp2 were 1.41 and 0.20 (21comparisons) and between MaSp 1 and MaSp 2 were 1.42and 0.26 (56 comparisons). Several intraspecific dS/dNestimates were possible as follows: 1.49/0.16 (MaSp1/MaSp2Nephila madagascariensis); 1.41/0.31 (MaSp1/MaSp2Araneus diadematus); 1.39/0.09 (MaSp1/MaSp2 Argiopetrifasciata); 1.09/0.15 (MaSp1/MaSp2 Nephila senegalensis).Similarly, the dS and dN estimates between the twoMiSp silks of Araneus diadematus and Nephila clavipeswere 0.99 and 0.52, with intraspecific dS and dN estimatesfor MiSp/MaSp1 (or MaSp2) of 1.88 and 0.62 (1.86 and0.92) for Araneus diadematus and 1.27 and 0.39 for N. cla-vipes. McDonald-Kreitman (1991) tests implemented in(Rozas et al., 2003) comparing interspecific substitutionratios (MaSp1/MaSp1 and MaSp2/MaSp2) with intraspe-cific ratios (same species MaSp1 vs. MaSp2) found nosignificant departures from neutrality (P > 0.05 for eachcomparison).

    Physical properties and functionA hydrophobicity profile of a representative silk gene C-terminus with the upstream final repeat units (MaSp1 ofN. senegalensis, 250 amino acids), is given in Fig. 5(a).The gene shows an oscillating pattern of hydrophobicand hydrophilic regions. All silks, with the exception of theflagelliform types, show a similar pattern (data not shown)and all silks (including the flagelliform types) have a peak inhydrophobicity at the C-terminus. The C-terminus peak inhydrophobicity is greater than that at any point in therepetitive region of silk genes and is similar to that of the silkN-terminus, which is known to act as a signal peptide inprotein transport.

    Hydrophobicity profiles for the C-terminal 90 amino acidsof all silk genes (the most hydrophobic region of the entiresilk gene) were also very similar (Fig. 5b). The exceptionwas the profile of silk from E. chisoseus (shown in bold).This sequence contained an additional hydrophobic peakregion of greater than two units, 48 residues upstream ofthe peak found in all silks (Fig. 5b). The height of the twopeaks is identical and they have a similar hydrophobicityprofile but less than 20% identity.

    The region of high hydrophobicity at the 3 end of the C-terminus found in all silks (Fig. 5b) corresponds to theQALLE region of high sequence conservation (Fig. 2). InAmSp, MaSp and MiSp silks this region has two hydrophobic

    Table 2. Estimates of under different models of heterogeneity across sites or branches within the tree topology (indicated on Fig. 3) and likelihood ratio tests (LRTs) of nested models

    Model Parameter estimates Likelihood 2, LRTAll sequences includedOne across branches and sitesModel = 0 NSites = 0 (per branch) = 0.088 3874.79 One across branches, rate variation among sites (site classes) = 0.050, 0.124, 0.245Model = 0 NSites = 3 (discrete classes) 3846.70 56.18, P < 0.0001Model = 0 NSites = 7 (beta distribution) p = 2.75, q = 26.77 3850.14 Model = 0 NSites = 8 (beta + free ratio of ) p0 = 1, p = 2.75, q = 26.77 NS

    (p1 = 0, = 2.60)MaSp1 & 2 sequences only 2 d.f.One across branches, rate variation among sitesModel = 0, NSites = 0 (per branch) = 0.063 1703.95 3 (discrete classes) (site classes) = 0.029, 0.130, 0.440 1679.78 48.34, P < 0.0001 7 (beta distribution) p = 0.92, q = 11.68 1682.84 8 (beta + free ratio of ) p0 = 1, p = 0.82, q = 11.69 1682.84 NS

    p1 = 0, = 2.73All sequences includedOne across branches, 2 site categoriesModel = 0 NSites = 3 (2 categories) (site classes) = 0.054, 0.165 3847.84Branch-sites model (variation across sites and branches)Model = 2 Nsites = 31 2 = 3 = 4 1 (site classes) = 0.052, 0.163, 13.420 3841.80 12.08, P = 0.002

    2,3,4 (site classes) = 0.052, 0.1632 1 = 3 = 4 2 (site classes) = 0.162, 0.524, 96.766 3847.26 NS

    1,3,4 (site classes) = 0.162, 0.5243 1 = 2 = 4 3 (site classes) = 0.053, 0.0165, 19.247 3847.87 NS

    1,2,4 (site classes) = 0.053, 0.1654 1 = 2 = 3 4 (site classes) = 0.053, 0.170, 4.019 3845.74 NS

    1,2,3 (site classes) = 0053, 0.170

  • Evolution of Spider silks 51

    2006 The Royal Entomological Society, Insect Molecular Biology, 15, 4556

    Figure 5. Kyte and Doolittle mean hydrophobicity profiles (Scan window = 13 residues) of: (a) Nephila senegalensis MaSp1 silk final repeat units and C-terminus (250 amino acids) (b) C-terminus (90 amino acids) of all silk genes (Euagrus chisoseus shown in bold) (c) Hydrophobicity plot of conserved QALLE region (21 amino acids) showing the difference between flagelliform (Nephila clavipes Flag, grey line) and non-flagelliform (Nephila senegalensis MaSp1, black line) silks.

  • 52 R.J. Challis et al.

    2006 The Royal Entomological Society, Insect Molecular Biology, 15, 4556

    maxima corresponding to the 8th and 12th residues ofthe conserved QALLE motif (usually leucine and serine). InFlagelliform silks there are also two hydrophobic maxima,but these are three, as opposed to four, residues apart, inline with the 9th and 12th residues of the conserved region.The difference between Flag and non-Flag silks in thisregion is illustrated in Fig. 5(c) by comparing the conservedQALLE region (21 amino acids) of MaSp1 of N. senegalensiswith the Flag from N. clavipes.

    Comparison of silk sequences with proteins of knownstructure using the 3D-PSSM server suggests that the highlyconserved C-terminal silk QALLE motifs form -helices(Fig. 2). All sequences have at least one additional regionupstream of the conserved QALLE motif that is likely toform an -helix (data not shown). In E. chisoseus Fib1 thisregion lies within the region of its first hydrophobic peakupstream of the QALLE motif.

    Discussion

    Evolution of spider silks: sequence conservation and diversificationPrevious studies have shown that the sequence motifs in therepeat region of silk genes are highly conserved (Gatesyet al., 2001). The present study demonstrates that there isalso a high degree of conservation of the non-repetitiveC-terminus of silk proteins in terms of primary amino acidsequence, predicted secondary structure and physicalproperties. This similarity exists between species that arethought to have diverged up to 240 million years ago (Fig. 1,Selden & Gall, 1992) and among silk proteins that have awide range of physical properties. These traits are thought tobe conferred largely by repeated amino acid motifs encodedby regions upstream of the C-terminus (Hayashi & Lewis,2000). The degree of similarity among physical propertiesand predicted secondary structure of silk C-termini suggeststhat they perform a common function and that their evolutionis likely to be constrained by selection against mutationsthat disrupt this function.

    Silk genes contain many GC-rich regions that are potentialrecombination hotspots (Hayashi & Lewis, 2000) and it ispossible that they have evolved through a complex mode ofevolution involving both gene conversion and recombination,which would have obscured their true origins. In accordancewith this prediction, the null hypothesis that there has beenno recombination is rejected for the entire dataset. However,the null hypothesis is not rejected when Fib sequences areexcluded from the analysis. Furthermore, MiSp silks clusterseparately from their MaSp counterparts in phylogeneticanalyses and silks of the same species cluster according togene in most cases (e.g. Nephila MaSp1 and MaSp2 genes).

    The observation that many silks cluster according to type,rather than according to species, suggests that their evolu-tion may be explained better by a birth-and-death process

    involving gene duplication and loss of function (Nei, 1969),than by a model of concerted evolution where recombinationhomogenizes genes post duplication. Under such a process,genes are expected to cluster by gene or duplication orderrather than by species, low levels of sequence homogeneityare expected between different genes (particularly at non-coding sites) and there will be evidence of gene loss orpseudo-gene formation (as discussed by Nei et al., 1997).In contrast, if recombination has had the greater influencethen sequence homogeneity between genes within speciesis expected to be high and they may cluster together inphylogenetic analyses.

    It is not possible to assess the rate of gene loss within thespider silk family, since much of the work so far has involvedidentification of expressed genes through isolation of theirmRNA. However, the phylogenetic relationships observedamong silks of the same species presented in this analysisare inconsistent with the sole explanation of recombinationunder a model of concerted evolution and comparisonsbetween intra- and interspecific dS values also support thisview for the following reason: Concerted evolution is expectedto result in homogenization of different silk types and hencedecrease intraspecific values of dS, regardless of any poten-tially countervailing effects of selection, which is expectedto have a greater effect on dN. However, in spider silks,intraspecific values of dS between silk types (e.g. 1.49,1.41, 1.39) are of a similar magnitude or greater than thosecalculated between species (1.27 for all MaSp1 sequences;1.41 for MaSp2 and 1.53 for the combined data set ofMaSp1 and 2). Furthermore, McDonald-Kreitman tests findno significant departure from neutrality; such a departure isexpected if dS has been reduced to a greater extent thanthe associated dN.

    Hydrophobicity profiles point to a potentially complex modeof evolution of spider silks that could also involve duplica-tion of regions within genes. The hydrophobicity profile ofthe C-terminus of what is thought to be the most basalspecies in this study, E. chisoseus Fib1, has two peaks, 48residues apart, both of which are predicted to form -helicalregions. The presence of two regions, with the same predictedsecondary structure, raises several important points aboutthe evolution of spider silks. They may have arisen from areplication error, such as are thought to drive the evolutionof spider silks (Beckwitt et al., 1998). However, convergencethrough selection for similar physical properties is a plausiblealternative explanation, given the low amino acid identity ofthe two peaks. It is not possible to establish from this datasetwhether the two peaks arose in the E. chisoseus lineageor if they represent an ancestral state, in which case theycould be present in other basal lineages. More sequences,particularly from primitive spider lineages, would help toanswer this intriguing question.

    The likelihood of estimates of dN/dS ratios on a site-by-site and branch-by-branch basis using different models

  • Evolution of Spider silks 53

    2006 The Royal Entomological Society, Insect Molecular Biology, 15, 4556

    suggest that there is variable selective pressure on differentamino acids within spider silks, that the ratio is not uniformthroughout the tree and that there may be positive selectionat some sites within particular lineages. Evidence for positiveselection within particular types of silk, such as those of theampullate class, which form an intrinsic part of the webarchitecture, is interesting because speciation has beenfound in at least one case to be more rapid in web-buildingthan non-web-building spiders (Gillespie, 1999). However,it is important to emphasize that tests for selection arebased on the assumption of no recombination and this nullhypothesis is rejected when Fib sequences are included inthe analyses (although the number of recombination eventsinferred is lower than that predicted to significantly increasetype I errors (Anisimova et al., 2003).

    dN/dS ratios were re-estimated with the Fib sequencesremoved from the dataset (and similarly with the highlydiverged Flag sequences removed) and the same trendwas observed in the ampullate silk lineage. However wecannot be certain that there has been no recombinationamong the remaining sequences within our data set becausetests for recombination can themselves be confounded bypositive selection at individual amino acid positions, orby heterogeneity in branch lengths within a phylogeny.Furthermore, they may not be sufficiently sensitive to detectrare or ancient recombination events. As a result we cannotrule out inter or intragenic recombination, although our datacertainly indicate that recombination has not overwhelmedthe effects of gene duplication and independent diversifica-tion of different genes, and our analyses point to variableselection within some regions of the C-terminus (whichcould be the result of differences in functional constraint)and a heterogeneous pattern of branchwise dN/dS substi-tution ratios.

    Evolution of spider silks: origin and functionThe origin of the Araneoid ecribellate silks as a whole remainsuncertain. If we accept the use of the silk from the basalMygalomorph species, E. chisoseus, as a suitable outgroupthen the results of the amino acid analyses are consistentwith the MaSp, AmSp and MiSp silks being the most derived(Fig. 4). Bayesian analysis of nucleotide sequences supportsthis hypothesis, but there is poor resolution in the maximumlikelihood tree (Fig. 3). It is important to emphasize that,broadly speaking, the same relationships among silks arerecovered using all methods of analysis, but the inconsist-ency in confidence estimates deserves some comment.Discrepancies between the support given by posterior prob-abilities (Bayesian analyses) and bootstrap values (maximumlikelihood analyses) are not unexpected (Douady et al.,2003) and are likely to reflect sensitivities of either (or both)measures to assumptions in the different models applied.Similar sensitivities could also account for differences inestimated branch length, which are greater for the Flag

    silks in the maximum likelihood analysis than when usingeither of the other methods, but the analyses are unlikely toimprove until additional silk sequences can be included.

    It is interesting to note that sequence divergence withinthe four cribellate silks, which are from a single species,P. tristis, is greater than that within the entire MaSp1 and2 group in both ML and Bayesian analyses. This diversitymight be peculiar to P. tristis, or it could reflect generallyhigher levels of sequence divergence among cribellate silks.Increased sequence divergence is expected to lower therate of recombination through removing recombinationhotspots, or simply by shortening homologous regionsrequired for recombination to occur. As such it is noteworthythat it was those tests that included the Fib sequences thatrejected the null hypothesis of no recombination.

    The flagelliform silks have the lowest sequence identityto other sequences and show above average sequencedivergence on the tree of C-terminal regions (Fig. 3). Despitethis divergence, and the different physical properties offlagelliform silks as a whole, the predicted physical proper-ties of flagelliform C-termini are not dissimilar to the othertypes studied. There are two possible hypotheses to explainsimilar physical properties despite considerable aminoacid sequence divergence: Either 1) ancestral amino acidsequence similarity to other silks has been obscured by alarge number of mutations or (2) the C-termini of flagelliformand non-flagelliform silks do not share a common ancestorand similarities in terms of predicted physical propertiesbetween flagelliform and other silk C-termini are the resultof selection for similar physical properties.

    Flagelliform silks are thought to have originated when theAraneoid spiders split from their sister group, with MaSpsilks having evolved somewhat earlier at the divergenceof the Araneomorphae. In contrast with this hypothesizedorigin, the flagelliform silk sequences appear to cluster moreclosely to the primitive cribellate silks than to the MaSp silksin phylogenetic analyses. This placement may be explainedby flagelliform silks being derived from primitive silks and notfrom the MaSp lineage. However, an entirely independent,non-homologous origin of the flagelliform silk C-terminuscould also explain the apparently elevated substitution ratiosalong the Flag lineage and the high degree of sequencedivergence between flagelliform and non-flagelliform silks (withlong-branch attraction accounting for the basal position inthe tree, Philippe & Laurent, 1998). Non-homology of the Flagsilks would violate assumptions upon which the methodsfor estimating all substitution ratios are based. Therefore,although an independent origin seems unlikely, it is importantto highlight the fact that our analyses were repeated withthis lineage completely excluded and that similar trendswere found.

    Silk C-termini show the same degree of hydrophobicity asthe silk protein N-terminal signal peptides of Nephila spp.(data not shown) and it is possible that the C-termini are

  • 54 R.J. Challis et al.

    2006 The Royal Entomological Society, Insect Molecular Biology, 15, 4556

    similarly involved in signalling. This supports the previoussuggestion made by Beckwitt & Arcidiacono (1994) thatthe role of this region might be required for correct proteintransport and synthesis although transport is unlikely to bethe only function since recent work on major ampullatespidroins confirms that C-termini are indeed present inthe mature silk thread (Sponner et al., 2004). The highlyconserved hydrophobic regions in the C-termini (shownin Fig. 5b) might, for example, be required for recruitingaccessory proteins such as chaperonins, in order to facilitatecorrect protein folding (reviewed in Hartl & Hayer-Hartl,2002).

    Diversification in both the C-terminus and upstreamregions and the consequent changes in physical propertiesmay be driven by differences in how the silk is used. As anexample, bolas spiders (Mastophora spp.) have evolved anexceptionally strong MaSp silk from which to hang a visciddroplet (bolus), which attracts prey, rather than weavingthe orb shaped web that is characteristic of many othermembers of the MaSp weaving family (Cartan & Miyashita,2000). This particular case illustrates the link between theability to evolve a new type of silk and the evolution ofdifferences in behaviour and ecology, which may themselvesbe associated with other processes such as speciation(Gillespie, 1999).

    While experimental work is essential to further develophypotheses relating to structure and function of spider silks,our study of the C-terminus has given some insight intothe evolution of this gene family. Analysis of the physicalproperties of the C-termini is also informative about silkfibre formation itself: Flagelliform and major ampullate silksrepresent the known extremes of extensibility and tensilestrength, and within the ampullate silks, upstream MaSp1and MaSp2 regions have very different amino acid com-positions, yet all C-termini are predicted to behave in a similarmanner. If this region is present in the mature protein andis important in protein transport and folding as predicted,the degree of sequence conservation suggests that a singleprocess may one day be used for the commercial productionof silks with diverse properties.

    Experimental procedures

    Silk sequencesSequences were retrieved from GENBANK (Table 1) and aligned inBioEdit v. 5.0.9 (Hall, 1999) using ClustalW (Thompson et al., 1994).

    Sequence evolutionPhylogenetic analyses of nucleotide sequences (198 unambiguouslyaligned base pairs/66 amino acids) were performed using severalmethods: maximum likelihood (Felsenstein, 1981), neighbour-joiningand using a probabilitybased, Bayesian approach.

    Maximum likelihood trees were constructed in PAUP* v. 4.0b(Swofford, 1999) using the general time-reversible (GTR) model

    (Lanave et al., 1984). The GTR rate matrix, base frequencies, theproportion of invariant sites and the shape parameter () for thegamma distribution that describes heterogeneity across sites, wereall estimated by likelihood using an iteration procedure based onan initial simple neighbour-joining tree: Parameter values wereestimated from this initial neighbour-joining tree using likelihood.These parameters were then used to make a new neighbour-joining tree, and the parameters re-estimated by likelihood fromthis new tree. The process was repeated until no further improve-ment in likelihood of the neighbour-joining tree was observed. Thefinal parameter estimates were used to construct a tree by maximumlikelihood. The phylogeny was rooted on E. chisoseus on the basisof its basal position within the Araneae based upon morphologicaldata (Fig. 1). Tree searching involved a heuristic procedure with tree-bisection-reconnection branch swapping. Bootstrap resampling(100 replicates, Felsenstein, 1985) was used to assign support forparticular branches within the tree. Neighbour-joining trees wereconstructed using MEGA 2.1 (Kumar et al., 2001) using the Tajima-Nei (1984) model of nucleotide substitution.

    A probability-based, Bayesian approach to tree construction wascarried out using MRBAYES (Huelsenbeck & Ronquist, 2001).This package uses a metropolis-coupled Markov chain Monte Carloalgorithm to allow the running of multiple chains. A run of four chainsfor 1 000 000 generations with a burn-in time of 100 000 generationswas carried out to ensure Markov chain convergence. A generaltime reversible model of nucleotide substitution was used allowingfor rate heterogeneity across sites, with a proportion of sitesallowed to be invariant.

    Maximum likelihood analysis of amino acid sequences usingthe Jones, Taylor Thornton (JTT, 1992) substitution matrix wasperformed using the program MOLPHY v. 2.3 (Adachi & Hasegawa,1996.) 50 bootstrap replicates were used to assign support forindividual nodes within the tree. Bayesian analysis of amino acidsequences was also carried out using the JTT matrix with thesame burn-in and run parameters as before.

    Tests for detecting recombination events based upon a phylogeneticapproach were carried out using the program Recpars (Hein, 1993).Phylogenies with and without recombination events were evaluatedagainst one another by comparing their total costs using a range ofrecombination to substitution costs (the recommended ratio is 1.5 : 1,Wiuf et al., 2001). A further test for recombination was made usingthe DSS (difference in sum of squares) approach (McGuire &Wright, 2000; F84 distance measure used) as implemented in theprogram TOPALI ( Milne, Husmeir, McGuire & Wright, 2003, 04).

    Estimates of , the parameter describing non-synonymous/synonymous (dN/dS) amino acid substitution ratios, were made bymaximum likelihood using the program codeml in the softwarepackage PAML (Yang, 1997). The method allows codon bias andvariable substitution rates to be incorporated into the analysis (Yang& Bielawski, 2000), which is essential given the AT bias of third codonpositions in spider silk (Xu & Lewis, 1990; Hayashi & Lewis, 1998).Estimates were made based upon a given tree topology, with thefollowing sets of criteria: (i) assuming a single value of across allbranches and sites in the tree (Model = 0, Nsites = M0) (ii) allowingfor heterogeneity in among codons within the tree (Model = 0, Nsites= M3, M7 or M8) (iii) allowing a different value of along a specifiedbranch in the tree (Yang & Nielsen 2002) whilst at the same timeallowing four different classes of for amino acid positions (usingModel = 2, Nsites = M3).

    The tests described can theoretically detect the small number ofsites (or branches) for which > 1 even when < 1 for the majority

  • Evolution of Spider silks 55

    2006 The Royal Entomological Society, Insect Molecular Biology, 15, 4556

    of sites. NSites M3 assumes discrete categories with different ratios where can exceed 1, M7 assumes a continuous betadistribution of across sites, where the distribution can take avariety of shapes and where 1 and M8 assumes the same,continuous beta distribution of as M7, but with the addition of oneextra site class that has a free ratio estimated from the data. M8is known to suffer from localized optima and was therefore runseveral times using different starting values and the highest likeli-hood score taken.

    Likelihood ratio tests for selection on individual branches wereperformed by comparing the likelihoods of Model = 2, NSites = M3(no. categories = 4) with the nested Model = 0, NSites = M3 (no.categories = 2). Similarly, tests for selection on individual aminoacids rather than branches were performed by comparing the nestedmodels NSites = M0 (all sites have the same ) with Nsites = M3(3 discrete categories of ) according to Nielsen & Yang 1998 andthe model M7 (beta distribution of with values always 1) withnested model M8 (beta distribution with an additional class of that can exceed 1). Twice the difference between the likelihoodsof these models was compared with the 2 distribution (1 or 2degrees of freedom for branch and site models, respectively).

    Predicted structure, physical properties and functionHydrophobicity of amino acid sequences was predicted using Kyteand Doolittle mean hydrophobicity profiles (Kyte & Doolittle, 1982)in BioEdit vs. 5.0.9 (Hall, 1999). This technique was consistent overa range of scan window sizes (520 residues) and gave resultscomparable with other scales, such as the Parker HPLC (Parkeret al., 1986) and the Eisenberg scales (Eisenberg et al., 1984).

    Secondary structure prediction was performed using the 3D-PSSM server (Fischer et al., 1999; Kelley et al., 1999, 2000). Thisweb-based program compares input query protein sequences withan extensive database (fold library) of proteins of known structure.Similarities between known and query sequences are used topredict secondary structures of the latter. Estimates of confidencein predicted secondary structure elements, based upon the simi-larity between query and known sequences, are calculated by theprogram (shown as E-values). Only regions with more than 95%confidence in their predicted structure are shown in this study.

    Acknowledgements

    The authors are grateful to Dr Brent Emerson, Amy Crowtherand Dr Alison Surridge for critically reading the manuscript.This work was supported by the University of East Angliaand by a BBSRC grant to Prof. Hewitt.

    References

    Adachi, J. and Hasegawa, M. (1996) MOLPHY, Version 2.3:Programs for Molecular Phylogenetics Based on MaximumLikelihood. Tokyo: Institute of Statistical Mathematics.

    Anisimova, M., Nielsen, R. and Yang, Z. (2003) Effect of recombi-nation on the accuracy of the likelihood method for detectingpositive selection at amino acid sites. Genetics 164: 12291236.

    Beckwitt, R. and Arcidiacono, S. (1994) Sequence conservation inthe C-terminal region of spider silk proteins (spidroin) fromNephila clavipes (Tetragnathidae) and Araneus bicentarius(Araneidae). J Biol Chem 269: 66616663.

    Beckwitt, R., Arcidiacono, S. and Stote, R. (1998) Evolution ofrepetitive proteins from Nephila clavipes (Tetragnathidae) andAraneus bicentarius (Araneidae). Insect Biochem Mol Biol 28:121130.

    Bielawski, J.P. and Yang, Z. (2003) Maximum likelihood methodsfor detecting adaptive evolution after gene dulplication. JTheoret Func Genomics 3: 201212.

    Cartan, K.C. and Miyashita, T. (2000) Extraordinary web and silkproperties of Cyrtarachne (Araneae, Araneidae): a possible linkbetween orb-webs and bolas. Biol J Linnean Soc 71: 219235.

    Coddington, J.A. and Levi, H.W. (1991) Systematics and evolutionof spiders (Araneae). Annu Rev Ecol Syst 22: 565592.

    Craig, C.L. and Reikel, C. (2002) Comparative architecture of silks,fibrous proteins and their encoding genes in insects andspiders. Comparative Biochem Physiol 133: 493507.

    Dayhoff, M.O., Schwartz, R.M. and Orcutt, B.C. (1978) A model ofevolutionary change in proteins. Matrices for detecting distantrelationships, pp. 345358. In: Dayhoff, M.O., ed. Atlas of pro-tein sequence and structure, Vol. 5. National biomedicalresearch foundation Washington DC.

    Douady C.J., Delsuc, F., Boucher, Y., Doolittle, W.F. and Douzery, E.J.P.(2003) Comparison of bayesian and maximum likelihood boot-strap measures of phylogenetic reliability Mol. Biol Evol 20:248254.

    Eisenberg, D., Schwarz, E., Komaromy, M. and Wall, R. (1984)Analysis of membrane and surface protein sequences with thehydrophobic moment plot. J Mol Biol 179: 125142.

    Felsenstein, J. (1981) Evolutionary trees from DNA sequences: amaximum likelihood approach. J Mol Evol 17: 368376.

    Felsenstein, J. (1985) Confidence limits on phylogenies: anapproach using the bootstrap. Evolution 39: 783791.

    Fischer, D., Barret, C., Bryson, K., Elofsson, A., Godzik, A., Jones, D.,Karplus, K.J., Kelley, L.A., Maccallum, R.M., Pawowski, K., Rost, B.,Rychlewski, L. and Sternberg, M.J. (1999) CAFASP-1: CriticalAssessment of Fully Automated Structure Prediction Methods.Proteins: Structure, Function Genetsupplement 3: 209217.

    Gatesy, J., Hayashi, C., Motriuk, D., Woods, J. and Lewis, R.(2001) Extreme diversity, conservation, and convergence ofspider silk fibroin sequences. Science 291: 26032605.

    Gillespie, R.G. (1999) Comparison of rates of speciation in web-building and non-web-building groups within a Hawaiian spiderradiation. J Arachnol 27: 7985.

    Gosline, J.M., Guerette, P.A., Ortlepp, C.S. and Savage, K.N.(1999) The mechanical design of spider silks: from fibroinsequence to mechanical function. J Exp Biol 202: 32953303.

    Guerette, P.A., Ginzinger, D.G., Weber, B.H.F. and Gosline, J.M.(1996) Silk properties determined by gland-specific expressionof a spider fibroin gene family. Science 272: 112115.

    Hall, T.A. (1999) BioEdit: a user-friendly biological sequence align-ment, ed. and analysis program for Windows 95/98/NT. NucleicAcids Symposium Series, 41, 9598.

    Hartl, F.U. and Hayer-Hartl, M. (2002) Molecular chaperones in thecytosol: from nascent chain to folded protein. Science 295:18521858.

    Hawthorn, A.C. and Opell, B.D. (2002) Evolution of adhesivemechanisms in cribellar spider prey capture thread: evidencefor van der Waals and hygroscopic forces. Biol J Linnean Soc77: 18.

    Hayashi, C.Y. and Lewis, R.V. (1998) Evidence from flagelliformsilk cDNA for the structural basis of elasticity and modular natureof spider silks. J Mol Biol 275: 773784.

  • 56 R.J. Challis et al.

    2006 The Royal Entomological Society, Insect Molecular Biology, 15, 4556

    Hayashi, C.Y. and Lewis, R.V. (2000) Molecular architecture andevolution of a modular spider silk protein gene. Science 287:14771479.

    Hayashi, C.Y., Shipley, N.H. and Lewis, R.V. (1999) Hypotheses thatcorrelate the sequence, structure, and mechanical propertiesof spider silk proteins. Int J Biol Macromolecules 24: 271275.

    Hein, J.J. (1993) A heuristic method to reconstruct the history ofsequences subject to recombination. J Mol Evol 20: 402411.

    Hinman, M.B. and Lewis, R.V. (1992) Isolation of a clone encodinga second dragline silk fibroin. J Biol Chem 267: 1932019324.

    Huelsenbeck, J.P. and Ronquist, F. (2001) MRBAYES: Bayesianinference of phylogenetic trees. Bioinformatics 17: 754755.

    Jin, H.-J. and Kaplan, D.L. (2003) Mechanism of silk processing ininsects and spiders. Nature 424: 10571061.

    Jones, D.T., Taylor, W.R. and Thornton, J.M. (1992) The RapidGeneration of Mutation Data Matrices from Protein Sequences.CABIOS, 8, 275282.

    Kelley, L.A., MacCallum, R.M. and Sternberg, M.J.E. (1999)Recognition of remote protein homologies using three-dimensional information to generate a position specific scoringmatrix in the program 3D-PSSM, pp. 218225. In: Istrail, S.Pevzner, P. and Waterman, M., eds. RECOMB 99, Proceedingsof the Third Annual Conference on Computational MolecularBiology. The Association for Computing Machinery, New York.

    Kelley, L.A., MacCallum, R.M. and Sternberg, M.J.E. (2000)Enhanced Genome Annotation using Structural Profiles in theProgram 3D-PSSM. J Mol Biol 299: 499520.

    Kerkham, K., Viney, C., Kaplan, D. and Lombardi, S. (1991) Liquidcrystallinity of natural silk secretions. Nature 349: 596598.

    Kumar, S., Tamura, K., Jakobsen, I.B. and Nei, M. (2001) MEGA2:Molecular Evolutionary Genetics Analysis software, Arizona.State University, Tempe, Arizona, USA.

    Kyte, J. and Doolittle, R.F. (1982) A simple method for displayingthe hydrophobic character of a protein. J Mol Biol 157: 105142.

    Lanave, C.G., Preparata, C., Saccone and Serio, G. (1984) A newmethod for calculating evolutionary substitution rates. J MolEvoution 20: 8693.

    McDonald, J.H. and Kreitman, M. (1991) Adaptive protein evolutionat the Adh locus in Drosophila. Nature 351: 652654.

    McGuire, G. and Wright, F. (2000) TOPAL 2.0: Improved Detection ofMosaic Sequences within Multiple Alignments. Bioinformatics16: 130134.

    Nei, M. (1969) Gene duplication and nucleotide substitution inevolution. Nature 221: 4042.

    Nei, M., Gu, X. and Sitinikova, T. (1997) Evolution by the birth-and-death process in multigene families of the vertebrate immunesystem. National Academy of Sciences Colloquium Geneticsand the Origin of. Species: From Darwin to Molecular Biology60 Years After Dobzhansky.

    Nielsen, R. and Yang, Z. (1998) Likelihood models for detecting

    positively selected amino acid sites and applications to the HIV-1envelope gene. Genetics 148: 929936.

    Parker, J.M.R., Guo, D. and Hodges, R.S. (1986) New hydrophilicityscale derived from High-Performance Liquid Chromatographypeptide retention data: correlation of predicted surface residueswith antigenicity and X-ray-derived accessible sites. Biochem-istry 25: 54255432.

    Philippe, H. and Laurent, J. (1998) How good are deep phyloge-netic trees? Curr Opin Genet Dev 8: 616623.

    Rozas, J., Snchez-Delbarrio, J.C., Messeguer, X. and Rozas, R.(2003) DnaSP, DNA polymorphism analyses by the coalescentand other methods. Bioinformatics 19: 24962497.

    Selden, P.A. (1990) Lower Cretaceous spiders from the Sierra-de-Montsech, North-east Spain. Palaeontology 33: 257285.

    Selden, P.A. and Gall, J.C. (1992) A Triassic Mygalomorph spiderfrom the Northern Vosges, France. Palaeontology 35: 211235.

    Shao, Z. and Vollrath, F. (1999) The effect of solvents on thecontraction and mechanical properties of silks. Polymer 40:17991806.

    Sponner, A., Unger, E., Grosse, F. and Weisshart, A. (2004)Conserved C-termini of spidroins are secreted by the majorampullate glands and retained in the silk thread. Biomacromol-ecules 5: 840845.

    Swofford, D.L. (1999) PAUP*. Phylogenetic Analysis Using Parsi-mony (*and Other Methods), Version 4. Sinauer Associates,Sunderland, Massachusetts.

    Tajima, F. and Nei, M. (1984) Estimation of evolutionary distancebetween nucleotide sequences. Mol Biol Evol 1: 269285.

    Thompson, J.D., Higgins, D.G. and Gibson, T.J. (1994) CLUSTALW:improving the sensitivity of progressive multiple sequencealignment through sequence weighting, position specific gappenalties and weight matrix choice. Nucl Acids Res 22: 46734680.

    Vollrath, F. (1999) Biology of spider silk. Int J Biol Macromolecules24: 8188.

    Vollrath, F. and Edmonds, D. (1989) Modulation of the mechanicalproperties of spider silk by coating with water. Nature 340:305307.

    Wiuf, C., Christensen, T. and Hein, J. (2001) A simulation Study ofthe reliability of recombination detection methods. MolecularBiology and Evolution 18: 19291939.

    Xu, M. and Lewis, R.V. (1990) Structure of a protein superfiber:spider dragline silk. Proc Natl Acad Sci United States America87: 71207124.

    Yang, Z. (1997) PAML: a Program Package for PhylogeneticAnalysis by Maximum Likelihood. CABIOS, 13, 555556.

    Yang, Z. and Bielawski, J.P. (2000) Statistical methods for detectingmolecular adaptation. Trends Ecol Evol 15: 496503.

    Yang, Z. and Nielsen, R. (2002) Codon-substitution models fordetecting molecular adaptation at individual sites along specificlineages. Mol Biol Evol 19: 908917.