100,000 protein structures for the biologist

4
meeting review nature structural biology • volume 5 number 12 • december 1998 1029 Recently, some 200 structural biologists from both academia and industry gath- ered in Avalon, New Jersey to explore the science and organization of the new field of structural genomics 1 . The aim of the structural genomics project is to deliver structural information about most pro- teins. While it is not feasible to determine the structure of every protein by experi- ment, useful models can be obtained by fold assignment and comparative model- ing for those protein sequences that are related to at least one known protein structure. Structure determination of 10,000 properly chosen proteins should result in useful three-dimensional models for hundreds of thousands of other pro- tein sequences. In other words, structural genomics will put each protein within a comparative modeling distance of a known protein structure (Eaton Lattman, Johns Hopkins University). Knowing the structures of biological macromolecules is almost always useful for predicting, interpreting, modifying and designing their functions. The high- throughput, coordinated structure deter- mination efforts should be more efficient for obtaining 10,000 structures than the current distributed and largely uncoordi- nated efforts in individual structural biol- ogy laboratories. Structural genomics will build on the human genome project and will be integrated with the functional genomics project. Here, I review the recent meeting in Avalon. Due to space restrictions, the review omits many of the presentations. However, a complete list of speakers may be found on the meeting web site 1 . The structural genomics project will deliver improved methods and a process for high-throughput protein structure determination. The process involves: (i) selection of the target proteins or domains, (ii) cloning, expression, and purification of the targets, (iii) crystalliza- tion and structure determination by X-ray crystallography or by nuclear magnetic res- onance (NMR) spectroscopy, and (iv) archiving and annotation of the new struc- tures. Several pilot projects in structural genomics that implement all of these steps are already underway (Table 1). It is feasible to determine at least 10,000 protein struc- tures within the next five years. No new dis- coveries or methods are needed for this task. The average cost per structure is expected to decrease significantly from ~$200,000 per structure to perhaps less than $20,000 per structure. These savings will be achieved by the economies of scale, new or improved methods, and their paral- lelization in most of the steps in the process. Target selection The targets for structure determination generally will be individual domains rather than multi-domain proteins (Chris Sander, Millenium Information). Such targets will be easier to determine by X-ray crystallog- raphy or NMR spectroscopy than the more flexible multi-domain proteins, although it would be beneficial to determine whole proteins whenever possible. Target selection will begin by pairwise comparison of all protein sequences, followed by clustering into groups of domains. The targets for the structural genomics project will be the rep- resentatives of the groups without struc- turally defined members. The groups of domains should contain members that share at least ~30% sequence identity (30- seq families), rather than those that corre- spond to superfamiles or fold families. This relatively high granularity is needed because of the limitations in the current methods for comparative modeling and for inferring function from structure. Errors in the comparative models become relatively small as the sequence identity increases above 30% (Andrej ˘ Sali, Rockefeller University) and function is generally con- served within families (John Moult, Center for Advanced Research in Biotechnology, Rockville, Maryland); Christine Orengo, University College, London). Given the clustering into the 30-seq families, the total number of targets for experimental struc- ture determination has been estimated to be on the order of 10,000. As an alternative, it has also been suggested that 100,000 protein structures need to be determined by experi- ment (David Eisenberg, UCLA); this would allow calculation of models with higher accuracy than is possible with only 10,000 known structures. After a list of the 30-seq families is obtained, the representatives will be picked and prioritized. A variety of dif- ferent criteria likely will be used for this task; for example, size of the family, biologi- cal knowledge about the family, distribu- tion of the members among various organisms, the pharmaceutical relevance, the likelihood of a successful structure determination, and so forth . A preliminary analysis suggests that among recent protein structures deter- mined in the hope of discovering a new fold, more than two-thirds actually have exhibited an already known fold (Steven Brenner, Stanford). While this unveils weaknesses in the fold assignment meth- ods, it is not a waste of resources. In prac- tical terms, the structure of a protein is worth determining if we do not know it, for whatever reason. In fact, it may even be an advantage when the newly determined protein is related to other known struc- tures because such structure is likely to be more informative about its function than in the case of structural ‘orphans’. Given a fixed number of experimentally determined protein structures, the quality and the number of models for the rest of the proteins will be influenced strongly by the methods for fold assignment and comparative protein structure modeling. The impact of expected improvements in these methods may be equivalent to sever- al years of the experimental protein struc- ture output (Andrej ˘ Sali). A number of different approaches to fold assignment have been described, applied, and evaluat- ed (Ron Levy, Rutgers University; George Rose, John Hopkins University; Barry Honig, Columbia University; Manfred Sippl, CAME, Salzburg; David Eisenberg; Eugene Koonin, NCBI; Steven Brenner). The methods that rely on multiple 100,000 protein structures for the biologist Andrej ˘ Sali Structural genomics promises to deliver experimentally determined three-dimensional structures for many thousands of protein domains. These domains will be carefully selected, so that the methods of fold assignment and comparative protein structure modeling will result in useful models for most other protein sequences. The impact on biology will be dramatic. © 1998 Nature America Inc. • http://structbio.nature.com © 1998 Nature America Inc. • http://structbio.nature.com

Transcript of 100,000 protein structures for the biologist

Page 1: 100,000 protein structures for the biologist

meeting review

nature structural biology • volume 5 number 12 • december 1998 1029

Recently, some 200 structural biologistsfrom both academia and industry gath-ered in Avalon, New Jersey to explore thescience and organization of the new fieldof structural genomics1. The aim of thestructural genomics project is to deliverstructural information about most pro-teins. While it is not feasible to determinethe structure of every protein by experi-ment, useful models can be obtained byfold assignment and comparative model-ing for those protein sequences that arerelated to at least one known proteinstructure. Structure determination of10,000 properly chosen proteins shouldresult in useful three-dimensional modelsfor hundreds of thousands of other pro-tein sequences. In other words, structuralgenomics will put each protein within acomparative modeling distance of aknown protein structure (Eaton Lattman,Johns Hopkins University).

Knowing the structures of biologicalmacromolecules is almost always usefulfor predicting, interpreting, modifyingand designing their functions. The high-throughput, coordinated structure deter-mination efforts should be more efficientfor obtaining 10,000 structures than thecurrent distributed and largely uncoordi-nated efforts in individual structural biol-ogy laboratories. Structural genomicswill build on the human genome projectand will be integrated with the functionalgenomics project. Here, I review therecent meeting in Avalon. Due to spacerestrictions, the review omits many of thepresentations. However, a complete list ofspeakers may be found on the meetingweb site1.

The structural genomics project willdeliver improved methods and a processfor high-throughput protein structuredetermination. The process involves:(i) selection of the target proteins ordomains, (ii) cloning, expression, andpurification of the targets, (iii) crystalliza-tion and structure determination by X-raycrystallography or by nuclear magnetic res-onance (NMR) spectroscopy, and (iv)

archiving and annotation of the new struc-tures. Several pilot projects in structuralgenomics that implement all of these stepsare already underway (Table 1). It is feasibleto determine at least 10,000 protein struc-tures within the next five years. No new dis-coveries or methods are needed for thistask. The average cost per structure isexpected to decrease significantly from~$200,000 per structure to perhaps lessthan $20,000 per structure. These savingswill be achieved by the economies of scale,new or improved methods, and their paral-lelization in most of the steps in the process.

Target selectionThe targets for structure determinationgenerally will be individual domains ratherthan multi-domain proteins (Chris Sander,Millenium Information). Such targets willbe easier to determine by X-ray crystallog-raphy or NMR spectroscopy than the moreflexible multi-domain proteins, although itwould be beneficial to determine wholeproteins whenever possible. Target selectionwill begin by pairwise comparison of allprotein sequences, followed by clusteringinto groups of domains. The targets for thestructural genomics project will be the rep-resentatives of the groups without struc-turally defined members. The groups ofdomains should contain members thatshare at least ~30% sequence identity (30-seq families), rather than those that corre-spond to superfamiles or fold families.

This relatively high granularity is neededbecause of the limitations in the currentmethods for comparative modeling and forinferring function from structure. Errors inthe comparative models become relativelysmall as the sequence identity increasesabove 30% (Andrej S̆ali, RockefellerUniversity) and function is generally con-served within families (John Moult, Centerfor Advanced Research in Biotechnology,Rockville, Maryland); Christine Orengo,University College, London). Given theclustering into the 30-seq families, the totalnumber of targets for experimental struc-ture determination has been estimated to be

on the order of 10,000. As an alternative, ithas also been suggested that 100,000 proteinstructures need to be determined by experi-ment (David Eisenberg, UCLA); this wouldallow calculation of models with higheraccuracy than is possible with only 10,000known structures. After a list of the 30-seqfamilies is obtained, the representatives willbe picked and prioritized. A variety of dif-ferent criteria likely will be used for thistask; for example, size of the family, biologi-cal knowledge about the family, distribu-tion of the members among variousorganisms, the pharmaceutical relevance,the likelihood of a successful structuredetermination, and so forth .

A preliminary analysis suggests thatamong recent protein structures deter-mined in the hope of discovering a newfold, more than two-thirds actually haveexhibited an already known fold (StevenBrenner, Stanford). While this unveilsweaknesses in the fold assignment meth-ods, it is not a waste of resources. In prac-tical terms, the structure of a protein isworth determining if we do not know it,for whatever reason. In fact, it may even bean advantage when the newly determinedprotein is related to other known struc-tures because such structure is likely to bemore informative about its function thanin the case of structural ‘orphans’.

Given a fixed number of experimentallydetermined protein structures, the qualityand the number of models for the rest ofthe proteins will be influenced strongly bythe methods for fold assignment andcomparative protein structure modeling.The impact of expected improvements inthese methods may be equivalent to sever-al years of the experimental protein struc-ture output (Andrej S̆ali). A number ofdifferent approaches to fold assignmenthave been described, applied, and evaluat-ed (Ron Levy, Rutgers University; GeorgeRose, John Hopkins University; BarryHonig, Columbia University; ManfredSippl, CAME, Salzburg; David Eisenberg;Eugene Koonin, NCBI; Steven Brenner).The methods that rely on multiple

100,000 protein structures for the biologistAndrej S̆ali

Structural genomics promises to deliver experimentally determined three-dimensional structures for manythousands of protein domains. These domains will be carefully selected, so that the methods of fold assignmentand comparative protein structure modeling will result in useful models for most other protein sequences. Theimpact on biology will be dramatic.

© 1998 Nature America Inc. • http://structbio.nature.com©

199

8 N

atu

re A

mer

ica

Inc.

• h

ttp

://s

tru

ctb

io.n

atu

re.c

om

Page 2: 100,000 protein structures for the biologist

meeting review

sequence information, such as PSI-BLAST, appear surprisingly sensitive indetecting remote relationships (LiisaHolm, EMBL-EBI, Cambridge; EugeneKoonin; Steven Brenner).

Cloning, expression and purificationObtaining protein samples for crystal-lization trials and NMR measurementslikely will be the bottleneck in the struc-tural genomics process. Although thecurrent methods are sufficiently power-ful to begin the project, advances areexpected in parallelization and automa-tion of the existing methods for cloning,expression, and purification of proteins.In addition, entirely new technologiesmay arise, such as an in vitro system thatalready has allowed researchers to pro-duce proteins up to concentrations of6 mg ml–1 (Shigeyuki Yokoyama, TokyoUniversity).

Crystallization and X-raycrystallographyStephen Burley (Rockefeller University)described how to test protein samples fortheir propensity to form useful crystals.Crystals have a higher chance of beingformed when the protein species in solu-tion is conformationally homogeneous.This can be evaluated by dynamic lightscattering, as well as by limited proteoly-sis combined with mass spectroscopy.For example, candidates that aremonodisperse under standard aqueousconditions have a probability of at least75% to yield crystals suitable for X-raystudies, in comparison to 10% for poly-disperse samples.

Wayne Hendrickson (ColumbiaUniversity) described several pivotaladvances that have matured only recentlyand form the basis of crystallographicanalysis at a markedly accelerated level.These include: undulator sources at syn-chrotrons, CCD detectors, cryo-crystallog-raphy, MAD phasing analysis, and

selenomethionyl proteins2. It is now possi-ble to sustain a data collection throughputof four structures per day on a single undu-lator beam line. Assuming 180 operationaldays per year and that only one out of threedata sets results in a useful final structure,this gives a conservative estimate ofapproximately 360 structures per year persingle undulator beam line. This meansthat only several dedicated undulated beamlines are needed over the course of fiveyears to achieve the goal of determining10,000 new protein structures.

Tom Terwilliger (LANL) has developedthe SOLVE computer program forobtaining electron density maps of pro-teins directly from multiwavelength ormultiple isomorphous replacement X-ray diffraction data. An optimization of ascoring function replaces a complex rule-based process followed manually by crys-tallographers. This system has alreadyallowed crystallographers to obtainimages of proteins within hours of theend of X-ray data collection. The proce-dure has been successfully applied toMAD data with up to 26 selenium sitesper derivative.

Sung-Hou Kim (University ofCalifornia, Berkeley) described resultsfrom the first structural genomics pilotproject, focusing on the Methanococcusjanaschii genome (Table 1). He presentedthe structures of two proteins, an ATPdependent molecular switch with a newATP binding motif, and a nucleotidepyrophosphatase, with a new fold. Foreach protein, the type of a ligand boundto it, and thus its biochemical function,were predicted from the protein structurealone. The predictions were subsequentlyconfirmed by direct experiment. Theseresults support the idea that proteinfunction can be anticipated from struc-tural information alone and thus thathigh-throughput structure determina-tion will make a valuable contribution tobiology.

Nuclear magneticresonance spectroscopyKurt Wüthrich (ETH,Zürich), Gaetano Monte-lione and Gerhardt Wagner(Harvard Medical School)described recent andprospective advances inNMR spectroscopy that willallow structure determina-tion of larger proteins aswell as increase the accura-cy and speed of the analysis.These advances include

1030 nature structural biology • volume 5 number 12 • december 1998

partial deuteration of protein samples,new magnets (up to 1 GHz), supercon-ducting probes, the TROSY detectionmethod, triple resonance methods for res-onance assignment, spatial informationfrom residual dipolar coupling, and soft-ware for resonance assignments and struc-ture determination. For example,Montelione and colleagues have devel-oped the program AUTOASSIGN to per-form automated backbone assignments ina few minutes of CPU time for proteinssmaller than 200 residues. This tool, inte-grated with the program AUTO_STRUC-TURE for automatic analysis of NOESYdata and structure generation, decreasedthe time needed for structure refinementof a Z-domain from several months toseveral hours. If the NMR data collectionfor a 120 residue protein takes 4–5 weekson a 600 MHz spectrometer and if auto-mated data processing takes 1 day, 120structures can be determined in 1 year by12 machines (Montelione), approximatelyat a cost of using one synchrotron beamline. By using higher field 800 MHz to 1GHz magnets and other sensitivity-enhancement methods, this throughputcould be significantly increased.

Anticipating the usefulness of NMR forstructural genomics, Shigeyuki Yokoyamaof Tokyo University announced theJapanese Genomic Sciences Center thatwill include a large NMR facility. The cen-ter will also support computational analy-sis of the human genome, sequencing andmapping of the mouse genome, and col-laborations with high-throughput proteincrystallography efforts elsewhere in Japan.The center will have 20 NMR instrumentsby 2001 and will be open to internationalcollaboration. A 1 GHz magnet is beingconstructed in collaboration with theTsukuba laboratory, the engineers of themagnets used for the bullet train.

NMR spectroscopy has become increas-ingly more useful and efficient. However,given the synchrotrons, X-ray crystallog-raphy is likely to retain an edge inthroughput, accuracy, maximal proteinsize, and possibly cost per protein struc-ture determination. Nevertheless, NMRspectroscopy will be indispensable formaximizing the benefits of structuralgenomics because of its unique ability toobtain information about the internaldynamics of a protein on multiple timescales as well as about its ligand bindingproperties. For example, Kurt Wüthrichdescribed the importance of flexible tailsfor the function of several proteins.Another example is the routine use ofchemical shift perturbation to enumerate

Table 1 List of pilot projects in structural genomics1

Pilot project organizers OrganismS. Burley, A. S̆ali, J. Sussman S. cerevisiaeA. Edwards M. thermoautotrophicumD. Eisenberg, T. Terwilliger P. aerophilumS.-H. Kim M. janaschiiG. Montelione, S. Anderson MetazoaJ. Moult H. influenzaeS. Yokoyama T. Thermophilus, HB8

1These projects were described at the meeting. Each project gen-erally involves several independent laboratories. Only speakers atthe meeting are listed in the first column. Their addresses can befound at http://lion.cabm.rutgers.edu/bioinformatics_meeting/.

© 1998 Nature America Inc. • http://structbio.nature.com©

199

8 N

atu

re A

mer

ica

Inc.

• h

ttp

://s

tru

ctb

io.n

atu

re.c

om

Page 3: 100,000 protein structures for the biologist

meeting review

binding sites on a protein for a thousandligands per day from a library of 15,000compounds, with up to 1 mM sensitivityin the dissociation constant (Steven Fesik,Abbott Laboratories). Also potentially fea-sible is NMR structure determination ofmembrane protein structures, which areexceptionally difficult to crystallize.

DatabasesSeveral different databases will be usefulin facilitating the structural genomics pro-ject (Table 2). These databases will be usedfor target selection, coordination of theexperimental efforts, archiving of the newstructures, dissemination of the results,and for performing new research with theold and new data. Helen Berman ofRutgers University described a newmacromolecular structure database thatwill replace the Brookhaven Protein DataBank3. Liisa Holm, Christine Orengo, andSteven Brenner described CATH, FSSP,and SCOP databases respectively. Thesedatabases classify protein structures in ahierarchical manner, generally consistingof families (proteins that share at least~30% sequence identity), superfamilies(proteins that are probably related bydivergent evolution but can have very lit-tle sequence similarity), and fold families(proteins that share similar folds).Orengo pointed out a great need for anexhaustive, systematic and computerfriendly database of protein functionalannotations. Such a database is necessaryfor the development and use of the meth-ods that correlate structure to function.Andrej S̆ali described ModBase, a largedatabase of comparative protein structuremodels. This database is calculated withprogram Modeller and currently containsprotein models for five genomes. It willcontinue to grow with the availability ofnew protein structures and sequences, aswell as the improvements in the modelingsoftware. Steven Brenner introducedPRESAGE, a database that stands tobecome the central nervous system forcoordinating the efforts in structuralgenomics. PRESAGE includes entries forproteins; each entry contains informationabout the protein’s homologs, current sta-tus of structure determination, links tothree-dimensional models, and so forth.The entries in PRESAGE will be con-tributed by the community at large.Biologists will use it as a portal for access-ing the most up-to-date informationabout proteins of interest. For example,PRESAGE would have eliminated theduplicated structure determination ofelongation factor 5A by two structural

nature structural biology • volume 5 number 12 • december 1998 1031

genomics pilot projects (Sung-Hou Kim;Tom Terwilliger).

Using the structuresDevelopment of the databases and soft-ware will be necessary to integrate rawstructural information and structure-derived results with other information,such as genetic and biochemical analysis,gene expression patterns, mutationalanalysis, binding data from protein chips,interaction data from two-hybrid systems,metabolic pathway information, and soforth (Chris Sander). These tools will thenallow researchers to address efficiently ahost of new questions. Some such studies,illustrating but the tip of the iceberg, havealready been performed and weredescribed at the meeting. For example,questions about the distribution of differ-ent structures among the genomes, andthe implications for models of evolution,have been asked by Mark Gerstein (YaleUniversity) and Eugene Koonin. In addi-tion, David Eisenberg focused on under-standing the extreme thermostability of

proteins in thermophilic organisms interms of the amino acid composition andstructural features of the proteins. Thestudy was based on a comparison of pro-teins that have homologs in both ther-mophilic and mesophilic organisms andwhich have homologs of known structure.It appears that thermostability of proteinsoriginates in part from salt bridges andfrom deletions of loop regions.

Jeff Skolnick (Scripps ResearchInstitute) described a method for predict-ing protein function based on structureprediction and three-dimensional motifidentification. This method could pushthe detection of protein function muchfurther into the twilight zone of sequenceidentity. First, the tertiary structures of thesequences of interest are predicted eitherab initio or by threading. Then, the result-ing models are screened using a library ofthree-dimensional descriptors of proteinactive sites, termed ‘fuzzy functionalforms’. If the geometry and residue typesin the predicted structures match a three-dimensional motif, then the protein is

Table 2 Structural genomics web sites

Pilot ProjectsDOE http://proi3.lanl.gov/structural_genomics/New York http://genome5.bio.bnl.gov/Proteome/CARB/TIGR http://s2f.carb.nist.govDatabasesNCBI http://www.ncbi.nlm.nih.gov/PDB http://www.pdb.bnl.gov/MSD http://www.rcsb.org/DALI http://www2.ebi.ac.uk/dali/CATH http://www.biochem.ucl.ac.uk/bsm/cath/SCOP http://scop.mrc-lmb.cam.ac.uk/scop/PRESAGE http://presage.stanford.eduModBase http://guitar.rockefeller.edu/modbase/GeneCensus http://bioinfo.mbb.yale.edu/genomeFold assignmentPhD http://www.embl-heidelberg.de/predictprotein/predictprotein.htmlTHREADER http://globin.bio.warwick.ac.uk/”jones/threader.html123D http://www-lmmb.ncifcrf.gov/”nicka/123D.htmlUCLA-DOE http://www.doe-mbi.ucla.edu/people/frsvr/frsvr.htmlPROFIT http://lore.came.sbg.ac.at/Comparative modelingCOMPOSER http://www-cryst.bioc.cam.ac.uk/CONGEN http://www.cabm.rutgers.edu/”brucDRAGON http://www.nimr.mrc.ac.uk/”mathbio/a-aszodi/dragon.htmlModeller http://guitar.rockefeller.edu/modeller/modeller.htmlPrISM http://honiglab.cpmc.columbia.edu/SWISS-MODEL http://www.expasy.ch/swissmod/SWISS-MODEL.htmlWHAT IF http://www.sander.embl-heidelberg.de/vriend/MiscellaneousTarget selection http://structuralgenomics.org/Funding NIH http://www.nih.gov/nigms/Funding NSF http://www.nsf.gov/home/grants.htmPedro’s list http://www.public.iastate.edu/”pedro/research_tools.htmlRoberto’s list http://guitar.rockefeller.edu/”roberto/tools/tools.html

© 1998 Nature America Inc. • http://structbio.nature.com©

199

8 N

atu

re A

mer

ica

Inc.

• h

ttp

://s

tru

ctb

io.n

atu

re.c

om

Page 4: 100,000 protein structures for the biologist

Every year, we are reminded of the powerof the influenza virus, especially whensevere and deadly outbreaks occur — suchas last season’s Hong Kong ‘bird flu’ inwhich 6 of the 18 confirmed cases inhumans were fatal. What makes oneinfluenza virus more virulent than anoth-er? In the case of the Hong Kong virus, onefactor was probably a variation in itshemagglutinin (HA) precursor protein,which had a five residue insertion.Hemagglutinin, a glycoprotein that is heldin the viral membrane by a C-terminaltransmembrane sequence, is required forviral membrane fusion during infection.Two events must occur for hemagglutininto become active: (i) the precursor protein,HA0, must be cleaved to create two disul-fide linked subunits HA1 and HA2 and (ii) cleaved HA must undergo a low pH-induced conformational change in endo-somes. The new N-terminus of HA2 is thefusion peptide that becomes embedded inthe target membrane, leading to infection.The pH-induced conformational changeprojects the fusion peptide to the end of along coiled coil, placing it close to the HAC-terminal transmembrane domain andthus bringing the viral and the target

membranes into close proximity. The five-residue insertion in the Hong Kong HA0hemagglutinin variant occurred at theHA0 cleavage site. Why would such amutant lead to higher virulence?Researchers have found that HA0 variantswith insertions in this region are more sus-ceptible to proteases, leading to a greaterproportion of cleaved HA molecules andhence a higher infectivity.

Hemagglutinin is a logical drug targetbecause of its key role in the success of aninfection. Researchers could aim to devel-op drugs that would block the cleavagestep or prevent the conformationalchange. Useful information for suchendeavors is provided by a recent structureof the HA0 precursor (Chen, J. et al. Cell95, 409-417; 1998), which shows the con-formation of the intact cleavage site. A

comparison with the previously deter-mined cleaved HA structure suggests thatproteolysis may be playing a direct role insetting the low pH trigger for the confor-mational change — in addition to playingmore obvious roles in exposing the fusionpeptide and giving conformational flexi-bility to the protein. In the HA0 precursor,the cleavage site (left image, between thedots) is exposed on a surface loop. Afterproteolysis, the new N-terminus of HA2fills a hole in the protein (arrow), buryingseveral ionizable Asp and His residues thatwere previously exposed in the cavity(right image). In theory, this could set asensitive trigger: protonation of the Hisresidue at low pH could severely destabi-lize the structure and lead to the dramaticconformational change that is necessaryfor infection. TS

meeting review

predicted to have the specified molecularfunction. The idea and its power relativeto local sequence pattern searches wereillustrated by searching for proteins witha disulfide oxidoreductase active site inthree bacterial genomes.

Impact of structural genomicsSuggestions about how best to proceedwith the structural genomics project werediscussed. Ideas ranging from continuing‘research as usual’ to establishing largecenters for structural genomics wereexplored. The pilot projects will helpshape the full scale effort. It is likely thatrelatively large centers will provide theinfrastructure for X-ray crystallographyand NMR spectroscopy, allow the econo-my of scale, foster collaboration, and alsoensure that the technical and repetitivejobs are not performed by students andpostdocs in the academic laboratories.Centers will make it possible for academic

and pharmaceutical researchers to obtainprotein samples and research informa-tion, such as purification protocols andcrystallization conditions. Such an infra-structure will enable the individual labo-ratories to focus on more challengingresearch problems, including functionalcharacterization of their favorite proteins,structure determination of disease targets,large proteins and protein complexes,study of post-translational modifications,and systemic questions involving net-works of proteins. Structural biologists inthe pharmaceutical industry will increasetheir impact on the drug discovery processbecause they will be in a position to deter-mine the structures of many more pro-tein–ligand complexes, at a lower cost andfaster, using the structures, samples andprotocols from the high-throughputstructure determination. The impact ofstructural genomics on the structural andfunctional characterization of proteins, as

1032 nature structural biology • volume 5 number 12 • december 1998

well as on our understanding of themachinery of life and its evolution, willsurely result in profusion of new drug tar-gets and better drugs. Structural genomicswill refocus biology, just as the advent ofof high throughput DNA sequencingmachines freed researchers to focus onmore elaborate experiments rather thanon accumulating sequence data.

Andrej S̆ali is with the Laboratories ofMolecular Biophysics, The RockefellerUniversity, New York, New York 10021,USA. email: [email protected]

1. Structure-based functional genomics, October 4–7,Avalon, New Jersey (1998). http://lion.cabm.rutgers.edu/bioinformatics_meeting/. The meetingwas organized by Gaetano Montelione, StephenAnderson, Edward Arnold, and Ann Stock of theCenter for Advanced Biotechnology and Medicine atRutgers University (CABM), with the backing of theNew Jersey Commission on Science and Technology,CABM, the Merck Genome Research Institute, andthe Burroughs Wellcome Fund.

2. Synchrotron supplement. Nature Struct. Biol. 5,614–656 (1998).

3. Editorial, Nature Struct. Biol. 5, 925–926 (1998).

Fight the flu

picture story

Adapted from Chen et al.

© 1998 Nature America Inc. • http://structbio.nature.com©

199

8 N

atu

re A

mer

ica

Inc.

• h

ttp

://s

tru

ctb

io.n

atu

re.c

om