Integrating Diverse Data for Structure Determination of...

ANRV345-BI77-19 ARI 28 April 2008 15:26

Integrating Diverse Data forStructure Determination ofMacromolecular AssembliesFrank Alber,1,3 Friedrich Forster,1

Dmitry Korkin,1,4 Maya Topf,2 and Andrej Sali11Departments of Biopharmaceutical Sciences and Pharmaceutical Chemistry,and California Institute for Quantitative Biosciences, University of Californiaat San Francisco, California 94158-2330; email: [email protected]; [email protected] of Crystallography, Birkbeck College, University of London, London WC1E7HX, United Kingdom; email: [email protected] addresses: Molecular and Computational Biology, Departmentof Biological Sciences, University of Southern California, Los Angeles,California 90089-2910; email: [email protected] Institute and Department of Computer Science, University of Missouriat Columbia, Missouri 65211; email: [email protected]

Annu. Rev. Biochem. 2008. 77:443–77

First published online as a Review in Advance onMarch 4, 2008

The Annual Review of Biochemistry is online atbiochem.annualreviews.org

This article’s doi:10.1146/annurev.biochem.77.060407.135530

Copyright c© 2008 by Annual Reviews.All rights reserved

0066-4154/08/0707-0443$20.00

Key Words

architecture, assembly, complex, configuration, hybrid methods,restraints

AbstractTo understand the cell, we need to determine the macromolecularassembly structures, which may consist of tens to hundreds of com-ponents. First, we review the varied experimental data that char-acterize the assemblies at several levels of resolution. We then de-scribe computational methods for generating the structures usingthese data. To maximize completeness, resolution, accuracy, preci-sion, and efficiency of the structure determination, a computationalapproach is required that uses spatial information from a varietyof experimental methods. We propose such an approach, definedby its three main components: a hierarchical representation of theassembly, a scoring function consisting of spatial restraints derivedfrom experimental data, and an optimization method that generatesstructures consistent with the data. This approach is illustrated bydetermining the configuration of the 456 proteins in the nuclearpore complex (NPC) from baker’s yeast. With these tools, we arepoised to integrate structural information gathered at multiple lev-els of the biological hierarchy—from atoms to cells—into a commonframework.

443

Click here for quick links to Annual Reviews content online, including:

• Other articles in this volume• Top cited articles• Top downloaded articles• Our comprehensive search

FurtherANNUALREVIEWS

Ann

u. R

ev. B

ioch

em. 2

008.

77:4

43-4

77. D

ownl

oade

d fr

om a

rjou

rnal

s.an

nual

revi

ews.

org

by U

nive

rsity

of

Que

ensl

and

on 0

3/23

/09.

For

per

sona

l use

onl

y.

ANRV345-BI77-19 ARI 28 April 2008 15:26

NPC: nuclear porecomplex

Configuration:component positionsand the presence ofinteractions

Resolution: for adensity map,resolution is theminimum distancebetween two pointsat which they can bedistinguished

Contents

INTRODUCTION. . . . . . . . . . . . . . . . . 444Assemblies as Functional Modules

of the Cell . . . . . . . . . . . . . . . . . . . . 444SOURCES OF SPATIAL

INFORMATION . . . . . . . . . . . . . . . . 446X-ray Crystallography

and NMR Spectroscopy . . . . . . . 447Electron Microscopy . . . . . . . . . . . . . 448Small-Angle X-Ray Scattering . . . . 449Proteomics Methods

and Mass Spectrometry . . . . . . . . 449Labeling Techniques. . . . . . . . . . . . . . 450Biochemical and Biophysical

Methods . . . . . . . . . . . . . . . . . . . . . . 451COMPUTATIONAL METHODS

FOR ASSEMBLY STRUCTUREDETERMINATION . . . . . . . . . . . . 451Template-Based Modeling . . . . . . . . 451Protein-Protein Docking . . . . . . . . . 453Comparative Patch Analysis . . . . . . . 454Structure Characterization

from Density Maps . . . . . . . . . . . . 455Structure Characterization from

Small-Angle X-ray Scattering . . 459COMPREHENSIVE DATA

INTEGRATION BYSATISFACTION OFSPATIAL RESTRAINTS . . . . . . . . 460Theory and Method . . . . . . . . . . . . . . 460Structural Characterization of

the Nuclear Pore Complex . . . . . 466CONCLUSIONS. . . . . . . . . . . . . . . . . . . 470

INTRODUCTION

Assemblies as Functional Modulesof the Cell

Macromolecular assemblies consist of non-covalently interacting macromolecular com-ponents, such as proteins and nucleic acids.They vary widely in size and play crucial rolesin most cellular processes (1). Many assem-blies are composed of tens and even hun-dreds of individual components. For exam-

ple, the nuclear pore complex (NPC) of ∼456proteins regulates macromolecular transportacross the nuclear envelope (NE); the ri-bosome consists of ∼80 proteins and ∼15RNA molecules and is responsible for proteinbiosynthesis.

Need for assembly structures. A compre-hensive characterization of the structures anddynamics of biological assemblies is essen-tial for a mechanistic understanding of thecell (2–5). Even a coarse characterization ofthe configuration of macromolecular com-ponents in a complex (Figure 1) helps toelucidate the principles that underlie cellu-lar processes, in addition to providing a nec-essary starting point for a higher-resolutiondescription.

Scope. Complete lists of the macromolecu-lar components of biological systems are be-coming available (6). However, the identifica-tion of complexes between these componentsis a nontrivial task. This difficulty arises partlyfrom the multitude of component types andthe varying life spans of the complexes (7).The most comprehensive information aboutbinary protein interactions is available for theSaccharomyces cerevisiae proteome, consistingof ∼6200 proteins. This data has been gener-ated by methods such as the yeast two-hybridsystem (8, 9) and affinity purifications coupledwith mass spectrometry (10–12). The lowerbound on binary protein interactions in yeasthas been estimated to be ∼30,000 (7), corre-sponding to the average of ∼9 protein part-ners per protein, although not necessarily allat the same time. The number of higher-ordercomplexes in yeast is estimated to be ∼800 onthe basis of affinity purification experiments(10–13). The human proteome may have anorder of magnitude more complexes than theyeast cell, and the number of different com-plexes across all relevant genomes may be sev-eral times larger still. Therefore, there maybe thousands of biologically relevant macro-molecular complexes in a few hundred keycellular processes whose stable structures and

444 Alber et al.

Ann

u. R

ev. B

ioch

em. 2

008.

77:4

43-4

77. D

ownl

oade

d fr

om a

rjou

rnal

s.an

nual

revi

ews.

org

by U

nive

rsity

of

Que

ensl

and

on 0

3/23

/09.

For

per

sona

l use

onl

y.

ANRV345-BI77-19 ARI 28 April 2008 15:26

Co

mp

os

itio

nS

toic

hio

me

try

Inte

rac

tio

nty

pe

sC

om

po

nen

t in

tera

cti

on

sC

om

po

ne

nt

co

nfi

gu

rati

on

Mo

lecu

lar

arc

hit

ec

ture

Ps

eu

do

-a

tom

ics

tru

ctu

re

Ato

mic

str

uc

ture

Dy

na

mic

pro

ce

ss

es

Sp

ati

al

& t

em

po

ral

info

rma

tio

n

NM

R s

pect

rosc

opy

FR

ET

spe

ctro

scop

yC

ryo-

ET

Sp

ati

al

info

rma

tio

n

Com

para

tive

mod

elin

gA

b in

itio

pred

ictio

nC

ompa

rativ

e pa

tch

anal

ysis

Com

puta

tiona

l doc

king

Ele

ctro

n sp

in r

eson

ance

Bio

info

rmat

ics

Cry

o-E

MC

ryo-

ET

SA

XS

H/D

exc

hang

eF

ootp

rintin

g

X-r

ay c

ryst

allo

grap

hyN

MR

spe

ctro

scop

y

No

n-s

pa

tia

l in

form

ati

on

Ove

rlay

assa

ysA

ffini

ty p

urifi

catio

nY

east

two-

hybr

idG

enet

ic in

tera

ctio

nsB

ioin

form

atic

sQ

uant

itativ

e im

mun

oblo

tting

PC

AS

urfa

ce p

lasm

on r

eson

ance

Cal

orim

etry

Mas

s sp

ectr

omet

ry

Imm

uno-

EM

FR

ET

spe

ctro

scop

yE

lect

ron

mic

rosc

opy

Sym

met

ry in

form

atio

nM

ass

spec

trom

etry

Co

mp

os

itio

n

Co

py

nu

mb

ers

Inte

rac

tio

np

art

ne

rsIn

tera

cti

on

ins

tan

ce

sP

os

itio

ns

an

d t

ime

Ato

mic

p

osit

ion

sR

es

idu

ep

osit

ion

sC

om

po

ne

nt

po

sit

ion

an

do

rien

tati

on

Co

mp

on

en

t p

osit

ion

Co

mp

on

en

t ty

pe

sC

om

po

nen

t in

sta

nces

Figu

re1

Stru

ctur

alin

form

atio

nab

outa

nas

sem

bly.

Var

ied

expe

rim

enta

lmet

hods

can

dete

rmin

eth

eco

pynu

mbe

rs(s

toic

hiom

etry

)and

type

s(c

ompo

sitio

n)of

the

com

pone

nts

(whe

ther

orno

tcom

pone

nts

inte

ract

with

each

othe

r),p

ositi

ons

ofth

eco

mpo

nent

s,an

dth

eir

rela

tive

orie

ntat

ions

.Im

port

antly

,som

em

etho

dsid

entif

yon

lyco

mpo

nent

type

san

ddo

notd

istin

guis

hbe

twee

ndi

ffer

enti

nsta

nces

ofa

com

pone

ntof

the

sam

ety

pew

hen

mor

eth

anon

eco

pyof

itis

pres

enti

nth

eas

sem

bly.

Oth

erm

etho

dsdo

iden

tify

spec

ific

inst

ance

sof

aco

mpo

nent

.Int

egra

tion

ofda

tafr

omva

ried

met

hods

gene

rally

incr

ease

sth

eac

cura

cy,e

ffici

ency

,and

cove

rage

ofst

ruct

ure

dete

rmin

atio

n.A

bbre

viat

ions

:Cry

o-E

T,c

ryo-

elec

tron

tom

ogra

phy;

EM

,ele

ctro

nm

icro

scop

y;FR

ET

,fluo

resc

ence

reso

nanc

een

ergy

tran

sfer

;H/D

,hyd

roge

n/de

uter

ium

;PC

A,

prot

ein-

frag

men

tcom

plem

enta

tion

assa

y;SA

XS,

smal

l-an

gle

X-r

aysc

atte

ring

.

www.annualreviews.org • Integrative Structure Determination 445

Ann

u. R

ev. B

ioch

em. 2

008.

77:4

43-4

77. D

ownl

oade

d fr

om a

rjou

rnal

s.an

nual

revi

ews.

org

by U

nive

rsity

of

Que

ensl

and

on 0

3/23

/09.

For

per

sona

l use

onl

y.

ANRV345-BI77-19 ARI 28 April 2008 15:26

Electronmicroscopy (EM):includes single-particle EM, electrontomography, andelectroncrystallography andprovides spatial mapsof assemblies atpseudoatomic orlower resolutions

SAXS: small-angleX-ray scattering

FRET: fluorescenceresonance energytransfer

Accuracy:difference betweenthe determinedstructure and theactual nativestructure

Precision:variability amongstructural solutionsconsistent with thedata

ET: electrontomography

transient interactions are yet to be character-ized (1, 14).

Difficulties. Compared to structure deter-mination of the individual components, how-ever, structural characterization of macro-molecular assemblies is usually more difficultand represents a major challenge in structuralbiology (2, 3). For example, X-ray crystallog-raphy is limited by the difficulties of growingsuitable crystals and building molecular mod-els into large unit cells; nuclear magnetic reso-nance (NMR) spectroscopy is limited by size;electron microscopy (EM), affinity purifica-tion, yeast two-hybrid system, calorimetry,footprinting, chemical cross-linking, small-angle X-ray scattering (SAXS), and fluo-rescence resonance energy transfer (FRET)spectroscopy are limited by low resolutionof the corresponding structural information;and computational protein structure model-ing and docking are limited by low accuracy.

Integrative approach. These shortcomingscan be minimized by simultaneous considera-tion of all available information about a givenassembly (Figure 1) (3, 15–17). This infor-mation may vary greatly in terms of its accu-racy and precision and includes data from bothexperimental methods and theoretical consid-erations, such as those listed above. The in-tegration of structural information about anassembly from various sources can only beachieved by computational means. In this re-view, we focus on the computational aspectsof data integration.

Chapter outline. We begin by reviewingthe types of spatial information generatedby experimental and computational meth-ods that have allowed structural biology toshift its focus from individual proteins tolarge assemblies. Such data include atomic andresidue positions from X-ray crystallographyand NMR spectroscopy, shape and density foran assembly from EM and SAXS, as well ascomponent proximities and interactions from

proteomics methods, mass spectrometry, andlabeling techniques.

Next, we review computational methodsthat generate models of assembly structureson the basis of given information. In partic-ular, we focus on advances in computationalmethods for comparative modeling of com-plexes and for docking atomic structures ofproteins to other proteins. We also reviewmethods for the fitting of component struc-tures into density maps, typically determinedby cryo-EM and cryo-electron tomography(cryo-ET). In addition, we outline computa-tional methods that use data from SAXS ex-periments in solution.

Finally, we offer a perspective on generat-ing macromolecular assemblies that are con-sistent with all available information from ex-perimental methods, physical theories, andstatistical preferences extracted from biolog-ical databases. Such an integrative system, inprinciple, achieves higher completeness, reso-lution, accuracy, precision, and efficiency thana structure characterization using any of theindividual types of data alone (3, 18). We illus-trate this approach by its application to the de-termination of the configuration of 456 pro-teins in the yeast NPC (18, 19).

SOURCES OF SPATIALINFORMATION

Various experimental methods produce differ-ent types of structural information (Figure 1).This information differs in terms of the spatialfeatures it restrains as well as in resolution, ac-curacy, and quantity. The stoichiometry andcomposition of protein components in an as-sembly can be determined by methods such asquantitative immunoblotting and mass spec-trometry. The positions of the componentscan be elucidated by cryo-EM and labelingtechniques. Whether or not components in-teract with each other can be measured by theyeast two-hybrid system and affinity purifica-tion. Relative orientations of components andinformation about interacting residues can beinferred from cryo-EM, hydrogen/deuterium

446 Alber et al.

Ann

u. R

ev. B

ioch

em. 2

008.

77:4

43-4

77. D

ownl

oade

d fr

om a

rjou

rnal

s.an

nual

revi

ews.

org

by U

nive

rsity

of

Que

ensl

and

on 0

3/23

/09.

For

per

sona

l use

onl

y.

ANRV345-BI77-19 ARI 28 April 2008 15:26

(H/D) exchange, OH radical footprinting,and chemical cross-linking. At the highest res-olution, information about the atomic struc-tures of components and their interactions canbe determined by X-ray crystallography andNMR spectroscopy. Importantly, some meth-ods do not distinguish between different in-stances of a component of the same type, re-sulting in ambiguity when more than one copyof the component is present in the assembly(e.g., proteomics methods, including the yeasttwo-hybrid system and affinity purification).Structures can be described at different levelsof resolution, including the component con-figuration (specifying component positionsand the presence of interactions), the molec-ular architecture (specifying the components’configuration and relative orientations), pseu-doatomic models (specifying atomic positionswith errors larger than the size of an atom),and atomic structures (specifying atomic po-sitions with precision smaller than the size ofan atom).

X-ray Crystallographyand NMR Spectroscopy

X-ray crystallography has been the most pro-lific technique for the structural analysis ofproteins and protein complexes and is still the“gold standard” in terms of accuracy and res-olution. X-ray crystallography measures thestructure factor amplitudes and approximatephases for a crystal sample. Together with amolecular mechanics force field, this informa-tion is used in an optimization process that canresult in an atomic structure of the assembly(20, 21).

NMR spectroscopy allows determinationof atomic structures of increasingly large sub-units and even complexes in solution undernear-native conditions (22, 23). Data fromNMR spectroscopy include upper distancebounds between pairs of atoms and dihe-dral angle values of certain groups of atoms.In combination with a molecular mechan-ics force field, this information can result inan atomic structure of the protein through

Moleculararchitecture:components’configuration andrelative orientations

Pseudoatomicmodels: atomicpositions with errorslarger than the sizeof an atom

an optimization process (24). NMR spec-troscopy is also increasingly used to deter-mine the interacting surfaces of protein com-ponents in complexes from chemical shiftperturbations (25) and residue dipolar cou-pling (26). Such information can be combinedwith computational docking to obtain approx-imate structures of protein complexes (seeProtein-Protein Docking below). It is partic-ularly useful that NMR spectroscopy meth-ods can be applied to weak and transient pro-tein complexes, which are difficult to studyby other structural methods (27, 28). Forinstance, transient encounter complexes inprotein-protein associations can be visualized(29). Moreover, in-cell NMR spectroscopyprovides a means of analyzing the structuralchanges that accompany protein interactionsin vivo and at atomic resolution (30).

Number of structures. The number ofstructures of macromolecular assembliessolved by X-ray crystallography or NMRspectroscopy is still relatively small. In theProtein Data Bank (PDB) (31), there are ap-proximately 5000 binary interfaces with lessthan 30% sequence identity to each other. Itwill likely be many years before we have acomplete repertoire of high-resolution struc-tures for the hundreds of binary and higher-order complexes in a typical cell. This discrep-ancy is due mainly to the difficult productionof sufficient quantities of the sample and itscrystallization.

Utility of atomic structures. Atomic struc-tures of protein complexes provide templatesfor the comparative modeling (32) of proteincomplexes (see Comparative modeling, be-low). They are also used to derive statisticalpotentials of mean force (33, 34) that are use-ful for the generation and assessment of pro-tein complexes by computational docking (seeProtein-Protein Docking, below). Several at-tempts have been made to classify proteincomplexes (35) and protein-protein interfaces(33) on the basis of their atomic structures,providing a basis for analysis of the physical


Ann

u. R

ev. B

ioch

em. 2

008.

77:4

43-4

77. D

ownl

oade

d fr

om a

rjou

rnal

s.an

nual

revi

ews.

org

by U

nive

rsity

of

Que

ensl

and

on 0

3/23

/09.

For

per

sona

l use

onl

y.

ANRV345-BI77-19 ARI 28 April 2008 15:26

Atomic models:atomic positionswith precisionsmaller than the sizeof an atom

principles, function, and evolution of interac-tions between proteins.

Electron Microscopy

The different variants of EM are electroncrystallography (36, 37), single-particle EM(38, 39), and ET (40–42).

Electron crystallography. Electron crystal-lography requires macromolecules to bearranged in two-dimensional crystals (typi-cally for membrane proteins) (36, 37) or he-lical fibers (often for proteins involved infilaments) (43). The resolution obtained inelectron crystallography is frequently suffi-cient to trace the protein backbone (<4.5 A)or at the very least to obtain pseudoatomicmodels by fitting component atomic struc-tures into the map (∼5–10 A) (44). However,the technique is not used often owing to diffi-culties in obtaining periodic arrays and to hightechnical demands.

Single-particle EM. Single-particle EM canbe applied to a dried and heavy-metal-stainedsample (negative stain EM) or to a hydratedand frozen sample (cryo-EM). Although neg-ative stain EM can only determine the en-velope of an assembly, cryo-EM also deter-mines the whole electron-optical density dis-tribution (38). Imaging by single-particle EMrequires neither large quantities of the sam-ple nor the sample in a crystalline form (38,39, 45). Single-particle EM is a powerful toolfor the investigation of macromolecular as-sembly structures that exist in different con-formational states (46) or for those whoseX-ray structure determination is difficult.An assembly typically needs to weigh above∼250 kDa. Typical resolutions are currentlyin the intermediate range (5–15 A) (47). Thecryo-EM density maps are particularly usefulwhen combined with atomic-resolution struc-tures of the components, as reviewed in thesection Structure Characterization from Den-sity Maps, below (48). For example, fittingof atomic structures and models of proteins

and nucleic acids into cryo-EM maps has re-sulted in quasi-atomic models of viral subunitassemblies (48, 49), ribosomes and ribosome-interacting proteins (46), and various other as-semblies (50). The number of single-particlereconstructions deposited in the Electron Mi-croscopy Database is 446 ( January 3, 2008)(51), indicating that single-particle EM is in-creasingly becoming a standard method instructural biology.

Cryo-ET. Cryo-ET can be used to obtainthree-dimensional structures of pleomorphicobjects such as whole cells (40–42). Thetremendous potential of cryo-ET lies in visu-alizing assemblies in an unperturbed cellularcontext (52). Prokaryotic and thin eukaryoticcells can be imaged in toto, and recent ad-vances in sectioning of vitrified samples makeit possible to gain insights into thicker cells,such as tissue cells (53). The current achiev-able resolution is in the range of 5–10 nm(54). For some macromolecules, higher res-olutions can be obtained by averaging puta-tively identical particles. Cryo-ET is particu-larly attractive for studying membrane-boundcomplexes (55). Images of retroviral envelopeprotein complexes in situ were obtained atapproximately 30-A resolution, where rigid-body fitting is applicable, albeit with low ac-curacy (56). Lower resolution, but invaluable,insights have been obtained into important as-semblies such as the NPC (57–59).

Cryo-ET can potentially characterizetransient interactions by imaging them in anunperturbed environment. Prerequisite to ananalysis of proximities of macromolecules aremethods to systematically determine atlases ofstable assemblies. The problem of introduc-ing electron-dense labels noninvasively favorsthe identification of assemblies on the basis ofknown structural signatures (i.e., using tem-plate matching, see below) (60). Studies us-ing phantom cells (i.e., liposomes with knowncontent) have indicated that this approach tolocating large assemblies (MW > 1 MDa) isfeasible (61); recently, the first ribosomal atlas

448 Alber et al.

Ann

u. R

ev. B

ioch

em. 2

008.

77:4

43-4

77. D

ownl

oade

d fr

om a

rjou

rnal

s.an

nual

revi

ews.

org

by U

nive

rsity

of

Que

ensl

and

on 0

3/23

/09.

For

per

sona

l use

onl

y.

ANRV345-BI77-19 ARI 28 April 2008 15:26

of the whole Spiroplasma melliferum cell wasdetermined (62).

Small-Angle X-Ray Scattering

Small-angle scattering of X-rays (SAXS) andneutrons is another biophysical method thatcan provide low-resolution information aboutthe shape of an assembly (63). These tech-niques study purified proteins and complexesin solution. In SAXS, the molecule’s rotation-ally averaged scattering pattern is measuredas a function of spatial frequency, typicallyto 1–3-nm resolution. This spectrum can bereadily transformed into a radial distributionfunction, which is essentially a histogram ofall pairwise distances of the atoms in an as-sembly weighted by their respective atomicnumbers. Because of rotational averaging, theinformation content of a SAXS spectrum isdramatically reduced compared to a diffrac-tion pattern in X-ray crystallography or evena density map from EM. A conservative esti-mate of the information content is given bythe Whittaker-Shannon sampling criterion,which specifies the number of independentsampling points, N, as a function of resolu-tion, r, of the dataset and the diameter, D, ofthe macromolecule under scrutiny: N = 2Dr .For a particle with diameter of 100 A and a res-olution of 20 A, this criterion yields N equal to10. In comparison, if we apply this criterion toan EM map at the same resolution, we obtainNEM equal to 4190. Nevertheless, SAXS canprovide important shape information aboutproteins and assemblies in the size range of50–250 kDa, which are not amenable to cryo-EM and NMR spectroscopy. In addition, theease of altering solution conditions makesSAXS ideal for studying differences betweenvaried conformational states of the same as-sembly (64). Examples where atomic quater-nary structure models could be obtained usingSAXS in conjunction with atomic structuresof fragments include the Ras activator son ofsevenless (65) and the different nucleotide-bound conformations of the ATPase GspE(66).

Proteomics Methodsand Mass Spectrometry

A variety of proteomics methods produce spa-tial information at relatively low resolutionwhose use for structure determination is gen-erally not straightforward.

Spatial information from proteomics. In-formation about the presence of a binary pro-tein interaction can be translated into an up-per bound on the distance between the twocorresponding components in higher-ordercomplexes. Therefore, even when the detailsabout their binding interface are not avail-able, the distance between the componentcentroids can still be restrained. Moreover,the protein composition of copurified com-plexes can reveal the proximity of a group ofproteins without revealing the underlying bi-nary interaction network. Nevertheless, suchdata implies the presence of a minimum num-ber of protein interactions, so that all compo-nents are connected to each other. Any pre-dicted assembly structure must be consistentwith such connectivity data. It is, in fact, pos-sible to impose an appropriate spatial restraintthat enforces such a connectivity conditionduring an optimization process (see below)(19, 67).

Ambiguity. A distinctive feature ofproteomics-based data is that proteininteractions cannot be unambiguously as-signed to distinct pairs of protein instancesif multiple copies of one or both proteinsare present in the assembly. This multiplicityalways applies if the assembly is built fromidentical symmetry units. Moreover, it is, inprinciple, also possible that not all in vitrodetected interactions are present in a givencomplex at the same time. It is thereforenot always easy to define what constitutesa unique biologically active complex on thebasis of binary protein interaction data alone.To incorporate spatial information from pro-teomics methods, these spatial ambiguitieshave to be considered (see below) (19, 67).


Ann

u. R

ev. B

ioch

em. 2

008.

77:4

43-4

77. D

ownl

oade

d fr

om a

rjou

rnal

s.an

nual

revi

ews.

org

by U

nive

rsity

of

Que

ensl

and

on 0

3/23

/09.

For

per

sona

l use

onl

y.

ANRV345-BI77-19 ARI 28 April 2008 15:26

Binary interactions from varied meth-ods. Methods for detecting binary proteininteractions include the yeast two-hybridsystem (68, 69), protein fragment comple-mentation (70), a combination of phagedisplay with other techniques (71), pro-tein arrays (72), calorimetry (73), and solid-phase detection by surface plasmon reso-nance (74, 75). Physical protein interactionshave also been inferred from genetic inter-actions through reduced activity or lethalityof double-knockout mutant yeast strains (76).Because of the relatively low resolution ofsome of these biochemical characterizationsand their relatively high false-positive rates,care is needed in their interpretation. Forexample, assessing the biochemically derivedinteraction sets against known structures ofcomplexes identified potential sources of sys-tematic errors in interaction discovery, suchas indirect interactions in yeast two-hybridsystems, obstruction of interfaces by molec-ular labels, and artificial promiscuity in thedetected interactions (77, 78).

Higher-order interactions from affinitypurification. An affinity purification exper-iment combines the purification of proteincomplexes with the identification of their indi-vidual components by mass spectrometry (79).The proximity between the identified com-ponents is established because they are di-rectly or indirectly associated with the taggedbait protein (10–12). The method can alsobe applied to characterize protein-DNA (80)and protein-RNA complexes (81). Several dif-ferent complex purification strategies exist(79).

A particularly powerful variant uses a sin-gle fluorescent affinity tag that allows visu-alization of the target protein in live cells,followed by rapid extraction and detectionof interacting macromolecular partners (82).This method determined the localizationand specific interactions of viral proteinswith host-cell interaction partners at differ-ent stages during infection (83). With the ad-vance of fast purification strategies, weak and

transient interactions in complexes will alsobe studied.

A powerful complement of affinity pu-rification is electrospray ionization massspectrometry (84). This technique separatesunique intact complexes and their fragmentsby their charge-to-mass ratios, which allowsthe determination of their composition andstoichiometry.

Although each affinity purification exper-iment on its own is relatively uninformativeabout the structure of the parent assembly,considerable synergy exists when a set of in-dependent affinity purifications represents as-sembly fragments that vary in size and over-lap in their composition. In fact, a relativelymodest set of affinity purifications that par-tially overlap in their composition can be suf-ficient for the identification of the configu-ration of the protein components in an as-sembly, especially if combined with other in-dependent data from proteomic and labelingmethods (67). For instance, the combinationof a large set of affinity purification data hascontributed to the identification of the pro-tein configuration and interaction map of theNPC (see below) (18, 19). In other studies,the generation of fragment complexes by massspectrometry has revealed the subunit archi-tectures of the 19S proteasome (85) and othercomplexes, such as the yeast exosome and thetryptophan RNA-binding attenuation protein(TRAP) complex (84).

Labeling Techniques

Several labeling techniques can be used to de-termine the approximate positions of proteincomponents in an assembly (86). The idea isto tag the protein component of interest witha probe, which can then be detected by EMor other methods.

Antibody labels in immuno-EM. Inimmuno-EM, the recognized label is anantibody, which is typically conjugated tonanometer-sized gold beads to enhance thevisibility in EM images (86). The gold-labeledantibody binds either to a primary antibody

450 Alber et al.

Ann

u. R

ev. B

ioch

em. 2

008.

77:4

43-4

77. D

ownl

oade

d fr

om a

rjou

rnal

s.an

nual

revi

ews.

org

by U

nive

rsity

of

Que

ensl

and

on 0

3/23

/09.

For

per

sona

l use

onl

y.

ANRV345-BI77-19 ARI 28 April 2008 15:26

directed against an epitope in the protein ofinterest or to a Protein A tag that is fusedwith the protein (87). Usually, many imagesof an assembly with the labeled protein aresuperposed to obtain a distribution of thegold particles for a more accurate positioningof the tagged protein. The localization ofproteins using gold-tagged beads is usuallylimited by the relatively large variability ofgold particle positions, typically because oferrors in EM image alignment and linker pro-tein flexibility. Nevertheless, the experimentis informative about the assembly structure ifthe corresponding error bars are still smallerthan the dimension of the assembly. Forinstance, immunogold labeling in combina-tion with transmission EM determined theorganization of the NPC components alongtwo principal axes of the nuclear pore (87)as well as the distribution of proteins in thep97-Ufd1-Npl4 complex (88).

Other labels. The choice of labels is not lim-ited to antibodies. For instance, histidine tagscan be detected using NiNTA-conjugatedgold particles (88), and proteins can also beidentified by interacting proteins that are co-valently bound to beads of gold (59).

Labeling in single-particle EM. In single-particle EM analysis, the precision is oftensufficiently high to detect the label withoutthe use of gold beads. The label is locatedby comparison of the labeled and unlabeleddensities. This approach was used to deter-mine the molecular architectures of numer-ous complexes, including the CCT chaper-onin (89). It also allowed the identification ofconformational variations among splicesoso-mal complexes (90).

Biochemical and BiophysicalMethods

There are a variety of biochemical and bio-physical methods that can be used to deriveinformation about the relative position as wellas the relative orientation of the components

in a larger complex. For example, site-directedmutagenesis can identify residues mediat-ing the interaction (91). Various forms ofchemical footprinting (92, 93) and hydrogen-deuterium exchange (94) can identify surfacesburied upon complex formation. Proximitiesof labeled groups on interacting proteins canbe detected by chemical cross-linking (95–97) and FRET spectroscopy (98–100); for in-stance, the protein organization of the yeastspindle pole body was established to a largeextent with distances from FRET experiments(98). A method for obtaining long-range dis-tance restraints in protein complexes is pulseddipolar spin resonance spectroscopy that pro-vides the separation distance of two specifi-cally placed spins within a protein complex.It has been used for the rigid-body refine-ment of the protein components in complexes(101), such as the Escherichia coli chemosome(102).

COMPUTATIONAL METHODSFOR ASSEMBLY STRUCTUREDETERMINATION

The experimental data about a structure, de-scribed above, must be converted to an ex-plicit structural model through computation.We now describe such computational meth-ods. We focus on the type of information theyuse to calculate the assembly structures, ratherthan on how they calculate them. The meth-ods reviewed in this section use one or twodominant types of information and do not aimto combine explicitly many different types ofinformation.

Template-Based Modeling

Template-based modeling methods rely onknown structures of homologous complexes.Such methods include comparative modelingand threading.

Comparative modeling. It may be possi-ble to model a protein assembly using stan-dard comparative modeling techniques (32,


Ann

u. R

ev. B

ioch

em. 2

008.

77:4

43-4

77. D

ownl

oade

d fr

om a

rjou

rnal

s.an

nual

revi

ews.

org

by U

nive

rsity

of

Que

ensl

and

on 0

3/23

/09.

For

per

sona

l use

onl

y.

ANRV345-BI77-19 ARI 28 April 2008 15:26

?

Homolo

gy

Target-template alignment Restrained dockingScoring

Comparative modeling Protein docking Comparative patch analysis

Methods

Unboundcomponents

Predictedcomplex

HomologyHom

ologyHom

ology

Figure 2Three computational methods for modeling structures of protein complexes. Comparative modelingbuilds a model of a complex by using a known structure of a similar complex as a template. Proteindocking can be applied when no structure of a similar assembly is known because it relies on searchingthrough possible complex configurations and assessing them by geometrical and physicochemicalcomplementarities. Comparative patch analysis is a hybrid of protein docking and comparative modeling;it restrains docking to only refined interaction modes suggested by structurally defined interactionsbetween each of the complex components, or their homologs, with any other protein (138).

33, 103–105) (Figure 2). The requirementis the availability of structural templates thatcan be reliably aligned to the sequences ofthe target assembly. Such templates may coverthe entire assembly or a sufficiently overlap-ping set of its fragments. A comparative modelof an assembly can be assessed with a varietyof different energy and scoring functions, in-cluding empirical statistical potentials that aredesigned to score component interactions andare derived from binary interfaces of knownstructures (33, 77). Comparative modeling as-sumes that homologous subunits constitut-ing the target and template assemblies formequivalent interactions (33, 103, 104). Indeed,interaction modes between proteins of thesame fold tend to be structurally similar [inter-action root-mean-square deviation (iRMSD)≤ 10 A] when the sequence identity is above

∼30% (106). Below this cutoff, the structuresof protein interfaces may be different.

Threading. Binary interfaces that are dis-tantly related to template structures canbe modeled with MULTIPROSPECTOR,which uses threading of individual protein se-quences onto a library of structurally definedinteractions (107). The individual sequencesare then scored on the basis of how well theyfit the proposed folds as well as on the inter-face between them (107).

Both types of approaches have been ap-plied to study large collections of sequencesand interactions (103, 104, 108). The applica-bility of template-based modeling is limitedto protein assemblies whose homologs arecurrently in the PDB (31) and Protein Qua-ternary Structure (109) databases. In a recent

452 Alber et al.

Ann

u. R

ev. B

ioch

em. 2

008.

77:4

43-4

77. D

ownl

oade

d fr

om a

rjou

rnal

s.an

nual

revi

ews.

org

by U

nive

rsity

of

Que

ensl

and

on 0

3/23

/09.

For

per

sona

l use

onl

y.

ANRV345-BI77-19 ARI 28 April 2008 15:26

study, 3387 binary and 1234 higher-orderprotein complexes could be predicted for S.cerevisiae (33).

Protein-Protein Docking

The structure of a binary protein complexcan be predicted by computational proteindocking if atomic structures of its compo-nents are available from experiments or mod-eling (110) (Figure 2). Unlike template-basedmodeling, protein docking can also be ap-plied when structures of homologous assem-blies are unknown and can thus predict novelbinding modes. Protein docking relies on aglobal search of a large set of possible as-sembly configurations, maximizing geomet-rical and physicochemical complementaritiesbetween the pair of constituting components(111–115). Although the vast majority of pro-tein docking methods are applied to proteinassemblies of two components, an approachof docking multiple components was recentlyproposed (116). Protein docking methodsvary in component representation, scoring ofconfigurations, and optimization protocols.

Rigid and flexible docking. Most proteindocking methods treat components as rigidbodies (111, 112, 115, 117). Other methodsincorporate side-chain and backbone flexibil-ities of the component residues (113, 114).However, these methods are usually compu-tationally expensive because the search spaceof possible assembly configurations is signifi-cantly increased.

CAPRI. Docking methods are systematicallyassessed every two years through blind trialsin the Critical Assessment of PRediction ofInteractions (CAPRI) (118). At the meetingin 2005, 2 out of the 30 participating groupspredicted 8 out of 9 assemblies with acceptableaccuracy (118). One group was able to predictfour target assemblies with high accuracy (atleast 50% of native contacts, ligand backboneRMSD ≤ 5 A, interface backbone RMSD≤1 A). Even if the docking methods are not

sufficiently accurate to predict whether or nottwo proteins actually interact with each other,they can, in many cases, correctly identify theinteracting surfaces between two structurallydefined components.

Challenges. The low accuracy of computa-tional protein docking is usually due to the(a) conformational differences in the boundand unbound states of assembly subunits, (b)limitations in the sampling of relevant con-figurations, or (c) difficulty of discriminatingthe native-like configurations from the largenumber of nonnative alternatives (119, 120).As a result, a typical docking method producesan ensemble of candidate solutions, and it isoften difficult to select the native-like mode.

Restrained docking. Varied experimentalinformation about component interactions inan assembly can be used to increase the ac-curacy of protein docking (27). These meth-ods incorporate the additional data eitherafter model building as a filter or duringmodel building to bias the search. The ex-perimental data can provide information atthe atomic level, e.g., chemical shift perturba-tion in NMR spectroscopy (121, 122); residuelevel, e.g., hydrogen/deuterium exchange (94,123); site-directed mutagenesis (91); lower-resolution levels, e.g., chemical cross-linking(124, 125); and residue dipolar coupling inNMR spectroscopy (26, 126).

Many protein docking methods use experi-mental data at the assessment stage to excludesome of the candidate solutions that are in-consistent with the data. Such programs in-clude Hex (127), GRAMM (128), and Roset-taDOCK (91) for use in combination withmutagenesis data (91, 127); DOT (94) for usetogether with hydrogen/deuterium exchangedata; BIGGER (121), AUTODOCK (129),and FTDOCK (130), which employ NMRchemical shift perturbation; and others (26,124).

Another group of methods incorporatesthe experimental data directly into thescoring function to be optimized or the


Ann

u. R

ev. B

ioch

em. 2

008.

77:4

43-4

77. D

ownl

oade

d fr

om a

rjou

rnal

s.an

nual

revi

ews.

org

by U

nive

rsity

of

Que

ensl

and

on 0

3/23

/09.

For

per

sona

l use

onl

y.

ANRV345-BI77-19 ARI 28 April 2008 15:26

optimization protocol. The additional pseu-doenergy term penalizes violations of the ex-perimental data (131, 132). For instance, theprogram HADDOCK can use ambiguous in-teraction restraints implied by chemical shiftperturbations from NMR experiments or mu-tagenesis data (27, 133); weighted geometricdocking uses experimental data to favor cer-tain areas of the subunit surfaces during therotation-translation scan (134, 135). Anothermethod, TREEDOCK, limits the configu-

PDZ3

SH3LinkerN CGK

a

PSD-95 RAT (fragment)

533301 415 430 713

b

Figure 3Two predicted binding modes of the core fragment of rat PSD-95 (138).PSD-95 consists of three domains, the PDZ3 domain (blue), the SH3domain (red), and the GK domain ( yellow). The gray spheres mimic theresidues of the interdomain linker between PDZ3 and SH3. (a) Thedomain architecture of the PSD-95 core fragment. (b) The two predictedconfigurations of PSD-95 (138).

rational search using pairs of anchor atoms,which require contact between them (136).

Comparative Patch Analysis

The locations of protein-binding sites on pro-teins are often conserved in evolution, irre-spective of the folds of their binding part-ners (Figure 2) (137). This feature is exploitedin comparative patch analysis, which is ahybrid of protein docking and comparativemodeling. The method restrains computa-tional docking to binding sites that are con-served within families of homologous do-mains (Figure 2) (138). To determine theconserved binding sites, the following strat-egy is applied. First, for each subunit in a bi-nary complex, a set of protein-binding sitesof its homologs represented in the PIBASEdatabase of structurally defined interfaces isidentified (139). Second, these binding sitesare mapped onto the partner subunit surfaceusing structure-based alignments between thesubunit and each of its homologs. Third, allpairs of the mapped binding sites are used asstarting points for restrained docking to ob-tain candidate models of the binary complex.This ensemble of models is then ranked us-ing a measure of geometric complementarityand a statistical potential score. Comparativepatch analysis has a greater applicability thancomparative modeling and a higher accuracythan protein docking (138).

Application. Comparative patch analysiswas used to model the tertiary structure of thecore fragment of rat PSD-95 (Figure 3) (138).PSD-95 is a key protein in the postsynapticdensity that serves as a structural scaffoldfor other signaling proteins. Although thestructures of its five individual domains havebeen solved, the complete structure of PSD-95 has not been determined. In addition,structures of neither the PDZ-SH3 northe PDZ-GK homologs are available, ren-dering comparative modeling inapplicable.Moreover, computational protein docking

454 Alber et al.

Ann

u. R

ev. B

ioch

em. 2

008.

77:4

43-4

77. D

ownl

oade

d fr

om a

rjou

rnal

s.an

nual

revi

ews.

org

by U

nive

rsity

of

Que

ensl

and

on 0

3/23

/09.

For

per

sona

l use

onl

y.

ANRV345-BI77-19 ARI 28 April 2008 15:26

results are ambiguous, generating anensemble of complexes without any predom-inant binding modes. By contrast, each ofthe subunit families is known to repeatedlyutilize a small number of binding sites fordifferent protein interactions, indicatingthat comparative patch analysis may beuseful. Comparative patch analysis of thePSD-95 core fragment suggests two alternateconfigurations, which potentially correspondto the different functional forms of PSD-95(Figure 3). Thus, this finding providesa possible structural explanation for theexperimentally observed cooperative foldingtransitions in PSD-95 and its homologs.

Comparative modeling, protein docking,and comparative patch analysis can benefitfrom including not only local information onprotein-protein interfaces, but also global in-formation about the overall shape of an as-sembly, as discussed next.

Structure Characterizationfrom Density Maps

Structures of macromolecular assemblies, or-ganelles, and even whole cells can be charac-terized by density maps derived from single-particle EM, electron crystallography, andcryo-ET. If structural models of componentsare available at a resolution higher than that ofthe map, it is usually helpful to fit these mod-els into the map (Figure 4). Here, we focuson various fitting and segmentation methods,first for single-particle EM and electron crys-tallography and then for cryo-ET.

Segmentation. Interpretation of an EM mapusually begins by identifying different struc-tural units (e.g., secondary structure ele-ments, domains, nucleic acids, proteins) inthe density map by means of segmentationtechniques (47). The size of the segmentedunits depends on the resolution of the map.For example, at 5–12-A resolution, secondarystructure segments can be seen (47). Thesegmentation is usually performed manually,

Density fitting:fitting of acomponent into adensity map tomaximize the overlapbetween thecomponent and themap

with the aid of visualization tools such asAmira (http://www.tgs.com/) and Chimera(140). In addition, automated segmentationmethods have been developed recently forboth cryo-EM and cryo-ET maps (141, 142).Methods for assigning structural units in agiven density map include skeletonization aswell as identification of α-helices and β-sheets(47). If the component folds are known, theidentified units can be useful in detecting thepositions of the components in the map. Oth-erwise, when the folds are unknown, the iden-tified units can help in predicting the compo-nent folds (143).

Pseudoatomic structures. In many cases,atomic-resolution models of the componentsare often available from experiment or pre-diction (144). By fitting these models into thecorresponding density map at better than ap-proximately 20-A resolution (145), a pseu-doatomic interpretation of the map can beobtained (48, 146). The utility of fittingcomponent structures into a cryo-EM mapis demonstrated by a detailed pseudoatomicmodel of the mammalian 80S ribosome at8.7-A resolution (186) (Figure 5).

Rigid fitting. In some cases, the fitting ofa component into a given map can be per-formed manually using interactive visualiza-tion tools such as Chimera (140). However,automated computational methods decreasethe level of subjectivity as well as increasethe accuracy and efficiency (145). Fitting ofcomponent structures is usually designed tooptimize a similarity score between the com-ponent and the density map (e.g., the cross-correlation coefficient) as a function of itstranslation and rotation relative to the map(rigid fitting). Many methods for cryo-EMdensity fitting have been developed, includ-ing SITUS (147), COAN (148), EMFIT(149), DOCKEM (150), FOLDHUNTER(151), URO (152), CHARMM (153), Mod-EM (146), and ADP-EM (153a). To im-prove the positioning of components, one


Ann

u. R

ev. B

ioch

em. 2

008.

77:4

43-4

77. D

ownl

oade

d fr

om a

rjou

rnal

s.an

nual

revi

ews.

org

by U

nive

rsity

of

Que

ensl

and

on 0

3/23

/09.

For

per

sona

l use

onl

y.

ANRV345-BI77-19 ARI 28 April 2008 15:26

Secondary structureassignment

Fitting of all known folds

Fold assignment bybioinformatics

Componentsequence

Cryo-EM map

Build multiplecomparative models

Component structure known?

Template detected?

Build multipleab initio models

Rigid fitting

Component structure

Best fit different from map?

Real-spacerefinement

Normal modeanalysis

Explore superfamilyvariability

Yes

No

Yes

No

No

Yes

456 Alber et al.

Ann

u. R

ev. B

ioch

em. 2

008.

77:4

43-4

77. D

ownl

oade

d fr

om a

rjou

rnal

s.an

nual

revi

ews.

org

by U

nive

rsity

of

Que

ensl

and

on 0

3/23

/09.

For

per

sona

l use

onl

y.

ANRV345-BI77-19 ARI 28 April 2008 15:26

E-tRNA

Novelproteins

Expansionsegments

Isu

Sec61TRAP

LD

Isussu

Core rRNAs

Known proteins

tRNA

ssu

a b

Figure 5A model of the cytoplasmic 80S ribosome on the basis of an integrated protocol of comparative proteinstructure modeling, ab initio RNA modeling, density fitting, and density map analysis (186). To constructthe model, thousands of comparative models of mammalian proteins were calculated and fitted into thecryo-EM map at 8.7-A resolution using Mod-EM (146). The models with the highest combination ofcross-correlation and statistical potential (34) scores were selected (161). Conserved mammalian corerRNA and ab initio models of expansion segments were fitted and refined in the density using RSRef(159). In addition, the resolution of the map in combination with the final model enabled theidentification of the approximate positions of 20 novel proteins (without a homolog in bacteria), manycontaining rod-like features corresponding to α-helices (Figure 4). As a result, it was possible to identifyunique interactions between mammalian proteins and expansion segments and obtain insights intoconformational changes during translation. The final model is shown in a front view within the densitymap (a) and on its own (b). The E-site tRNA is shown in red between the small (ssu) and large subunits(lsu) of the ribosome. This specimen contained a native ER channel (magenta) comprised of Sec61 andthe tryptophan RNA-binding attenuation protein (TRAP), with a prominent lumenal domain (LD). Thesubunit rRNAs and conserved proteins are color coded. The novel proteins (spheres and rods) andexpansion segments (red helices) are also included.

can use additional experimental (e.g., labelingby gold) and computational information,e.g., statistical potentials (34) and geometric

complementarity between domains (154; K.Lasker, M. Topf, A. Sali, & H. Wolfson, un-published information).

←−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−Figure 4Analysis of a density map from single-particle EM. If the atomic structure of the component is notknown, bioinformatics methods for fold recognition can be applied to its sequence. These methods canalso be combined with analysis of its density to identify structural features, such as secondary structuresegments (47) or the complete fold (156). If a template fold has been detected, comparative modeling canbe used to obtain a structural model; if the fold is not known, ab initio modeling can be used instead (155).Next, the component structure, experimentally determined or modeled, is rigidly fitted in the densitymap. For models, multiple models can be fitted, and the one that fits the density best is selected (146,158). Finally, if there are differences between the fitted model and the map, flexible fitting can be appliedto improve the fit by modifying the conformation of the component structure (145, 157, 185, 187).


Ann

u. R

ev. B

ioch

em. 2

008.

77:4

43-4

77. D

ownl

oade

d fr

om a

rjou

rnal

s.an

nual

revi

ews.

org

by U

nive

rsity

of

Que

ensl

and

on 0

3/23

/09.

For

per

sona

l use

onl

y.

ANRV345-BI77-19 ARI 28 April 2008 15:26

Conformational variations. A commonproblem in fitting is that the isolated com-ponent structure may be in a differentconformational state than in the intactassembly structure. These conformationaldifferences can originate from the varied con-ditions under which the isolated componentand assembly structures were determinedas well as from errors in the experimentalmethods (such as crystal packing and noise).Common conformational differences areshear and hinge movements of domains andsecondary structure elements, as well as loopdistortions and movements. Furthermore,when an experimentally determined structureof the component is unavailable, the useof structure prediction methods to obtaincomponent models (155) can introduceadditional errors, such as misassignment ofsecondary structure elements to incorrectsequence regions and their shifts in spacecaused by target-template misalignment incomparative modeling (144).

Fitting multiple conformations. The sim-plest approach to consider conformationalvariations of component structures in fitting isto generate a set of different conformations, fiteach of them into the density map, and selectthe top ranking conformation. This approachrelies on a high correlation between the ac-curacy of a model and the cross-correlationscore between the model and the correspond-ing density map (146). There are several dif-ferent approaches to generating such mod-els. First, candidate conformations can becalculated by exploring the structural vari-ability within the fold superfamily of a com-ponent (Figure 4); for example, a numberof alternative comparative models, which arebased on different templates, can be fitted intothe map, and the best fitting one is selected(MODELLER/Mod-EM) (146). Second, ifthe component’s fold is unknown, candidatemodels for fitting can correspond to represen-tatives of all known domain folds (Spi-EM)(156). Third, varied models can be createdthrough rigid-body transformations of sec-

ondary structure segments guided by a princi-pal component analysis of structurally alignedprotein domains in the target’s superfamily(S-flexfit) (157). Finally, a large number ofmodels can also be produced by ab initio pre-diction on the basis of protein sequence alone(ROSETTA/FOLDHUNTER) (158).

Flexible fitting. The efficiency of conforma-tional search is increased by considering thefit between the component and the map dur-ing the sampling. This goal can be achievedby optimizing the conformation of the com-ponent simultaneously with its position andorientation in the cryo-EM map while ideallymaintaining correct stereochemistry. Suchflexible fitting methods are similar to crys-tallographic refinement programs, except thatthey generally refine rigid bodies consisting ofa number of atoms (e.g., secondary structuresegments and domains) instead of individualatoms.

Several flexible fitting methods have beendeveloped, utilizing different sampling andscoring schemes. For example, the real-spacerefinement programs RSRef (145, 159) andFlex-EM (187) rely on standard optimiza-tion methods, including conjugate gradients,molecular dynamics with simulated anneal-ing, and Monte Carlo sampling. Another ap-proach to flexible fitting attempts to improvethe efficiency of conformational sampling isby a normal mode analysis of the componentstructure (160).

Certain large conformational changes canbe efficiently sampled by Moulder-EM (161),a genetic algorithm protocol that gener-ates comparative models through iterative se-quence alignment, model building, model fit-ting, and model assessment. Conformationalsampling arises from the iterative changes inthe alignment on which the model is based.The fitness function of this genetic algo-rithm combines the cross-correlation scorebetween the model and the map with anatomic distance-dependent statistical poten-tial for model assessment (34).

458 Alber et al.

Ann

u. R

ev. B

ioch

em. 2

008.

77:4

43-4

77. D

ownl

oade

d fr

om a

rjou

rnal

s.an

nual

revi

ews.

org

by U

nive

rsity

of

Que

ensl

and

on 0

3/23

/09.

For

per

sona

l use

onl

y.

ANRV345-BI77-19 ARI 28 April 2008 15:26

Cellular atlas. Compared to electron crys-tallography and single-particle EM densities,cryo-ET maps are of lower resolution. How-ever, cryo-ET can be used to reconstruct den-sity maps of large cellular volumes, even wholecells. Such tomograms can be used to iden-tify locations of assemblies in the cell (162).This task, also known as template matching, ischallenging owing to a low signal-to-noise ra-tio, varying contrast throughout tomograms,and missing structure factors because of thelimited tilt range (“missing wedge effect”)(141, 162). The most common and generalmolecular detection algorithm is a locally nor-malized, matched filter, introduced for rigid-body fitting (150, 163). It was modified toaccount for the missing-wedge effect and ap-plied to tomograms (61, 62, 164). These stud-ies demonstrated that it is feasible to identifylarge macromolecular complexes (>500 kDa)within tomograms with high fidelity (61). De-tection will be greatly facilitated by future in-strumental advances in cryo-ET that will im-prove the resolution of the tomograms to theexpected limit of approximately 2 nm (45).

Structure Characterization fromSmall-Angle X-ray Scattering

SAXS provides an approximate radial distri-bution function of a macromolecule in so-lution. For structure determination, addi-tional information is needed because the ra-dial distribution function alone is relativelyuninformative about the details of molecularstructure. We summarize different methodsfor integrating SAXS data into computationalmodeling of macromolecules.

SAXS data as a filter. Similarly to othertypes of experimental information, SAXS datacan be used as a filter for a set of models gener-ated independently by other methods. At theprotein domain level, simulations have indi-cated that SAXS spectra can be used to chooseclose-to-native models from different com-parative models (165). In quaternary struc-ture determination, experimental SAXS spec-

tra have been employed to choose one of anumber of quaternary structure arrangementsthat resulted from computational docking oftwo assembly components to each other (65).Furthermore, SAXS spectra have been usedto verify the accuracy of coarse-grained sim-ulations of lipoproteins (166).

SAXS data in optimization. SAXS data canalso be a term in a scoring function that is op-timized to obtain a model consistent with thedata. The first approaches to optimize modelson the basis of SAXS data relied on represent-ing macromolecular surfaces using sphericalharmonics (167). However, this representa-tion has a relatively low resolution and led tothe development of alternative methods. Ow-ing to the sparseness of SAXS data, virtually allsubsequently developed methods aimed to in-tegrate additional information into structuredetermination.

Coarse-grained approaches represent themacromolecule as an assembly of beads on agrid (168–170). This representation enforcesan overall mass by using a required numberof beads and potential geometrical symmetryby symmetric sampling. In addition, compact-ness of the models is ensured by restricting thesampling to the vicinity of a compact initialmodel (168, 169) or by including appropri-ate terms into the scoring function (170, 171).Higher-resolution modeling approaches rep-resent a protein as a chain of beads rather thana grid (171).

If high-resolution structural information isavailable for some parts of the protein, theconformational sampling can be focused onlyon the undefined segments. One approachhas been developed to approximate the struc-ture of missing loops in structures derivedby X-ray crystallography using coarse-grainedbeads (172). Another approach was developedto determine the spatial arrangement of do-mains of known structure and the structuresof their connecting linkers (173). Here, theelements of a known structure are kept rigid,and their translations and rotations are opti-mized using a simulated annealing protocol.


Ann

u. R

ev. B

ioch

em. 2

008.

77:4

43-4

77. D

ownl

oade

d fr

om a

rjou

rnal

s.an

nual

revi

ews.

org

by U

nive

rsity

of

Que

ensl

and

on 0

3/23

/09.

For

per

sona

l use

onl

y.

ANRV345-BI77-19 ARI 28 April 2008 15:26

Completeness ofstructuralcoverage: thefraction of thestudied assembly thatis represented in itsmodel

If flexible linkers are present, they are repre-sented as beads that tether the domains.

Heterogeneous samples. All SAXS model-ing methods described above assume that theprotein is present in only one conformationunder the conditions examined. If multipleconformations are present, the SAXS spec-trum of the sample is a weighted average ofthe SAXS spectra of each conformation. Re-cently, methods have been proposed to fit anensemble of models to a given spectrum (174).

Integration of SAXS data with other in-formation. The recent renaissance of SAXSis to a large extent the result of efforts on inte-grating SAXS with other structural informa-tion from additional complementary sources(64). This integration is necessary becausethe information content of SAXS data islow, given the number of degrees of free-dom that typically need to be determinedand even more so for heterogeneous samples.For example, the SAXS data of proteinsor smaller complexes can be considered si-multaneously with corresponding cryo-EMmaps (175). Another approach, which in-cluded SAXS data into molecular dynam-ics simulations, has shown promising resultsfor simulated examples (176). Recently, SAXSspectra have been incorporated into a pro-tocol for structure determination by NMRspectroscopy (177). The SAXS data containglobal information on the protein that is com-plementary to the short-range restraints fromNMR spectroscopy and, hence, significantlyincrease the accuracy of models for multido-main proteins compared to models based onNMR spectra only.

To generalize further, we have incor-porated the use of SAXS data into ourprogram, the Integrated Modeling Platform(IMP), for the modeling of proteins and theirassemblies by satisfaction of spatial restraints(http://salilab.org/imp; 167a; F. Forster, B.Webb, K.A. Krukenberg, H. Tsuruta, D.A.Agard, & A. Sali, unpublished information).We have used the SAXS-based restraint in

conjunction with restraints on the rigidity ofdomains, steric clashes, stereochemistry, andan atomic distance-based statistical potential(34) (Figure 6). By thorough sampling ofthe configurational space and subsequentclustering of low scoring solutions, we aimto enumerate models that are consistent withthe given SAXS data. The integration ofadditional information reduces the ambiguityof such models.

COMPREHENSIVE DATAINTEGRATION BYSATISFACTION OFSPATIAL RESTRAINTS

Detailed structural characterization of assem-blies is often difficult by any single existingexperimental or computational method. Wesuggest that this barrier can be overcome byhybrid approaches that integrate data fromdiverse biochemical and biophysical experi-ments as well as computational methods. Thisinformation may vary greatly in terms of itsresolution, accuracy, and quantity. Here, weoutline an approach for generating structuresof macromolecular assemblies that are consis-tent with all available information from exper-imental methods, physical theories, and sta-tistical preferences extracted from biologicaldatabases. Such an integrative system will helpmaximize efficiency, resolution, accuracy, pre-cision, and completeness of the structural cov-erage of macromolecular assemblies.

Theory and Method

We begin this section by describing the un-derlying theory and methods of our hybridapproach to characterizing macromolecularassembly structures. Then, we highlight anexample, the structure determination of theNPC (2, 17–19).

Formalization of the problem. The com-plete process of structure determination canbe seen as a potentially iterative series of foursteps, including data generation by experi-ments, data translation into spatial restraints,

460 Alber et al.

Ann

u. R

ev. B

ioch

em. 2

008.

77:4

43-4

77. D

ownl

oade

d fr

om a

rjou

rnal

s.an

nual

revi

ews.

org

by U

nive

rsity

of

Que

ensl

and

on 0

3/23

/09.

For

per

sona

l use

onl

y.

ANRV345-BI77-19 ARI 28 April 2008 15:26

105

107

108

106

105

107

108

106

a b

c

q(Å –1)

l(q)

q(Å–1)

l(q)

q(Å–1)

l(q)

q(Å–1)

l(q)

105

107

108

106

105

107

108

106

I II

Native

I II

III IVNative

Native

0.50.40.30.20.100.50.40.30.20.10

0.50.40.30.20.10 0.50.40.30.20.10

III IV

Native

Figure 6Modeling influenza hemagglutinin using SAXS. (a) Native hemagglutinin consists of two domains (blueand red) (PDB code 1a0d). The SAXS spectrum of this structure was simulated with the addition of somewhite noise, which typically occurs in experimental data. (b) We approximated the hemagglutinindomains by their structures in the postcleavage forms (2viu). (c) Models were obtained by optimizationfrom 600 different initial “seeds.” The optimized models were subsequently clustered into four majorgroups. The models with the lowest SAXS penalty (χ2 of the experimental data and the SAXS spectrumof the model) from each cluster (top) and the corresponding SAXS spectra (bottom) are shown. The modelfrom cluster III has the lowest SAXS score (χ2 = 0.84 compared to 5.13, 3.86, and 3.68 for the modelsfrom clusters I, II, and IV, respectively) and is closest to the native state in terms of its Cα RMSD (2.7 Acompared to 13.6, 16.4, and 14.6 A). However, the differences between the cluster scores are small,demonstrating the problem of ambiguity in modeling an assembly structure on the basis of its SAXSspectrum. Abbreviations: I(q), scattering intensity; native, simulated spectrum, which was used as theinput for modeling; q[A−1], spatial frequency in A−1.

calculation of an ensemble of structures by sat-isfaction of spatial restraints, and an analysisof the ensemble. The structural characteriza-tion part of the process can be expressed as

an optimization problem (Figure 7). In thisview, models that are consistent with the in-put information are calculated by optimizinga scoring function. The three components of


Ann

u. R

ev. B

ioch

em. 2

008.

77:4

43-4

77. D

ownl

oade

d fr

om a

rjou

rnal

s.an

nual

revi

ews.

org

by U

nive

rsity

of

Que

ensl

and

on 0

3/23

/09.

For

per

sona

l use

onl

y.

ANRV345-BI77-19 ARI 28 April 2008 15:26

Nativestructure

Starting model Calculatedmodel

DRMS ~ 1.3 nmSpatial restraints

Optimization

28affinity purifications

14position

restraints

Assemblyshape

Shaperestraints

Assemblysymmetry

Symmetryrestraints

14positions

36distancerestraints

28connectivity

restraints

36cross-links

7S-values

7proteinshapes

Immuno-EM EM EM Cross-linking Affinitypurifications

Ultra-centrifugation

Figure 7Characterization of an assembly configuration on the basis of data simulated from a known nativestructure (67). The simulated data include protein positions (e.g., from immuno-EM), assembly shape(e.g., from EM), relative proximity of components (e.g., from cross-linking and affinity purification). Thedata is translated into spatial restraints that are then summed to obtain a scoring function. A randomstarting structure is optimized by a combination of conjugate gradients and molecular dynamics withsimulated annealing to minimize violations of all restraints. The listed data were sufficient to identify thecoarse relative position of each protein (i.e., the protein configuration). To illustrate the possibility ofusing different representations for different proteins, a protein is represented either by an X-ray structureor by a single sphere that best reproduces its hydrodynamic properties determined by ultracentrifugation.Abbreviations: DRMS, distance root-mean-square difference between the protein centroids in thedetermined model and the native structure; EM, electron microscopy.

this approach are (a) a representation of themodeled assembly, (b) a scoring function con-sisting of the individual spatial restraints, and(c) optimization of the scoring function to ob-tain all possible models that satisfy the inputrestraints.

Representation. The modeled structure isrepresented by a hierarchy of particles, de-fined by their positions and other proper-ties (Figure 7). For a protein assembly, thehierarchy can include atoms, atomic groups,amino acid residues, secondary structure seg-ments, domains, proteins, protein subcom-plexes, symmetry units, and the whole assem-bly. The coordinates and properties of parti-

cles at any level are calculated from those atthe highest resolution level. Different parts ofthe assembly can be represented at differentresolutions to reflect the input informationabout the structure (Figure 7). Moreover, dif-ferent representations can also apply to thesame part of the system. For example, affinitypurification may indicate proximity betweentwo proteins, and cross-linking may indicatewhich specific residues are involved in the in-teraction.

Scoring function. The most important as-pect of structure characterization is to ac-curately capture all experimental, physical,and statistical information about the modeled

462 Alber et al.

Ann

u. R

ev. B

ioch

em. 2

008.

77:4

43-4

77. D

ownl

oade

d fr

om a

rjou

rnal

s.an

nual

revi

ews.

org

by U

nive

rsity

of

Que

ensl

and

on 0

3/23

/09.

For

per

sona

l use

onl

y.

ANRV345-BI77-19 ARI 28 April 2008 15:26

structure. This objective is achieved by ex-pressing knowledge of any kind as a scoringfunction whose global optimum correspondsto the native assembly structure (34). Onesuch function is a joint probability densityfunction (pdf) of the Cartesian coordinates ofall assembly proteins, given the available in-formation, I, about the system, p(C |I ), whereC = (c 1, c 2, . . . , c n) is the list of the Carte-sian coordinates, c i , of the n component pro-teins in the assembly. The joint pdf, p, givesthe probability density that a component, i,of the native configuration is positioned veryclose to c i , given the information, I, we wishto consider in the calculation. In general, Imay include any structural information fromexperiments, physical theories, and statisticalpreferences. For example, when I reflects onlythe sequence and the laws of physics underthe conditions of the canonical ensemble, thejoint pdf corresponds to the Boltzmann dis-tribution. If I also includes a crystallographicdataset sufficient to define the native structureprecisely, the joint pdf is a Dirac delta functioncentered on the native atomic coordinates.

The complete joint pdf is generally un-known but can be approximated as a productof pdfs, p f , that describe individual assemblyfeatures (e.g., distances, angles, interactions,or relative orientations of proteins):

p(C |I ) =∏

f

p f (C |I f ).

The scoring function F(C ) is then defined asthe logarithm of the joint pdf:

F (C) = − ln∏

f

p f (C |I f ) =∑

f

r f (C).

For convenience, we refer to the logarithm ofa feature pdf as a restraint, rf , and the scoringfunction is therefore a sum of the individualrestraints.

Restraints. A restraint, rf , can in principlehave any functional form. However, it is con-venient if ideal solutions consistent with thedata correspond to values of 0 and valueslarger than 0 correspond to a violated re-

straint; for example, a restraint is frequentlya harmonic function of the restrained feature.

Restrained features. The restrained fea-tures, in principle, include any structuralaspect of an assembly, such as contacts, prox-imity, distances, angles, chirality, surface, vol-ume, excluded volume, shape, symmetry, andlocalization of particles and sets of particles.

Translating data into restraints. A keychallenge is to accurately express the inputdata and their uncertainties in terms of theindividual spatial restraints. An interpretationof the data in terms of a spatial restraintgenerally involves identifying the restrainedcomponents (i.e., structural interpretation)and the possible values of the restrained fea-ture implied by the data. For instance, theshape, density, and symmetry of a complexor its subunits may be derived from X-raycrystallography and EM (38); upper distancebounds on residues from different proteinsmay be obtained from NMR spectroscopy(22) and chemical cross-linking (95); protein-protein interactions may be discovered by theyeast two-hybrid system (178) and calorime-try (73); two proteins can be assigned tobe in proximity if they are part of an iso-lated subcomplex identified by affinity purifi-cation in combination with mass spectrom-etry (79). Increasingly, important restraintswill be derived from pairwise molecular dock-ing (118), statistical preferences observed instructurally defined protein-protein interac-tions (139), and analysis of multiple sequencealignments (179).

Conditional restraints. If structural inter-pretation of the data is ambiguous (i.e., thedata cannot be uniquely assigned to specificcomponents), only “conditional restraints”can be defined. For example, when there ismore than one copy of a protein per as-sembly, a yeast two-hybrid system indicatesonly which protein types but not which in-stances interact with each other. Such am-biguous information must be translated into


Ann

u. R

ev. B

ioch

em. 2

008.

77:4

43-4

77. D

ownl

oade

d fr

om a

rjou

rnal

s.an

nual

revi

ews.

org

by U

nive

rsity

of

Que

ensl

and

on 0

3/23

/09.

For

per

sona

l use

onl

y.

ANRV345-BI77-19 ARI 28 April 2008 15:26

a conditional restraint that considers all al-ternative structural interpretations of the data(Figure 8). The selection of the best alterna-tive interpretation is then achieved as part ofthe structure optimization process.

Optimization methods. Structures can begenerated by simultaneously minimizing theviolations of all restraints, resulting in con-figurations that minimize the scoring func-tion F. It is crucial to have access to multi-ple optimization methods to choose one thatworks best with a specific scoring functionand representation. Optimization methodsimplemented in IMP currently include con-jugate gradients, quasi-Newton minimiza-tion, and molecular dynamics, as well asmore sophisticated schemes, such as self-guided Langevin dynamics, the replica ex-change method, and exact inference (beliefpropagation) (K. Lasker, M. Topf, A. Sali, &H. Wolfson, unpublished information); all ofthese methods can refine positions of the in-dividual particles as well as treat subsets ofparticles as rigid bodies.

Outcomes. There are three possible out-comes of the calculation. First, if only a singlemodel satisfies all input information, there isprobably sufficient data for prediction of theunique native state. Second, if different mod-els are consistent with the input information,the data are insufficient to define the singlenative state, or there are multiple native struc-tures. If the number of distinct models is small,the structural differences between the modelsmay suggest additional experiments to narrowdown the possible solutions. Third, if no mod-els satisfy all input information, the data ortheir interpretation in terms of the restraintsare incorrect.

Analysis. In general, a number of differentconfigurations may be consistent with the in-put restraints. The aim is to obtain as manystructures as possible that satisfy all inputrestraints. To comprehensively sample suchstructural solutions consistent with the data,

independent optimizations of randomly gen-erated initial configurations need to be per-formed until an ensemble of structures sat-isfying the input restraints is obtained. Theensemble can then be analyzed in terms ofassembly features, such as the protein posi-tions, contacts, and configuration. These fea-tures can generally vary among the individ-ual models in the ensemble. To analyze thisvariability, a probability distribution of eachfeature can be calculated from the ensemble.Of particular interest are the features that arepresent in most configurations in the ensem-ble and have a single maximum in their prob-ability distribution. The spread around themaximum describes how precisely the featurewas determined by the input restraints. Whenmultiple maxima are present in the feature dis-tribution at the precision of interest, the inputrestraints are insufficient to define the singlenative state of the corresponding feature (orthere are multiple native states).

Predicting accuracy. Assessing the accuracyof a structure is important and difficult.The accuracy of a model is defined as thedifference between the model and the na-tive structure. Therefore, it is impossible toknow with certainty the accuracy of the pro-posed structure without knowing the real na-tive structure. Nevertheless, our confidencecan be modulated by five considerations:(a) self-consistency of independent experi-mental data; (b) structural similarity amongall configurations in the ensemble that sat-isfy the input restraints; (c) simulations wherea native structure is assumed, correspondingrestraints are simulated from it, and the re-sulting calculated structure is compared withthe assumed native structure; (d ) confirma-tory spatial data that were not used in the cal-culation of the structure (e.g., criterion similarto the crystallographic free R-factor (180) canbe used to assess both the model accuracy andthe harmony among the input restraints); and(e) patterns emerging from a mapping of in-dependent and unused data on the structurethat are unlikely to occur by chance (18, 19).

464 Alber et al.

Ann

u. R

ev. B

ioch

em. 2

008.

77:4

43-4

77. D

ownl

oade

d fr

om a

rjou

rnal

s.an

nual

revi

ews.

org

by U

nive

rsity

of

Que

ensl

and

on 0

3/23

/09.

For

per

sona

l use

onl

y.

ANRV345-BI77-19 ARI 28 April 2008 15:26

Minimal spanning treeSpanning trees

Selectinteraction

Selectinteraction

interface

a b

cOb

Oa

Oa O

aOa

Oa

Oa

I II III

...

Figure 8Conditional restraint (19). As an example, shown is a conditional restraint on protein contacts derivedfrom a single affinity purification experiment that identified 4 protein types ( yellow, blue, red, green),obtained from an assembly containing a single copy of the yellow, blue, and red protein and two copies ofthe green protein (18, 19, 67). (a) A single protein is represented by either one bead (blue and greenproteins) or two beads ( yellow and red proteins); alternative interactions between proteins are indicated bydifferent edges. (b) Protein contacts are selected in a decision tree-like evaluation process by operatorfunctions Oa and Ob. Red vertical lines indicate restraints that encode a protein contact; thick verticallines are a subset of restraints that are selected for contribution to the final value of the conditionalrestraint, whereas dotted vertical lines indicate restraints that are not selected. Also shown are spanningtrees of a “composite graph.” (c) The composite graph is a fully connected graph that consists of nodesfor all identified protein types (square nodes) and edges for all pairwise interactions between protein types(left of the Ob operator); edge weights correspond to violations of interaction restraints and quantify howconsistent the corresponding interaction is with the current assembly structure. A “spanning tree” is agraph with the smallest possible number of edges that connect all nodes; a subset of 4 out of 16 spanningtrees is indicated to the right of the Ob operator. The “minimal spanning tree” is the spanning tree withthe minimal sum of edge weights (i.e., restraints violations). The sample affinity purification implies thatat least three of the following six possible types of interactions must occur: blue-red, blue-yellow,blue-green, red-green, red-yellow, and yellow-green. In addition, (i ) the three selected interactions mustform a spanning tree of the composite graph; (ii ) each type of interaction can involve either copy of thegreen protein; and (iii ) each protein can interact through any of its beads. These considerations can beencoded through a tree-like evaluation of the conditional restraint. At the top level, all possible bead-beadinteractions between all protein copies are clustered by protein types. Each alternative bead interactioncan be restrained by a restraint corresponding to a harmonic upper bound on the distance between thebeads; these are termed “optional restraints” because only a subset is selected for contribution to the finalvalue of the conditional restraint. Next, an operator function (Oa) selects only the least violated optionalrestraint from each interaction type, resulting in six restraints (thick red lines) at the middle level of thetree. Finally, a minimal spanning tree operator (Ob) finds the minimal spanning tree corresponding to thecombination of three restraints that are most consistent with the affinity purification (thick red lines). Thewhole restraint evaluation process is executed at each optimization step on the basis of the currentconfiguration, thus resulting in possibly different subsets of selected optional restraints at each step (19).


Ann

u. R

ev. B

ioch

em. 2

008.

77:4

43-4

77. D

ownl

oade

d fr

om a

rjou

rnal

s.an

nual

revi

ews.

org

by U

nive

rsity

of

Que

ensl

and

on 0

3/23

/09.

For

per

sona

l use

onl

y.

ANRV345-BI77-19 ARI 28 April 2008 15:26

Advantages. The integrative approach tostructure determination has several advan-tages. It benefits from the synergy among theinput data, minimizing the drawback of in-complete, inaccurate, and/or imprecise datasets (although each individual restraint maycontain little structural information, the con-current satisfaction of all restraints derivedfrom independent experiments may drasti-cally reduce the degeneracy of structural so-lutions). It can potentially produce all struc-tures that are consistent with the data, notjust one. The variation among the struc-tures, consistent with the data, allows an as-sessment of the sufficiency of the data andthe precision of the representative structure.Finally, this approach makes the process ofstructure determination more efficient by in-dicating what measurements would be mostinformative.

Structural Characterization ofthe Nuclear Pore Complex

Using the approach outlined above, we de-termined the native configuration of proteinsin the yeast NPC (18, 19). These NPCs arelarge (∼50 MDa) proteinaceous assembliesspanning the NE, where they function as thesole mediators of bidirectional macromolecu-lar exchange between the nucleoplasmic andcytoplasmic compartments in all eukaryotes(181). EM images of the yeast NPC at ∼200-Aresolution revealed that the nuclear poreforms a channel by stacking two similar rings,each one consisting of eight radially arranged“half-spoke” units (182). The yeast NPC isbuilt from multiple copies of 30 different pro-teins, totaling ∼456 proteins (called nucleo-porins or nups).

Although low-resolution EM has providedvaluable insights into the overall shape of theNPC, the spatial configuration of its com-ponent proteins and the detailed interactionnetwork between them were unknown. A de-scription of the NPC’s structure was neededto understand its function and assembly as

well as to provide clues to its evolutionaryorigins. Owing to its size and flexibility, de-tailed structural characterization of the com-plete NPC assembly has proven to be extraor-dinarily challenging. Further compoundingthe problem, atomic structures have only beensolved for domains covering ∼5% of the pro-tein sequence (183).

To determine the protein configuration ofthe NPC, we collected a large and diverse setof biophysical and biochemical data. The datawere derived from six experimental sources(Figure 9).

1. Quantitative immunoblotting experi-ments determined the stoichiometry ofall 30 nups in the NPC.

2. Hydrodynamics experiments providedinformation about the approximate ex-cluded volume and the coarse shape ofeach nup.

3. Immuno-EM provided a coarse local-ization for each nup along two principalaxes of the NPC.

4. An exhaustive set of affinity purificationexperiments determined the composi-tion of 77 NPC complexes.

5. Overlay experiments determined fivedirect binary nup interactions.

6. Symmetry considerations and the di-mensions of the NE were extractedfrom cryo-EM. Moreover, bioinformat-ics analysis provided information aboutthe position of transmembrane helicesfor the three integral membrane nups.These data were translated into spatialrestraints on the NPC (Figure 9).

The relative positions and proximities ofthe NPC’s constituent proteins were then pro-duced by satisfying these spatial restraints,using the approach described above and il-lustrated in Figure 10. Optimization relieson conjugate gradients and molecular dy-namics with simulated annealing. It startswith a random configuration of proteinsand then iteratively moves these proteins soas to minimize violations of the restraints(Figure 10). To comprehensively sample all

466 Alber et al.

Ann

u. R

ev. B

ioch

em. 2

008.

77:4

43-4

77. D

ownl

oade

d fr

om a

rjou

rnal

s.an

nual

revi

ews.

org

by U

nive

rsity

of

Que

ensl

and

on 0

3/23

/09.

For

per

sona

l use

onl

y.

ANRV345-BI77-19 ARI 28 April 2008 15:26

Z

R

Z

R

Z

R

Z

R

Pro

tein

co

nta

cts

Pro

tein

co

nfi

gu

rati

on

Pro

tein

po

sit

ion

c da

Der

ive

the

stru

ctur

e fr

om th

e en

sem

ble.

Ass

ess

the

stru

ctur

e.

Ultr

acen

trifu

gatio

nQ

uant

itativ

eim

mun

oblo

tting

Ove

rlay

assa

yE

lect

ron

mic

rosc

opy

Imm

uno-

elec

tron

mic

rosc

opy

Bio

info

rmat

ics

&m

embr

ane

frac

tiona

tion

Ele

ctro

nm

icro

scop

y m

ap

10,6

15 g

old

part

icle

s30

pro

tein

se

quen

ces

Affi

nity

purif

icat

ion

30 S

-val

ues

1 S

-val

ue30

rel

ativ

eab

unda

nces

75 c

ompo

site

s7

cont

acts

Pro

tein

shap

eC

ompl

exsh

ape

Pro

tein

stoi

chio

met

ryP

rote

inco

nnec

tivity

inco

mpo

site

s

Pro

tein

cont

acts

NP

Csy

mm

etry

Nuc

lear

en

velo

peex

clud

ed

volu

me

Pro

tein

loca

lizat

ion

Nuc

lear

enve

lope

surf

ace

loca

lizat

ion

Pro

tein

ex

clud

edvo

lum

e

1680

rest

rain

ts1

rest

rain

t30

pro

tein

s45

6 co

pies

3344

res

trai

nts

208

rest

rain

ts87

6 be

ads

3648

rest

rain

ts19

2 r

estr

aint

s18

65pr

otei

n be

ads

Pro

duce

an

ense

mbl

e of

sol

utio

ns th

atsa

tisfy

the

inpu

t res

trai

nts,

sta

rtin

g fr

omm

any

rand

om c

onfig

urat

ions

.O

ptim

izat

ion

Ens

embl

e an

alys

is

Dat

a ge

nera

tion

bD

ata

trans

latio

nin

to s

patia

lre

stra

ints

Figu

re9

Det

erm

inin

gth

ear

chite

ctur

eof

the

NP

Cby

inte

grat

ing

spat

ialr

estr

aint

sfr

ompr

oteo

mic

data

(19)

.Fir

st,(

a)va

riou

sex

peri

men

ts(b

lack

)gen

erat

est

ruct

ural

data

(red

).Se

cond

(b),

the

data

and

theo

retic

alco

nsid

erat

ions

are

expr

esse

das

spat

ialr

estr

aint

s(bl

ue).

Thi

rd(c

),an

ense

mbl

eof

stru

ctur

also

lutio

nsth

atsa

tisfy

the

data

isob

tain

edby

min

imiz

ing

the

viol

atio

nsof

the

spat

ialr

estr

aint

s,st

artin

gfr

omm

any

diff

eren

tran

dom

confi

gura

tions

.Fou

rth

(d),

the

ense

mbl

eis

clus

tere

din

tose

tsof

dist

inct

solu

tions

asw

ella

san

alyz

edin

term

sof

prot

ein

posi

tions

,con

tact

s,an

dco

nfigu

ratio

n.


Ann

u. R

ev. B

ioch

em. 2

008.

77:4

43-4

77. D

ownl

oade

d fr

om a

rjou

rnal

s.an

nual

revi

ews.

org

by U

nive

rsity

of

Que

ensl

and

on 0

3/23

/09.

For

per

sona

l use

onl

y.

ANRV345-BI77-19 ARI 28 April 2008 15:26

1 2 3 4 5 6a

0.60

0.52

0.44

0.36

0.68

b

Av

era

ge

co

nta

ct

sim

ilari

ty

1010 108 106 104 102 0

Score Score

2000 40000

Nu

mb

er

of

Co

nfi

gu

rati

on

s

300

200

100

0

c

12

3 45

6

Figure 10Calculation of the NPC bead structure by satisfaction of spatial restraints. (a) Representation of theoptimization process as it progresses from an initial random configuration to an optimal solution. (b) Thegraph shows the relationship between the score (a measure of the consistency between the configurationand the input data) and the average contact similarity. The contact similarity quantifies how similar twoconfigurations are in terms of the number and types of their protein contacts; two proteins are consideredto be in contact when they are sufficiently close to one another given their size and shape. The averagecontact similarity at a given score is determined from the contact similarities between the lowest scoringconfiguration and a sample of 100 configurations with that given score. Error bars indicate standarddeviation. Representative configurations at various stages of the optimization process from left (verylarge scores) to right (with a score of 0) are shown above the graph; a score of 0 indicates that all inputrestraints have been satisfied. As the score approaches zero, the contact similarity increases, showing thatthere is only a single cluster of closely related configurations that satisfies the input data. (c) Distributionof configuration scores demonstrates that our sampling procedure finds configurations consistent withthe input data. These configurations satisfy all the input restraints within the experimental error (19).

possible structural solutions that are consis-tent with the data, we obtained an ensembleof 1000 independently calculated structuresthat satisfied the input restraints (Figure 10c).After superposition of these structures, theensemble was converted into the probabil-ity of finding a given protein at any point inspace (i.e., the localization probability). Theresulting localization probabilities yielded asingle pronounced maximum for almost ev-ery protein, demonstrating that the inputrestraints define a single NPC architecture

(Figure 11). The average standard deviationfor the separation between neighboring pro-tein centroids is 5 nm. Given that this levelof precision is less than the diameter of manyproteins, our map is sufficient to determinethe relative position of proteins in the NPC.Although each individual restraint may con-tain little structural information, the concur-rent satisfaction of all restraints derived fromindependent experiments drastically reducesthe degeneracy of the structural solutions(Figure 12).

468 Alber et al.

Ann

u. R

ev. B

ioch

em. 2

008.

77:4

43-4

77. D

ownl

oade

d fr

om a

rjou

rnal

s.an

nual

revi

ews.

org

by U

nive

rsity

of

Que

ensl

and

on 0

3/23

/09.

For

per

sona

l use

onl

y.

ANRV345-BI77-19 ARI 28 April 2008 15:26

FG nups

Inner rings Membrane rings

Linker nupsOuter rings

5 nmCytoplasm

Nucleoplasm

Figure 11Localization of major substructures and their component proteins in the NPC. The proteins arerepresented by their localization volumes and have been colored according to their classification into fivedistinct substructures on the basis of their location and functional properties: the outer rings in yellow, theinner rings in purple, the membrane rings in brown, the linker nups in blue and pink, and the FG nups(for which only the structured domains are shown) in green. The pore membrane is shown in gray (18).

Our structure (Figure 11) reveals that halfof the NPC is made of a core scaffold, which isstructurally analogous to vesicle coating com-plexes. This scaffold forms an interlaced net-work that coats the entire curved surface ofthe NE within which the NPC is embedded.The selective barrier for transport is formedby large numbers of proteins with disorderedregions that line the inner face of the scaf-fold. The NPC consists of only a few struc-tural modules. These modules resemble eachother in terms of the configuration of theirhomologous constituents. The architecture ofthe NPC thus appears to be based on the hier-

archical repetition of the modules that likelyevolved through a series of gene duplicationsand divergences. Thus, the determination ofthe NPC configuration in combination withthe fold prediction (183, 184) of its constituentproteins can provide clues to the ancient evo-lutionary origins of the NPC.

In the future, we envision combiningcryo-ET, proteomics, cross-linking, cryo-EMof subcomplexes, and experimentally de-termined or modeled atomic structures ofthe individual subunits to obtain a pseu-doatomic model of the whole NPC assemblyin action.


Ann

u. R

ev. B

ioch

em. 2

008.

77:4

43-4

77. D

ownl

oade

d fr

om a

rjou

rnal

s.an

nual

revi

ews.

org

by U

nive

rsity

of

Que

ensl

and

on 0

3/23

/09.

For

per

sona

l use

onl

y.

ANRV345-BI77-19 ARI 28 April 2008 15:26

Immuno-EM UltracentrifugationNup stoichiometyNPC symmetry

NE pore volume Overlay assaysAffinity purifications

+ + +

Protein positions

Figure 12Synergy between varied datasets results in increased precision of structure determination. The proteinsare increasingly localized by the addition of different types of synergistic experimental information. As anexample, each panel illustrates the localization of 16 copies of Nup192 in the ensemble of nuclear porecomplex (NPC) structures generated, using the datasets indicated below. The smaller the volume (red ),the better localized is the protein. The NPC structure is therefore essentially “molded” into shape by thelarge amount of experimental data (19). Abbreviations: NE, nuclear envelope; Nup, nucleoporin protein.

CONCLUSIONS

There is a wide spectrum of experimental andcomputational methods for identification andstructural characterization of macromolecularcomplexes. The data from these methods needto be combined through integrative compu-tational approaches to achieve higher resolu-tion, accuracy, precision, completeness, andefficiency than any of the individual meth-ods. New methods must be capable of gen-erating possible alternative models consistentwith information from various sources, such

as stoichiometry, interaction data, similarityto known structures, docking results, and low-resolution images.

Structural biology is a great unifying dis-cipline of biology. Thus, structural charac-terization of many protein complexes willbridge the gaps between genome sequencing,functional genomics, proteomics, and systemsbiology. The goal seems daunting, but theprize will be commensurate with the effortinvested, given the importance of molecularmachines and functional networks in biologyand medicine.

SUMMARY POINTS

1. To understand the cell, we need to determine the structures of macromolecular as-semblies, many of which consist of tens and even hundreds of components.

2. A variety of experimental methods exists that generates structural information aboutassemblies, from atomic-resolution data to coarse descriptions of the component ar-rangement in the complex.

3. To maximize the completeness, accuracy, and resolution of the structural determi-nation, a computational approach is needed that can use spatial information from avariety of experimental methods.

4. The complete process of structure determination can be seen as a potentially iterativeseries of four steps, including data generation by experiments, data interpretation interms of spatial restraints, calculation of an ensemble of structures by satisfaction ofspatial restraints, and an analysis of the ensemble.

470 Alber et al.

Ann

u. R

ev. B

ioch

em. 2

008.

77:4

43-4

77. D

ownl

oade

d fr

om a

rjou

rnal

s.an

nual

revi

ews.

org

by U

nive

rsity

of

Que

ensl

and

on 0

3/23

/09.

For

per

sona

l use

onl

y.

ANRV345-BI77-19 ARI 28 April 2008 15:26

5. The structure calculation part of this process is conveniently expressed as an optimiza-tion problem, and the solution requires three main components: the representationof an assembly, a scoring function, and optimization.

6. The power of the integrative approach is illustrated by its use of the proteomic datato define the configuration of proteins in large assemblies, such as the NPC.

FUTURE ISSUES

1. Information from experimental and theoretical sources in terms of individual spatialrestraints needs to be quantified.

2. Individual restraints should be combined into an accurate scoring function.

3. Thorough sampling schemes for finding good solutions to a scoring function areneeded.

4. Methods for a comprehensive analysis of the ensemble of models consistent with thedata can be developed.

5. Accurate methods for predicting the likely accuracy of the input data and the corre-sponding structure are needed.

6. Robust, efficient, user friendly, and generally applicable computer software for calcu-lating assembly structures on the basis of varied datasets should be developed.

7. Descriptions of the structures as well as dynamics of both stable and transient com-plexes are needed.

DISCLOSURE STATEMENT

The authors are not aware of any biases that might be perceived as affecting the objectivity ofthis review.

ACKNOWLEDGMENTS

We are grateful to Michael P. Rout, Brian T. Chait, John Aitchison, Chris Akey, Wah Chiu,David Agard, Wolfgang Baumeister, Joachim Frank, Fred Davis, M.S. Madhusudhan, Min-yiShen, Michael Kim, Keren Lasker, Daniel Russell, Javier Velazquez-Muriel, Bret Peterson,and Ben Webb for many discussions about structure characterization by satisfaction of spatialrestraints. We are also thankful to Svetlana Dokudovskaya, Liesbeth Veenhoff, Whenzu Zhang,Julia Kipper, Damien Devos, Adisetyantari Suprapto, Orit Karni-Schmidt, and RosemaryWilliams for their contribution to the determination of the NPC structure. F.F. has beenfunded by a Human Frontier Science Organization long-term fellowship. M.T. is grateful fora MRC career development award. We also acknowledge support from the Sandler FamilySupporting Foundation, NIH/NCRR U54 RR022220, NIH R01 GM54762, Human Fron-tier Science Program, NSF IIS 0705196, and NSF EIA-0324645. And we are grateful forcomputer hardware gifts from Ron Conway, Mike Homer, Intel, Hewlett-Packard, IBM,and Netapp.


Ann

u. R

ev. B

ioch

em. 2

008.

77:4

43-4

77. D

ownl

oade

d fr

om a

rjou

rnal

s.an

nual

revi

ews.

org

by U

nive

rsity

of

Que

ensl

and

on 0

3/23

/09.

For

per

sona

l use

onl

y.

ANRV345-BI77-19 ARI 28 April 2008 15:26

LITERATURE CITED

1. Alberts B. 1998. Cell 92:291–942. Sali A, Kuriyan J. 1999. Trends Cell Biol. 9:M20–243. Sali A, Glaeser R, Earnest T, Baumeister W. 2003. Nature 422:216–254. Sali A. 2003. Structure 11:1043–475. Robinson C, Sali A, Baumeister W. 2007. Nature 450:973–826. Aebersold R, Mann M. 2003. Nature 422:198–2077. Russell RB, Alber F, Aloy P, Davis FP, Korkin D, et al. 2004. Curr. Opin. Struct. Biol.

14:313–248. Uetz P, Giot L, Cagney G, Mansfield TA, Judson RS, et al. 2000. Nature 403:623–279. Ito T, Tashiro K, Muta S, Ozawa R, Chiba T, et al. 2000. Proc. Natl. Acad. Sci. USA

97:1143–4710. Collins SR, Kemmeren P, Zhao XC, Greenblatt JF, Spencer F, et al. 2007. Mol. Cell

Proteomics 6:439–5011. Krogan NJ, Cagney G, Yu H, Zhong G, Guo X, et al. 2006. Nature 440:637–4312. Gavin AC, Aloy P, Grandi P, Krause R, Boesche M, et al. 2006. Nature 440:631–3613. Devos D, Russell RB. 2007. Curr. Opin. Struct. Biol. 17:370–7714. Abbott A. 2002. Nature 417:894–9615. Harris ME, Nolan JM, Malhotra A, Brown JW, Harvey SC, Pace NR. 1994. EMBO J.

13:3953–6316. Malhotra A, Harvey SC. 1994. J. Mol. Biol. 240:308–4017. Alber F, Eswar N, Sali A. 2004. In Practical Bioinformatics, ed. JM Bujnicki, pp. 73–96.

Germany: Springer-Verlag18. Alber F, Dokudovskaya S, Veenhoff L, Zhang W, Kipper J, et al. 2007. Nature 450:695–

70119. Alber F, Dokudovskaya S, Veenhoff L, Zhang W, Kipper J, et al. 2007. Nature 450:683–9420. Adams PD, Pannu NS, Read RJ, Brunger AT. 1999. Acta Crystallogr. D Biol. Crystallogr.

55(Part 1):181–9021. Cramer P, Bushnell DA, Fu J, Gnatt AL, Maier-Davis B, et al. 2000. Science 288:640–4922. Fiaux J, Bertelsen EB, Horwich AL, Wuthrich K. 2002. Nature 418:207–1123. Bonvin AM, Boelens R, Kaptein R. 2005. Curr. Opin. Chem. Biol. 9:501–824. Rieping W, Habeck M, Nilges M. 2005. Science 309:303–625. Zuiderweg ER. January 2002. Biochemistry 41:1–726. McCoy MA, Wyss DF. 2002. J. Am. Chem. Soc. 124:2104–527. van Dijk AD, Boelens R, Bonvin AM. 2005. FEBS J. 272:293–31228. Vaynberg J, Qin J. 2006. Trends Biotechnol. 24:22–2729. Tang C, Iwahara J, Clore GM. 2006. Nature 444:383–8630. Burz DS, Dutta K, Cowburn D, Shekhtman A. 2006. Nat. Methods 3:91–9331. Berman HM, Battistuz T, Bhat TN, Bluhm WF, Bourne PE, et al. 2002. Acta Crystallogr.

D Biol. Crystallogr. 58:899–90732. Marti-Renom MA, Stuart AC, Fiser A, Sanchez R, Melo F, Sali A. 2000. Annu. Rev.

Biophys. Biomol. Struct. 29:291–32533. Davis FP, Braberg H, Shen MY, Pieper U, Sali A, Madhusudhan MS. 2006. Nucleic Acids

Res. 34:2943–5234. Shen MY, Sali A. 2006. Protein Sci. 15:2507–2435. Levy ED, Pereira-Leal JB, Chothia C, Teichmann SA. 2006. PLoS Comput. Biol. 2:e15536. Kuhlbrandt W, Williams KA. 1999. Curr. Opin. Chem. Biol. 3:537–4337. Fujiyoshi Y. 1998. Adv. Biophys. 35:25–80

472 Alber et al.

Ann

u. R

ev. B

ioch

em. 2

008.

77:4

43-4

77. D

ownl

oade

d fr

om a

rjou

rnal

s.an

nual

revi

ews.

org

by U

nive

rsity

of

Que

ensl

and

on 0

3/23

/09.

For

per

sona

l use

onl

y.

ANRV345-BI77-19 ARI 28 April 2008 15:26

38. Frank J. 2006. Three-Dimensional Electron Microscopy of Macromolecular Assemblies.Oxford: Oxford Univ. Press

39. van Heel M, Gowen B, Matadeen R, Orlova EV, Finn R, et al. 2000. Q. Rev. Biophys.33:307–69

40. Frank J, ed. 2006. Electron Tomography: Methods for Three-dimensional Visualization of Struc-tures in the Cell. New York: Springer. 455 pp.

41. Lucic V, Forster F, Baumeister W. 2005. Annu. Rev. Biochem. 74:833–6542. McIntosh R, Nicastro D, Mastronarde D. 2005. Trends Cell Biol. 15:43–5143. Yonekura K, Maki-Yonekura S, Namba K. 2003. Nature 424:643–5044. Fleishman SJ, Ben-Tal N. 2006. Curr. Opin. Struct. Biol. 16:496–50445. Henderson R. 2004. Q. Rev. Biophys. 37:3–1346. Mitra K, Frank J. 2006. Annu. Rev. Biophys. Biomol. Struct. 35:299–31747. Chiu W, Baker ML, Jiang W, Dougherty M, Schmid MF. 2005. Structure 13:363–7248. Rossmann MG, Morais MC, Leiman PG, Zhang W. 2005. Structure 13:355–6249. Johnson JE, Chiu W. 2007. Curr. Opin. Struct. Biol. 17:237–4350. Orlova EV, Saibil HR. 2004. Curr. Opin. Struct. Biol. 14:584–9051. Tagari M, Newman R, Chagoyen M, Carazo JM, Henrick K. 2002. Trends Biochem. Sci.

27:58952. Medalia O, Weber I, Frangakis AS, Nicastro D, Gerisch G, Baumeister W. 2002. Science

298:1209–1353. Al-Amoudi A, Norlen LP, Dubochet J. 2004. J. Struct. Biol. 148:131–3554. Grunewald K, Desai P, Winkler DC, Heymann JB, Belnap DM, et al. 2003. Science

302:1396–9855. Forster F, Hegerl R. 2007. Methods Cell Biol. 79:741–6756. Roux KH, Taylor KA. 2007. Curr. Opin. Struct. Biol. 17:244–5257. Beck M, Forster F, Ecke M, Plitzko JM, Melchior F, et al. 2004. Science 306:1387–9058. Stoffler D, Feja B, Fahrenkrog B, Walz J, Typke D, Aebi U. 2003. J. Mol. Biol. 328:119–3059. Beck M, Lucic V, Forster F, Baumeister W, Medalia O. 2007. Nature 449: 611–1560. Bohm J, Frangakis AS, Hegerl R, Nickell S, Typke D, Baumeister W. 2000. Proc. Natl.

Acad. Sci. USA 97:14245–5061. Frangakis AS, Bohm J, Forster F, Nickell S, Nicastro D, et al. October 2002. Proc. Natl.

Acad. Sci. USA 99:14153–5862. Ortiz JO, Forster F, Kurner J, Linaroudis AA, Baumeister W. 2006. J. Struct. Biol.

156:334–4163. Koch MH, Vachette P, Svergun DI. 2003. Q. Rev. Biophys. 36:147–22764. Nagar B, Kuriyan J. 2005. Structure 13:169–7065. Sondermann H, Nagar B, Bar-Sagi D, Kuriyan J. 2005. Proc. Natl. Acad. Sci. USA

102:16632–3766. Yamagata A, Tainer JA. 2007. EMBO J. 26:878–9067. Alber F, Kim MF, Sali A. 2005. Structure 13:435–4568. Parrish JR, Gulyas KD, Finley RL Jr. 2006. Curr. Opin. Biotechnol. 17:387–9369. Rual JF, Venkatesan K, Hao T, Hirozane-Kishikawa T, Dricot A, et al. 2005. Nature

437:1173–7870. Michnick SW, Ear PH, Manderson EN, Remy I, Stefan E. 2007. Nat. Rev. Drug Discov.

6:569–8271. Landgraf C, Panni S, Montecchi-Palazzi L, Castagnoli L, Schneider-Mergener J, et al.

2004. PLoS Biol. 2:E1472. MacBeath G, Schreiber SL. 2000. Science 289:1760–63


Ann

u. R

ev. B

ioch

em. 2

008.

77:4

43-4

77. D

ownl

oade

d fr

om a

rjou

rnal

s.an

nual

revi

ews.

org

by U

nive

rsity

of

Que

ensl

and

on 0

3/23

/09.

For

per

sona

l use

onl

y.

ANRV345-BI77-19 ARI 28 April 2008 15:26

73. Lakey JH, Raggett EM. 1998. Curr. Opin. Struct. Biol. 8:119–2374. Nedelkov D, Nelson RW. 2003. Trends Biotechnol. 21:301–575. Piehler J. 2005. Curr. Opin. Struct. Biol. 15:4–1476. Collins SR, Miller KM, Maas NL, Roguev A, Fillingham J, et al. 2007. Nature 446:806–1077. Aloy P, Russell RB. 2002. Proc. Natl. Acad. Sci. USA 99:5896–90178. Fields S. 2005. FEBS J. 272:5391–9979. Bauer A, Kuster B. 2003. Eur. J. Biochem. 270:570–7880. Tackett AJ, Dilworth DJ, Davey MJ, O’Donnell M, Aitchison JD, et al. 2005. J. Cell Biol.

169:35–4781. Krogan NJ, Peng WT, Cagney G, Robinson MD, Haw R, et al. 2004. Mol. Cell 13:225–3982. Cristea IM, Williams R, Chait BT, Rout MP. 2005. Mol. Cell Proteomics 4:1933–4183. Cristea IM, Carroll JW, Rout MP, Rice CM, Chait BT, MacDonald MR. 2006. J. Biol.

Chem. 281:30269–7884. Sharon M, Robinson CV. 2007. Annu. Rev. Biochem. 76:167–9385. Sharon M, Taverner T, Ambroggio XI, Deshaies RJ, Robinson CV. 2006. PLoS Biol.

4:e26786. Hainfeld JF, Powell RD. 2000. J. Histochem. Cytochem. 48:471–8087. Rout MP, Aitchison JD, Suprapto A, Hjertaas K, Zhao Y, Chait BT. 2000. J. Cell Biol.

148:635–5188. Pye VE, Beuron F, Keetch CA, McKeown C, Robinson CV, et al. 2007. Proc. Natl. Acad.

Sci. USA 104:467–7289. Martin-Benito J, Grantham J, Boskovic J, Brackley KI, Carrascosa JL, et al. 2007. EMBO

Rep. 8:252–5790. Golas MM, Sander B, Will CL, Luhrmann R, Stark H. 2005. Mol. Cell 17:869–8391. Sivasubramanian A, Chao G, Pressler HM, Wittrup KD, Gray JJ. 2006. Structure 14:401–

1492. Mohd-Sarip A, van der Knaap JA, Wyman C, Kanaar R, Schedl P, Verrijzer CP. 2006.

Mol. Cell 24:91–10093. Guan JQ, Almo SC, Reisler E, Chance MR. 2003. Biochemistry 42:11992–200094. Anand GS, Law D, Mandell JG, Snead AN, Tsigelny I, et al. 2003. Proc. Natl. Acad. Sci.

USA 100:13264–6995. Trester-Zedlitz M, Kamada K, Burley SK, Fenyo D, Chait BT, Muir TW. 2003. J. Am.

Chem. Soc. 125:2416–2596. Seebacher J, Mallick P, Zhang N, Eddes JS, Aebersold R, Gelb MH. 2006. J. Proteome

Res. 5:2270–8297. Sinz A. 2006. Mass Spectrom. Rev. 25:663–8298. Muller EG, Snydsman BE, Novik I, Hailey DW, Gestaut DR, et al. 2005. Mol. Biol. Cell

16:3341–5299. Truong K, Ikura M. 2001. Curr. Opin. Struct. Biol. 11:573–78

100. Yan Y, Marriott G. 2003. Curr. Opin. Chem. Biol. 7:635–40101. Bhatnagar J, Freed JH, Crane BR. 2007. Methods Enzymol. 423:117–33102. Park SY, Borbat PP, Gonzalez-Bonet G, Bhatnagar J, Pollard AM, et al. 2006. Nat. Struct.

Mol. Biol. 13:400–7103. Aloy P, Bottcher B, Ceulemans H, Leutwein C, Mellwig C, et al. 2004. Science 303:2026–

29104. Pieper U, Eswar N, Davis FP, Braberg H, Madhusudhan MS, et al. 2006. Nucleic Acids

Res. 34:D291–95105. Cockell SJ, Oliva B, Jackson RM. 2007. Bioinformatics 23:573–81

474 Alber et al.

Ann

u. R

ev. B

ioch

em. 2

008.

77:4

43-4

77. D

ownl

oade

d fr

om a

rjou

rnal

s.an

nual

revi

ews.

org

by U

nive

rsity

of

Que

ensl

and

on 0

3/23

/09.

For

per

sona

l use

onl

y.

ANRV345-BI77-19 ARI 28 April 2008 15:26

106. Aloy P, Ceulemans H, Stark A, Russell RB. 2003. J. Mol. Biol. 332:989–98107. Lu L, Lu H, Skolnick J. 2002. Proteins 49:350–64108. Lu L, Arakaki AK, Lu H, Skolnick J. 2003. Genome Res. 13:1146–54109. Henrick K, Thornton JM. 1998. Trends Biochem. Sci. 23:358–61110. Tovchigrechko A, Wells CA, Vakser IA. 2002. Protein Sci. 11:1888–96111. Bordner AJ, Gorin AA. 2007. Proteins 68:488–502112. Schneidman-Duhovny D, Inbar Y, Nussinov R, Wolfson HJ. 2005. Proteins 60:224–31113. Fernandez-Recio J, Abagyan R, Totrov M. 2005. Proteins 60:308–13114. Wiehe K, Pierce B, Mintseris J, Tong WW, Anderson R, et al. 2005. Proteins 60:207–13115. Kozakov D, Brenke R, Comeau SR, Vajda S. 2006. Proteins 65:392–406116. Inbar Y, Benyamini H, Nussinov R, Wolfson HJ. 2005. Phys. Biol. 2:S156–65117. Man-Kuang Cheng T, Blundell TL, Fernandez-Recio J. 2007. Proteins 68:503–15118. Mendez R, Leplae R, Lensink MF, Wodak SJ. 2005. Proteins 60:150–69119. Kowalsman N, Eisenstein M. 2007. Bioinformatics 23:421–26120. Tobi D, Bahar I. 2005. Proc. Natl. Acad. Sci. USA 102:18908–13121. McKenna S, Moraes T, Pastushok L, Ptak C, Xiao W, et al. 2003. J. Biol. Chem.

278:13151–58122. Walters KJ, Lech PJ, Goh AM, Wang Q, Howley PM. 2003. Proc. Natl. Acad. Sci. USA

100:12694–99123. Law D, Hotchko M, Ten Eyck L. 2005. Proteins 60:302–7124. Chu F, Shan SO, Moustakas DT, Alber F, Egea PF, et al. 2004. Proc. Natl. Acad. Sci. USA

101:16454–59125. Schulz DM, Ihling C, Clore GM, Sinz A. 2004. Biochemistry 43:4703–15126. Matsuda T, Ikegami T, Nakajima N, Yamazaki T, Nakamura H. 2004. J. Biomol. NMR

29:325–38127. Gaboriaud C, Juanhuix J, Gruez A, Lacroix M, Darnault C, et al. 2003. J. Biol. Chem.

278:46974–82128. Azuma Y, Renault L, Garcia-Ranea JA, Valencia A, Nishimoto T, Wittinghofer A. 1999.

J. Mol. Biol. 289:1119–30129. Sachchidanand, Lequin O, Staunton D, Mulloy B, Forster MJ, et al. 2002. J. Biol. Chem.

277:50629–35130. Dobrodumov A, Gronenborn AM. 2003. Proteins 53:18–32131. Dominguez C, Bonvin AM, Winkler GS, van Schaik FM, Timmers HT, Boelens R. 2004.

Structure 12:633–44132. Eriksson MA, Roux B. 2002. Biophys. J. 83:2595–609133. Tomaselli S, Ragona L, Zetta L, Assfalg M, Ferranti P, et al. 2007. Proteins 69:177–91134. Ben-Zeev E, Zarivach R, Shoham M, Yonath A, Eisenstein M. 2003. J. Biomol. Struct.

Dyn. 20:669–76135. Ben-Zeev E, Eisenstein M. 2003. Proteins 52:24–27136. Fahmy A, Wagner G. 2002. J. Am. Chem. Soc. 124:1241–50137. Korkin D, Davis FP, Sali A. 2005. Protein Sci. 14:2350–60138. Korkin D, Davis FP, Alber F, Luong T, Shen MY, et al. 2006. PLoS Comput. Biol. 2:e153139. Davis FP, Sali A. 2005. Bioinformatics 21:1901–7140. Goddard TD, Huang CC, Ferrin TE. 2007. J. Struct. Biol. 157:281–87141. Frangakis AS, Forster F. 2004. Curr. Opin. Struct. Biol. 14:325–31142. Baker ML, Yu Z, Chiu W, Bajaj C. 2006. J. Struct. Biol. 156:432–41143. Zhou ZH, Baker ML, Jiang W, Dougherty M, Jakana J, et al. 2001. Nat. Struct. Biol.

8:868–73


Ann

u. R

ev. B

ioch

em. 2

008.

77:4

43-4

77. D

ownl

oade

d fr

om a

rjou

rnal

s.an

nual

revi

ews.

org

by U

nive

rsity

of

Que

ensl

and

on 0

3/23

/09.

For

per

sona

l use

onl

y.

ANRV345-BI77-19 ARI 28 April 2008 15:26

144. Eswar N, Webb B, Marti-Renom MA, Madhusudhan MS, Eramian D, et al. 2006. InCurrent Protocols Bioinformatics, ed. AD Baxevanis, GA Petsko, LD Stein, GD Stormo,Suppl. 15:5.6.1–30. New York: Wiley

145. Fabiola F, Chapman MS. 2005. Structure 13:389–400146. Topf M, Baker ML, John B, Chiu W, Sali A. 2005. J. Struct. Biol. 149:191–203147. Wriggers W, Milligan RA, McCammon JA. 1999. J. Struct. Biol. 125:185–95148. Volkmann N, Hanein D. 1999. J. Struct. Biol. 125:176–84149. Rossmann MG. 2000. Acta Crystallogr. D Biol. Crystallogr. 56(Part 10):1341–49150. Roseman AM. 2000. Acta Crystallogr. D Biol. Crystallogr. 56:1332–40151. Jiang W, Baker ML, Ludtke SJ, Chiu W. 2001. J. Mol. Biol. 308:1033–44152. Navaza J, Lepault J, Rey FA, Alvarez-Rua C, Borge J. 2002. Acta Crystallogr. D Biol.

Crystallogr. 58:1820–25153. Wu X, Milne JL, Borgnia MJ, Rostapshov AV, Subramaniam S, Brooks BR. 2003. J.

Struct. Biol. 141:63–76153a.Garzon JI, Kovacs J, Abagyan R, Chacon P. 2007. Bioinformatics 23:427–33154. Halperin I, Ma B, Wolfson H, Nussinov R. 2002. Proteins 47:409–43155. Baker D, Sali A. 2001. Science 294:93–96156. Velazquez-Muriel JA, Sorzano CO, Scheres SH, Carazo JM. 2005. J. Mol. Biol. 345:759–

71157. Velazquez-Muriel JA, Valle M, Santamaria-Pang A, Kakadiaris IA, Carazo JM. 2006.

Structure 14:1115–26158. Baker ML, Jiang W, Wedemeyer WJ, Rixon FJ, Baker D, Chiu W. 2006. PLoS Comput.

Biol. 2:e146159. Chen JZ, Furst J, Chapman MS, Grigorieff N. 2003. J. Struct. Biol. 144:144–51160. Tama F, Brooks CL. 2006. Annu. Rev. Biophys. Biomol. Struct. 35:115–33161. Topf M, Baker ML, Marti-Renom MA, Chiu W, Sali A. 2006. J. Mol. Biol. 357:1655–68162. Frangakis AS, Rath BK. 2006. In Electron Tomography, ed. J Frank, pp. 401–16. New York:

Springer Verlag163. Roseman AM. 2003. Ultramicroscopy 94:225–36164. Rath BK, Hegerl R, Leith A, Shaikh TR, Wagenknecht T, Frank J. 2003. J. Struct. Biol.

144:95–103165. Zheng W, Doniach S. 2005. Protein. Eng. Des. Sel. 18:209–19166. Shih AY, Denisov IG, Phillips JC, Sligar SG, Schulten K. 2005. Biophys. J. 88:548–56167. Stuhrmann H. 1970. Acta Crystallogr. A 26:297–306167a.Krukenberg KA, Forster F, Rice LM, Sali A, Agard DA. 2008. Structure. In press168. Chacon P, Moran F, Diaz JF, Pantos E, Andreu JM. 1998. Biophys. J. 74:2760–75169. Walther D, Cohen FE, Doniach S. 2000 . J. Appl. Crystallogr. 33:350–63170. Svergun DI. 1999. Biophys. J. 76:2879–86171. Svergun DI, Petoukhov MV, Koch MH. 2001. Biophys. J. 80:2946–53172. Petoukhov MV, Eady NA, Brown KA, Svergun DI. 2002. Biophys. J. 83:3113–25173. Petoukhov MV, Svergun DI. 2005. Biophys. J. 89:1237–50174. Bernado P, Mylonas E, Petoukhov MV, Blackledge M, Svergun DI. 2007. J. Am. Chem.

Soc. 129:5656–64175. Tidow H, Melero R, Mylonas E, Freund SM, Grossmann JG, et al. 2007. Proc. Natl. Acad.

Sci. USA 104:12324–29176. Wu Y, Tian X, Lu M, Chen M, Wang Q, Ma J. 2005. Structure 13:1587–97177. Grishaev A, Wu J, Trewhella J, Bax A. 2005. J. Am. Chem. Soc. 127:16621–28178. Phizicky E, Bastiaens PI, Zhu H, Snyder M, Fields S. 2003. Nature 422:208–15

476 Alber et al.

Ann

u. R

ev. B

ioch

em. 2

008.

77:4

43-4

77. D

ownl

oade

d fr

om a

rjou

rnal

s.an

nual

revi

ews.

org

by U

nive

rsity

of

Que

ensl

and

on 0

3/23

/09.

For

per

sona

l use

onl

y.

ANRV345-BI77-19 ARI 28 April 2008 15:26

179. Valencia A, Pazos F. 2002. Curr. Opin. Struct. Biol. 12:368–73180. Brunger AT. 1993. Acta Crystallogr. D Biol. Crystallogr. 49:24–36181. Lim RY, Fahrenkrog B. 2006. Curr. Opin. Cell Biol. 18:342–47182. Yang Q, Rout MP, Akey CW. 1998. Mol. Cell 1:223–34183. Devos D, Dokudovskaya S, Williams R, Alber F, Eswar N, et al. 2006. Proc. Natl. Acad.

Sci. USA 103:2172–77184. Devos D, Dokudovskaya S, Alber F, Williams R, Chait BT, et al. 2004. PLoS Biol. 2:e380185. Tama F, Miyashita O, Brooks CL 3rd. 2004. J. Mol. Biol. 337:985–99186. Chandramouli P, Topf M, Menetret J, Eswar N, Gutell R, et al. 2008. Structure. In press187. Topf M, Lasker K, Webb B, Wolfson H, Chiu W, Sali A. 2008. Structure 16:295–307


Ann

u. R

ev. B

ioch

em. 2

008.

77:4

43-4

77. D

ownl

oade

d fr

om a

rjou

rnal

s.an

nual

revi

ews.

org

by U

nive

rsity

of

Que

ensl

and

on 0

3/23

/09.

For

per

sona

l use

onl

y.

AR345-FM ARI 7 May 2008 14:43

Annual Review ofBiochemistry

Volume 77, 2008Contents

Prefatory Chapters

Discovery of G Protein SignalingZvi Selinger � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �1

Moments of DiscoveryPaul Berg � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 14

Single-Molecule Theme

In singulo Biochemistry: When Less Is MoreCarlos Bustamante � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 45

Advances in Single-Molecule Fluorescence Methodsfor Molecular BiologyChirlmin Joo, Hamza Balci, Yuji Ishitsuka, Chittanon Buranachai,and Taekjip Ha � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 51

How RNA Unfolds and RefoldsPan T.X. Li, Jeffrey Vieregg, and Ignacio Tinoco, Jr. � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � 77

Single-Molecule Studies of Protein FoldingAlessandro Borgia, Philip M. Williams, and Jane Clarke � � � � � � � � � � � � � � � � � � � � � � � � � � � � �101

Structure and Mechanics of Membrane ProteinsAndreas Engel and Hermann E. Gaub � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �127

Single-Molecule Studies of RNA Polymerase: Motoring AlongKristina M. Herbert, William J. Greenleaf, and Steven M. Block � � � � � � � � � � � � � � � � � � � �149

Translation at the Single-Molecule LevelR. Andrew Marshall, Colin Echeverría Aitken, Magdalena Dorywalska,and Joseph D. Puglisi � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �177

Recent Advances in Optical TweezersJeffrey R. Moffitt, Yann R. Chemla, Steven B. Smith, and Carlos Bustamante � � � � � �205

Recent Advances in Biochemistry

Mechanism of Eukaryotic Homologous RecombinationJoseph San Filippo, Patrick Sung, and Hannah Klein � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �229

v

Ann

u. R

ev. B

ioch

em. 2

008.

77:4

43-4

77. D

ownl

oade

d fr

om a

rjou

rnal

s.an

nual

revi

ews.

org

by U

nive

rsity

of

Que

ensl

and

on 0

3/23

/09.

For

per

sona

l use

onl

y.

AR345-FM ARI 7 May 2008 14:43

Structural and Functional Relationships of the XPF/MUS81Family of ProteinsAlberto Ciccia, Neil McDonald, and Stephen C. West � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �259

Fat and Beyond: The Diverse Biology of PPARγ

Peter Tontonoz and Bruce M. Spiegelman � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �289

Eukaryotic DNA Ligases: Structural and Functional InsightsTom Ellenberger and Alan E. Tomkinson � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �313

Structure and Energetics of the Hydrogen-Bonded Backbonein Protein FoldingD. Wayne Bolen and George D. Rose � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �339

Macromolecular Modeling with RosettaRhiju Das and David Baker � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �363

Activity-Based Protein Profiling: From Enzyme Chemistryto Proteomic ChemistryBenjamin F. Cravatt, Aaron T. Wright, and John W. Kozarich � � � � � � � � � � � � � � � � � � � � � �383

Analyzing Protein Interaction Networks Using Structural InformationChristina Kiel, Pedro Beltrao, and Luis Serrano � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �415

Integrating Diverse Data for Structure Determinationof Macromolecular AssembliesFrank Alber, Friedrich Förster, Dmitry Korkin, Maya Topf, and Andrej Sali � � � � � � � �443

From the Determination of Complex Reaction Mechanismsto Systems BiologyJohn Ross � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �479

Biochemistry and Physiology of Mammalian SecretedPhospholipases A2

Gerard Lambeau and Michael H. Gelb � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �495

Glycosyltransferases: Structures, Functions, and MechanismsL.L. Lairson, B. Henrissat, G.J. Davies, and S.G. Withers � � � � � � � � � � � � � � � � � � � � � � � � � � �521

Structural Biology of the Tumor Suppressor p53Andreas C. Joerger and Alan R. Fersht � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �557

Toward a Biomechanical Understanding of Whole Bacterial CellsDylan M. Morris and Grant J. Jensen � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �583

How Does Synaptotagmin Trigger Neurotransmitter Release?Edwin R. Chapman � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �615

Protein Translocation Across the Bacterial Cytoplasmic MembraneArnold J.M. Driessen and Nico Nouwen � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �643

vi Contents

Ann

u. R

ev. B

ioch

em. 2

008.

77:4

43-4

77. D

ownl

oade

d fr

om a

rjou

rnal

s.an

nual

revi

ews.

org

by U

nive

rsity

of

Que

ensl

and

on 0

3/23

/09.

For

per

sona

l use

onl

y.

AR345-FM ARI 7 May 2008 14:43

Maturation of Iron-Sulfur Proteins in Eukaryotes: Mechanisms,Connected Processes, and DiseasesRoland Lill and Ulrich Mühlenhoff � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �669

CFTR Function and Prospects for TherapyJohn R. Riordan � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �701

Aging and Survival: The Genetics of Life Span Extensionby Dietary RestrictionWilliam Mair and Andrew Dillin � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �727

Cellular Defenses against Superoxide and Hydrogen PeroxideJames A. Imlay � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �755

Toward a Control Theory Analysis of AgingMichael P. Murphy and Linda Partridge � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �777

Indexes

Cumulative Index of Contributing Authors, Volumes 73–77 � � � � � � � � � � � � � � � � � � � � � � � �799

Cumulative Index of Chapter Titles, Volumes 73–77 � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �803

Errata

An online log of corrections to Annual Review of Biochemistry articles may be foundat http://biochem.annualreviews.org/errata.shtml

Contents vii

Ann

u. R

ev. B

ioch

em. 2

008.

77:4

43-4

77. D

ownl

oade

d fr

om a

rjou

rnal

s.an

nual

revi

ews.

org

by U

nive

rsity

of

Que

ensl

and

on 0

3/23

/09.

For

per

sona

l use

onl

y.

Integrating Diverse Data for Structure Determination of...

Documents

Transcript of Integrating Diverse Data for Structure Determination of...