Prof Shoba Ranganathan Dept. of Chemistry and Biomolecular Sciences, Macquarie University, Sydney,...

Post on 25-Dec-2015

229 views 0 download

Transcript of Prof Shoba Ranganathan Dept. of Chemistry and Biomolecular Sciences, Macquarie University, Sydney,...

Prof Shoba RanganathanProf Shoba Ranganathan

Dept. of Chemistry and Biomolecular Sciences, Dept. of Chemistry and Biomolecular Sciences, Macquarie University, Sydney, Australia &Macquarie University, Sydney, Australia &

Dept of Biochemistry, Yong Loo Lin School of MedicineDept of Biochemistry, Yong Loo Lin School of MedicineNational University of SingaporeNational University of Singapore

(shoba@bic.nus.edu.sg)(shoba@bic.nus.edu.sg)

Biomolecular Modeling:Biomolecular Modeling: building a 3D protein building a 3D protein

structure from its sequencestructure from its sequence

Why protein structure?Why protein structure? In the factory of the living cell, proteins are the

workers, performing a variety of tasks

Each protein adopts a particular folding pattern that determines its functionThe 3D structure of a protein brings

into close proximity residues that are far apart in the amino acid sequence

How does a protein fold?How does a protein fold? Most newly synthesized proteins fold

without assistance! Ribonuclease A: denatured protein

could refold and recover its activity (C. Anfinsen -1966)“Structure implies function”

The amino acid sequence encodes the protein’s structural information

1. Understanding Protein Structure

2. A Quick Overview of Sequence Analysis

3. Finding a Structural Homologue

4. Template Selection

5. Aligning the Query Sequence to Template Structure(s)

6. Building the Model

The basicsThe basics Proteins are linear heteropolymers: one or more

polypeptide chains Repeat units: 20 amino acid residues

Range from a few 10s-1000s Three-dimensional shapes (“folds”)

adopted vary enormously Experimental methods: X-ray

crystallography, electron microscopy and NMR (nuclear magnetic resonance)

The (L-)amino acidThe (L-)amino acid

N

R

C

O

O

C

+

-

Amino

Carboxylate

Side chain = H,CH3,…

Backbone

The peptide bondThe peptide bond

Coplanar atomsCoplanar atoms

Levels of protein structureLevels of protein structure

Zeroth: amino acid composition Primary

This is simply the order of covalent linkages along the polypeptide chain, i.e. the sequence itself

Levels of protein structureLevels of protein structure

Secondary Local organization of the protein backbone: -

helix, -strand (which assemble into -sheets), turn and interconnecting loop

Ramachandran / phi-psi plotRamachandran / phi-psi plot

-helix (right

handed)

-sheet

-helix (left handed)

Levels of protein structureLevels of protein structure

Tertiary packing of secondary

structure elements into a compact spatial unit

“Fold” or domain – this is the level to which structure prediction is currently possible

Levels of protein structureLevels of protein structure

Quaternary Assembly of homo- or

heteromeric protein chains

Usually the functional unit of a protein, especially for enzymes

Structural classesStructural classes

All- (helical) All- (sheet)

(parallel -sheet)

Structural classesStructural classes

(antiparallel -sheet)

Structural informationStructural information

Protein Data Bank: maintained by the Research Collaboratory for Structural Bioinformatics http://www.rcsb.org/pdb > 45,744 structures of proteins Also contains structures of DNA,

carbohydrates, protein-DNA complexes and numerous small ligand molecules.

The PDB dataThe PDB data

Text files Each entry is identified by a unique 4-

letter code: say 1emg 1emg entry

Header information Atomic coordinates in Å (1 Ångstrom

= 1.0e-10 m)

PDB Header detailsPDB Header details identifies the molecule, any modifications, date of

release of PDB entry

organism, keywords, method Authors, reference, resolution if X-ray structure Sequence, x-reference to sequence databases

HEADER GREENFLUORESCENT PROTEIN 12-NOV-98 1EMG TITLE GREEN FLUORESCENT PROTEIN (65-67 REPLACED BY CRO, S65T TITLE 2 SUBSTITUTION, Q80R) COMPND MOL_ID: 1; COMPND 2 MOLECULE: GREEN FLUORESCENT PROTEIN; COMPND 3 CHAIN: A; COMPND 4 ENGINEERED: YES; COMPND 5 MUTATION: 65 - 67 REPLACED BY CRO, S65T SUBSTITUTION, Q80R COMPND 6 SUBSTITUTION; COMPND 7 BIOLOGICAL_UNIT: MONOMER

The data itselfThe data itself

ATOM 1 N SER A 2 29.089 9.397 51.904 1.00 81.75 ATOM 2 CA SER A 2 27.883 10.162 52.185 1.00 79.71ATOM 3 C SER A 2 26.659 9.634 51.463 1.00 82.64 ATOM 4 O SER A 2 26.718 8.686 50.686 1.00 81.02 ATOM 5 CB SER A 2 28.039 11.660 51.932 1.00 75.59ATOM 6 OG SER A 2 27.582 12.038 50.639 1.00 43.28-------ATOM 1737 CD1 ILE A 229 39.535 21.584 52.346 1.00 41.62TER 1738 ILE A 229

Coordinates for each heavy (non-hydrogen) atom from the first residue to the last

Any ligands (starting with HETATM) follow the biomacromolecule

O of water molecules (also HETATM) at the end

Structural FamiliesStructural Families SCOP - Structural Classification Of

Proteins http://scop.mrc-lmb.cam.ac.uk/scop

FSSP – Family of Structurally Similar Proteins http://www.ebi.ac.uk/dali/fssp/

CATH – Class, Architecture, Topology, Homology http://www.biochem.ucl.ac.uk/bsm/cath

Structure comparison factsStructure comparison facts

Proteins adopt a limited number of topologies. Homologous sequences show very

similar structures, with strong conservation in secondary structural elements: variations in non-conserved regions.

In the absence of sequence homology, some folds are preferred by vastly different sequences.

Structure comparison factsStructure comparison facts The “active site” (a collection of functionally

critical residues) is remarkably conserved, even when the protein fold is different. Structural models (especially those based

on homology) provide insights into possible function for new proteins.

Implications for protein engineering ligand/drug design, function assignment of genomic data.

Visualizing PDB informationVisualizing PDB information RASMOL: most popular, available for all platforms (Sayle et al, 2005) http://www.bernstein-plus-sons.com/software/rasmol

DeepView Swiss-PDBViewer: from Swiss-Prot (Guex & Peitsch, 1997) http://tw.expasy.org/spdbv/

Chemscape Chime Plug-in: for PC and Mac http://www.mdli.com/products/framework/chemscape

PyMOL: Very good, available for all platforms (DeLano, W.L. The PyMOL Molecular Graphics System, 2002) http://pymol.sourceforge.net

RASMOL views - SH2 domain RASMOL views - SH2 domain

All-atom model Space-filling model

Atom colors: N O C S

C Trace Ribbon

Rainbow coloring: N to C Coloring: by structural units

RASMOL views – 1sha RASMOL views – 1sha

Homologous foldsHomologous folds Hemoglobin and

erythrocruorin: 31% sequence identity

Analogous foldsAnalogous folds Hemoglobin and

phycocyanin: 9% sequence identity

Surface PropertiesSurface PropertiesCro repressor –

DNA complex Basic residues

in blue Acidic residues

in red

Mapping Functional RegionsMapping Functional Regions

Immunoglobulin light chain - dimer

Hydrophobhic residues in magenta

Hydrophilic and charged residues in cyan

1. Understanding Protein Structure

2. A Quick Overview of Sequence Analysis

3. Finding a Structural Homologue

4. Template Selection

5. Aligning the Query Sequence to Template Structure(s)

6. Building the Model

Siblings and CousinsSiblings and Cousins Siblings or homologues: sequences with at least

30% sequence identity over an alignment length of at least 125 residues and conservation of function.

Cousins or paralogues: < 30% identity but with conservation of function

Both show structural conservation Homologues located using a database search tool

such as BLAST (free webserver): http://www.ncbi.nlm.nih.gov/BLAST

Paralogues require a more sensitive method such as PSI-BLAST

Multiple Sequence AlignmentMultiple Sequence AlignmentFinding the best way to match the residues ofrelated sequences Identical residues must be lined up The rest should be arranged, based on

observed substitution in protein families chemical similarity charge similarity

Where it is impossible to get the residues to line up, the biological concept of insertion/deletion in invoked: the ‘gap’ in alignments

MSA MethodsMSA Methods CLUSTALW / CLUSTALX (Thompson et al, 1997):

freely available for all platforms and one of the best alignment programs

http://www-igbmc.u-strasbg.fr/BioInfo/ClustalX/Top.html

MAXHOM (Sander & Schneider, 1991): alignment based on maximum homology; available via the PredictProtein webserver, free for academics

http://cubic.bioc.columbia.edu/predictprotein/

MALIGN (Johnson et al, 1994): freely available UNIX program, based on the structural alignment of protein families

http://www.abo.fi/fak/mnf/bkf/research/johnson/software.html

Alignment ChecksAlignment Checks Conservation of functionally important residues:

e.g. the catalytic triad (Asp-Ser-His) that are essential for serine proteinase activity

Line up of structurally important residues: e.g. cysteines forming disulfide bonds

Overall, maximizing the alignment of “like” residues

Completely conserved residues usually indicate some conserved structural or functional role, especially buried charges

Sequence Motifs & PatternsSequence Motifs & Patterns From the analysis of the alignment of

protein families Conserved sequence features, usually

associated with a specific function PROSITE (Hulo et al, 2006) database for

protein “signature” patterns: http://www.expasy.ch/prosite

Aligned Sequence FamiliesAligned Sequence Families From alignments of homologous

sequences: PRINTS PRODOM:

http://www.toulouse.inra.fr/prodom.html

From Hidden Markov Model based methods: PFAM: http://www.sanger.ac.uk/Pfam

Protein DomainsProtein Domains Most proteins are composed of structural subunits

called domains A domain is a compact unit of protein structure,

usually associated with a function. It is usually a “fold” - in the case of monomeric

soluble proteins. A domain comprises normally only one protein

chain: rare examples involving 2 chains are known.

Domains can be shared between different proteins: like a LEGO block

Protein ArchitecturesProtein Architectures Beads-on-a-string: sequential location: tyrosine-

protein kinase receptor TIE-1 (immunoglobulin, EGF, fibronectin type-3 and protein kinase).

Domain insertions: “plugged-in” - pyruvate kinase (1pyk)

SMART: smart.embl-heidelberg.deSimple Modular Architecture Retrieval Tool

Dissection into DomainsDissection into Domains A sequence, usually > 125 residues should

be routinely checked to see how many domains are present.

Conserved Domain Architecture Retrieval Tool (CDART) uses information in Pfam and SMART to assign domains along a sequence

E.g. NP_002917 shows similarity to G-protein regulators:

1. Understanding Protein Structure

2. A Quick Overview of Sequence Analysis

3. Finding a Structural Homologue

4. Template Selection

5. Aligning the Query Sequence to Template Structure(s)

6. Building the Model

Structural HomologuesStructural Homologues

BLASTP vs. PDB database or PSI-BLAST: look for 4-character PDB ID E < 0.005

Domain coverage: at least 60% coverage is recommended

Gaps: we don’t want them. Choose between: few gaps and reasonable similarity scores or lots of gaps and high similarity scores?

Small Proteins: Disulfide bondsSmall Proteins: Disulfide bonds BLAST-type methods may not locate

homologues, if Conserved Domain search is not turned on.

Are the Cys residues conserved? Gaps: where are they on the structure?

gnl|Pfam|pfam00095, wap, WAP-type (Whey Acidic Protein) four-disulfide core'.

CD-Length = 46 residues, 100.0% aligned Score = 43.9 bits (102), Expect = 1e-06

Q:49 KAGFCPWNLLQMISSTGPCPMKIECSSDRECSGNMKCCNVDCVMTCTPP 97 D: 1 KPGVCPWVSISE---AGQCLELNPCQSDEECPGNKKCCPGSCGMSCLTP46

Metal-binding domainsMetal-binding domains

C2H2 Zinc Finger 2 Cys & 2 His binding to

Zinc Not detected even by CD-

search in BLAST Detected by Pfam &

SMART Sequence Pattern:#-X-C-X(1-5)-C-X3-#-X5-#-

X2-H-X(3-6)-[H/C]

Structure Prediction MethodsStructure Prediction Methods Secondary Structure Prediction: identify local

structural elements such as helices, strands and loops.

> 75% accuracy achievable PredictProtein or PHD

http://cubic.bioc.columbia.edu/pp/ PSIPRED

http://bioinf.cs.ucl.ac.uk/psipred/ SSPro

http://promoter.ics.uci.edu/BRNN-PRED/

Folds from Secondary Folds from Secondary Structure PredictionsStructure Predictions

Assembling SSEs into folds is a combinatorial problem

Current methods depend on available structural data for mapping predictions: FORREST

http://abs.cit.nih.gov/foresst/foresst.html TOPITS from the PHD server

http://cubic.bioc.columbia.edu/pp

Tertiary Structure PredictionTertiary Structure Prediction Fold recognition/Threading: < 20% identity

typically Best results obtained by combining several

database search and knowledge-based tools: 3D-PSSM

http://www.sbg.bio.ic.ac.uk/~3dpssm/ FUGUE

http://www-cryst.bioc.cam.ac.uk/fugue/

1. Understanding Protein Structure

2. A Quick Overview of Sequence Analysis

3. Finding a Structural Homologue

4. Template Selection

5. Aligning the Query Sequence to Template Structure(s)

6. Building the Model

One or many templates?One or many templates? Sequence similarity: extract template

sequences and align with query: select the most similar structure

Completeness: Missing data? REMARK 465 MISSING RESIDUES REMARK 465 THE FOLLOWING RESIDUES WERE NOT LOCATED IN THE REMARK 465 EXPERIMENT. (M=MODEL NUMBER; RES=RESIDUE NAME; C=CHAIN REMARK 465 IDENTIFIER; SSSEQ=SEQUENCE NUMBER; I=INSERTION CODE.) REMARK 465 REMARK 465 M RES C SSSEQI REMARK 465 MET A 1 REMARK 465 THR A 230

REMARK 470 M RES CSSEQI ATOMS REMARK 470 GLU A 5 OE2 REMARK 470 GLU A 6 CG CD OE1 OE2 REMARK 470 GLU A 17 OE1

One or many templates?One or many templates? X-ray or NMR?:

Lowest resolution X-ray structure X-ray and then NMR NMR average over assembly

One or many?: Structure alignment of C atoms If 2 templates are very close, keep only one Keep templates that provide new information

Many templatesMany templates Sequence alignment from structure

comparison of templates (SSA) can be different from a simple sequence alignment (SA).

For model building, 1. align templates structurally

2. extract the corresponding SSA

1. Understanding Protein Structure

2. A Quick Overview of Sequence Analysis

3. Finding a Structural Homologue

4. Template Selection

5. Aligning the Query Sequence to the Template Structure(s)

6. Building the Model

Query - Template AlignmentQuery - Template Alignment >40% identity: any alignment method is OK Below this, checks are essential.

Collect close sequence homologues (about 10) and align to query to get MSA (multiple sequence alignment)

Collect several structural templates (at least 5) and align them using structure comparison methods: extract the SSA (structural sequence alignment)

Align MSA to SSA using profile alignment Extract query and selected template(s) from the

final alignment – QTA.

QTA ChecksQTA Checks Residue conservation checks

Functional regions Patterns/motifs conserved?

Indels Combine gaps separated by few residues

Editing the alignment Move gaps from secondary structures to

loops Within loops, move gaps to loop ends, i.e.

turnaround point of backbone

QTA ChecksQTA Checks Residue conservation checks

Functional regions Patterns/motifs conserved?

Indels Combine gaps separated by few residues

Editing the alignment Move gaps from secondary structures to

loops Within loops, move gaps to loop ends, i.e.

turnaround point of backbone

Visual Inspection of IndelsVisual Inspection of Indels 2-residue

deletion from sequence alignment

End-of-loop 2-residue deletion

1. Understanding Protein Structure

2. A Quick Overview of Sequence Analysis

3. Finding a Structural Homologue

4. Template Selection

5. Aligning the Query Sequence to Template Structure(s)

6. Building the Model

Input for Model BuildingInput for Model Building

Query sequence Template structure

Template sequence Query-template sequence alignment

Methods AvailableMethods Available

1. WHATIF (Vriend G, 1990) : High quality models where template is

available Indels not modelled

Side chain rotamers In silico mutations In silico disulfide bond creation

Methods AvailableMethods Available

2. SWISS-MODEL (Schwede et al, 2003) : Automatic modeling mode with multiple

templates Query + template input High Homology situations DeepView for input file creation

Methods AvailableMethods Available3. MODELLER (Sali & Blundell, 1993) :

High quality models Sequence alignment Structure analysis/alignment Multiple templates Multiple chains Ligand/cofactor present

4. ESyPred3D (uses MODELLER): QTAs from several methods & neural networks http://www.fundp.ac.be/urbm/bioinfo/esypred/

Methods AvailableMethods Available5. ICM (Ruben et al, 1994) :

High quality models Loop modelling

Multiple templates not possible Sequence/Structure alignment/analysis Ab initio peptide modeling Secondary structure prediction

6. Geno3D (Combet et al, 2002) : Automated modelling Distance geometry used for loops http://geno

Methods AvailableMethods Available

7. 3D-JIGSAW (Bates et al, 2001) : Automatic modeling mode Interactive user mode to select templates Multiple templates Multidomain protein modeling

Methods AvailableMethods Available

8. CPH-MODELS (Lund et al, 1997) : Fully automated FASTA search for templates Not validated

Automatic or Manual Mode?Automatic or Manual Mode? Automatic: High homology

Manual Medium/Low homology Template from structure prediction Multiple templates Multiple chains Ligand present

How good is the model?How good is the model?

Structural Quality Analysis

PROCHECK (Laskowski et al, 1993) :

WHATIF (Vriend G, 1990) : ERRAT (Colovos & Yeates, 1993) :

Improving ill-defined regionsImproving ill-defined regions

Iterative model building Rebuild or anneal bad regions Check/edit alignment and rebuild

Molecular dynamics and/or Monte Carlo simulations Compute intensive Input files need to be set up Optional

Molecular Modeling ProtocolMolecular Modeling Protocol Resources required

The query sequence   Personal computer with internet

connectivity RASMOL/DeepView for PDB structure

visualization CLUSTALX sequence alignment software Access to a UNIX workstation MODELLER/ICM – UNIX software WHATIF – UNIX/PC software PROCHECK – UNIX or Windows software

MM Protocol – Input FilesMM Protocol – Input Files

Minimum requirement Query sequence Template structure

Template sequence Query-Template alignment

Ex 1. High Homology CaseEx 1. High Homology Case

Human SOX9 WT - homologous to SRY (PDB: 1HRY) - 49% identity

S9WT: ..AGAACAATGG.. highest SOXCORE: ..GCAACAATCT.. least Mutants (campomelic dysplasia):

F12L: No DNA binding H65Y: Minimal binding P70R: altered specificity; no SOXCORE A19V: near WT but normal binding

1. SOX9 Models1. SOX9 Models

WT & P70R models built

C overlay:

WT-SRY ~ 0.72 Å J. Biol. Chem. 274

(1999) 24023

SOX9 & P70Rbased on SRY

Ex 1. SOX9 Ex 1. SOX9 + DNA Models+ DNA Models

Observed disease-linked mutations mapped

Other residues in DNA-binding groove determined

SOX9-WT

SOX9-P70R

SRY

Ex 2. Low Homology SituationEx 2. Low Homology Situation Pigments from reef-building corals: similar to

Pocilloporin fluoresce under UV and visible radiation similar to the Green Fluorescent Protein -

GFP (19.6% identity) contain ‘QYG’ instead of ‘SYG’ in GFP, as

proposed fluorophore

2. Alignment of POC4 & GFP2. Alignment of POC4 & GFP

2. POC4 Model2. POC4 Model Barrel ends open C-ter not included -sheet OK ‘QYG’ fits the site! 26 residues within 5Å

of QYG (only 19 in GFP)

Increased thermal stability

UV protection

Ex 3. Small Disulfide-bonded Ex 3. Small Disulfide-bonded Protein: Complement Factor HProtein: Complement Factor H

20 tandem homologous units = SCRs (short consensus repeat) or “sushi” regions

Each SCR is ~ 60 aa: conserved Y, P, G 2 disulfide bridges:1-3 & 2-4 Linkers of 3-8 aa

Heparin binding SCRs: 7 (high affinity) & 20

Previous SCRs required for activity: minimum constructs are fH67 and fH18-20

C1

C3

C2C4

N-ter

C-ter

HV loop

3. Sequence Alignment of Close 3. Sequence Alignment of Close Functional HomologuesFunctional Homologues

hfH 385 C L R K C Y F P Y L E N G Y N Q N H G R K F V Q G K S I D V A

fHR-3 83 C L R K C Y F P Y L E N G Y N Q N Y G R K F V Q G N S T E V A

bfH 292 C L R Q C I F N Y L E N G H N Q H R E E K Y L Q G E T V R V H

mfH 385 C V R K C V F H Y V E N G D S A Y W E K V Y V Q G Q S L K V Q

Consensus * : * : * * * : * * * . . : : * * : : *

hfH 416 C H P G Y A L P K A - Q T T V T C M E N G W S P T P R C I R 444

fHR-3 114 C H P G Y G L P K V R Q T T V T C T E N G W S P T P R C I R 143

bfH 323 C Y E G Y S L Q N D - Q N T M T C T E S G W S P P P R C I R 351

mfH 416 C Y N G Y S L Q N G - Q D T M T C T E N G W S P P P K C I R 444

Consensus * : * * . * : * * : * * * . * * * * . * : * * *

Site A Site d Site B Site c

3. Templates for fH SCRs 6-73. Templates for fH SCRs 6-7 hfH SCRs 15&16 (fH1516; PDB ID: 1HFH) Vaccinia virus complement control protein

domains 3&4 (vcp34; PDB ID: 1VVC)

Orientations differ considerably Vcp34 28% identical to hfH67 compared to

hfH1516 (25%) !

hfH15 hfH16

vcp3vc

p4

hfH15 hfH16

vcp3

vcp4

3. Query-Templates alignment3. Query-Templates alignment

3. hfH67 model3. hfH67 model

hfH67

hfH1516

Sialic acid

Heparin

disaccharide repeat

3. Locating residues for 3. Locating residues for mutation from modelmutation from model

Arg-404Arg-387

Lys-405 His

-402

Lys-410

Lys-388SCRs 6&7 SCRs 15&16

Pacific Symposium of Biocomputing 2000, 5:155Pacific Symposium of Biocomputing 2000, 5:155

Ex 4: Protein EngineeringEx 4: Protein Engineering Thermolysin-like

protease unstable at high temperatures (> 40 ºC unlike trypsin)

Homology Model built G8 & N60 suited for

disulfide bond Double Mutant

functional at 92.5 ºC

J Mansfeld et al. Extreme Stabilization of a Thermolysin-like Protease by an Engineered Disulfide Bond” J. Biol. Chem. 1997 272: 11152-11156.

Ex 5. Multiple chains: Human Ex 5. Multiple chains: Human Hand, Foot & Mouth Disease Hand, Foot & Mouth Disease Virus capsidVirus capsid 2000 outbreak of

HFMD in Singapore: thousands of children affected – 4 deaths (The Lancet, 2000, 356, 1338)

Major etiological agent: EV71 (enterovirus group)

Neurological complications

5. EV71 genome structure5. EV71 genome structureCapsid Replication

VP4 VP2 VP3 VP1 2A 2B 2C 3A,B 3C 3D

5’ AUG 3’ UAGVPg

PolyA

EV71 specific primersPan-enterovirus primers

95/94% (RNA) homology coxsackievirus A16/B3 Only 1% difference between neurovirulent and

non-neuro virulent isolates Most variations in non-capsid regions Within capsid regions, VP1 shows maximum

variability relative to other Evs Differences in capsid region 1: VP1 & VP2, 2: VP3

5. Picornaviridae5. PicornaviridaeIcosahedralCapsid

5. Template hunt5. Template hunt

BLASTP against PDB sequences VP1: 3 templates

1BEV 38.7% (bovine enterovirus) 1EAH 36.5% (poliovirus type 2 strain Lansing) 1FPN 38.0% (human rhinovirus serotype 2)

VP2: 1BEV 56.7% VP3: 1BEV 54.9% VP4: 1BEV* 50.0%

5. Fixing the VP1 Alignment5. Fixing the VP1 Alignment

Structural alignment of templates: using VAST (Gibrat, Madej, & Bryant, 1996)

Extract corresponding sequence alignment

Match HFMDV VP1 to aligned templates using profile alignment in CLUSTALW

5. VP1 alignment to templates5. VP1 alignment to templates

VP11BEV1 14 Q A A G A L V A G T S T S T H S V A T D S T P A L Q A A E T G A T S T A R D E S M I E T R T I V P T H G I H E T S V E S F F G R S S L V G M1EAH1 24 - - - A N N L P D T Q S S G P A H S - K E T P A L T A V E T G A T N P L V P S D T V Q T R H V I Q K R T R S E S T V E S F F A R G A C V A I1FPN1 15 - - - - L V V P N I N S S N P T T S - N S A P A L D A A E T G H T S S V Q P E D V I E T R Y V Q T S Q T R D E M S L E S F L G R S G C I H EEV711 23 A L P A P T G Q N T Q V S S H R L D T G E V P A L Q A A E I G A S S N T S D E S M I E T R C V L N S H S T A E T T L D S F F S R A G L V G E

. . * . . * * * * . * * : . . . : : * * : . : * : : : * * : . * . . :

1BEV1 84 P L L A T - - - - - - G T S I T H W R I D F R E F V Q L R A K M S W F T Y M R F D V E F T I I A T S S - T G Q N V T T E Q H T T Y Q V M Y V1EAH1 90 I E V D N D - - - - - S K L F S V W K I T Y K D T V Q L R R K L E F F T Y S R F D M E F T F V V T S N Y T D A N N G H A L N Q V Y Q I M Y I1FPN1 80 S K L E V T L A N Y N K E N F T V W A I N L Q E M A Q I R R K F E L F T Y T R F D S E I T L V P C I S A L - - - S Q D I G H I T M Q Y M Y VEV711 93 I D L P L E - G T T N P N G Y A N W D I D I T G Y A Q M R R K V E L F T Y M R F D A E F T F V A C T P - - - - - T G E V V P Q L L Q Y M F V

: : * * . * : * * . . * * * * * * * : * : : * * : :

1BEV1 147 P P G A P V P S N Q D S F Q W Q S G C N P S V F A D T D G P P A Q F S V P F M S S A N A Y S T V Y D G Y A R F M - - - D T - - - D P D R Y G1EAH1 161 P P G A P I P G K W N D Y T W Q T S S N P S V F Y T Y G A P P A R I S V P Y V G I A N A Y S H F Y D G F A K V P L A G Q A S T E G D S L Y G1FPN1 147 P P G A P V P N S R D D Y A W Q S G T N A S V F W Q H G Q A Y P R F S L P F L S V A S A Y Y M F Y D G Y D E - - - - - - - - - - Q D Q N Y GEV711 157 P P G A P K P E S R E S L A W Q T A T N P S V F V K L T D P P A Q V S V P F M S P A S A Y Q W F Y D G Y P T F G - - - E H K Q E K D L E Y G

* * * * * * . : . * * : . * . * * * . . : . * : * : : . * . * * . * * * : * *

1BEV1 211 I L P S N F L G F M Y F R T L E D - - - A A H Q V R F R I Y A K I K H T S C W I P R A P R Q A P Y K K R Y N L V F S - - G - D S D R I C S N1EAH1 231 A A S L N D F G S L A V R V V N D H N P T K L T S K I R V Y M K P K H V R V W C P R P P R A V P Y Y G P - G V D Y K - - D - G L A P - L P G1FPN1 207 T A N T N N M G S L C S R I V T E K H I H K V H I M T R I Y H K A K H V K A W C P R P P R A L E Y T R A H R T N F K I E D R S I Q T A I V TEV711 224 A C P N N M M G T F S V R T V G S S - K S K Y P L V V R I Y M R M K H V R A W I P R P M R N Q N Y L F K A N P N Y A - - G N S I K P T G T S

* : * : * : . * : * : * * . * * * . * * : . .

1BEV1 275 R A S L T S Y1EAH1 296 K - G L T T Y1FPN1 277 R P I I T T AEV711 291 R T A I T T -

: : * :

281301283296

strandsheliceshelices

Pocket-factor binding residues

5. Model building steps5. Model building steps Build all 4 capsid proteins (VP1-VP4)

together to ensure 3D fit Use 1BEV alone for VP2-VP4 For VP1: use aligned 1BEV, 1EAH,

1FPN Check model

5. Round 1: 5. Round 1: VP1VP1,,VP2VP2,,VP3VP3,,VP4VP4

Clip hanging ends

Re-position problem loops:

adjust gaps in alignment

Build again

5. Round 2: Pentamer Check5. Round 2: Pentamer Check

Loops look OK Build pentamer Publish…. Oops: clash in

pentamer assembly. Go back

5. Close encounters of the 35. Close encounters of the 3rdrd Kind Kind

Build only VP3 pentamer

N-terminus of each VP3 hydrogen-bonded

Also, in BEV, Asp-Lys ion pair

First 25 aa overlay v. well

5. Fourth foray: build with 5 VP3s5. Fourth foray: build with 5 VP3s

Only first 50aa of the other 4 VP3s included

Model resulted in knots due to insufficient refinement cycles

However, VP3 pentameric region OK

5. Fifth 5. Fifth and final and final attemptattempt

5. Canyon Pit and Antigenic Sites5. Canyon Pit and Antigenic Sites

Cardiovirus

Neurovirulent Polio (mouse)

Poliovirussites

5. Putative antigenic sites5. Putative antigenic sites

VP2VP1

5. HFMDV Conclusions 5. HFMDV Conclusions Unique surface loops identified for

Immunodiagnostic assays Vaccine design Antibodies being generated

Canyon pit: depth is similar to BEV Mapping the antigenic regions of other related enteroviruses

on the HFMDV surface: specific VP1 and VP2 sites buried Sunita Singh, Vincent T. K. Chow, C. L. Poh, M. C. Phoon:

Dept. of Microbiology, NUS Applied Bioinformatics, Vol 1, issue 1, 43-52: invited research

article