Computational Virology

42
Marcella A. McClure, Ph.D. Department of Microbiology and the Center for Computational Biology Montana State University, Bozeman MT [email protected] Computational Virology Lectures in Bioinformatic Studies on the Evolution Structure and Function of RNA-based Life Forms

description

Computational Virology. Lectures in. Bioinformatic Studies on the Evolution Structure and Function of RNA-based Life Forms. Marcella A. McClure, Ph.D. Department of Microbiology and the Center for Computational Biology Montana State University, Bozeman MT [email protected]. B I - PowerPoint PPT Presentation

Transcript of Computational Virology

Page 1: Computational Virology

Marcella A. McClure, Ph.D.Department of Microbiology and the Center for Computational Biology

Montana State University, Bozeman MT

[email protected]

Computational VirologyLectures in

Bioinformatic Studies on the Evolution Structure and Function of RNA-based Life Forms

Page 2: Computational Virology

McCLURE

LAB

OF

BIOINFORMATICS

Living on the edge of statistics!

Page 3: Computational Virology

Bioinformatics is the creation of new knowledge from existing data. This

type of research takes place in silico and includes the development

and testing of the software tools necessary to analyze the data.

McClure, 2000

What is Bioinformatics?

Page 4: Computational Virology

is an interplay between knowledge of empirically derived data,

bioinformatic tools and human decision making. Exactly which information and tools are to be

accessed is dependent on the nature of the question of interest.

McClure, 2000

The Practice of Bioinformatics

Page 5: Computational Virology

1) Potential Multiple Endonuclease Functions and a Ribonuclease H Encoded in Retroposon Genomes

2) Hypothesis: The Reverse Transcriptase Domain Shares Common Ancestry with the RNA-dependent RNA polymerase of both positive and negative-stranded RNA

viruses: a Test of Protein Motif FindingMethods.

3) A Functional Genomics Challenge: the Transcription/ Replication Complex of the Order Mononegavirales.

Rabies, measles and Ebola viruses are true RNA-based life-forms that have no DNA stage belonging to the order Mononegavirales. To date, little has been learned about

the distribution of functions within or the actual structure of the replication /transcription complex (three proteins and the RNA template). The goal is to elucidate potential regions and residues Of protein:protein interactions of the replication /transcription complex without structural information. The studies will proceed along three paths: prediction of disorder; determination of compensatory mutation; and the assessment of evolutionary dynamics. Correlation of the results of these methods will provide high probability candidates for the protein:protein contacts.

Recent and Current Projects

Page 6: Computational Virology

Recent and Current Projects, cont.

4) Mapping of All Genomic Retroid Agents: Prototype Human Genome.

The Retroid Agents (e.g., HIV, Hepatitis B, retrotransposons, etc.) encode the reverse transcriptase thereby providing the interface for the transfer of genetic information from RNA-based to DNA-based replication systems. The goal of this project is to identify, classify and map of all Retroid Agents of a specific genome. The Genome Parsing Suite is the prototype software that not only identifies and classes these agents, but also determines Retroid genome boundaries, architecture, and gene complement, and also assesses the host environment of each agent. These data are then used to create a browseable database that will be available for display through the UCSC Genome Browser. The creation of this database is necessary for hypothesis testing regarding the roles that Retroid Agents play in the reproduction, development, evolution and diseases processes in Eukaryotes, including humans.

Page 7: Computational Virology

1) Introduction to RNA-based life forms

2) Methods to test the hypothesis.

3) Testing the hypothesis.

4) Predicting protein contacts.

Summary Lecture I

Page 8: Computational Virology

DNA viruses

ssDNA dsDNAdsDNA

RNA viruses

ssRNA dsRNA

+ ssRNA+ ssRNA - ssRNA- ssRNA

Does the RT domain of the RdDp share common ancestry with the RdRp of negative and positive polarity, single-

stranded viruses?

RdDp

RdRphost Pol II

The World of Viruses

Page 9: Computational Virology

Paramyxoviridae

Filoviridae

Retroviridae

Picornaviridae

Rhabdoviridae

Page 10: Computational Virology

Retroviruses, retrotransposons, pararetroviruses, retroposons, retroplasmids, retrointrons, and retrons

Replication by

DNA-dependent

DNA polymerase

PROTEIN SYNTHESIS

snRNAs, ribozymes, tRNA, rRNA

translationtranscription

reverse transcriptase mediated replication

or transposition

RNA DNA

RNA viruses e.g.,

Ebola, rabies,

influenza, polio

Replication by

RNA-dependent

RNA Polymerase

All cellular systems

& most DNA Viruses

McClure, 2000

Retroid Agents

Page 11: Computational Virology

Mononegavirales“OLD” FOESrabies (Rhabdoviridae)measles, RSV, mumps (Paramyxoviridae)

“EMERGING” THREATSEbola, Marburg (Filoviridae)equine morbillivirus, Nipah virus (Paramyxoviridae)

MODEL AGENTvesicular stomatitis virus (Rhabdoviridae)

Page 12: Computational Virology

1) Disease:a) retroviruses:

1) exogenous infectious: HIV HTLV2) endogenous associations: breast cancer, testicular tumors,

insulin dependent diabetes, multiple sclerosis, rheumatoid arthritis, schizophrenia and systemic lupus erythematosus b)LINEs insertional mutagenesis:

1) Hemophilia A 2) muscular dystrophies; Duchenne and Fukuyama- congenital type3) X-linked disorders; Alport Syndrome-Diffuse

Leiomyomatosis and Chronic Granulomatous Disease 2) Regulation of cellular genes and reproduction3) Telomere maintenance4) Repair of broken dsDNA5) Exchange of genetic information among and between organisms

Roles of Retroid Agents:

Page 13: Computational Virology

Plus-strand RNA Virus Families and Human Diseases

Togaviridae - Riff Valley Fever

Flaviviridae - Dengue Fever virus, West Nile virus

Coronaviridae - Infectious Bronchitis

Caliciviridae - Hepatitis E virus

Picornaviridae - Human poliovirus, Hepatitis A

Page 14: Computational Virology

MMLV Genome

Paramyxoviridae Genome

N P/C/V M F HN RdRp

N P M G RdRp

N VP35 VP40 G VP30 VP24 RdRp

Filoviridae Genome

5’LTR GAG RdDp ENV 3’LTR

PRO RT/RH INT

Rhabdoviridae Genome

Picornaviridae GenomeVPg L P4 P2 P3 P1 2A 2B 2C 3A 3B 3C 3D Poly(A)

RdRp

Page 15: Computational Virology

L

NP

P

3'

3'

L

P PP

P

N

N

n

L

P

PP

5'

P

L

PP

PP

P

CO-ASSEMBLY?

P P

L

leader N

5'

P

read through

VSV Transcription

VSV Replication

VSV Transcription

Page 16: Computational Virology

RNA Template

Page 17: Computational Virology

Replication

Page 18: Computational Virology

Model of a poliovirus polymerase-dsRNA complex based on the structure of HIV-1 RT complexed to dsDNA (Huang etal., 1998).

HIV-1 Reverse Transcriptase

Poliovirus Polymerase

Poliovirus Polymerase Oligorner

Model of a poliovirus polymerase-dsRNA complex

Page 19: Computational Virology

Analysis of Multiple Alignment

Search Databases

Multiple Alignment of Sequences

Refined Multiple Alignment

Annotate and Preparation of Sequences

McClure, 2000

Basic Strategy

Page 20: Computational Virology

McClure, 2000

Biological Patterns

“Whether randomness can be measured is a difficult problem. One cannot judge the absence of pattern without specifying which pattern, and what is a pattern to you may not be a pattern to me.”

Page 21: Computational Virology

McClure 2002

An OSM, which may span hundreds of residues, is defined as a set of conserved or semi-conserved motifs (1-9 contiguous amino acid residues) found in the same arrangement relative to one another in all sequences of a protein family. The amino acids of these patterns are involved in catalysis or structural integrity. The spacing between motifs or motif intervening regions (MIRs) can be highly variable, reflecting the regions of a protein that are less restricted by functional or structural constrains. MIRs may evolve more rapidly and be more subject to insertion/deletion events, and duplications that the OSM.

Why is OSM identification important?The OSM of a protein family can be used to predict function. The identification of an OSM common among protein sequences with as little as 8% amino acid identity has led to successful prediction of function. If a multiple alignment method, (be it global or local) cannot correctly identify the highly conserved residues of a given sequence that are critical for

function and structure, then it is of little value.

What is an ordered series of motifs (OSM)?

Page 22: Computational Virology

LEVELS of SEQUENCE COMPARISONS

> 25% IDENTICAL = HOMOLOGYHT01 PGNNPVFPVKKANGTWRFIHDLRATNSLTIDLSSSSPGPPDLSSLPTTLAHLQTIDLKDAFFQIPLPKQFHT02 PGNNPVFPVKKANGTWRFIHDLRATNSLTIDLSSSSPGPPDLSSLPTTLAHLQTIDLKDAFFQIPLPKQFHT11 PGNNPVFPVKKPNGKWRFIHDLRATNAITTTLTSPSPGPPDLTSLPTALPHLQTIDLTDAFFQIPLPKQYBL01 PGNNPVFPVRKPNGAWRFVHDLRATNALTKPIPALSPGPPDLTAIPTHPPHIICLDLKDAFFQIPVEDRFBL02 PGNNPVFPVRKPNGAWRFVHDLRVTNALTKPIPALSPGPPDLTAIPTHLPHIICLDLKDAFFQIPVEDRF

< 25% IDENTICAL + OSM = HOMOLOGYHV04 GIRYQYNVLPQGWKGSPAIFQSSMTKILDPFRRDNPELEICQYMDDLYVGSDLPLTEHRKRIELLREHLYSV22 GIRYQFNCLPQGWKGSPTIFQNTAANILEEIKRHTPGLEIVQYMDDLWLASDHDETRHNQQVDIVRKMLLBL01 HRRFAWRVLPQGFINSPALFERALQEPLRQVSAAFSQSLLVSYMDDILYASPTEEQRS.QCYQALAARLRSP01 GKQYCWTRLPQGFLNSPALFTADAVDLLKEVP.N.....VQVYVDDIYLSHDNP.HEHIQQLEKVFQILLMM29 TTQYTWTQLPQRFKNSPTIFGEALARDLQKFPTRDLGCVLLQYVDDLLLGHPTAVGWP.REQMLYSGTWR

< 25% IDENTICAL - OSM = NO HOMOLOGYIN10 DHH..MLEKLLVKHFQDQSFIDLYWKMVKAGYVEFDKDKSSMIGVPQGGIASPMLSNLVLNELDEFVQNICO01 VKH..TFIRILMSVVVDQD.LELEQMDVKTAFLHGELEEELYMEQPEGFIS.............EDGKNKNL04 IEAGQSAMRFRRTNGRDNRFLLVVSMDVKNAFNTASWQAIATALQMKGVPAG..............LQRIBA01 LKR............VGNK.KVFSKFDLKSGFHQVAMAEESIPWTAFWVPQG..................HB78 FYHIPISPAAVPHLLVGSP..GLERFNTCLSYSTHNRNDSQLQTMHNLCTRH..............VYSS

McClure, 2000

Page 23: Computational Virology

I II III IV V VI

HT13 pvkKa-- t- IDLkdaf - LPQG-fk qYMDDIll shGL-- kFLGqiiNVV0 ikk K--- ti LDIgday - LPQG-wk -YMDDIyi qyGFM- kWLGfelSFV1 pvp Kp-- tt LDLtngf - LPQG-fl aYVDDIyi naGYVv eFLGfniHERVC pvp Kp-- tc LDLkdaf - LPQR-fk qYVDDLll tvGIRc cYLGftiGMG1 mvr Ka-- tk VDVraaf - CPFG-la aYLDDIli --GLN- kYLGfivGM17 v-p Kkqd tt IDLakgf - MPFG-lk vYLDDIiv --NLK- tFLG-hvMDG1 lvp Kksl sc LDLmsgf - LPFG-lk lYMDDLvv --NLK- tYLG-hkMORG vvr Kk-- tt MDLqngf - APFG-fk lYMDDIiv --GLK- hFLG-hiCAT1 lvd Kpkd eq MDVktaf k SLYG-lk lYVDDMli --EMK- rILGidiCMC1 tit Krpe hq MDVktaf k AIYG-lk lYVDDVvi --- KR- hFIGiriCST4 ftk Krng t- LDInhaf k ALYG-lk vYVDDCvi inKLK- dILGmdlC1095 fnr Krdg tq LDIssay k SLYG-lk lFVDDMil itTLKk dILGleiNDM0 mih Kt-- af LDIqqaf g VPQGsvl tYADDTav tsGL-- kYLGitlNL13 lip Kp-- s- IDAekaf g TRQGcpl lFADDMiv vsGYK- kYLGiqlNLOA fip Ka-- af LDIegaf g CPQGgvl gYADDIvi evGLN- kYLGvi-NTC0 vlr Kp-- am LDGrnay g VRQGmvl aYLDDVtv alGIE- rVLGagvICD0 eip Kp-- vd IDIk-gf g TPQGgil rYADDFki kqWLKv dFLGfklIAG0 fkk Kt-- ie GDIks-f g VPQGgii rYADDWlv -qELKi -FLGvnlICS0 wip Kp-- ld ADIsk-c g TPQGgvi rYADDFvi emGLE- nFLGfnvIPL0 yip Ks-- le ADIr-gf g VPQGgpi rYADDFvv -rGLV- dFVGfnf

The six motifs of the reverse transcriptase (RT) comprising the ordered-series-of-motif (OSM) involvedin enzymatic function are indicated by roman numerals (I-VI). The bold and capitalized letters representthe core amino acids of each motif that are highly conserved among all RT sequences. Abbreviationson the left side bar indicate the names of different types of Retroid agents.Adapted from Hudak and McClure, 1999.

McClure, 2000

Example of local subsequences or OSM

Page 24: Computational Virology

RT

GDNQ

FADDM

HYPOTHESIS: The Reverse Transcriptase domain of the RNA-dependentDNA Polymerase shares common ancestry with the RNA-dependent RNAPolymerase of the Order Mononegavirales and Plus Strand RNA viruses.

RH

RdDp

GDD

RdRp of Mononegavirales

RdRp of Plus strand viruses

Page 25: Computational Virology

Support for homology:Statistical tests

Protein Sequence Data

SEQUENCE COMPARISON

>30% identical= homology

<30% identical

MOTIF DETECTION

OSM present= functionally equivalent= likely homologue

OSM absent= unlikely homologue

Functional identification, Phylogenetic analysis, Structural prediction

McClure, 2000

Support for homology:Gene order and size, common function

Strategy for Assessing Protein Sequence Homology

Page 26: Computational Virology

Program AGlobin Motifs

1(7) 2(5) 3(5) 4 (5) 5(3)

Kinase Motifs

1(6) 2(1) 3(1) 4(9) 5(3) 6(3) 7(8) 8(1)AMULT NW 100 100 100 100 100 100 83 92 100 100 100 100 100CLUSTAL V WL 100 92 100 100 100 100 92 50+42 100 100 100 100 58+42DFALIGN NW 100 100 100 100 100 100 100 100 100 100 100 100 100GENALIGN CW 67+25 100 100 67+17 67+25 100 42+33 83 100 100 100 50+50 67+25MULTAL NW 100 90 100 100 100 100 58+17 50+33 100 100 58+42 100 100PIMA SW 100 100 100 100 100 100 92 92 100 100 100 100 100PRALIGN CW 67 33+17+17 33+25+17 33+17+17 67+17 100 42+42 33+17 33 42+33 42+33 33 33SAM BW 100 100 100 100 100 100 100 100 100 100 100 100 100

Program AProtease Motifs

1(3) 2 (5) 3(3)

Ribonuclease H Motifs

1(3) 2(1) 3(3) 4(5)AMULT NW 92 58 83 92 58+17 50+17 25+17+17CLUSTAL V WL 100 50+25 25+25 100 75 58+17 33+25+17DFALIGN NW 100 70+30 100 100 100 83 100GENALIGN CW 92 42+25 25+17+17 83+17 58 33+17+17 33+25+17MULTAL NW 83 33+25 50+25 75+17 58+17+17 50+25 83PIMA SW 100 25+17 25+17 83 75 33+17+17 42+33+17PRALIGN CW 33+33 17+17 25+25+17 75 33+33 33+17 17SAM BW 100 92 100 100 83 75 83

Column A lists the algorithm employed by each method: NW= Needleman-Wunsch, WL= Wilber- Lipman,CW= consensus word, SW=Smith-Waterman and BW= Baum-Welsh

Values above columns indicate the number of motifs and values in parenthesis indicate the number of residuesin each motif. Values in columns indicate the percentage of sequences in which the motif is correctly identified.

McClure, 2000

Comparison Of Hmm/sam To Classical Multiple Alignment Methods

Page 27: Computational Virology

Bench Mark Sequences:Biologically informative markers Sequence length distributionEvolutionary distributionSet size

Methods:Appropriateness AvailabilityAssumptions Limitations User specific parameters

Evaluate Results for Correct Identification of Biologically Informative Marker

Parameter Range Tests Types of Test Data

Test hypothesis: RdRp share common ancestry with RdDp

Method (s) that Accurately Identify Biologically Informative Marker

RdRp and RdDp sequences

Experimental Design for Testing Motif Detection Methods

Page 28: Computational Virology

Search Databases:Sequence, Literature, StructuralOther??

Data: Retrieve, Annotate, Manage

Analyze Data:Multiple Alignment of Sequences

OSM/MIR Determination2D and 3D Modeling

Phylogenetic ReconstructionGene and Genome Architecture

Determine Methodological Limitations

McClure, 2001

Page 29: Computational Virology

BlockmakerMatchboxMemePimaPralignSAM

Motif-detection Programs

Page 30: Computational Virology

McClure, 2002

Motif Detection Programs

PROGRAM Algorithm MATRIX INDEL RUN USER SPECIFICATIONSd

PENALTYc TIME (# MOTIFS) (WIDTH) (# SEQUENCES)BLOCKMAKER Motifj PAM 250 none ~1m N N NeITERALIGN SI PAM 250 C ~1h40m N Y YMATCHBOX Scanning BLOSUM 62 none ~45m N N NiMEME MM/EM PAM 250 none ~2m Y Y YPIMA SW AACHb I + E ~2m N N NiPROBE SW+G+GA PAM 250 I + E ~2h30m N N YSAM BW none none ~2h20m N N Ni

aAlgorithms are:

SI = Symmetric-Iterative protocolMM = Mixture Model that uses (EM) Expectation MaximizationSW = Smith-WatermanG = Gibbs SamplingGA = Genetic AlgorithmBW = Baum Welch.b Matrices are:

PAM = point accepted mutation as defined by DayhoffBLOSUM = sum of conserved blocks as defined by HenikoffAACH = Amino Acid Cluster Hierarchy (patgen, class 1; and class 2) as defined by R. SmithcThe insertion/deletion penalties are:

C = constantI + E = initial + extension.dUser specific parameters are

# MOTIFS = number of motifs to be detectedWIDTH = width of motifs to be detected# SEQUENCES = number of sequences that contain the motifN = user cannot specifyNe =user cannot specify and program excludes sequencesNi = user cannot specify, but program automatically includes all sequencesY = user can specify, but it is not required.

Page 31: Computational Virology

Sequence Length, Percent Identity and Distance Values of

Globin, Kinase, Aspartic Acid Protease, Ribonuclease H and

Reverse Transcriptase Test Sequence Sets

DATA SET SEQUENCE LENGTH PERCENT IDENTITY DISTANCERange Average Range Average Range Average

GLOB12 141-153 147 14-84 30 9.1-174.8 109.1KIN12 255-340 273 16-44 26 71.0-170.4 130.0PRO12 98-160 127 9-72 20 27.5-205.8 169.2RH12 126-158 141 9-41 19 100.2-237.6 176.1RT20 297-412 348 11-40 20 70.5-205.7 163.7

GLOB174 115-161 145 10-99 39 0.1-204.7 85.8KIN186 246-409 286 9-99 28 1.3-212.1 130.9PRO114 97-150 108 7-99 28 0.1-282.9 146.8

RH169 122-246 144 5 -99 25 0.1-283.0 160.0RT178 288-434 347 10-99 25 0.1-230.4 153.2

The range and average sequence length, percent identity, and distance value is given foreach data set.

Data sets are either, small, 12-20 sequences, representing a rather smooth distributionof the entire sequence collection or large, 114-186 sequences, randomly selected fromthe entire sequence collection. The large data sets more accurately represent theunequal distribution of sequence relationship encountered in real data.

Percent identity is the percentage of identical amino acid residues among all sequencepairs.

Distance (D) is a measure of difference between all sequence pairs that takes intoaccount the probability of amino acid substitutions and the ease of converting from onecodon to another; D = -ln[(Sreal - Srandom)/(Sidentical - Srandom)] x 100, where S = similarityscore.

McClure 2002

Page 32: Computational Virology

Program Data Sets AVG

GLOB(12) KIN(12) PRO(12) RT(20) RH(12)

BLOCKMAKER 80 63 53 31 31 52

INTERALIGN 98 94 22 49 23 57

MATCHBOX 38 85 61 67 37 58

MEME 90 96 67 93 73 84

PIMA 98 99 55 71 87 82

PROBE 93 95 81 94 83 89

Scores reported as percentage of sequences in which Motifs were correctly identified. Values in parenthesis are the number of sequences in each data set.

Summary of small data set analysis

Page 33: Computational Virology

 

 

Program     Data Sets

    AVG

  GLOB(174) KIN(186) PRO(114) RT(178) RH(169)  

PIMA 43 46 69 47 43 50

  12 35 19 16 22 21

MEME 85 97 87 84 76 86

PROBE 98 98 91 85 93 93

Two sets of scores are reported for the results of testing the PIMA method. In each case this method finds two subsets of alignments with the OSM correctly identified, but fails to merge these two into a final multiple alignment. Scores are reported as percentages of sequences in which the OSM is correctly identified. Values in parentheses are the number of sequence in each dataset.

Summary of Large Data Set Analysis

Page 34: Computational Virology

RT

GDNQ

FADDM

HYPOTHESIS: The Reverse Transcriptase domain of the RNA-dependent DNA Polymerase shares common ancestry with the RNA-dependent RNA Polymerase of the OrderMononegavirales and Plus Strand RNA viruses.

RH

RdDp

GDD

RdRp of Mononegavirales

RdRp of Plus strand viruses

Page 35: Computational Virology

MEME OutputDatasets 1 2 3 4 5 6 ParametersPol 16 N/A 75% 100% 100% 63% N/A mod oops, nmotifs = 20RT16 94% 100% 100% 100% 100% 100% mod oops, nmotifs = 20L16 N/A 100% 100% 100% 88% N/A mod oops, nmotifs = 20RT16.Pol16 N/A 0% 69% 100% 0% N/A mod oops, nmotifs = 20L16.Pol16 N/A 0% 0% 94% 0% N/A mod oops, nmotifs = 20L16.RT16 N/A 0% 0% 0% 0% N/A mod oops, nmotifs = 20L16.RT16.Pol16 N/A 0% 0% 94% 0% N/A mod oops, nmotifs = 20

PROBE OutputDatasets 1 2 3 4 5 6 ParametersPol 16 N/A 57% 100% 100% 75% N/A defaultRT16 100% 100% 100% 100% 100% 100% defaultL16 N/A 100% 100% 100% 100% N/A defaultRT16.Pol16 N/A 0% 88% 100% 0% N/A defaultL16.Pol16 N/A 0% 0% 94% 0% N/A defaultL16.RT16 N/A 0% 0% 0% 0% N/A defaultL16.RT16.Pol16 N/A 0% 0% 100% 0% N/A default

Page 36: Computational Virology

1) Protein disorderA) Low hydrophobicity and high mean net charge are good indicators of natively unfolded proteinsB) Predictors of Natural Disordered Regions (PONDR)--

utilizes neural networks to distinguish disordered from ordered regions

2) Evolutionary Dynamic Approaches A) Intermolecular compensatory mutations Pazos and Valencia 1) predicting interacting partners 2) detecting correlated mutations between two interacting proteins 3) extending to three interacting partners B) Evolutionary-Structure Function (EFS) -- Simon and Sidow

Determines numbers amino acid replacements given a fixed phylogenetic topology, ranking constrained regions

C) Intramolecular compensatory mutations -- Pollackcalculates likelihood estimates of allowing for rate variation and robustly discriminates coevolution of intra-sites versus random effects.

A Functional Genomics Approach to Inferring Amino Acid Contacts Among the L, P and N proteins of the Replication/Transcription Complex of the Order Mononivavirales

3) Use experimental results to model and validate expectations

4) Test the predicted structure for the Ebola

New work

Page 37: Computational Virology

L

NP

P

3'

3'

L

P PP

P

N

N

n

L

P

PP

5'

P

L

PP

PP

P

CO-ASSEMBLY?

P P

L

leader N

5'

P

read through

VSV Transcription

VSV Replication

Page 38: Computational Virology

Paramyxoviridae Genome

N P/C/V M F HN RdRp

N P M G RdRp

Rhabdoviridae Genome

Sendai

VSV

Page 39: Computational Virology

PPBS

22281RSR

PPBS

Sendai

1 2109

I II III

PPBS

+ +

V VI

VSV

I II III IV V VI

RNA-BS

L protein

IV

required for replication

1 524PPBS PPBS

RNA-BS

RNA-BS PCS

Sendai

1 422VSV

N protein

1 568 LPBSRSR

** * * ** **

Oligomerization domain

Sendai

1**** * ** * *

GTP bindingNPBS

265VSV

NPBS

P protein

NPBS

NPBS

LPBS

RNA-BS

RES NPBS

+ MT

MT

&

+

N, P and Proteins

Page 40: Computational Virology
Page 41: Computational Virology

N, P and L sequences

Predict regions of disorder

Multiple Alignment

Inter-CM analysis

Evolutionary Dynamics Analysis

Update Mononegavirales Sequence and Literature Database

Calculate H/R PONDRPhylogenetic reconstruction

ESF-analysis Intra-CM analysis

Annotated N, P, L protein maps with ALL information regarding positions of experimentally determined functions and interactions

Page 42: Computational Virology

Dr. Marcella McClure, P.I. (Marcie)

Dustin Lee, M.S., Bioinformatics Programmer

Brad Crowther, B.S., Bioinformatician I/Lab Manager

Aaron Juntunen, Undergraduate programmer

Dr. Ruth Angeletti Hogue, Adjunct Professor (visiting from Albert Einstein School of Medicine)

Kelly Burningham, Undergraduate