Identification of an informative and accurate region of the HCV genome for phylogenetic analyses...

16
Identification of an informative and accurate region of the HCV genome for phylogenetic analyses Patrik Medstrand [email protected] Clinical Virology Department of Laboratory Medicine Malmö Lund University, Sweden 11 th annual conference of new Visby network Vilnius, April 25-27, 2014

Transcript of Identification of an informative and accurate region of the HCV genome for phylogenetic analyses...

Page 1: Identification of an informative and accurate region of the HCV genome for phylogenetic analyses Patrik Medstrand patrik.medstrand@med.lu.se Clinical Virology.

Identification of an informative and accurate region of the HCV

genome for phylogenetic analyses

Patrik [email protected]

Clinical VirologyDepartment of Laboratory Medicine Malmö

Lund University, Sweden

11th annual conference of new Visby network Vilnius, April 25-27, 2014

Page 2: Identification of an informative and accurate region of the HCV genome for phylogenetic analyses Patrik Medstrand patrik.medstrand@med.lu.se Clinical Virology.

• To investigate the dynamics of the HCV epidemic

in the Baltic region using genetic and

epidemiological data (date and site of sample

collection, information of potential ”risk group”)

• To initiate a joint project among participants in the

network (part of the proposal to SI)

AIMS

Page 3: Identification of an informative and accurate region of the HCV genome for phylogenetic analyses Patrik Medstrand patrik.medstrand@med.lu.se Clinical Virology.

Phylogenetics define genetic relationships

We will study the genetic relationship among HCV strains that are circulating in the Baltic region.

We could address:

Relationship among HCV strains in the region (cities, countries, risk groups, general population)

this can be done by “classical” phylogenetic studies, where groups (“clusters”) are defined

Page 4: Identification of an informative and accurate region of the HCV genome for phylogenetic analyses Patrik Medstrand patrik.medstrand@med.lu.se Clinical Virology.

Cluster definitions = need statistics

To define relationships (clusters; epidemiological links defined by a common ancestral HCV strain) we need to use statistics.

Limited information will obscure the identification of true relationships.

Information or power in phylogenetic inference is represented by genetic information in terms of nucleotide sequences.

This is similar to any statistical comparison - to small groups or limited information does not allow us to draw any robust conclusions.

Page 5: Identification of an informative and accurate region of the HCV genome for phylogenetic analyses Patrik Medstrand patrik.medstrand@med.lu.se Clinical Virology.

More might be better

1 kb 9 kb

Subset Complete

Page 6: Identification of an informative and accurate region of the HCV genome for phylogenetic analyses Patrik Medstrand patrik.medstrand@med.lu.se Clinical Virology.

From phylogenetics to phylodynamics

Unrooted Genetic relationship Rooted and with time estimates

Colors:

Geographic locations

Risk groups

etc

NEW INSIGHTS

Page 7: Identification of an informative and accurate region of the HCV genome for phylogenetic analyses Patrik Medstrand patrik.medstrand@med.lu.se Clinical Virology.

Using phylodynamic analysis, we investigate the extent the HCV epidemic in three metropolitan areas of Sweden were linked or separate. We found evidence for one early introduction (Western Europe to Gothenburg in 1958; panel A) and rapid dissemination (from Gothenburg to Stockholm and Malmö 1965-1968; panels B-C), whereas the later epidemic (after 1975) were characterized by HCV strains that were introduced from regions outside Sweden (Western Europe and USA; panel D), indicating limited epidemic links within Sweden during this later time period (Jerkeman et al, manuscript in preparation). Panel E: exponential growth from ~1960-1980.

A. B. C. D.

Phylodynamics can inform about migrations and growth of the epidemic

historical and more recent

Similar studies can be performed on other “traits”, such as risk groups

E1960 1965 1968 1980

Page 8: Identification of an informative and accurate region of the HCV genome for phylogenetic analyses Patrik Medstrand patrik.medstrand@med.lu.se Clinical Virology.

Goal of present study

1. Identify phylogenetically informative genome regions that

• Allow identification of a reasonable number and correct clusters

• Allow reconstruction of the “true” phylogeny

in comparison to the phylogeny reconstructed from near full-length HCV genomes

2. Establish a convenient PCR and sequencing protocol

Page 9: Identification of an informative and accurate region of the HCV genome for phylogenetic analyses Patrik Medstrand patrik.medstrand@med.lu.se Clinical Virology.

Genome regions

0

20

40

60

80

100

120

140

0.4

0.6

0.8

1

Sequ

ence

sim

ilarit

y

Num

ber o

f seq

uenc

es(c

ount

ry a

nd y

ear i

nfo)

E1-E2P7P7-NS2

NS5A NS5BNS5A-NS5B

NS5Bsh

Page 10: Identification of an informative and accurate region of the HCV genome for phylogenetic analyses Patrik Medstrand patrik.medstrand@med.lu.se Clinical Virology.

Data set and Methods

• 143 near full length HCV 1a genomes (polyprotein region) were obtained from the Los Alamos HCV database

• The data set was used to create 7 subsets representing 7 subgenomic regions

• ML trees were constructed using Garli v2.0 using GTR+I+G subst model

• Branch support was estimated using the Shimodaira-Hasegawa (SH) test as implemented in PhyML

• False positive branches were defined as branches with statistical support (SH > 0.9) in ML-trees of subgenomic regions, that were absent in the ML-tree obtained from the polyprotein region (“true” tree)

• Accuracy (topology-testing) of phylogenies obtained from subgenomic regions were inferred by statistical comparison to the “true” tree using the SH-test implemented in TreePuzzle and Consel

Page 11: Identification of an informative and accurate region of the HCV genome for phylogenetic analyses Patrik Medstrand patrik.medstrand@med.lu.se Clinical Virology.

Branch supportPolyprotein E1-E2 P7 P7-NS2 NS5A-NS5B NS5A NS5B NS5B-sh(9036 bp) (1236 bp) (933 bp) (1455 bp) (2934 bp) (1272 bp) (1688 bp) (640 bp)

75 28 18 22 35 26 28 8

Supported Branches (N)

FP (%)

True supportedBranches (N)

75

1

- 39 23 22 37 15 46 65

75 17 14 17 22 22 15 3

Page 12: Identification of an informative and accurate region of the HCV genome for phylogenetic analyses Patrik Medstrand patrik.medstrand@med.lu.se Clinical Virology.

Topology supportE1-E2 P7 P7-NS2 NS5A-NS5B NS5A NS5B NS5B-sh(1236 bp) (933 bp) (1455 bp) (2934 bp) (1272 bp) (1688 bp) (640 bp)

27 35 38 37 39 21 12

Branches in subgenomic tree supported in true tree (N)

Topology difference of sub- genomic and true tree (p-value)

- <0.01 <0.01 <0.01 0.06 0.06 <0.01 <0.01

Page 13: Identification of an informative and accurate region of the HCV genome for phylogenetic analyses Patrik Medstrand patrik.medstrand@med.lu.se Clinical Virology.

Conclusions

• The 1272-bp region of NS5A displayed the lowest FP-rate compared to other subgenomic regions analyzed

• The NS5A and NS5A-NS5B trees conformed topologies of the true tree. In total, 39 NS5A branches of a total 75 branches were shared with the true tree. Among those, 22 branches had statistical support.

• The NS5A region represents a trade-off between phylogenetic accuracy/information in comparison to full-length genome sequencing, and may be suitable for phylogenetic and phylodynamics studies of HCV

• The preliminary findings shown here will be confirmed using other HCV subgenotypes and methods

• PCR protocols will be established and shared to network members

Page 14: Identification of an informative and accurate region of the HCV genome for phylogenetic analyses Patrik Medstrand patrik.medstrand@med.lu.se Clinical Virology.

Lund University, SwedenAnders WidellPer BjörkmanAnna JerkemanMarianne AlankoVilma MolnegrenJoakim Esbjörnsson

HCV study

ACKNOWLEDGEMENT

Thanks to Anders and Joakim for presenting this!

To bad that I couldn’t come to Vilnius but are looking forwards

to see you all soon!

Page 15: Identification of an informative and accurate region of the HCV genome for phylogenetic analyses Patrik Medstrand patrik.medstrand@med.lu.se Clinical Virology.
Page 16: Identification of an informative and accurate region of the HCV genome for phylogenetic analyses Patrik Medstrand patrik.medstrand@med.lu.se Clinical Virology.

Alternative regions

0

20

40

60

80

100

120

140

0.4

0.6

0.8

1

Sequ

ence

sim

ilarit

y

Num

ber o

f seq

uenc

es(c

ount

ry a

nd y

ear i

nfo)

E1-E2P7P7-NS2

NS5A NS5BNS5A-NS5B

NS5Bsh