Identification of an informative and accurate region of the HCV genome for phylogenetic analyses...
-
Upload
esmond-spencer -
Category
Documents
-
view
219 -
download
1
Transcript of Identification of an informative and accurate region of the HCV genome for phylogenetic analyses...
Identification of an informative and accurate region of the HCV
genome for phylogenetic analyses
Patrik [email protected]
Clinical VirologyDepartment of Laboratory Medicine Malmö
Lund University, Sweden
11th annual conference of new Visby network Vilnius, April 25-27, 2014
• To investigate the dynamics of the HCV epidemic
in the Baltic region using genetic and
epidemiological data (date and site of sample
collection, information of potential ”risk group”)
• To initiate a joint project among participants in the
network (part of the proposal to SI)
AIMS
Phylogenetics define genetic relationships
We will study the genetic relationship among HCV strains that are circulating in the Baltic region.
We could address:
Relationship among HCV strains in the region (cities, countries, risk groups, general population)
this can be done by “classical” phylogenetic studies, where groups (“clusters”) are defined
Cluster definitions = need statistics
To define relationships (clusters; epidemiological links defined by a common ancestral HCV strain) we need to use statistics.
Limited information will obscure the identification of true relationships.
Information or power in phylogenetic inference is represented by genetic information in terms of nucleotide sequences.
This is similar to any statistical comparison - to small groups or limited information does not allow us to draw any robust conclusions.
More might be better
1 kb 9 kb
Subset Complete
From phylogenetics to phylodynamics
Unrooted Genetic relationship Rooted and with time estimates
Colors:
Geographic locations
Risk groups
etc
NEW INSIGHTS
Using phylodynamic analysis, we investigate the extent the HCV epidemic in three metropolitan areas of Sweden were linked or separate. We found evidence for one early introduction (Western Europe to Gothenburg in 1958; panel A) and rapid dissemination (from Gothenburg to Stockholm and Malmö 1965-1968; panels B-C), whereas the later epidemic (after 1975) were characterized by HCV strains that were introduced from regions outside Sweden (Western Europe and USA; panel D), indicating limited epidemic links within Sweden during this later time period (Jerkeman et al, manuscript in preparation). Panel E: exponential growth from ~1960-1980.
A. B. C. D.
Phylodynamics can inform about migrations and growth of the epidemic
historical and more recent
Similar studies can be performed on other “traits”, such as risk groups
E1960 1965 1968 1980
Goal of present study
1. Identify phylogenetically informative genome regions that
• Allow identification of a reasonable number and correct clusters
• Allow reconstruction of the “true” phylogeny
in comparison to the phylogeny reconstructed from near full-length HCV genomes
2. Establish a convenient PCR and sequencing protocol
Genome regions
0
20
40
60
80
100
120
140
0.4
0.6
0.8
1
Sequ
ence
sim
ilarit
y
Num
ber o
f seq
uenc
es(c
ount
ry a
nd y
ear i
nfo)
E1-E2P7P7-NS2
NS5A NS5BNS5A-NS5B
NS5Bsh
Data set and Methods
• 143 near full length HCV 1a genomes (polyprotein region) were obtained from the Los Alamos HCV database
• The data set was used to create 7 subsets representing 7 subgenomic regions
• ML trees were constructed using Garli v2.0 using GTR+I+G subst model
• Branch support was estimated using the Shimodaira-Hasegawa (SH) test as implemented in PhyML
• False positive branches were defined as branches with statistical support (SH > 0.9) in ML-trees of subgenomic regions, that were absent in the ML-tree obtained from the polyprotein region (“true” tree)
• Accuracy (topology-testing) of phylogenies obtained from subgenomic regions were inferred by statistical comparison to the “true” tree using the SH-test implemented in TreePuzzle and Consel
Branch supportPolyprotein E1-E2 P7 P7-NS2 NS5A-NS5B NS5A NS5B NS5B-sh(9036 bp) (1236 bp) (933 bp) (1455 bp) (2934 bp) (1272 bp) (1688 bp) (640 bp)
75 28 18 22 35 26 28 8
Supported Branches (N)
FP (%)
True supportedBranches (N)
75
1
- 39 23 22 37 15 46 65
75 17 14 17 22 22 15 3
Topology supportE1-E2 P7 P7-NS2 NS5A-NS5B NS5A NS5B NS5B-sh(1236 bp) (933 bp) (1455 bp) (2934 bp) (1272 bp) (1688 bp) (640 bp)
27 35 38 37 39 21 12
Branches in subgenomic tree supported in true tree (N)
Topology difference of sub- genomic and true tree (p-value)
- <0.01 <0.01 <0.01 0.06 0.06 <0.01 <0.01
Conclusions
• The 1272-bp region of NS5A displayed the lowest FP-rate compared to other subgenomic regions analyzed
• The NS5A and NS5A-NS5B trees conformed topologies of the true tree. In total, 39 NS5A branches of a total 75 branches were shared with the true tree. Among those, 22 branches had statistical support.
• The NS5A region represents a trade-off between phylogenetic accuracy/information in comparison to full-length genome sequencing, and may be suitable for phylogenetic and phylodynamics studies of HCV
• The preliminary findings shown here will be confirmed using other HCV subgenotypes and methods
• PCR protocols will be established and shared to network members
Lund University, SwedenAnders WidellPer BjörkmanAnna JerkemanMarianne AlankoVilma MolnegrenJoakim Esbjörnsson
HCV study
ACKNOWLEDGEMENT
Thanks to Anders and Joakim for presenting this!
To bad that I couldn’t come to Vilnius but are looking forwards
to see you all soon!
Alternative regions
0
20
40
60
80
100
120
140
0.4
0.6
0.8
1
Sequ
ence
sim
ilarit
y
Num
ber o
f seq
uenc
es(c
ount
ry a
nd y
ear i
nfo)
E1-E2P7P7-NS2
NS5A NS5BNS5A-NS5B
NS5Bsh