Evolution 2012
-
Upload
kate-hertweck -
Category
Technology
-
view
234 -
download
2
description
Transcript of Evolution 2012
Assembly of repetitive DNA from genome survey
sequencing: Lessons from grasses and applications to
non-model systemsKate L Hertweck (NESCent)
and J. Chris Pires (U of Missouri)
mobilebotanicalgardens.orgSandwalk.blogspot.com
Genome sequencing, large genomes and evolution
Kate Hertweck, Repetitive DNA assembly
● Genome sequencing is becoming a routine laboratory procedure.
● The first step in genome analysis is masking repetitive elements (REs), which may compromise a large portion of a genome.
● Digging through everyone's genomic junk sounds pretty fun!
● What determines genome size? Why and how?
Genome sequencing, large genomes and evolution
Kate Hertweck, Repetitive DNA assembly
● Genome sequencing is becoming a routine laboratory procedure.
● The first step in genome analysis is masking repetitive elements (REs), which may compromise a large portion of a genome.
● Digging through everyone's genomic junk sounds pretty fun!
● What determines genome size? Why and how?
● Methods in large genome de novo assembly of next-gen data are improving (Schatz et al 2010)
● Sanger sequencing in Fritillaria indicates highly divergent TEs (Ambrozova et al 2011)
● Low-coverage Illumina sequencing in barley identifies both genes and novel repeats (Wicker et al 2008)
● Estimation of genome size and TE content in maize and relatives is accurate with very short paired-end reads (Tenaillon et al 2011)
Transposable elements are relevant to evolution
Kate Hertweck, Repetitive DNA assembly
● Direct: TE movement can disrupt gene function
● Links between TEs and adaptation/speciation?● Indirect: Increases in genome size
● Many historical hypotheses about relationships between genome size and life history (complexity, mean generation time, habitat/environment/climate, growth form)
● Physical-mechanical effects of nuclear size and mass
● How does TE proliferation affect plant diversification?
Our data
Kate Hertweck, Evolutionary effects of junk DNAKate Hertweck, Repetitive DNA assembly
● Illumina (80-120 bp single end), 6 taxa per lane
● GSS: Genome Survey Sequences
● Assembled plastomes, mtDNA genes, and nrDNA genes from less than less than 10% of the GSS data!
● Poaceae (family of grasses, model system)
● Medium-sized genomes
● well-annotated library of repeats
● Asparagales (order of petaloid monocots, non-model system)
● Very large genomes
● discovery of novel repeats
Our data
Kate Hertweck, Evolutionary effects of junk DNAKate Hertweck, Repetitive DNA assembly
● Illumina (80-120 bp single end), 6 taxa per lane
● GSS: Genome Survey Sequences
● Assembled plastomes, mtDNA genes, and nrDNA genes from less than less than 10% of the GSS data!
● Poaceae (family of grasses, model system)
● Medium-sized genomes
● well-annotated library of repeats
● Asparagales (order of petaloid monocots, non-model system)
● Very large genomes
● discovery of novel repeats
Methodological approaches
Kate Hertweck, Evolutionary effects of junk DNAKate Hertweck, Repetitive DNA assembly
1. Sequence assembly:● Ab initio repeat construction: use raw sequence reads to build
pseudomolecules or ancestral sequences● De novo sequence assembly: standard genome assembly
methods, screen resulting contigs (MSR-CA)
Methodological approaches
Kate Hertweck, Evolutionary effects of junk DNAKate Hertweck, Repetitive DNA assembly
1. Sequence assembly:● Ab initio repeat construction: use raw sequence reads to build
pseudomolecules or ancestral sequences● De novo sequence assembly: standard genome assembly
methods, screen resulting scaffolds (MSR-CA)
2. Annotation method:● Motif searching● Reference library: current RepBase, 3110 repeats, 98.7% are
from grasses (RepeatMasker and CENSOR)
Methodological approaches
Kate Hertweck, Evolutionary effects of junk DNAKate Hertweck, Repetitive DNA assembly
1. Sequence assembly:● Ab initio repeat construction: use raw sequence reads to build
pseudomolecules or ancestral sequences● De novo sequence assembly: standard genome assembly
methods, screen resulting scaffolds (MSR-CA)
2. Annotation method:● Motif searching● Reference library: current RepBase, 3110 repeats, 98.7% are
from grasses (RepeatMasker and CENSOR)
Class I: RetrotransposonsLTRLINESINEERVSVA
Class II: DNA transposonsTIRCryptonHelitronMaverick
See my iEvoBio talk about TE databasing and ontology!
TE assembly and annotation results: Poaceae
Kate Hertweck, Evolutionary effects of junk DNAKate Hertweck, Repetitive DNA assembly
Taxon Genome size (Mb)
# reads # scaff-olds
Repeat scaff-olds
% LTRs
% Copia
% Gypsy
% SINEs
% LINEs
% DNA TEs
rice 389 3.8 2376 1718 72 21 48 0.2 4.4 18
sorghum 735 5.3 2248 2255 67 21 46 N/A 2.9 26
maize 2045 5.1 1324 1197 77 21 56 N/A 1.9 18
TE assembly and annotation results: Poaceae
Kate Hertweck, Evolutionary effects of junk DNAKate Hertweck, Repetitive DNA assembly
Taxon Genome size (Mb)
# reads # scaff-olds
Repeat scaff-olds
% LTRs
% Copia
% Gypsy
% SINEs
% LINEs
% DNA TEs
rice 389 3.8 2376 1718 72 21 48 0.2 4.4 18
sorghum 735 5.3 2248 2255 67 21 46 N/A 2.9 26
maize 2045 5.1 1324 1197 77 21 56 N/A 1.9 18
● Previous research: Good TE annotations and copy number estimates in all genomes
● Our results:● Recovery of all extant superfamilies● High sequence similarity between scaffolds and reference
sequences● Full length LINEs, SINEs, LTRs; fragmented examples of all● Abundance estimation is problematic
REs in Core Asparagales
Kate Hertweck, Evolutionary effects of junk DNAKate Hertweck, Repetitive DNA assembly
Xan
thor
rhoe
acea
eA
gapa
ntha
ceae
Asp
arag
acea
e
● Reference library is highly diverged from scaffolds to be annotated (much lower sequence similarity)
● Caution in interpreting results● Large scaffolds of some TEs● Many small scaffolds of many TE
superfamilies● Comparisons of sister clades
ag.arizona.eduNaturehills.com
Very large genomes in Core Asparagales
Kate Hertweck, Evolutionary effects of junk DNAKate Hertweck, Repetitive DNA assembly
Xan
thor
rhoe
acea
eA
gapa
ntha
ceae
Asp
arag
acea
e
other (RC, satellite, low complexity, simple repeats)
% Copia LTRs
% Gypsy LTRs
% LINEs
% DNA TEs
AllioidaeAllium12.9 Gb5.1 billion reads1858 scaffolds
AmaryllidoideaeScadoxus21.6 Gb6 billion reads1336 scaffolds
Closely related lineages have different results
Kate Hertweck, Evolutionary effects of junk DNAKate Hertweck, Repetitive DNA assembly
Xan
thor
rhoe
acea
eA
gapa
ntha
ceae
Asp
arag
acea
e
other (RC, satellite, low complexity, simple repeats)
% Copia LTRs
% Gypsy LTRs
% LINEs
% DNA TEs
AphyllanthoideaeAphyllanthes2.7 billion reads436 scaffolds
AgavoideaeHosta4.7 billion reads1084 scaffolds*
Small genomes contain variation
Kate Hertweck, Evolutionary effects of junk DNAKate Hertweck, Repetitive DNA assembly
Xan
thor
rhoe
acea
eA
gapa
ntha
ceae
Asp
arag
acea
e
other (RC, satellite, low complexity, simple repeats)
% Copia LTRs
% Gypsy LTRs
% LINEs
% DNA TEs
LomandroideaeLomandra1.1 Gb4.7 billion reads1491 scaffolds
AsparagoideaeAsparagus1.3 Gb5 billion reads1977 scaffolds
NolinoideaeSansevieria1.2 Gb4.9 billion reads835 scaffolds
Example: LTR from Hosta
Kate Hertweck, Evolutionary effects of junk DNAKate Hertweck, Repetitive DNA assembly
So what?
Kate Hertweck, Evolutionary effects of junk DNAKate Hertweck, Repetitive DNA assembly
● Assembly of consensus sequences of TEs from very low coverage sequence data, even without a close reference library
● Improve annotation (and assembly) by building a library of lineage-specific TEs
● Other parameters for genomic comparisons
● Abundance estimates● Characterize genetic diversity within each element
● Comparative biology of TEs
● Does TE proliferation contribute to diversification or shifts in rates of molecular evolution?
● Are there common patterns between TEs and life history trait evolution?
Acknowledgements
Kate Hertweck, Evolutionary effects of junk DNAKate Hertweck, Repetitive DNA assembly
J. Chris Pires lab (U of Missouri)Dustin MayfieldPat Edger
NESCent (National Evolutionary Synthesis Center)Allen RoderigoKaren Cranston
www.nescent.org
Twitter k8lhGoogle+ [email protected]
Asparagales results
Kate Hertweck, Evolutionary effects of junk DNAKate Hertweck, Repetitive DNA assembly
Taxon Genome size (Gb)
#reads (billions)
Total scaffolds
Nuclear scaffolds
% LTRs
% Copia
% Gypsy
% LINEs
% DNA TEs
Hosta N/A 4.7 1084 601 52 6 46 0.5 4
Agapanthus 10.2 1.3 438 176 70 32 40 1.7 3
Lomandra 1.1 4.7 1491 532 68 29 39 7.9 6
Sansevieria 1.2 4.9 835 280 67 27 39 4.3 6
Asparagus 1.3 5.0 1977 646 67 35 32 0.5 10
Scadoxus 21.6 6.0 1336 493 73 24 49 0.2 4
Allium 12.9 5.1 1858 539 65 22 44 0.6 10
Ledebouria 8.6 4.1 2481 771 66 35 32 0.4 5
Haworthia 14.9 4.6 1360 481 75 30 45 0.8 3
Aphyllanthes N/A 2.7 436 248 51 24 23 1.2 10
Dichelostemma 9.1 3.9 1706 584 75 38 37 0.2 7