Rainer Lehtonen PhD, Genomics and genetics project leader Metapopulation Research Group Department...

14
Glanville fritillary butterfly genomics and genetics Rainer Lehtonen PhD, Genomics and genetics project leader Metapopulation Research Group Department of Biological and Environmental Sciences, University of Helsinki

Transcript of Rainer Lehtonen PhD, Genomics and genetics project leader Metapopulation Research Group Department...

Page 1: Rainer Lehtonen PhD, Genomics and genetics project leader Metapopulation Research Group Department of Biological and Environmental Sciences, University.

Glanville fritillary butterflygenomics and genetics

Rainer LehtonenPhD, Genomics and genetics project leaderMetapopulation Research GroupDepartment of Biological and Environmental Sciences, University of Helsinki

Page 2: Rainer Lehtonen PhD, Genomics and genetics project leader Metapopulation Research Group Department of Biological and Environmental Sciences, University.

2

Glanville fritillary butterfly – genomics and genetics

Background Genome project Genome assembly >> Panu

Somervuo Some NGS applications Conclusions

Page 3: Rainer Lehtonen PhD, Genomics and genetics project leader Metapopulation Research Group Department of Biological and Environmental Sciences, University.

3

Glanville fritillary as a model Glanville fritillary is an internationally

recognized metapopulation model system in ecological and evolutionary studies

Studied since 1991 in the Åland Islands in Finland

Data available from different populations:- Fragmented landscape vs. continuous- Isolated vs. metapopulation- Large vs. small- Same vs. different population history

Field studies, indoor & outdoor cage + laboratory experiments, controlled crosses, molecular studies

Page 4: Rainer Lehtonen PhD, Genomics and genetics project leader Metapopulation Research Group Department of Biological and Environmental Sciences, University.

4

Collaborative genome project

SEQUENCE DATA PRODUCTION

DNA (+RNA) SAMPLES

QC + ASSEMBLY

ASSEMBLY VALIDATION (ref g)

ANNOTATION + PUBLICATION

GENOME ANALYSIS

VARIATION IN THE GENOME

GENETIC TOOLS

INSTITUTE OF BIOTECH, KAROLINSKA INSTITUTE

INSTITUTE OF BIOTECHNOLOGY

INSTITUTE OF BIOTECH, DEP COMPUTER SCI

INSTITUTE OF BIOTECH, DEP COMPUTER SCI

EBI, ENSEMBL GENOMES

EBI, OTHER GENOME PROJECTS

INSTITUTE OF BIOTECH, DEP COMPUTER SCIFIMM, BIOMEDICUM HKI, INSTITUTE OF BIOTECH, ILLUMINA INC.

Page 5: Rainer Lehtonen PhD, Genomics and genetics project leader Metapopulation Research Group Department of Biological and Environmental Sciences, University.

ESTs REF GENOME

GENOME ANNOTATION

DATA FROMOTHER SOURCES

NEX-GEN SEQUENCING454, SOLiD3, SOLEXA

REF DNA +RNA SAMPLES

GENOME ASSEMBLY

NEX-GEN RE-SEQUENCINGSOLiD4/SOLEXA

CROSSES/POP POOLS/INDS

MAPPING TO REF GENOME VARIATION

GENETIC MAP(MARKER

LOCATIONS)

GENETIC VARIATIONGENE EXPRESSION

PLATFORM FOR LARGE SCALE TARGETED GENOTYPING

GENOTYPING OF LARGE POPULATION SAMPLES (>50K)

Reference genome + variation

EST ASSEMBLY

Page 6: Rainer Lehtonen PhD, Genomics and genetics project leader Metapopulation Research Group Department of Biological and Environmental Sciences, University.

Heliconius Genome Meeting 6

Variation & other nex-gen data

25.-26.3.2010

Sample Aim Platform

Read Type Read Length

Runs to be done

RNA, pool used in RNAseq

Gene start sitesGene 5’ variation

SOLiD4 Pair-end 50+25

1/4

Amp DNA, 4 crosses

Construction of genetic map

SOLiD4 Single read, RAD tag library

50+25

3

Amp DNA, pool ~30 ind

SNPs & other genetic variation

SOLiD4 Pair-end 50+25

1

RNA, pooled pop samplesfrom 5+1 pop

Variation in 5+2 popSNPs in ESTs, Expression

SOLiD4 Pair-end 50+25

1(-2)

DNA from selected individuals

Pgi & flanking genes +Sdhd, Hsp70

Sure-Select + 454Sanger seq

Single read 400 1/4

Page 7: Rainer Lehtonen PhD, Genomics and genetics project leader Metapopulation Research Group Department of Biological and Environmental Sciences, University.

7

Deep re-sequencing

RAD-tag (Restriction Enzyme Associated DNA) known also as “Deep sequencing of reduced representation library”

Example: Construction of a high-density genetic map:*4 controlled Spain-Finland crosses* Parents and 50 individuals from each family to be sequencedGenetic or linkage map defines an order and distance between markers

based on a recombination frequency (1cM = 1% recombination rate) in meiosis

SureSelect (Agilent)Target Enrichment + deep sequencing with 454

Example: Population comparison of the Pgi + flanking genes (+ some other)

in a sample of 24 individuals or pools

Page 8: Rainer Lehtonen PhD, Genomics and genetics project leader Metapopulation Research Group Department of Biological and Environmental Sciences, University.

8

Genetic map with RAD-tag NGS

150-200bp pair-end library

50bp seq 25 bp seq

SNP1 SNP2

Nathan A et al.PloS ONE 2008

Now:500MReads50 bp each

Page 9: Rainer Lehtonen PhD, Genomics and genetics project leader Metapopulation Research Group Department of Biological and Environmental Sciences, University.

9

RAD-tagging in Glanville fritillary

Average fragment size454 Glanville gContigs   Heliconius

NcoI 13.3          14 XhoI  11.5           4 EcoRI  4.5            2

Mappable reads • Restriction site > 250bp from the end of a gContig• Targets = 2x sites• 454-Newbler assembly: 320Mbp (out of ~550Mbp genome in 220K contigs (>500bp) • Expected number of SNPs 1/300bp, read lenght 50-25bp

-----------------------------------------------------               #sites #mappable #exp #SNPsNcoI*  ccatgg  24,064   38,880 48,128 12,032XhoI  ctcgag  27,788   45,925 55,576 13,894EcoRI gaattc  70,474  117,293 140,948 35,2367BsphI* tcatga  66,967  110,731 133,934 33,483NdeI  catatg  73,629  121,628 147,258 36,814

*The most probable combination > ~45,000 SNPs• Reads have to unique• 10-20x coverage/ individual (>~5000x on average)• Heavy data filtering needed > probably only 30-50% of data is usable

In silico restriction analysis made by Panu Somervuo, MRG

Page 10: Rainer Lehtonen PhD, Genomics and genetics project leader Metapopulation Research Group Department of Biological and Environmental Sciences, University.

10

Targeted enrichment + resequencing

Max 55K 120 meroligos

Glanville fritillary butterfly SureSelectTarget enrichment (10x tiling):•To identify “lethal” haplotypes associatedto a known homozygous genotype•To define structure and variations of the hypervariable Pgi gene* To design tag-SNPs for large scale genotyping

Page 11: Rainer Lehtonen PhD, Genomics and genetics project leader Metapopulation Research Group Department of Biological and Environmental Sciences, University.

11

Uneven coverage

TCMID_3 - Tas_pooli_Cinxia Sure Select_F3

TCMID_51 - Tas_pooli_Cinxia Sure Select_A1

TCMID_53 - Tas_pooli_Cinxia Sure Select_C1

TCMID_55 - Tas_pooli_Cinxia Sure Select_E1

TCMID_57 - Tas_pooli_Cinxia Sure Select_G1

TCMID_59 - Tas_pooli_Cinxia Sure Select_A2

TCMID_61 - Tas_pooli_Cinxia Sure Select_B3

TCMID_63 - Tas_pooli_Cinxia Sure Select_1

TCMID_65 - Tas_pooli_Cinxia Sure Select_3

TCMID_67 - Tas_pooli_Cinxia Sure Select_5

TCMID_69 - Tas_pooli_Cinxia Sure Select_E3

TCMID71 -

5 0

00

10

000

15

000

20

000

25

000

30

000

35

000

30 753 31 488

7 998 11 072

20 699 13 346

10 540 4 568

9 164 7 520

9 863 1 131

11 687 9 362

12 959 13 717

16 644 9 214 9 780

20 851 17 110

22 316 21 122

14 731

1154612197

31284343

82045236

41441791

35872829

3581444

44943613

49835361

649936213708

77186324

79607774

5468

Cinxia Sure Select

Bases kbp (total 128 555 kbp)Reads (total 337 635)

Hypothesis driven samplingcompare samples (24) from different populations with different tag-SNP genotype frequencies

>Hardy-Weinberg equilibrium > Hardy-Weinberg

disequilibrium

¼ 454 Titanium run: 444-12197 kb/sample = 15-406 x coverageFigure by Pia LaineInstitute of BiotechnologyUniversity of Helsinki

Page 12: Rainer Lehtonen PhD, Genomics and genetics project leader Metapopulation Research Group Department of Biological and Environmental Sciences, University.

12

How well SureSelect works?

Data from Agilent

Our very preliminary result:~40% of the datacomes from the target

Page 13: Rainer Lehtonen PhD, Genomics and genetics project leader Metapopulation Research Group Department of Biological and Environmental Sciences, University.

Heliconius Genome Meeting 13

Comparison of haplotypes

25.-26.3.2010

Sampsa Hautaniemi, Marko Laakso, Sirkku Karinen, Rainer [email protected]

Page 14: Rainer Lehtonen PhD, Genomics and genetics project leader Metapopulation Research Group Department of Biological and Environmental Sciences, University.

14

Message

Whole genome sequencing is doable for a “non-genome” oriented research group

Most work on data filtering and analysis Tools for data management and

analysis under strong development Down-stream efforts need to be

compatible with available genome data