Large-scale biodiversity genomics initiatives in Brazil: From DNA ... · genome center networks and...
Transcript of Large-scale biodiversity genomics initiatives in Brazil: From DNA ... · genome center networks and...
Workshop Biodiversity and Biobank – FAPESP / ABC / SI
FAPESP, São Paulo, Brazil, 16 August 2017
Large-scale biodiversity genomics
initiatives in Brazil:
From DNA barcodes to whole genome
sequences
Eduardo Eizirik
Faculdade de Biociências, PUCRS, Brazil
Instituto Pró-Carnívoros, Brazil
Brazil
Area: 8,515,767 km2
Population: 190,732,694
Megadiverse country
- Estimated to harbor ~1.8 mi spp.
(10-17% of the world’s total)
[Lewinsohn & Prado 2005]
- Known fauna:
~9k vertebrates (711 mammals,
1,900 birds, 732 non-avian
reptiles, 973 amphibians, 3,133
continental fish, 1,376 marine
fish)
Map: wikipedia.org
Applications of DNA Barcoding in
Biodiversity Conservation
1. Gathering data on components of native biodiversity;
1.1. Baseline data (e.g. community composition and
dynamics, geographic distribution, trophic interactions).
1.2. Monitoring of biodiversity in impacted areas.
2. Gathering data on threats to native biotas.
- e.g. invasive species, pathogens, wildlife trafficking.
3. Helping to enforce actions aimed at curbing threats to
biodiversity.
- e.g. wildlife forensic analyses.
E. Eizirik, CBOL regional mtg. Brazil, 2007
The effort towards Large-scale DNA barcoding of
Brazilian biodiversity
Initial Proposal (2005):
Beginning of the Brazilian DNA barcoding network
- Large-scale inventory of Brazilian biodiversity
- Multi-center project designed in 2005 to boost taxonomic
research in Brazil using the DNA barcode concept as a catalyst to
integrate field collections, museum-based biodiversity research,
genome center networks and bioinformatics advances.
- 6 museums, 14 Centers of Molecular Biodiversity, ~300 people
Large-scale inventorying of Brazilian biodiversity (2005) Sampling strategy
15 sites:
-10,000 samples/site:
- fish
- amphibians
- reptiles (incl. birds)
- mammals
- spiders
- Leguminosae
Phase 1:
~US$ 3,000,000
2010-2014
2010 – BrBOL launched
>100 participating
groups
~500 people involved
- All major Brazilian museums
and natural history collections.
- Bioinformatics center
- Biomedical institution
- Agricutural research agency
The two largest museums did
not have molecular biology
laboratories.
- Implemented in both via
funding from the BrBOL
network.
Projeto ‘Tetrapoda’ - BrBOL
Barcoding Tetrapoda:
4 sub-groups
Amphibians (C. Haddad)
Reptiles (H. Zaher)
Birds (C. Miyaki)
Mammals (E. Eizirik)
Participating labs
Mammals
Birds
Non-avian reptiles
Amphibians
DNA barcoding of Brazilian tetrapods (2010-2014)
25 participating
institutions
> 100 people involved
4 Major Natural
history collections in
Brazil
French Guiana joined BrBOL
Group Barcoded Individuals Barcoded Species
Amphibians 5,100 450
Non-avian reptiles 2,608 816
Birds 3,508 1,253
Mammals 2,122 344
Total 13,338 2,863
DNA barcoding of Brazilian tetrapods
Results: DNA Barcode (COI) library construction (2010-2015)
DNA barcoding of Atlantic Forest amphibians
(M. Lyra, C. Haddad)
~3800 inds/vouchers
- 386 species
(71% of the known
species).
- 59/63 genera
Geographic distribution of mammal DNA barcodes
Large-scale DNA barcoding in Brazil
Good news:
1. We got started and scratched the surface
2. An unprecedented community of biodiversity scientists has
been assembled and integrated in Brazil.
3. There is capacity in the country to move forward.
Challenges ahead:
1. Securing continuous, large-scale funding.
2. Improving governance and organizational structure.
3. Scaling up and speeding up to tackle the magnitude of the task
and the pace of habitat loss in the country.
Leopardus geoffroyiScinax granulatus
Dietary analyses of wild cats using DNA barcodes of prey items
CPCN Pró-Mata, PUCRS, Brazil
Vriesea platynemaAechmea gamosepala
Meta-barcoding of Atlantic Forest bromeliad
tank waters
Meta-barcoding of Atlantic Forest bromeliad
tank waters
Biodiversity of the mineral province of Carajás (G. Oliveira)
DNA barcodes
Genomes
Metagenomes
Canga plantsCave invertebrates
Plant barcodes
16
144
0
2748
46
706
0
3179
0 0
179
2077
26 0 3
2285
0
500
1000
1500
2000
2500
3000
3500
BOLD ITV
Carajás
Circonuscus sp.
Cave invertebrates:1,074 specimensCirconiscus: 1 BOLD 70 ITV
Guilherme Oliveira – Environmental Genomics @ITV
Whole-genome sequencing of Brazilian biodiversity
The Jaguar Genome
Project
www.jaguargenome.org
Origin: 2011
Brazilian Congress of Genetics
ConGen course in the Pantanal
- Unique features among the Panthera
- Phenotypic diversity (size, coloration)
The jaguar (Panthera onca)
Galetti et al. 2013. Science
Conservation Issues
The Jaguar Genome
Project
www.jaguargenome.org
• Consortium of Brazilian Institutions
PUCRS, ESALQ/USP, FIOCRUZ,
IPC, CENAP, IDSM, IOP, UFSJD,
ZMQB
• Collaborators in 6 different countries
- USA
- Russia
- Ireland
- Portugal
- Spain
- Argentina
SEQUENCING ASSEMBLY ANNOTATION
Vagalume(Sorocaba Zoo)
100 bp
180 bp
3 kb
8 kb
3 lanes
1 lane
1 lane
2,660,456,270 reads
94x coverage
Sequencing the jaguar genome
• De novo assembly
• ALLPATHS
• 156,436 Contigs
• 7,521 Scaffolds
• 2.4 Gb
• Contig N50: 28.6 kb
• Scaffold N50: 1.52 Mb
SEQUENCING ASSEMBLY ANNOTATION
Sequencing the jaguar genome
• Maker2
• De novo prediction of 25,451 coding genes
• Validation of 96% with RNA-seq data (6
tissues) and other empirical evidence.
• Annotation of repetitive elements (e.g. 344,251
microsatellites), ncRNA and numts (nuclear
insertions of mtDNA)
SEQUENCING ASSEMBLY ANNOTATION
Sequencing the jaguar genome
Comparative genomics of genus Panthera:
Henrique Figueiró, PUCRS
Gang Li, William Murphy
Texas A&M University
Species tree
• Estimated based on 100-kb genomic windows, as well as
gene-by-gene data sets (13,183 shared loci).
lion (Panthera leo)
leopard (Panthera pardus)
jaguar (Panthera onca)
Snow leopard (Panthera uncia)
tiger (Panthera tigris)
• Extensive genealogical discordance in the Panthera, with remarkable spatial
variation across their genomes.
Tree 1 (species tree)
6
5
0
4
3
1
2
Tree 2 Tree 3
lion
jaguar
leopard
tiger
snow leopard
cat
jaguar
leopard
lion
tiger
snow leopard
cat
XAutosomes
Years
(M
ya)
lion
leopard
jaguar
tiger
snow leopard
cat
X X
P1 P2 P3 D s.e. Z
PLE PON PUN -0.2235 0.0043 -52.5
PPA PON PUN -0.1804 0.0042 -43.3
PPA PLE PUN 0.0567 0.0034 16.7
PLE PON PTI -0.1739 0.0055 -31.6
PPA PON PTI -0.1379 0.0051 -27.2
PPA PLE PTI 0.0482 0.0037 13.1
PTI PUN PON -0.1200 0.0040 -30.0
PTI PUN PPA -0.0742 0.0028 -26.8
PTI PUN PLE -0.0685 0.0026 -26.0
PPA PLE PON 0.0569 0.0048 11.8
P. leoP. tigris P. oncaP. pardusP. uncia
ABBA-BABA test
lion
jaguar
leopard
tiger
snow leopard
cat
jaguar
leopard
lion
tiger
snow leopard
cat
lion
leopard
jaguar
tiger
snow leopard
cat
Outlier window testScreen for divergence time outliers indicating introgressed segments
Genealogy
Intriguing overlap of pathways enriched in both introgressedsegments and genes with signatures of positive selection
Design of a customized Panthera exome capture array
• 19,000 genes = 36Mb
• Nimblegen Capture Kit
• Annotation
> > >= =
• 113 jaguar individuals sequenced in 2 HiSeq lanes
• Average coverage of 8x
• 160,000 SNPs – 67,000 SNPs after filtering
• 30 individuals used to assess intra-specific
variation in genes with signatures of introgression
Assessment of intra-specific variation
Collaboration with
Rasmus Nielsen
(UC Berkeley)
Coalescent simulations to test for positive selection in genes bearing signatures of inter-species introgression
Ongoing fronts – Jaguar Genome Project
- Continued analyses of the exome data set.
- Analysis of a GBS data set covering multiple
Brazilian regions.
- Analysis of whole genomes from additional
individuals:
- Guatemala (Chromium – 10xG)
- Arizona (Discovar)
- Amazonia, Atlantic Forest, Caatinga
(Illumina HiSeqX).
Other felid genome sequencing projects
Puma
(Puma concolor)
Pampas cat
(Leopardus colocolo)
Ocelot
(Leopardus pardalis)Margay
(Leopardus wiedii)
Collaborations with:
Beth Shapiro (UCSC)
Priscilla Villela (EcoMol)
Luiz Coutinho (ESALQ/USP)
Greg Barsh (HudsonAlpha)
Bill Murphy (Texas A&M)
Chris Kaelin (Stanford)
Pedro Galetti (UFSCar)
Warren Johnson (Smithsonian)
Klaus Koepfli (Smithsonian)
Other ongoing genome sequencing projects
Maned wolf
(Chrysocyon brachyurus)
Bush dog
(Speothos venaticus)
Neotropical otter
(Lontra longicaudis)
Humpback whale
(Megaptera novaeangliae)
South American fur seal
(Arctocephalus australis)
Luis Claudio Marigo
South American foxes
(Lycalopex spp.)
Fernanda Valdez
Cristine Trinca
Henrique Figueiró
Marina Favarini
Tiago Ferraz
Maísa Bertoldo
Paulo Chaves
Flavia Tirelli
Fernanda Michalski
Anne Schmidt-Küntzel
Maria Eduarda Appel
Bromeliad meta-barcoding
Taiz Simão
Laura Utz
Adriana Giongo
Eric Tripplett
Renata Medina
Acknowledgments - Barcodes
BrBOL
Claudio Oliveira
Guilherme Oliveira
Cristina Miyaki
Fabricio Santos
Aristóteles Góes
Mariana Oliveira
Paulo Buckup
Ana Maria Espin
Fernando Monteiro
Jorge Porto
Hussam Zaher
Célio Haddad
Mariana Lyra
Felipe Grazziotin
Alexandre Aleixo
Marcelo Weksler
Benoit de Thoisy
Camila Ribas
Pedro C. Estrela
Yuri Leite
Larissa Oliveira
Alexandre Percequillo
Aleksey Komissarov
Rodrigo Teixeira
Adauto Nunes
Leandro Silveira
Fernando Azevedo
Emiliano Ramalho
Graham Hughes
Oliver Ryder
Greg Barsh
Chris Kaelin
Marta Svartman
Acknowledgements – Genome Sequencing
Henrique Figueiró
Fernanda Trindade
Maíra Rodrigues
Lucas G. Silva
Vera de Ferran
Sarah Santos
Cristine S. Trinca
Daniel Kantek
Laura Heidtmann
Gustavo Lorenzana
Support: CNPq, CAPES, FAPERGS,
FAPESP, Tetrapak
William Murphy
Gang Li
Rasmus Nielsen
Steve O’Brien
Guilherme Oliveira
Luiz Coutinho
Agostinho Antunes
Emma Teeling
Toni Gabaldón
Patricia Saragueta
Ronaldo G. Morato
Robert Wayne
Klaus Koepfli
Bridgett VonHoldt
Warren Johnson
Sandro Bonatto
C. Scott Baker
Larissa Oliveira