Virus Hunting in French Guiana

Post on 23-Aug-2014

1.291 views 10 download

Tags:

description

Lab meeting presentation about my work doing viral metagenomics in French Guiana Rat by Francisca Arévalo from The Noun Project Bat by Adam Heller from The Noun Project

Transcript of Virus Hunting in French Guiana

French Guiana

Virus Hunting in

Nacho Caballero

French Guiana

Rodents

Bats

Rodents

Bats

Leishmania

Capture

Capture Isolate viral particles

Capture Isolate viral particles

Extract RNA

Capture Isolate viral particles

Extract RNA

Sequence

Estimated read coverage

% reads with coverage smaller than x

Rodents

Estimated read coverage

% reads with coverage smaller than x

Rodents

Estimated read coverage

% reads with coverage smaller than x

Rodents Bats

Read

How can we estimate the coverage without a reference genome?

Read

How can we estimate the coverage without a reference genome?

K-mers

Read

How can we estimate the coverage without a reference genome?

How can we estimate the coverage without a reference genome?

1111111

How can we estimate the coverage without a reference genome?

78

1081136

78

1081136

Median k-mer count ≈

Read coverage

k-mers make it possible to align without a reference

Problem: each sequencing error introduces k erroneous k-mers

Problem: each sequencing error introduces k erroneous k-mers

78

1081136

Over a threshold, additional reads are redundant

5555535

Solution: digital normalization reduces redundancy and errors

Assembly

Assembly

SPADes

Assembly Alignment

Assembly Alignment

BLAST

Assembly TaxonomyAlignment

Assembly TaxonomyAlignment

NCBI

Problem: 67% of contigs in rodent dataset (serum) align to human sequences

Problem: 67% of contigs in rodent dataset (serum) align to human sequences

Night-heron coronavirus HKU19 (1 Kb) Simian hemorrhagic fever virus (300 bp) Equine arteritis virus (3.7 Kb) Possum nidovirus Rodent hepacivirus Chipmunk parvovirus Theiler's disease-associated virus Reticuloendotheliosis virus Mosquito VEM Anellovirus SDBVL A Porcine reproductive and respiratory syndrome virus Dragonfly-associated circular virus 1 Gemycircularvirus 3 Rodent pegivirus Cyclovirus PK5510 Hypericum japonicum associated circular DNA virus

Pig stool associated circular ssDNA virus (1Kb) Avian gyrovirus 2 Torque teno sus virus 1a Mosquito VEM virus SDBVL G Turdivirus 3

Problem: 92% of contigs in bat dataset (droppings) don’t align to anything in NCBI

Lymphocytic choriomeningitis virus (7kb) Hepatitis C virus Amphotropic murine leukemia virus Murid herpesvirus 1 Mosquito VEM Anellovirus SDBVL A Rat retrovirus SC1 Mason-Pfizer monkey virus (retrovirus) Eidolon helvum parvovirus 2 Periplaneta fuliginosa densovirus (also a parvovirus) Moloney murine sarcoma virus Sclerotinia sclerotiorum hypovirulence associated DNA virus 1

Problem: 95% of contigs in rodent dataset 2 (serum, spleen) align to mouse sequences

(2)

7 out of 10 samples contained more than 1Kb of Leishmania RNA virus (94% ident)

5 Kb genome

Lessons

Assume that 50% of your samples are going to fail

Lessons

Assume that 50% of your samples are going to fail

Lessons

Design a small experiment, then iterate

Assume that 50% of your samples are going to fail

Lessons

Design a small experiment, then iterate

Come up with excuses to learn