Bioinformatics for high-throughput DNA sequencing Gabor Marth Boston College Biology New grad...
-
date post
19-Dec-2015 -
Category
Documents
-
view
217 -
download
1
Transcript of Bioinformatics for high-throughput DNA sequencing Gabor Marth Boston College Biology New grad...
![Page 1: Bioinformatics for high-throughput DNA sequencing Gabor Marth Boston College Biology New grad student orientation Boston College September 8, 2009.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649d2f5503460f94a06ca2/html5/thumbnails/1.jpg)
Bioinformatics for high-throughput DNA sequencing
Gabor MarthBoston College Biology
New grad student orientationBoston CollegeSeptember 8, 2009
![Page 2: Bioinformatics for high-throughput DNA sequencing Gabor Marth Boston College Biology New grad student orientation Boston College September 8, 2009.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649d2f5503460f94a06ca2/html5/thumbnails/2.jpg)
DNA sequence variations
The Human Genome Project has determined a reference sequence of the human genome
However, every individual is unique, and is different from others at millions of nucleotide locations
![Page 3: Bioinformatics for high-throughput DNA sequencing Gabor Marth Boston College Biology New grad student orientation Boston College September 8, 2009.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649d2f5503460f94a06ca2/html5/thumbnails/3.jpg)
Why do we care about variations?
underlie phenotypic differences
cause inherited diseases
allow tracking ancestral human history
![Page 4: Bioinformatics for high-throughput DNA sequencing Gabor Marth Boston College Biology New grad student orientation Boston College September 8, 2009.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649d2f5503460f94a06ca2/html5/thumbnails/4.jpg)
4
Human genetic variation
![Page 5: Bioinformatics for high-throughput DNA sequencing Gabor Marth Boston College Biology New grad student orientation Boston College September 8, 2009.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649d2f5503460f94a06ca2/html5/thumbnails/5.jpg)
The first “famous” genomes
![Page 6: Bioinformatics for high-throughput DNA sequencing Gabor Marth Boston College Biology New grad student orientation Boston College September 8, 2009.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649d2f5503460f94a06ca2/html5/thumbnails/6.jpg)
Genome sequencing
~1 Mb ~100 Mb >100 Mb ~3,000 Mb
![Page 7: Bioinformatics for high-throughput DNA sequencing Gabor Marth Boston College Biology New grad student orientation Boston College September 8, 2009.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649d2f5503460f94a06ca2/html5/thumbnails/7.jpg)
New sequencing technologies…
![Page 8: Bioinformatics for high-throughput DNA sequencing Gabor Marth Boston College Biology New grad student orientation Boston College September 8, 2009.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649d2f5503460f94a06ca2/html5/thumbnails/8.jpg)
Next-gen sequencing – a revolution
read length
base
s per
mach
ine r
un
10 bp 1,000 bp100 bp
1 Gb
100 Mb
10 Mb
10 Gb
Illumina/Solexa, AB/SOLiD sequencers
ABI capillary sequencer
Roche/454 pyrosequencer
(100-400 Mb in 200-450 bp reads)
(10-30Gb in 25-100 bp reads)
1 Mb
100 Gb
![Page 9: Bioinformatics for high-throughput DNA sequencing Gabor Marth Boston College Biology New grad student orientation Boston College September 8, 2009.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649d2f5503460f94a06ca2/html5/thumbnails/9.jpg)
The re-sequencing informatics pipeline
REF
(ii) read mapping
IND
(i) base calling
IND(iii) SNP and short INDEL calling
(v) data viewing, hypothesis generation
(iv) SV callingGigaBayesGigaBayes
![Page 10: Bioinformatics for high-throughput DNA sequencing Gabor Marth Boston College Biology New grad student orientation Boston College September 8, 2009.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649d2f5503460f94a06ca2/html5/thumbnails/10.jpg)
Tools
![Page 11: Bioinformatics for high-throughput DNA sequencing Gabor Marth Boston College Biology New grad student orientation Boston College September 8, 2009.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649d2f5503460f94a06ca2/html5/thumbnails/11.jpg)
Read mapping is like a jigsaw puzzle…
… and they give you the picture on the box
2. Read mapping
…you get the pieces…
Big and Unique pieces are easier to place than others…
![Page 12: Bioinformatics for high-throughput DNA sequencing Gabor Marth Boston College Biology New grad student orientation Boston College September 8, 2009.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649d2f5503460f94a06ca2/html5/thumbnails/12.jpg)
The MOSAIK read mapping program
• Reads from repeats cannot be uniquely mapped back to their true region of origin
Michael Strömberg(Wan-Ping Lee)
![Page 13: Bioinformatics for high-throughput DNA sequencing Gabor Marth Boston College Biology New grad student orientation Boston College September 8, 2009.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649d2f5503460f94a06ca2/html5/thumbnails/13.jpg)
SNP discovery
GigaBayesGigaBayes
Marth et al. Nature Genetics 1999Quinlan et al. in prep.(Amit Indap, Wen Fung Leong)
![Page 14: Bioinformatics for high-throughput DNA sequencing Gabor Marth Boston College Biology New grad student orientation Boston College September 8, 2009.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649d2f5503460f94a06ca2/html5/thumbnails/14.jpg)
Structural variation discovery
Navigation bar
Fragment lengths in selected region
Depth of coverage in selected region
Stewart et al. in prep.(Deniz Kural, Jiantao Wu)
![Page 15: Bioinformatics for high-throughput DNA sequencing Gabor Marth Boston College Biology New grad student orientation Boston College September 8, 2009.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649d2f5503460f94a06ca2/html5/thumbnails/15.jpg)
Sequence alignment viewers
Huang et al. Genome Research 2008(Derek Barnett)
![Page 16: Bioinformatics for high-throughput DNA sequencing Gabor Marth Boston College Biology New grad student orientation Boston College September 8, 2009.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649d2f5503460f94a06ca2/html5/thumbnails/16.jpg)
Data mining
![Page 17: Bioinformatics for high-throughput DNA sequencing Gabor Marth Boston College Biology New grad student orientation Boston College September 8, 2009.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649d2f5503460f94a06ca2/html5/thumbnails/17.jpg)
Mutational profiling in deep 454 data
• Pichia stipitis is a yeast that efficiently converts xylose to ethanol (bio-fuel production)• one specific mutagenized strain had especially high conversion efficiency• goal was to determine where the mutations were that caused this phenotype• we analyzed 10 runs (~3 million reads) of 454 reads (~20x coverage of the 15MB
genome)
Pichia stipitis reference sequence
• found 39 mutations• informatics analysis in < 24 hours (including manual checking of all candidates)
Image from JGI web site
Smith et al. Genome Research 2008
![Page 18: Bioinformatics for high-throughput DNA sequencing Gabor Marth Boston College Biology New grad student orientation Boston College September 8, 2009.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649d2f5503460f94a06ca2/html5/thumbnails/18.jpg)
SNP calling in short-read coverage
C. elegans reference genome (Bristol, N2 strain)
Pasadena, CB4858(1 ½ machine runs)
Bristol, N2 strain(3 ½ machine runs)
• goal was to evaluate the Solexa/Illumina technology for the complete resequencing of large model-organism genomes• 5 runs (~120 million) Illumina reads were collected by Washington Univ.
SNP
• we found 45,000 SNP with very high validation rate
Hillier et al.Nature Methods 2008
![Page 19: Bioinformatics for high-throughput DNA sequencing Gabor Marth Boston College Biology New grad student orientation Boston College September 8, 2009.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649d2f5503460f94a06ca2/html5/thumbnails/19.jpg)
Current focus
![Page 20: Bioinformatics for high-throughput DNA sequencing Gabor Marth Boston College Biology New grad student orientation Boston College September 8, 2009.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649d2f5503460f94a06ca2/html5/thumbnails/20.jpg)
1000 Genomes Project
• data quality assessment• project design (# samples depth of read coverage)• read mapping• SNP calling• structural variation discovery
![Page 21: Bioinformatics for high-throughput DNA sequencing Gabor Marth Boston College Biology New grad student orientation Boston College September 8, 2009.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649d2f5503460f94a06ca2/html5/thumbnails/21.jpg)
SV discovery in autism
deletion
amplification
![Page 22: Bioinformatics for high-throughput DNA sequencing Gabor Marth Boston College Biology New grad student orientation Boston College September 8, 2009.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649d2f5503460f94a06ca2/html5/thumbnails/22.jpg)
Lab
![Page 23: Bioinformatics for high-throughput DNA sequencing Gabor Marth Boston College Biology New grad student orientation Boston College September 8, 2009.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649d2f5503460f94a06ca2/html5/thumbnails/23.jpg)
People
![Page 24: Bioinformatics for high-throughput DNA sequencing Gabor Marth Boston College Biology New grad student orientation Boston College September 8, 2009.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649d2f5503460f94a06ca2/html5/thumbnails/24.jpg)
Resources
• computer cluster (72 servers)• 128 GB RAM server• ~200TB disk space
• 2 R01 grants (NHGRI/NIH)• 1 R21 grant (NIAID/NIH)• a BC RIG grant
• 2 RC2 grants (NHGRI/NIH) starting September 2009
![Page 25: Bioinformatics for high-throughput DNA sequencing Gabor Marth Boston College Biology New grad student orientation Boston College September 8, 2009.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649d2f5503460f94a06ca2/html5/thumbnails/25.jpg)
Collaborations
Baylor HGSC
Wash. U. GSC
Genome Canada
UBC GSC
Cornell
UC Davis UCSF
NCBI @ NIH NCI @ NIH Marshfield Clinic
UCLA
Pfizer
![Page 26: Bioinformatics for high-throughput DNA sequencing Gabor Marth Boston College Biology New grad student orientation Boston College September 8, 2009.](https://reader036.fdocuments.in/reader036/viewer/2022062421/56649d2f5503460f94a06ca2/html5/thumbnails/26.jpg)
Graduate student rotations
• Looking for new graduate students
• Spots are available for all three rotations
• Lots or projects
• Caveat: you need to be able to program…
• Check us out at: http://bioinformatics.bc.edu/marthlab/
• If you are interested, please talk to me