Next Generation Sequencing and its data analysis challenges Background Alignment and Assembly...
-
Upload
cynthia-lane -
Category
Documents
-
view
218 -
download
0
Transcript of Next Generation Sequencing and its data analysis challenges Background Alignment and Assembly...
Next Generation Sequencing and its data analysis challenges
Background
Alignment and Assembly
ApplicationsGenomeEpigenomeTranscriptome
References
Cell 2013, 155:27Cell 2013, 155:39Annu. Rev. Plant Biol. 2009, 60: 305.Annu. Rev. Genomics Hum. Genet. 2009, 10:135.Curr. Opin. Biotechnology, 24:22.Nat. Biotech. 2009, 25:195.Nat. Methods. 2009, 6:S6.Nat. Rev. Genet. 2009, 10:669.Nat Rev Genet. 2010 Jan;11(1):31-46.Genomics. 2010 Jun;95(6):315-27.
This lecture is about the opportunities and challenges, not detailed statistical techniques. The materials are taken from some review articles.
Background
“Method of the year” 2007 by Nature Methods.The name:
“Next generation sequencing”“Deep sequencing”“High-throughput sequencing” “Second-generation sequencing”
The key characteristics:
Massive parallel sequencingamount of data from a single run ~ amount of data from the human genome project
The reads are short~ a few hundred bases / read
Background
Potential impact:
The “$1000 genome” will become reality very soon
Genome sequencing will become a regular medical procedure.
Personalized medicinePredictive medicineEthical issues
For statisticians:Data mining using hundreds of thousands of
genomesFinding rare SNPs/mutations associated with
diseasesNew methods to analyze
epigeomics/transcriptomics dataFinding interventions to improve life quality
Background
The companies use different techniques. We use Illumina’s as an example. (http://seqanswers.com/forums/showthread.php?t=21)
Background
Background
Background
Background
An incomplete list of some common platforms.
BMC Genomics 2012, 13:341
Background
Background
Advantages:
Fast and cost effective.No need to clone DNA fragments.
Drawbacks:
Short read length (platform dependent)Some platforms have trouble on identical
repeatsNon-uniform confidence in base calling in
reads. Data less reliable near the 3’ end of each read.
Background
What deep sequencing can do:
Background
Nat Methods. 2009 Nov;6(11 Suppl):S2-5.
Sequence the genome of a person? --- Alignment
Can rely on existing human genome as a blue print.
Align the short reads onto the existing human genome.
Need a few fold coverage to cover most regions.
Sequence a whole new genome? --- Assembly
Overlaps are required to construct the genome.The reads are short need ~30 fold coverage.If 3G data per run, need 30 runs for a new
genome similar to human size.
Alignment and Assembly
Alignment and Assembly
Hash table-based alignment. Similar to BLAST in principle.(1) Find potential locations:
(2) Local alignment.
Alignment and Assembly
From read to graph:
Alignment and Assembly
Alignment and Assembly
de Bruijn graph assembly
Red: read error.
Alignment and Assembly
de Bruijn graph assembly
Alignment and Assembly
de Bruijn graph assembly
Whole gnome/exome/transcriptome sequencing
Genomics
Whole genome sequencing detects all variants (SNP alleles, rare variants, mutations)
Could be associated with disease:
Rare variants (burden testing by collapsing by gene)
De novo mutations (need family tree)
Rare Mendelian disorders
Structural variants in cancer
Medical Genomics
Nature Reviews Genetics 11, 415
Example: Extreme-case sequencing to find rare variants associated with a disease.
MedicalGenomics
Example:Cancergenome
Epigenomics
http://www.roadmapepigenomics.org/
ChIP-Seq
ChIP-Seq.
Purpose: analyze which part of the DNA sequence bind to a certain protein.
Transcription factor(Regulome)
Modified histone (Epigenome)
Overall ChIP-Seq workflow
ChIP-Seq
Before deep sequencing, the same information was obtained by using array in the place of sequencing.
ChIP-Seq
ChIP-Seq
Different kind of profiles in different applications.
Elongation
Silencing
ChIP-Seq
Example of active gene chromatin pattern found by ChIP-Seq.
Initiation site
Elongation
ChIP-Seq
RNA-Seq
RNA-Seq
Deep sequencing provides more information about each mRNA
RNA-Seq
Finding novel exons.
Splicing? (short read could be an issue.)
RNA-Seq
Gene expression profiling – to replace arrays?Exon-specific abundance.
RNA-Seq
Sequencin small RNA.
RNA-Seq
Quantification of miRNA and de novo detection of miRNAs
MicroRNA:21-23 in length.
Regulate gene expression by complementary binding .
Derived from non-coding RNAs that form stem-loop structure.
RNA-Seq
Directly probe mRNA targets of miRNA.
RNA-Seq