Heng Li and Richard Durbin ∗ Members of this presentation: Yunji Wang Sree Devineni Zhen Gao.
-
Upload
miles-norton -
Category
Documents
-
view
213 -
download
0
Transcript of Heng Li and Richard Durbin ∗ Members of this presentation: Yunji Wang Sree Devineni Zhen Gao.
![Page 1: Heng Li and Richard Durbin ∗ Members of this presentation: Yunji Wang Sree Devineni Zhen Gao.](https://reader030.fdocuments.in/reader030/viewer/2022032723/56649cfa5503460f949cbfb9/html5/thumbnails/1.jpg)
Fast and accurate short read alignment with Burrows–Wheeler
transformHeng Li and Richard Durbin∗
Members of this presentation:Yunji WangSree DevineniZhen Gao
![Page 2: Heng Li and Richard Durbin ∗ Members of this presentation: Yunji Wang Sree Devineni Zhen Gao.](https://reader030.fdocuments.in/reader030/viewer/2022032723/56649cfa5503460f949cbfb9/html5/thumbnails/2.jpg)
Motivation
The first generation of hash table-based methods (e.g. MAQ) are:SlowNot support gapped alignment
![Page 3: Heng Li and Richard Durbin ∗ Members of this presentation: Yunji Wang Sree Devineni Zhen Gao.](https://reader030.fdocuments.in/reader030/viewer/2022032723/56649cfa5503460f949cbfb9/html5/thumbnails/3.jpg)
Suffix array interval
position of each substring will occur in an interval in the suffix array. (On the right figure)
e.g. Suffix interval of pattern “go” is [1, 2].What about “og”?
![Page 4: Heng Li and Richard Durbin ∗ Members of this presentation: Yunji Wang Sree Devineni Zhen Gao.](https://reader030.fdocuments.in/reader030/viewer/2022032723/56649cfa5503460f949cbfb9/html5/thumbnails/4.jpg)
Prefix trie and Inexact string matchingPrefix trie of string “GOOGOL”
The dashed line shows how to find string ‘LOL’ (1 mismatch allowed)
What about “LOG”?
![Page 5: Heng Li and Richard Durbin ∗ Members of this presentation: Yunji Wang Sree Devineni Zhen Gao.](https://reader030.fdocuments.in/reader030/viewer/2022032723/56649cfa5503460f949cbfb9/html5/thumbnails/5.jpg)
ConclusionsScientists Implemented of Burrows-Wheeler Alignment tool (BWA) which is based on BWT. Thus:FastReducing memoryAllow gaps
![Page 6: Heng Li and Richard Durbin ∗ Members of this presentation: Yunji Wang Sree Devineni Zhen Gao.](https://reader030.fdocuments.in/reader030/viewer/2022032723/56649cfa5503460f949cbfb9/html5/thumbnails/6.jpg)
REFERENCESHeng Li and Richard Durbin (2009) Fast and
accurate short read alignment with Burrows–Wheeler transform. Bioinformatics, 25, no. 14 2009, pages 1754–1760
![Page 7: Heng Li and Richard Durbin ∗ Members of this presentation: Yunji Wang Sree Devineni Zhen Gao.](https://reader030.fdocuments.in/reader030/viewer/2022032723/56649cfa5503460f949cbfb9/html5/thumbnails/7.jpg)
CS 6293: Advanced Topics: Current Bioinformatics
A probabilistic framework for aligning paired-end RNA-seq data
Members of this presentation:Yunji WangSree DevineniZhen Gao
![Page 8: Heng Li and Richard Durbin ∗ Members of this presentation: Yunji Wang Sree Devineni Zhen Gao.](https://reader030.fdocuments.in/reader030/viewer/2022032723/56649cfa5503460f949cbfb9/html5/thumbnails/8.jpg)
A probabilistic framework for aligning paired-end RNA-seq data
• Current Biology Method
• Align RNA-seq reads to the reference genome rather than to a transcript database.
![Page 9: Heng Li and Richard Durbin ∗ Members of this presentation: Yunji Wang Sree Devineni Zhen Gao.](https://reader030.fdocuments.in/reader030/viewer/2022032723/56649cfa5503460f949cbfb9/html5/thumbnails/9.jpg)
Current Biology Problem
• A single read:
Constitute 35-100 consecutive nucleotides of a fragment of an mRNA transcript.
• However, the expected size of mRNA fragments are around 182bp.
• Paired-end read (PER)protocol sequences two ends of a size-selected fragment of an mRNA.
(Double the length of single read)
![Page 10: Heng Li and Richard Durbin ∗ Members of this presentation: Yunji Wang Sree Devineni Zhen Gao.](https://reader030.fdocuments.in/reader030/viewer/2022032723/56649cfa5503460f949cbfb9/html5/thumbnails/10.jpg)
Problem of PER fragment alignment
• Problem:
The expected distance between the two end reads within the transcript fragment, know as mate-pair distance.
The distance between the two ends when aligned to the genome is quit different with mate-pair distance.
![Page 11: Heng Li and Richard Durbin ∗ Members of this presentation: Yunji Wang Sree Devineni Zhen Gao.](https://reader030.fdocuments.in/reader030/viewer/2022032723/56649cfa5503460f949cbfb9/html5/thumbnails/11.jpg)
Problem of PER fragment alignment
![Page 12: Heng Li and Richard Durbin ∗ Members of this presentation: Yunji Wang Sree Devineni Zhen Gao.](https://reader030.fdocuments.in/reader030/viewer/2022032723/56649cfa5503460f949cbfb9/html5/thumbnails/12.jpg)
Current Tools
• TopHat• TopHat reports the closest end alignment for a
PER.
• SpliceMap• SpliceMap considers PERs with ends mapped
within 400 000bp on the genome.
![Page 13: Heng Li and Richard Durbin ∗ Members of this presentation: Yunji Wang Sree Devineni Zhen Gao.](https://reader030.fdocuments.in/reader030/viewer/2022032723/56649cfa5503460f949cbfb9/html5/thumbnails/13.jpg)
Method-Step 1
• Mapping the individual reads
![Page 14: Heng Li and Richard Durbin ∗ Members of this presentation: Yunji Wang Sree Devineni Zhen Gao.](https://reader030.fdocuments.in/reader030/viewer/2022032723/56649cfa5503460f949cbfb9/html5/thumbnails/14.jpg)
Method-Step 2
• Graphical model
![Page 15: Heng Li and Richard Durbin ∗ Members of this presentation: Yunji Wang Sree Devineni Zhen Gao.](https://reader030.fdocuments.in/reader030/viewer/2022032723/56649cfa5503460f949cbfb9/html5/thumbnails/15.jpg)
Probabilistic framework
• Splice graph, G={V,E}
• Nodes - individual nucleotides• Directed edge types✔connect adjacent nodes✔Skips around the sliced-out portion of
the genome
![Page 16: Heng Li and Richard Durbin ∗ Members of this presentation: Yunji Wang Sree Devineni Zhen Gao.](https://reader030.fdocuments.in/reader030/viewer/2022032723/56649cfa5503460f949cbfb9/html5/thumbnails/16.jpg)
Estimation of alignments,
(Maximize likelihood of PERsover all the putative alignments.)
![Page 17: Heng Li and Richard Durbin ∗ Members of this presentation: Yunji Wang Sree Devineni Zhen Gao.](https://reader030.fdocuments.in/reader030/viewer/2022032723/56649cfa5503460f949cbfb9/html5/thumbnails/17.jpg)
EM continued...
![Page 18: Heng Li and Richard Durbin ∗ Members of this presentation: Yunji Wang Sree Devineni Zhen Gao.](https://reader030.fdocuments.in/reader030/viewer/2022032723/56649cfa5503460f949cbfb9/html5/thumbnails/18.jpg)
Methods-Step 3
• Expectation-maximization algorithm
![Page 19: Heng Li and Richard Durbin ∗ Members of this presentation: Yunji Wang Sree Devineni Zhen Gao.](https://reader030.fdocuments.in/reader030/viewer/2022032723/56649cfa5503460f949cbfb9/html5/thumbnails/19.jpg)
Discussion• Proposed a probabilistic framework to
predict the alignment of each PER fragment to a reference genome.
• By maximizing the likelihood of all PER alignments through a splice graph model
• Advantageous-higher coverage and specificity than just the alignment of PERs.
• Capable of detecting trans-chromosome and trans-strand gene fusion events.
![Page 20: Heng Li and Richard Durbin ∗ Members of this presentation: Yunji Wang Sree Devineni Zhen Gao.](https://reader030.fdocuments.in/reader030/viewer/2022032723/56649cfa5503460f949cbfb9/html5/thumbnails/20.jpg)
Advantages
• First, the fragment alignments significantly increase coverage of the transcriptome.
Reason: The PER contains almost double information of single read.
• Second, it has higher specificity than the junctions in the individual end reads.
Reasons: EM algorithm used the information from the entire set of end read alignments.
![Page 21: Heng Li and Richard Durbin ∗ Members of this presentation: Yunji Wang Sree Devineni Zhen Gao.](https://reader030.fdocuments.in/reader030/viewer/2022032723/56649cfa5503460f949cbfb9/html5/thumbnails/21.jpg)
Advantages
• Third, the splice graph accurately captures alternative paths between two end read and the expected mate-pair distance can effectively disambiguate them.
![Page 22: Heng Li and Richard Durbin ∗ Members of this presentation: Yunji Wang Sree Devineni Zhen Gao.](https://reader030.fdocuments.in/reader030/viewer/2022032723/56649cfa5503460f949cbfb9/html5/thumbnails/22.jpg)
Thank you