Multiple Sequence Alignment (MSA); short version BIOL 7711 Computational Bioscience
Short read alignment BNFO 601. Short read alignment Input: –Reads: short DNA sequences usually up...
-
date post
22-Dec-2015 -
Category
Documents
-
view
219 -
download
1
Transcript of Short read alignment BNFO 601. Short read alignment Input: –Reads: short DNA sequences usually up...
Short read alignment
• Input:– Reads: short DNA sequences usually up to 100
base pairs (bp) produced by a sequencing machine
• Reads are fragments of a longer DNA sequence present in the sample given as input to the machine
• Usually number in the millions
– Genome sequence: a reference DNA sequence much longer than the read length
Short read alignment
• Applications– Genome assembly– RNA splicing studies– Gene expression studies– Discovery of new genes– Discovering of cancer causing mutations
Short read alignment
• Two approaches– Hashing based algorithms
• BFAST• SHRIMP• MAQ• STAMPY (statistical alignment)
– Burrows Wheeler transform• Bowtie• BWA
Short read alignment
Empirical performance:• Simulated data:
– Extract random substrings of fixed length with random mutations and gaps
– Realign back to reference genome
• Real data: – Paired reads: two ends of the same molecule– Count number of paired reads within 500 to 10000
bases of each other