Comparison of Genomic DNA to cDNA Alignment Methods
-
Upload
miguel-galves -
Category
Science
-
view
328 -
download
2
Transcript of Comparison of Genomic DNA to cDNA Alignment Methods
![Page 1: Comparison of Genomic DNA to cDNA Alignment Methods](https://reader035.fdocuments.in/reader035/viewer/2022062220/558437bcd8b42a84368b46ea/html5/thumbnails/1.jpg)
Comparison of Genomic DNA to cDNA Alignment Methods
Miguel Galves and Zanoni Dias
Institute of Computing – Unicamp – Campinas – SP – Brazil
{miguel.galves,zanoni}@ic.unicamp.br
Scylla Bioinformatics – Campinas – SP – Brazil
{miguel,zanoni}@scylla.com.br
![Page 2: Comparison of Genomic DNA to cDNA Alignment Methods](https://reader035.fdocuments.in/reader035/viewer/2022062220/558437bcd8b42a84368b46ea/html5/thumbnails/2.jpg)
Agenda
Introduction Problem Aligners Data set Subsets Evaluation Methods Results: Exact Alignments Results: EST Alignments Running Time Comparison Conclusions
![Page 3: Comparison of Genomic DNA to cDNA Alignment Methods](https://reader035.fdocuments.in/reader035/viewer/2022062220/558437bcd8b42a84368b46ea/html5/thumbnails/3.jpg)
Introduction
Identifying genes in non-characterized DNA sequences is one of the greatest challenges in genomics
EST-to-DNA alignment is one of the most common methods
EST are key to understanding the inner working of an organism
– Human being has between 30000 and 35000 genes– Alternative Splicing plays an important role in diversity
![Page 4: Comparison of Genomic DNA to cDNA Alignment Methods](https://reader035.fdocuments.in/reader035/viewer/2022062220/558437bcd8b42a84368b46ea/html5/thumbnails/4.jpg)
CCCGGGAAACGAAUAU CCUCUCACCCGGGA CUUGGCCCGGGAAACGAAUAU CCUCUCACCCGGGA CUUGG
Problem
Mature mRNA
mRNA
Intron
Exon
![Page 5: Comparison of Genomic DNA to cDNA Alignment Methods](https://reader035.fdocuments.in/reader035/viewer/2022062220/558437bcd8b42a84368b46ea/html5/thumbnails/5.jpg)
Problem: How to solve ?
Classic algorithms– Dynamic programming
Heuristic based algorithms– Multi-steps– Based on other tools such as Blast and
local alignments.
![Page 6: Comparison of Genomic DNA to cDNA Alignment Methods](https://reader035.fdocuments.in/reader035/viewer/2022062220/558437bcd8b42a84368b46ea/html5/thumbnails/6.jpg)
Aligners
Java version of global and semi-global– Affine gap penalty function– Linear space– Global algorithm by Miller and Myers (1988)– Semi-global based on global algorithm
Heuristic based algorithms– sim4, Spidey and est_genome
![Page 7: Comparison of Genomic DNA to cDNA Alignment Methods](https://reader035.fdocuments.in/reader035/viewer/2022062220/558437bcd8b42a84368b46ea/html5/thumbnails/7.jpg)
Data Set
Human genome database– Based on FASTA a GENBANK’s flat format file from
NCBI repository.
Filtering criteria– Genes, mRNAs and CDS with /pseudo tag– mRNAs without any CDS– Genes without any mRNA– CDS matching wrong patterns
23124 genes and 27448 mRNAs stored in database
![Page 8: Comparison of Genomic DNA to cDNA Alignment Methods](https://reader035.fdocuments.in/reader035/viewer/2022062220/558437bcd8b42a84368b46ea/html5/thumbnails/8.jpg)
Subsets
Subset 1Subset 1:: 66 genes from chromossome Y whith less than 100000 bases
Subset 2: 50 complete genes from chromossome Y whith less than 100000 bases
Subset 3: 8056 complete genes from all chromossomes whith less than 100000 bases
Subset 4: 493 artificial EST based on complete genes from chromossome 6 with less than 100000 bases
![Page 9: Comparison of Genomic DNA to cDNA Alignment Methods](https://reader035.fdocuments.in/reader035/viewer/2022062220/558437bcd8b42a84368b46ea/html5/thumbnails/9.jpg)
Evaluation methods
Number of gaps introduced in the aligned gene sequence
Delta exons Bases similarity percentage Mismatch percentage
![Page 10: Comparison of Genomic DNA to cDNA Alignment Methods](https://reader035.fdocuments.in/reader035/viewer/2022062220/558437bcd8b42a84368b46ea/html5/thumbnails/10.jpg)
Experimental method
Two score systems, from 15 previously defined and an alignment strategy were choosed, using subsets 1 and 2:– Semi-global aligner– (1,-2,-1,0) and (1,-2,-10,0) score systems
The classic semi-global aligner was compared to sim4, Spidey and est_genome, both with subsets 3 and 4
![Page 11: Comparison of Genomic DNA to cDNA Alignment Methods](https://reader035.fdocuments.in/reader035/viewer/2022062220/558437bcd8b42a84368b46ea/html5/thumbnails/11.jpg)
Results: Exact Alignments
Extra GapStrategy Avg SD %Score 0
SG(1, -2, -1, 0) 0.00 0.00 100.00%
SG(1, -2, -10, 0)
0.00 0.00 100.00%
sim4 1.11 1.63 54.56%
est_genome 16.99 21.49 27.84%
Spidey 0.15 1.39 97.43%
![Page 12: Comparison of Genomic DNA to cDNA Alignment Methods](https://reader035.fdocuments.in/reader035/viewer/2022062220/558437bcd8b42a84368b46ea/html5/thumbnails/12.jpg)
Results: Exact Alignments
Delta ExonsStrategy Avg SD %Score 0
SG(1, -2, -1, 0) 0.00 0.00 100.00%
SG(1, -2, -10, 0) 0.01 0.07 99.91%
sim4 -0.01 0.20 97.46%
est_genome -0.14 0.30 76.79%
Spidey -4.04 3.10 0.00%
![Page 13: Comparison of Genomic DNA to cDNA Alignment Methods](https://reader035.fdocuments.in/reader035/viewer/2022062220/558437bcd8b42a84368b46ea/html5/thumbnails/13.jpg)
Results: Exact Alignments
Base SimilarityStrategy Avg SD %Scr. 100%
SG(1, -2, -1, 0) 99.89% 0.49% 53.56%
SG(1, -2, -10, 0) 99.89% 0.49% 53.49%
sim4 99.39% 1.34% 22.79%
est_genome 53.83% 35.00% 18.11%
Spidey 80.34% 36.49% 44.25%
![Page 14: Comparison of Genomic DNA to cDNA Alignment Methods](https://reader035.fdocuments.in/reader035/viewer/2022062220/558437bcd8b42a84368b46ea/html5/thumbnails/14.jpg)
Results: Exact Alignments
Mismatch PercentageStrategy Avg SD %Scr. 100%
SG(1, -2, -1, 0) 0.00% 0.00% 100.00%
SG(1, -2, -10, 0) 0.01% 0.03% 99.47%
sim4 0.17% 0.21% 36.68%
est_genome 1.19% 1.26% 21.55%
Spidey 0.15% 0.98% 90.65%
![Page 15: Comparison of Genomic DNA to cDNA Alignment Methods](https://reader035.fdocuments.in/reader035/viewer/2022062220/558437bcd8b42a84368b46ea/html5/thumbnails/15.jpg)
Results: EST Alignments
![Page 16: Comparison of Genomic DNA to cDNA Alignment Methods](https://reader035.fdocuments.in/reader035/viewer/2022062220/558437bcd8b42a84368b46ea/html5/thumbnails/16.jpg)
Results: EST Alignments
![Page 17: Comparison of Genomic DNA to cDNA Alignment Methods](https://reader035.fdocuments.in/reader035/viewer/2022062220/558437bcd8b42a84368b46ea/html5/thumbnails/17.jpg)
Running Time Comparison
EST-to-DNA
(sec/alignment)
mRNA-toDNA
(sec/alignment)
sim4 0.013 0.170
Spidey 0.066 0.140
est_genome 0.640 3.400
Semi-global 0.670 5.170
![Page 18: Comparison of Genomic DNA to cDNA Alignment Methods](https://reader035.fdocuments.in/reader035/viewer/2022062220/558437bcd8b42a84368b46ea/html5/thumbnails/18.jpg)
Conclusions
Classic semi-globl algorithm produces good results– Running time is a problem, although it can be
improved
Sim4 produces the best results amont external softwares tested
![Page 19: Comparison of Genomic DNA to cDNA Alignment Methods](https://reader035.fdocuments.in/reader035/viewer/2022062220/558437bcd8b42a84368b46ea/html5/thumbnails/19.jpg)
Thanks