Analyses of ORFans in microbial and viral genomes
description
Transcript of Analyses of ORFans in microbial and viral genomes
![Page 1: Analyses of ORFans in microbial and viral genomes](https://reader035.fdocuments.in/reader035/viewer/2022062408/5681451d550346895db1df8b/html5/thumbnails/1.jpg)
Analyses of ORFans in microbial and viral genomes
Journal club presentation on Mar. 14
Albert Yu
![Page 2: Analyses of ORFans in microbial and viral genomes](https://reader035.fdocuments.in/reader035/viewer/2022062408/5681451d550346895db1df8b/html5/thumbnails/2.jpg)
ORFan
Defenition: an ORF with no detectable sequence similarity to other ORFs in the database considered
Nearly all genomes have ORFans (df %)
The more genomes sequenced, the more ORFans have found
Most are annotated as hypothetical proteins of unknown function (no exp.)
![Page 3: Analyses of ORFans in microbial and viral genomes](https://reader035.fdocuments.in/reader035/viewer/2022062408/5681451d550346895db1df8b/html5/thumbnails/3.jpg)
ORFan continue
More data…
real , functional proteins
3D nstructure
conserved in closely related species (Ka/Ks)
Origin of ORFans ????????
Viral genome Microbial genome?
Viral laterally transferred genes (especially phages)
![Page 4: Analyses of ORFans in microbial and viral genomes](https://reader035.fdocuments.in/reader035/viewer/2022062408/5681451d550346895db1df8b/html5/thumbnails/4.jpg)
Viral genome Microbial genome
![Page 5: Analyses of ORFans in microbial and viral genomes](https://reader035.fdocuments.in/reader035/viewer/2022062408/5681451d550346895db1df8b/html5/thumbnails/5.jpg)
Question: the origin of ORFans
Test hypothesis: ORFans have been acquired through lateral gene transfer from viruses
To find homologs to these microbial ORFans within the virus sequence database
![Page 6: Analyses of ORFans in microbial and viral genomes](https://reader035.fdocuments.in/reader035/viewer/2022062408/5681451d550346895db1df8b/html5/thumbnails/6.jpg)
Genome-wide quantitative study
• BLASTP
• 277 microbial genomes
• 1456 viral genomes
• H(g): the number of genomes having at least one homolog of ORFan g
• U(g): uniqueness: the genomic distance between the genomes with ORFan g
![Page 7: Analyses of ORFans in microbial and viral genomes](https://reader035.fdocuments.in/reader035/viewer/2022062408/5681451d550346895db1df8b/html5/thumbnails/7.jpg)
Classification of ORFans
• Singleton: without any homolog wherever
H=1, BLASTP=1
• Paralogous: homologs in the same genome
H=1, BLASTP>1
• Orthologous: homologs within very closely related microbial genome
H>1, U <= 0.1(by observations)
![Page 8: Analyses of ORFans in microbial and viral genomes](https://reader035.fdocuments.in/reader035/viewer/2022062408/5681451d550346895db1df8b/html5/thumbnails/8.jpg)
The U-value for all ORFs in prokaryote genomes
In total:
ORFs: 818906
ORFans: 110186
S: 64324(7.8%)
P: 10419(1.3%)
O: 35443(4.3%)
0.64
S or p
O
![Page 9: Analyses of ORFans in microbial and viral genomes](https://reader035.fdocuments.in/reader035/viewer/2022062408/5681451d550346895db1df8b/html5/thumbnails/9.jpg)
• ORFans-VH%(OVH): % of ORFans having homologs in viruses (0% ~ 63.8%)
• Non-ORFans-VH%(NOVH): % of non-ORFans having homologs in viruses (4.1% ~ 18.2%)
• The strength of the hypothesis = the value between these two VH%
![Page 10: Analyses of ORFans in microbial and viral genomes](https://reader035.fdocuments.in/reader035/viewer/2022062408/5681451d550346895db1df8b/html5/thumbnails/10.jpg)
Percentages of microbial ORFs with homologs in viruses
Red: OVH
Blue: NOVH24 phylogenetic clades
Bacteria
Archea
Firmicutes
Gamma proteobacteria
![Page 11: Analyses of ORFans in microbial and viral genomes](https://reader035.fdocuments.in/reader035/viewer/2022062408/5681451d550346895db1df8b/html5/thumbnails/11.jpg)
The average % of OVH and NOVH in various groups
148
66
6310% vs 9 %
8.5% vs 2.7 %
6.6% vs 0.8 %
![Page 12: Analyses of ORFans in microbial and viral genomes](https://reader035.fdocuments.in/reader035/viewer/2022062408/5681451d550346895db1df8b/html5/thumbnails/12.jpg)
Conclusion
• Most OVH << NOVH: current evidence supporting the hypothesis is weak
• Firmicutes and Gamma-proteobacteria have the highest number of homologs in viruses (viral database is biased)
Viral database bias
1456 viruses
280 phages (109--Gamma; 102--Firmicutes; 69--others)
Sampling ?????
![Page 13: Analyses of ORFans in microbial and viral genomes](https://reader035.fdocuments.in/reader035/viewer/2022062408/5681451d550346895db1df8b/html5/thumbnails/13.jpg)
Viral genome Microbial genome
![Page 14: Analyses of ORFans in microbial and viral genomes](https://reader035.fdocuments.in/reader035/viewer/2022062408/5681451d550346895db1df8b/html5/thumbnails/14.jpg)
• 277 Microbial genomes• 1456 viruses
All-virus-DB: 43566 ORFs• 280 phages (20%)
Phage-DB: 18368 ORFs (42%)ORFans:
all-virus: 13078(30%) (v.s. all-virus-DB) 8200 (v.s. all nr, env-nr)
all-phage: 6765 (v.s. all-virus-DB) 7047 (v.s. phage-DB)
![Page 15: Analyses of ORFans in microbial and viral genomes](https://reader035.fdocuments.in/reader035/viewer/2022062408/5681451d550346895db1df8b/html5/thumbnails/15.jpg)
Some characteristics of ORFans
• Bacterial ORFans are shorter than non-ORFans on average
• Bacterial ORFans have significant lower GC3 content than non-ORFans
![Page 16: Analyses of ORFans in microbial and viral genomes](https://reader035.fdocuments.in/reader035/viewer/2022062408/5681451d550346895db1df8b/html5/thumbnails/16.jpg)
The length of Viral ORFans and non-ORFans
Length: Non-ORFans > ORFans
![Page 17: Analyses of ORFans in microbial and viral genomes](https://reader035.fdocuments.in/reader035/viewer/2022062408/5681451d550346895db1df8b/html5/thumbnails/17.jpg)
Length: ORFans < non-ORFans
GC3%: ORFans < non-ORFans
![Page 18: Analyses of ORFans in microbial and viral genomes](https://reader035.fdocuments.in/reader035/viewer/2022062408/5681451d550346895db1df8b/html5/thumbnails/18.jpg)
The number of ORFs per genome in 1456 viruses
Focusing on phage: higher %
![Page 19: Analyses of ORFans in microbial and viral genomes](https://reader035.fdocuments.in/reader035/viewer/2022062408/5681451d550346895db1df8b/html5/thumbnails/19.jpg)
The growing of the number of phage ORFans (consistent)
Drop to 0 ?
Keep increasing
38.4%
![Page 20: Analyses of ORFans in microbial and viral genomes](https://reader035.fdocuments.in/reader035/viewer/2022062408/5681451d550346895db1df8b/html5/thumbnails/20.jpg)
• Each microbial species is a host for at least 10 phage species --- the phage diversity is at least 10 times higher than microbial diversity
• Only 280 phage genomes in database (low phage sampling)
![Page 21: Analyses of ORFans in microbial and viral genomes](https://reader035.fdocuments.in/reader035/viewer/2022062408/5681451d550346895db1df8b/html5/thumbnails/21.jpg)
Less than 5 phages
Virus sampling bias between and within groups
![Page 22: Analyses of ORFans in microbial and viral genomes](https://reader035.fdocuments.in/reader035/viewer/2022062408/5681451d550346895db1df8b/html5/thumbnails/22.jpg)
The H-value percentages for all phage ORFs and prokaryotic ORFs
prokaryotesphages
9.1% - ORFans
11.3% - ortho
38.4% - ORFans
32.4% - ortho
![Page 23: Analyses of ORFans in microbial and viral genomes](https://reader035.fdocuments.in/reader035/viewer/2022062408/5681451d550346895db1df8b/html5/thumbnails/23.jpg)
the H-value percentages of phage ORFs
![Page 24: Analyses of ORFans in microbial and viral genomes](https://reader035.fdocuments.in/reader035/viewer/2022062408/5681451d550346895db1df8b/html5/thumbnails/24.jpg)
• 4397(61.5%) / 7150(63.8%) / 11212 (prophage/ prokaryotic homologs/ phage non-ORFans)
• 589(44.7%) / 1317(18.7%) / 7047 (prophage/ prokaryotic homologs/ phage ORFans)
• 4987(58.9%)/8467(46.4%)/18248 (prophage/ prokaryotic homologs/ phage ORFs)