Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely...
-
Upload
curtis-oconnor -
Category
Documents
-
view
214 -
download
0
Transcript of Why Manual Genome Annotation? Even the best gene predictors and genome annotation pipelines rarely...
Why Manual Genome Annotation?
Even the best gene predictors and genome annotation pipelines rarely exceed accuracies of 80% at the exon level, meaning that most gene annotations contain at least one mis-annotated exon. (Yandell and Ence, 2012, Nature Reviews)
Automated annotation is often not good enough for genes you really care about!
Yandell and Ence, 2012, Nature Reviewshttp://www.yandell-lab.org/publications/pdf/euk_genome_annotation_review.pdf
Different lines of evidence go into modern gene annotation pipelines:1. Computational prediction (Open Reading Frames, etc.)2. Evidence based prediction (ESTs, RNA-seq, etc)3. Homology based prediction (BLAST, etc)Synthesized into a consensus gene annotation – still may be wrong!
Bees(Order Hymenoptera, Family Apidae)
Western Honey Bee (Apis mellifera)
Common Eastern Bumble Bee (Bombus impatiens)
Buff-Tailed Bumble Bee (Bombus terrestris) Dwarf Asian Honey Bee
(Apis florea)
NADPH + H+ + O2 + R-H NADP+ + H2O + R-OH
cytochrome P450 monooxygenase enzymes
classification: CYP 3 A 4
family>40% amino acid sequence-homology
sub-family>55% amino acid sequence-homology
isoenzyme
*15 A-B
allele
Chemical signalling??? (pheromone synthesis and breakdown)
Detoxication(toxin and pesticide metabolism)
Hormone synthesis (highly conserved orthologs)+ Detoxication
Organism P450s food / environment
Nasonia vitripennis 92 f ly pupae
Apis mellifera 46 nectar and pollen / homeostatic nest
Anopheles gambiae 106 blood and detritus / standing water
Drosophila melanogaster 85 rotting fruit
Tribolium castaneum 131 seeds
Organism P450s Mito CYP2 CYP3 CYP4
Drosophila melanogaster 85 11 6 36 32
Apis mellifera 46 6 8 28 4
Nasonia vitripennis 87 6 7 45 29
Repeats
Intron splice sites are highly conserved
P450s:~ 500 amino acids (1500 nucleotides)Highly conserved heme-binding site (cysteine)
Basic Annotation Rules
CDS StartAmino acid MNucleotide ATG
CDS Stop * Amino AcidTAA/TAG/TAG Nucleotide
Translation Frames
Frame 1Frame 2Frame 3
http://en.wikipedia.org/wiki/File:Exon_and_Intron_classes.png
http://doc.goldenhelix.com/SVS/latest/_images/splice_site_diagram.png
Intron splice sites
GT-AG
“(\w)”
“\1 “
‘GT’ intron donor site
‘AG’ intron acceptor site
‘GT’ intron donor site
1 nucelotide “G” for next codon = Phase 1 intron
‘AG’ intron acceptor site
2 nucelotides “AA” before first full codon
Combine with “G” on exon 2
Make the codon “GAA” for glutamic acid (E)
This start looks good!
Jamboree!Search for paralogs using one of these genes from Apis mellifera in the protein database on Genbank (e.g. CYP9R1 AND Apis mellifera)
CYP9R1 CYP6AS3CYP6BD1CYP6AQ1CYP4G11
Use BLASTP to find predicted paralogs in the NCBI “nr” database. Select one of the following bees for the Organism:
Apis floreaBombus impatiensBombus terrestrisMegachile rotundata
Copy and paste verified amino acid sequences (FASTA formatted) into a text file: