Post on 02-Jan-2016
Sequence-based Similarity Module(BLAST & CDD only )
&Horizontal Gene Transfer Module
(Ortholog Neighborhood & GC content only)
Phylogenetic tree of Phylogenetic tree of BacteriaBacteria
Recall: Planctomycetes are one of the GEBA genomes, representing an under-represented phylum within domain Bacteria
GEBA: Genomic Encyclopedia of Bacteria & Archaea
Insert Figure 1 from Handelsman (2004) Microbiol. Mol. Biol. Rev. 68: 669-685.
Recent phylogenetic analysis using 23S rRNA genesupports the monophyletic grouping and branch order
for these four bacterial phyla
Insert Figure 4A from Pilhofer et al. (2008)Characterization and Evolution of Cell Division and Cell Wall SynthesisGenes in the Bacterial Phyla Verrucomicrobia, Lentisphaerae, Chlamydiae,and Planctomycetes and Phylogenetic Comparison with rRNA Genes.J Bacteriology 190: 3192-3202.
http://www.ncbi.nlm.nih.gov/Taxonomy/Browser/wwwtax.cgi?id=126
• The Basic Local Alignment Search Tool (BLAST) finds regions of local similarity between two sequences.
• Conserved Domain Database Search (CDD) finds sequence similarity with genes in conserved orthologous groups (COGs).
Verifying Function Based onSequence Conservation
Different types of BLAST searches– blastp– blastn– blastx– tblastn– tblastx
http://www.ncbi.nlm.nih.gov/
>35% identity to experimentally characterized protein (especially in conserved regions) can be considered good evidence for function
E-value less than 10-3 is significant equal to or less than 10-15 may indicate good match
Be cautious of auto-annotated gene function – GenBank not a curated database
Beware!!!
Mindless BLAST – Similarity score and E-value do not tell whole story! Must also consider length of match (query coverage) & biological function (organismal context)
Follow this link from the lab notebook
BLAST:Altschul et al. (1997)Nucleic Acids Research 25: 3389-2402.
Genbank:Benson et al. (2006)Nucleic Acids Research 35: D21 – D25.
Retrieve query sequencefrom first module in
imgACT Lab Notebook
Copy amino acid sequencein FASTA format from
in imgACT Lab Notebook
Paste query sequenceinto box
“Click”
WHAT YOU SHOULD SEE. . . BLAST RESULTS
Scroll down
Accession IDAccession ID
Top significant Top significant hithit
Start with first hit. . .Click on Accession ID
NOTE: Top hit isfrom class organism;Do not include results
in P. limnophilusin lab notebook
Accession IDAccession ID
Next significant Next significant hithit
Click on Accession ID
NOTE: Function assignedby automatic Gene Caller(not experimentally verified)
Copy/paste thisinformation intoimgACT notebook
Reminder:Make sure you are in
EDIT mode whenmaking changes to imgACT notebook
and SAVE your workalong the way
Return to BLASTresults for thisinformation
““Click” on Bit Click” on Bit scorescore
Copy/paste into imgACT
notebook: Length of alignment Score Expect (E-value) Identities Positives Gaps Pair-wise alignment
between “Query” and “Sbjct” sequences.
Pair-wise alignmentPair-wise alignmentwith statisticswith statistics(including E-value)(including E-value)
Sequence length of database hit (not alignment length)Sequence length of database hit (not alignment length)
NOTE:You need to modifyyour notebook for
requested info(statistics
include E-value)
REPEAT procedure with second BLAST hit.
725
““Click” on Bit Click” on Bit scorescore
““Click” on Accession Click” on Accession IDID
Copy/paste requested information in lab notebook
733
CDD:CDD:Conserved Domain DatabaseConserved Domain Database
Bi-directional best hitin curated database
COG genes havesequence similarity &functional conservation
COG 1 – ion transportCOG 2 – energy productionCOG 3 – cell divisionetc.
Figure from Sanders-Lorenz and Miller (2010)
Return to top of BLAST Results page
CDD:Marchler-Bauer et al. (2006)Nucleic Acids Research 35: D237-D240.
“Click” on Conserved Domain image
“Click”
If there are no hits, write “no significant hits” in notebookIf there are hits, scroll down & click the ++ sign next to the top hit
Click here
Copy top COG hit and COG name into notebook Modify BOX to include length, bit score, and E-value
COG hit COG name
Length, bit score, and E-value
COGdescription
Change headings and enter COG information as shown for top hit
If obtain more than one significant hit, record this info for at least the top 2 hits
Hint: Look at Score & E-value
Retrieve fromGene Detail page
How do I return to the Gene Detail page for my proposed gene?
“Click” on URL saved for your geneduring first module (week 2)
Then what?
Keep the Gene Detail page openin separate tab while working onimgACT Lab Notebook modules
Scroll down
“Click” here onGene Detail
page
Change to 40
Note the red arrow corresponds to your gene Plus strand genes on top (right to left) Minus strand genes on bottom (right to left)
Is your gene a stand alone ORF or is it clustered with other geneson same DNA strand and in same orientation?
Could be evidence that your gene is part of an operon What are the functions of adjacent genes? Do they have related function?
How conserved is the gene neighborhood? Are there similar patterns in other organisms that contain a gene from same orthologous group?
If considerably different, may be evidence for HGT
Need to save individual panels
as JPEG or PNG files.Include P. limnophilus as
wellas 4-5 different organisms
in imgACT notebook.
“Click” here to insert
images into notebook
Delete ‘gene neighborhood images’ and place cursor in the
box
1- Click “Browse” to find image file.
2- Press “Attach” button. Thumbnail image should appear in window.
3- Repeat for each individual neighborhood panel until all are loadedin the window prompt.
4- Next, select one image at a time and press [OK] to insert them into imgACT notebook at cursor position.
NOTE: The images should beinserted in same order that
theorganisms were listed in
img/edu
Insert next image
Results: Ortholog Neighborhood
Scrolldown
Enter comments about homology & context:
Is your gene a stand alone ORF or is it clustered with other genes or same DNA strand and in same orientation?
Could be evidence that your gene is part of an operon What are the functions of adjacent genes? Do they have
related function?
How conserved is the gene neighborhood? Are there similar patterns in other organisms that contain
a gene from same orthologous group? If considerably different, may be evidence for HGT
Retrieve fromOrganism
Details page
Retrieve fromGene Detail page
On Gene Detail page, you will find the GC content for your
gene.
To find GC content for the entire P. limnophilus genome, select “Find Genomes” tab from the Gene Detail page.
Search for Planctomyces limnophilus
and click on the corresponding hyperlink.
Scrolldown
WHAT YOU SHOULD SEE. . .
GC content will be listed under Genome
Statistics.
NOTE: A gene with a GC content that is morethan a few percentage points above or below thethe average GC content in the genome may haveoriginated from another organism by HGT. Add acomment box & make note of this if your genemeets this criterion.