Sequence-based Similarity Module (BLAST & CDD only ) & Horizontal Gene Transfer Module (Ortholog...

Post on 02-Jan-2016

217 views 0 download

Tags:

Transcript of Sequence-based Similarity Module (BLAST & CDD only ) & Horizontal Gene Transfer Module (Ortholog...

Sequence-based Similarity Module(BLAST & CDD only )

&Horizontal Gene Transfer Module

(Ortholog Neighborhood & GC content only)

Phylogenetic tree of Phylogenetic tree of BacteriaBacteria

Recall: Planctomycetes are one of the GEBA genomes, representing an under-represented phylum within domain Bacteria

GEBA: Genomic Encyclopedia of Bacteria & Archaea

Insert Figure 1 from Handelsman (2004) Microbiol. Mol. Biol. Rev. 68: 669-685.

Recent phylogenetic analysis using 23S rRNA genesupports the monophyletic grouping and branch order

for these four bacterial phyla

Insert Figure 4A from Pilhofer et al. (2008)Characterization and Evolution of Cell Division and Cell Wall SynthesisGenes in the Bacterial Phyla Verrucomicrobia, Lentisphaerae, Chlamydiae,and Planctomycetes and Phylogenetic Comparison with rRNA Genes.J Bacteriology 190: 3192-3202.

• The Basic Local Alignment Search Tool (BLAST) finds regions of local similarity between two sequences.

• Conserved Domain Database Search (CDD) finds sequence similarity with genes in conserved orthologous groups (COGs).

Verifying Function Based onSequence Conservation

Different types of BLAST searches– blastp– blastn– blastx– tblastn– tblastx

http://www.ncbi.nlm.nih.gov/

>35% identity to experimentally characterized protein (especially in conserved regions) can be considered good evidence for function

E-value less than 10-3 is significant equal to or less than 10-15 may indicate good match

Be cautious of auto-annotated gene function – GenBank not a curated database

Beware!!!

Mindless BLAST – Similarity score and E-value do not tell whole story! Must also consider length of match (query coverage) & biological function (organismal context)

Follow this link from the lab notebook

BLAST:Altschul et al. (1997)Nucleic Acids Research 25: 3389-2402.

Genbank:Benson et al. (2006)Nucleic Acids Research 35: D21 – D25.

Retrieve query sequencefrom first module in

imgACT Lab Notebook

Copy amino acid sequencein FASTA format from

in imgACT Lab Notebook

Paste query sequenceinto box

“Click”

WHAT YOU SHOULD SEE. . . BLAST RESULTS

Scroll down

Accession IDAccession ID

Top significant Top significant hithit

Start with first hit. . .Click on Accession ID

NOTE: Top hit isfrom class organism;Do not include results

in P. limnophilusin lab notebook

Accession IDAccession ID

Next significant Next significant hithit

Click on Accession ID

NOTE: Function assignedby automatic Gene Caller(not experimentally verified)

Copy/paste thisinformation intoimgACT notebook

Reminder:Make sure you are in

EDIT mode whenmaking changes to imgACT notebook

and SAVE your workalong the way

Return to BLASTresults for thisinformation

““Click” on Bit Click” on Bit scorescore

Copy/paste into imgACT

notebook: Length of alignment Score Expect (E-value) Identities Positives Gaps Pair-wise alignment

between “Query” and “Sbjct” sequences.

Pair-wise alignmentPair-wise alignmentwith statisticswith statistics(including E-value)(including E-value)

Sequence length of database hit (not alignment length)Sequence length of database hit (not alignment length)

NOTE:You need to modifyyour notebook for

requested info(statistics

include E-value)

REPEAT procedure with second BLAST hit.

725

““Click” on Bit Click” on Bit scorescore

““Click” on Accession Click” on Accession IDID

Copy/paste requested information in lab notebook

733

CDD:CDD:Conserved Domain DatabaseConserved Domain Database

Bi-directional best hitin curated database

COG genes havesequence similarity &functional conservation

COG 1 – ion transportCOG 2 – energy productionCOG 3 – cell divisionetc.

Figure from Sanders-Lorenz and Miller (2010)

Return to top of BLAST Results page

CDD:Marchler-Bauer et al. (2006)Nucleic Acids Research 35: D237-D240.

“Click” on Conserved Domain image

“Click”

If there are no hits, write “no significant hits” in notebookIf there are hits, scroll down & click the ++ sign next to the top hit

Click here

Copy top COG hit and COG name into notebook Modify BOX to include length, bit score, and E-value

COG hit COG name

Length, bit score, and E-value

COGdescription

Change headings and enter COG information as shown for top hit

If obtain more than one significant hit, record this info for at least the top 2 hits

Hint: Look at Score & E-value

Retrieve fromGene Detail page

How do I return to the Gene Detail page for my proposed gene?

“Click” on URL saved for your geneduring first module (week 2)

Then what?

Keep the Gene Detail page openin separate tab while working onimgACT Lab Notebook modules

Scroll down

“Click” here onGene Detail

page

Change to 40

Note the red arrow corresponds to your gene Plus strand genes on top (right to left) Minus strand genes on bottom (right to left)

Is your gene a stand alone ORF or is it clustered with other geneson same DNA strand and in same orientation?

Could be evidence that your gene is part of an operon What are the functions of adjacent genes? Do they have related function?

How conserved is the gene neighborhood? Are there similar patterns in other organisms that contain a gene from same orthologous group?

If considerably different, may be evidence for HGT

Need to save individual panels

as JPEG or PNG files.Include P. limnophilus as

wellas 4-5 different organisms

in imgACT notebook.

“Click” here to insert

images into notebook

Delete ‘gene neighborhood images’ and place cursor in the

box

1- Click “Browse” to find image file.

2- Press “Attach” button. Thumbnail image should appear in window.

3- Repeat for each individual neighborhood panel until all are loadedin the window prompt.

4- Next, select one image at a time and press [OK] to insert them into imgACT notebook at cursor position.

NOTE: The images should beinserted in same order that

theorganisms were listed in

img/edu

Insert next image

Results: Ortholog Neighborhood

Scrolldown

Enter comments about homology & context:

Is your gene a stand alone ORF or is it clustered with other genes or same DNA strand and in same orientation?

Could be evidence that your gene is part of an operon What are the functions of adjacent genes? Do they have

related function?

How conserved is the gene neighborhood? Are there similar patterns in other organisms that contain

a gene from same orthologous group? If considerably different, may be evidence for HGT

Retrieve fromOrganism

Details page

Retrieve fromGene Detail page

On Gene Detail page, you will find the GC content for your

gene.

To find GC content for the entire P. limnophilus genome, select “Find Genomes” tab from the Gene Detail page.

Search for Planctomyces limnophilus

and click on the corresponding hyperlink.

Scrolldown

WHAT YOU SHOULD SEE. . .

GC content will be listed under Genome

Statistics.

NOTE: A gene with a GC content that is morethan a few percentage points above or below thethe average GC content in the genome may haveoriginated from another organism by HGT. Add acomment box & make note of this if your genemeets this criterion.