Sequence-based Similarity Module (BLAST & CDD only ) & Horizontal Gene Transfer Module (Ortholog...

46
Sequence-based Similarity Module (BLAST & CDD only ) & Horizontal Gene Transfer Module (Ortholog Neighborhood & GC content only)

Transcript of Sequence-based Similarity Module (BLAST & CDD only ) & Horizontal Gene Transfer Module (Ortholog...

Page 1: Sequence-based Similarity Module (BLAST & CDD only ) & Horizontal Gene Transfer Module (Ortholog Neighborhood & GC content only)

Sequence-based Similarity Module(BLAST & CDD only )

&Horizontal Gene Transfer Module

(Ortholog Neighborhood & GC content only)

Page 2: Sequence-based Similarity Module (BLAST & CDD only ) & Horizontal Gene Transfer Module (Ortholog Neighborhood & GC content only)

Phylogenetic tree of Phylogenetic tree of BacteriaBacteria

Recall: Planctomycetes are one of the GEBA genomes, representing an under-represented phylum within domain Bacteria

GEBA: Genomic Encyclopedia of Bacteria & Archaea

Insert Figure 1 from Handelsman (2004) Microbiol. Mol. Biol. Rev. 68: 669-685.

Page 3: Sequence-based Similarity Module (BLAST & CDD only ) & Horizontal Gene Transfer Module (Ortholog Neighborhood & GC content only)

Recent phylogenetic analysis using 23S rRNA genesupports the monophyletic grouping and branch order

for these four bacterial phyla

Insert Figure 4A from Pilhofer et al. (2008)Characterization and Evolution of Cell Division and Cell Wall SynthesisGenes in the Bacterial Phyla Verrucomicrobia, Lentisphaerae, Chlamydiae,and Planctomycetes and Phylogenetic Comparison with rRNA Genes.J Bacteriology 190: 3192-3202.

Page 5: Sequence-based Similarity Module (BLAST & CDD only ) & Horizontal Gene Transfer Module (Ortholog Neighborhood & GC content only)

• The Basic Local Alignment Search Tool (BLAST) finds regions of local similarity between two sequences.

• Conserved Domain Database Search (CDD) finds sequence similarity with genes in conserved orthologous groups (COGs).

Page 6: Sequence-based Similarity Module (BLAST & CDD only ) & Horizontal Gene Transfer Module (Ortholog Neighborhood & GC content only)

Verifying Function Based onSequence Conservation

Different types of BLAST searches– blastp– blastn– blastx– tblastn– tblastx

http://www.ncbi.nlm.nih.gov/

>35% identity to experimentally characterized protein (especially in conserved regions) can be considered good evidence for function

E-value less than 10-3 is significant equal to or less than 10-15 may indicate good match

Be cautious of auto-annotated gene function – GenBank not a curated database

Beware!!!

Mindless BLAST – Similarity score and E-value do not tell whole story! Must also consider length of match (query coverage) & biological function (organismal context)

Page 7: Sequence-based Similarity Module (BLAST & CDD only ) & Horizontal Gene Transfer Module (Ortholog Neighborhood & GC content only)

Follow this link from the lab notebook

BLAST:Altschul et al. (1997)Nucleic Acids Research 25: 3389-2402.

Genbank:Benson et al. (2006)Nucleic Acids Research 35: D21 – D25.

Page 8: Sequence-based Similarity Module (BLAST & CDD only ) & Horizontal Gene Transfer Module (Ortholog Neighborhood & GC content only)
Page 9: Sequence-based Similarity Module (BLAST & CDD only ) & Horizontal Gene Transfer Module (Ortholog Neighborhood & GC content only)

Retrieve query sequencefrom first module in

imgACT Lab Notebook

Page 10: Sequence-based Similarity Module (BLAST & CDD only ) & Horizontal Gene Transfer Module (Ortholog Neighborhood & GC content only)

Copy amino acid sequencein FASTA format from

in imgACT Lab Notebook

Page 11: Sequence-based Similarity Module (BLAST & CDD only ) & Horizontal Gene Transfer Module (Ortholog Neighborhood & GC content only)

Paste query sequenceinto box

“Click”

Page 12: Sequence-based Similarity Module (BLAST & CDD only ) & Horizontal Gene Transfer Module (Ortholog Neighborhood & GC content only)

WHAT YOU SHOULD SEE. . . BLAST RESULTS

Scroll down

Page 13: Sequence-based Similarity Module (BLAST & CDD only ) & Horizontal Gene Transfer Module (Ortholog Neighborhood & GC content only)

Accession IDAccession ID

Top significant Top significant hithit

Start with first hit. . .Click on Accession ID

Page 14: Sequence-based Similarity Module (BLAST & CDD only ) & Horizontal Gene Transfer Module (Ortholog Neighborhood & GC content only)

NOTE: Top hit isfrom class organism;Do not include results

in P. limnophilusin lab notebook

Page 15: Sequence-based Similarity Module (BLAST & CDD only ) & Horizontal Gene Transfer Module (Ortholog Neighborhood & GC content only)

Accession IDAccession ID

Next significant Next significant hithit

Click on Accession ID

Page 16: Sequence-based Similarity Module (BLAST & CDD only ) & Horizontal Gene Transfer Module (Ortholog Neighborhood & GC content only)

NOTE: Function assignedby automatic Gene Caller(not experimentally verified)

Copy/paste thisinformation intoimgACT notebook

Page 17: Sequence-based Similarity Module (BLAST & CDD only ) & Horizontal Gene Transfer Module (Ortholog Neighborhood & GC content only)

Reminder:Make sure you are in

EDIT mode whenmaking changes to imgACT notebook

and SAVE your workalong the way

Return to BLASTresults for thisinformation

Page 18: Sequence-based Similarity Module (BLAST & CDD only ) & Horizontal Gene Transfer Module (Ortholog Neighborhood & GC content only)

““Click” on Bit Click” on Bit scorescore

Page 19: Sequence-based Similarity Module (BLAST & CDD only ) & Horizontal Gene Transfer Module (Ortholog Neighborhood & GC content only)

Copy/paste into imgACT

notebook: Length of alignment Score Expect (E-value) Identities Positives Gaps Pair-wise alignment

between “Query” and “Sbjct” sequences.

Pair-wise alignmentPair-wise alignmentwith statisticswith statistics(including E-value)(including E-value)

Sequence length of database hit (not alignment length)Sequence length of database hit (not alignment length)

Page 20: Sequence-based Similarity Module (BLAST & CDD only ) & Horizontal Gene Transfer Module (Ortholog Neighborhood & GC content only)

NOTE:You need to modifyyour notebook for

requested info(statistics

include E-value)

REPEAT procedure with second BLAST hit.

725

Page 21: Sequence-based Similarity Module (BLAST & CDD only ) & Horizontal Gene Transfer Module (Ortholog Neighborhood & GC content only)

““Click” on Bit Click” on Bit scorescore

““Click” on Accession Click” on Accession IDID

Copy/paste requested information in lab notebook

733

Page 22: Sequence-based Similarity Module (BLAST & CDD only ) & Horizontal Gene Transfer Module (Ortholog Neighborhood & GC content only)

CDD:CDD:Conserved Domain DatabaseConserved Domain Database

Bi-directional best hitin curated database

COG genes havesequence similarity &functional conservation

COG 1 – ion transportCOG 2 – energy productionCOG 3 – cell divisionetc.

Figure from Sanders-Lorenz and Miller (2010)

Page 23: Sequence-based Similarity Module (BLAST & CDD only ) & Horizontal Gene Transfer Module (Ortholog Neighborhood & GC content only)

Return to top of BLAST Results page

CDD:Marchler-Bauer et al. (2006)Nucleic Acids Research 35: D237-D240.

Page 24: Sequence-based Similarity Module (BLAST & CDD only ) & Horizontal Gene Transfer Module (Ortholog Neighborhood & GC content only)

“Click” on Conserved Domain image

“Click”

Page 25: Sequence-based Similarity Module (BLAST & CDD only ) & Horizontal Gene Transfer Module (Ortholog Neighborhood & GC content only)

If there are no hits, write “no significant hits” in notebookIf there are hits, scroll down & click the ++ sign next to the top hit

Click here

Page 26: Sequence-based Similarity Module (BLAST & CDD only ) & Horizontal Gene Transfer Module (Ortholog Neighborhood & GC content only)

Copy top COG hit and COG name into notebook Modify BOX to include length, bit score, and E-value

COG hit COG name

Length, bit score, and E-value

COGdescription

Page 27: Sequence-based Similarity Module (BLAST & CDD only ) & Horizontal Gene Transfer Module (Ortholog Neighborhood & GC content only)

Change headings and enter COG information as shown for top hit

If obtain more than one significant hit, record this info for at least the top 2 hits

Hint: Look at Score & E-value

Page 28: Sequence-based Similarity Module (BLAST & CDD only ) & Horizontal Gene Transfer Module (Ortholog Neighborhood & GC content only)

Retrieve fromGene Detail page

Page 29: Sequence-based Similarity Module (BLAST & CDD only ) & Horizontal Gene Transfer Module (Ortholog Neighborhood & GC content only)

How do I return to the Gene Detail page for my proposed gene?

“Click” on URL saved for your geneduring first module (week 2)

Page 30: Sequence-based Similarity Module (BLAST & CDD only ) & Horizontal Gene Transfer Module (Ortholog Neighborhood & GC content only)

Then what?

Keep the Gene Detail page openin separate tab while working onimgACT Lab Notebook modules

Scroll down

Page 31: Sequence-based Similarity Module (BLAST & CDD only ) & Horizontal Gene Transfer Module (Ortholog Neighborhood & GC content only)

“Click” here onGene Detail

page

Page 32: Sequence-based Similarity Module (BLAST & CDD only ) & Horizontal Gene Transfer Module (Ortholog Neighborhood & GC content only)

Change to 40

Page 33: Sequence-based Similarity Module (BLAST & CDD only ) & Horizontal Gene Transfer Module (Ortholog Neighborhood & GC content only)

Note the red arrow corresponds to your gene Plus strand genes on top (right to left) Minus strand genes on bottom (right to left)

Is your gene a stand alone ORF or is it clustered with other geneson same DNA strand and in same orientation?

Could be evidence that your gene is part of an operon What are the functions of adjacent genes? Do they have related function?

How conserved is the gene neighborhood? Are there similar patterns in other organisms that contain a gene from same orthologous group?

If considerably different, may be evidence for HGT

Page 34: Sequence-based Similarity Module (BLAST & CDD only ) & Horizontal Gene Transfer Module (Ortholog Neighborhood & GC content only)

Need to save individual panels

as JPEG or PNG files.Include P. limnophilus as

wellas 4-5 different organisms

in imgACT notebook.

Page 35: Sequence-based Similarity Module (BLAST & CDD only ) & Horizontal Gene Transfer Module (Ortholog Neighborhood & GC content only)

“Click” here to insert

images into notebook

Delete ‘gene neighborhood images’ and place cursor in the

box

Page 36: Sequence-based Similarity Module (BLAST & CDD only ) & Horizontal Gene Transfer Module (Ortholog Neighborhood & GC content only)

1- Click “Browse” to find image file.

2- Press “Attach” button. Thumbnail image should appear in window.

3- Repeat for each individual neighborhood panel until all are loadedin the window prompt.

Page 37: Sequence-based Similarity Module (BLAST & CDD only ) & Horizontal Gene Transfer Module (Ortholog Neighborhood & GC content only)

4- Next, select one image at a time and press [OK] to insert them into imgACT notebook at cursor position.

NOTE: The images should beinserted in same order that

theorganisms were listed in

img/edu

Insert next image

Page 38: Sequence-based Similarity Module (BLAST & CDD only ) & Horizontal Gene Transfer Module (Ortholog Neighborhood & GC content only)

Results: Ortholog Neighborhood

Scrolldown

Page 39: Sequence-based Similarity Module (BLAST & CDD only ) & Horizontal Gene Transfer Module (Ortholog Neighborhood & GC content only)

Enter comments about homology & context:

Is your gene a stand alone ORF or is it clustered with other genes or same DNA strand and in same orientation?

Could be evidence that your gene is part of an operon What are the functions of adjacent genes? Do they have

related function?

How conserved is the gene neighborhood? Are there similar patterns in other organisms that contain

a gene from same orthologous group? If considerably different, may be evidence for HGT

Page 40: Sequence-based Similarity Module (BLAST & CDD only ) & Horizontal Gene Transfer Module (Ortholog Neighborhood & GC content only)

Retrieve fromOrganism

Details page

Retrieve fromGene Detail page

Page 41: Sequence-based Similarity Module (BLAST & CDD only ) & Horizontal Gene Transfer Module (Ortholog Neighborhood & GC content only)

On Gene Detail page, you will find the GC content for your

gene.

Page 42: Sequence-based Similarity Module (BLAST & CDD only ) & Horizontal Gene Transfer Module (Ortholog Neighborhood & GC content only)

To find GC content for the entire P. limnophilus genome, select “Find Genomes” tab from the Gene Detail page.

Page 43: Sequence-based Similarity Module (BLAST & CDD only ) & Horizontal Gene Transfer Module (Ortholog Neighborhood & GC content only)

Search for Planctomyces limnophilus

and click on the corresponding hyperlink.

Page 44: Sequence-based Similarity Module (BLAST & CDD only ) & Horizontal Gene Transfer Module (Ortholog Neighborhood & GC content only)

Scrolldown

WHAT YOU SHOULD SEE. . .

Page 45: Sequence-based Similarity Module (BLAST & CDD only ) & Horizontal Gene Transfer Module (Ortholog Neighborhood & GC content only)

GC content will be listed under Genome

Statistics.

Page 46: Sequence-based Similarity Module (BLAST & CDD only ) & Horizontal Gene Transfer Module (Ortholog Neighborhood & GC content only)

NOTE: A gene with a GC content that is morethan a few percentage points above or below thethe average GC content in the genome may haveoriginated from another organism by HGT. Add acomment box & make note of this if your genemeets this criterion.