© Wiley Publishing. 2007. All Rights Reserved. Comparing Two Sequences.

22
© Wiley Publishing. 2007. All Rights Reserved. Comparing Two Sequences

Transcript of © Wiley Publishing. 2007. All Rights Reserved. Comparing Two Sequences.

Page 1: © Wiley Publishing. 2007. All Rights Reserved. Comparing Two Sequences.

© Wiley Publishing. 2007. All Rights Reserved.

Comparing Two Sequences

Page 2: © Wiley Publishing. 2007. All Rights Reserved. Comparing Two Sequences.

Learning Objectives

Get the basics about dot plotsKnow how to interpret the most common

patterns in a dot plotUse DotletUse Lalign to extract local alignments

Page 3: © Wiley Publishing. 2007. All Rights Reserved. Comparing Two Sequences.

Outline

Some reasons for comparing two sequencesBasic principles of dot-plot comparisonsUsing DotletMaking local alignments with Lalign

Page 4: © Wiley Publishing. 2007. All Rights Reserved. Comparing Two Sequences.

Why Compare Two Sequences?

Database searches are useful for finding homologues

Database searches don’t provide precise comparisons

More precise tools are needed to analyze the sequences in detail including• Dot plots for graphic analysis• Local or global alignments for residue/residue analysis

The alignment of two sequences is called a pairwise alignment

Page 5: © Wiley Publishing. 2007. All Rights Reserved. Comparing Two Sequences.

Using The Right Tool

Page 6: © Wiley Publishing. 2007. All Rights Reserved. Comparing Two Sequences.

Some Applications of Pairwise Alignments

Convince yourself two sequences are homologous

Identify a shared domain

Identify a duplicated region

Locate important features such as

• Catalytic domains • Disulphide bridges

Compare a gene and its product

Page 7: © Wiley Publishing. 2007. All Rights Reserved. Comparing Two Sequences.

What Is a Dot Plot ? A dot plot is a graphic representation of pairwise similarity

The simplicity of dot plots prevents artifacts

Ideal for looking for features that may come in different orders

Reveal complex patterns

Benefit from the most sophisticated statistical-analysis tool in the universe . . . your brain

Page 8: © Wiley Publishing. 2007. All Rights Reserved. Comparing Two Sequences.

Choosing Your Two Sequences

Making pairwise comparisons takes time

Use BLAST to rapidly select your sequences

• More than 70% identity for DNA• More than 25% identity for proteins

If your sequences are too similar, comparing them yields no useful information

Page 9: © Wiley Publishing. 2007. All Rights Reserved. Comparing Two Sequences.

Self-comparisons

Start comparing your sequence with itself

You can discover• Repeated domains• Motifs repeated many times (low complexity)• Mirror regions (palindromes) in nucleic acids

Page 10: © Wiley Publishing. 2007. All Rights Reserved. Comparing Two Sequences.

What Can You Analyze with a Dot Plot ?

Any pair of sequences• DNA• Proteins• RNA

DNA with proteins• Dotlet is an appropriate tool• To compare full genomes, install the program locally

Sequences longer than 1000 symbols are hard to analyze online

Page 11: © Wiley Publishing. 2007. All Rights Reserved. Comparing Two Sequences.

Some Typical Dot-plot Comparisons

Divergent sequences where only a segment is homologous Long insertions and deletions Tandem repeats

• The square shape of the pattern is characteristic of these repeats

Page 12: © Wiley Publishing. 2007. All Rights Reserved. Comparing Two Sequences.

Using Dotlet

Dotlet is one of the handiest tools for making dot

plotsDotlet is a Java appletOpen and download the applet at the following site:

• www.isrec.isb-sib.ch/java/dotlet

Use Firefox or IE (if one doesn’t work, use the other)

Page 13: © Wiley Publishing. 2007. All Rights Reserved. Comparing Two Sequences.

Set Dotlet Parameters Dotlet slides a window along

each sequence

If the windows are more similar than the threshold, Dotlet prints a dot at their intersection

You can control the similarity threshold with the little window on the left

Threshold

Window SizeWindow size

Threshold

Page 14: © Wiley Publishing. 2007. All Rights Reserved. Comparing Two Sequences.

The Dotlet Threshold Every dot has a score given by the window

comparison When the score is

• Below threshold 1 black dot• Between thresholds 1 and 2 grey dot• Above threshold 2 white dot

The blue curve is the distribution of scores in the sequences

The peak most common score, • Most common less informative

Log curve

Page 15: © Wiley Publishing. 2007. All Rights Reserved. Comparing Two Sequences.

Getting Your Dot Plot Right

Window size and the stringency control the aspect of your dot plot• Very stringent = clean dot plot, little signal• Not stringent enough = noisy dot plot, too much signal

Play with the threshold until a usable signal appears

Page 16: © Wiley Publishing. 2007. All Rights Reserved. Comparing Two Sequences.

Which Size for the Window?

Long window • Clean dot plots• Little sensitivity

Short window • Noisy dot plots• Very sensitive

The size of the window should be in the range of the elements you are looking for• Conserved domains: 50 amino acids• Transmembrane segments: 20 amino acids

Shorten the window to compare distantly related sequences

Page 17: © Wiley Publishing. 2007. All Rights Reserved. Comparing Two Sequences.

Looking at Repeated Domains with Dotlet

The square shape is typical

of tandem repeats

The repeats are not perfect

because the sequences

have diverged after their

duplication

Page 18: © Wiley Publishing. 2007. All Rights Reserved. Comparing Two Sequences.

Comparing a Gene and Its Product

Eukaryotic genes are transcribed

into RNA The RNA is then spliced to

remove the introns’ sequences

It may be necessary to compare

the gene and its product Dotlet makes this comparative

analysis easy

Page 19: © Wiley Publishing. 2007. All Rights Reserved. Comparing Two Sequences.

Aligning Sequences

Dotlet dot plots are a good way to provide an overview

Dot plots don’t provide residue/residue analysis

For this analysis you need an alignmentThe most convenient tool for making precise

local alignments is Lalign

Page 20: © Wiley Publishing. 2007. All Rights Reserved. Comparing Two Sequences.

Lalign and BLAST

Lalign is like a very precise BLAST It works on only two sequences at a time You must provide both sequences

Page 21: © Wiley Publishing. 2007. All Rights Reserved. Comparing Two Sequences.

Lalign Output

Lalign produces an output similar

to the alignment section of BLAST

The E-value indicates the

significance of each alignment

Low E-value good alignment

Page 22: © Wiley Publishing. 2007. All Rights Reserved. Comparing Two Sequences.

Going Farther

If you need to align coding DNA with a protein, try these sites: • www.tcoffee.org => protogene• coot.embl.de/pal2nal

If you need to align very large sequences, try this site:• www.ncbi.nlm.nih.gov/blast/bl2seq/wblast2.cgi

If you need a precise estimate of your alignment’s statistical significance, use PRSS• The program is available at fasta.bioch.virginia.edu