Sequence comparisons

24
Sequence comparisons April 9, 2002 Review homework Learning objectives-Review amino acids. Understand difference between identity, similarity and homology. Understand difference between global alignment and local alignment. Workshop-Perform sliding window to compare two sequences Homework #3 due on Thurs.

description

Sequence comparisons. April 9, 2002 Review homework Learning objectives-Review amino acids. Understand difference between identity, similarity and homology. Understand difference between global alignment and local alignment. Workshop-Perform sliding window to compare two sequences - PowerPoint PPT Presentation

Transcript of Sequence comparisons

Page 1: Sequence comparisons

Sequence comparisonsApril 9, 2002Review homeworkLearning objectives-Review amino acids. Understand difference between identity, similarity and homology. Understand difference between global alignment and local alignment.Workshop-Perform sliding window to compare two sequences Homework #3 due on Thurs.

Page 2: Sequence comparisons

Amino acid characteristics

Page 3: Sequence comparisons
Page 4: Sequence comparisons

Review of amino acid characteristics

http://info.bio.cmu.edu/Courses/BiochemMols/AAViewer/AAVFrameset.htm

http://info.bio.cmu.edu/Courses/BiochemMols/BCMolecules.html

Page 5: Sequence comparisons

Purpose of finding differences and similarities of amino acids.

Infer structural information

Infer functional information

Infer evolutionary relationships

Page 6: Sequence comparisons

Evolutionary Basis of Sequence Alignment

1. Similarity: Quantity that relates how much two amino acid sequences are alike.2. Identity: Quantity that describes how muchtwo sequences are alike in the strictest terms.3. Homology: a conclusion drawn from datasuggesting that two genes share a commonevolutionary history.

Page 7: Sequence comparisons

Evolutionary Basis of Sequence Alignment (Cont. 1)

1. Example: Shown on the next page is a pairwise alignment of two proteins. One is mouse trypsin and the other is crayfish trypsin. They are homologous proteins. The sequences share 41% identity.

2. Underlined residues are identical. Asterisks and diamond represent those residues that participate in catalysis. Five gaps are placed to optimize the alignment.

Page 8: Sequence comparisons
Page 9: Sequence comparisons

Evolutionary Basis of Sequence Alignment (Cont. 2)

Why are there regions of identity?

1) Conserved function-residues participate in reaction.

2) Structural (For example, conserved cysteine residues that form a disulfide linkage) 3) Historical-Residues that are conserved solely due to a

common ancestor gene.

Page 10: Sequence comparisons

Evolutionary Basis of Sequence Alignment (Cont. 3)

Note: it is possible that two proteins share a high degree of similarity but have two different functions. For example, human gamma-crystallin is a lens protein that has no knownenzymatic activity. It shares a high percentage of identity withE. coli quinone oxidoreductase. These proteins likely had acommon ancestor but their functions diverged.

Analogous to railroad car and diner function.

Page 11: Sequence comparisons
Page 12: Sequence comparisons

Modular nature of proteins

The previous alignment was global. However, many proteins do not display global patterns of similarity. Instead, they possess local regions of similarity.

Proteins can be thought of as assemblies of modular domains. It is thought that this may, in some cases, be due to a process known as exon shuffling.

Page 13: Sequence comparisons

Modular nature of proteins (cont. 1)

Exon 1a Exon 2a

Duplication of Exon 2a

Exon 1a Exon 2a Exon 2a

Exchange with Gene B

Gene A

Gene A

Gene A

Gene B

Exon 1a Exon 2a Exon 3 (Exon 2b from Gene B)

Exon 1b Exon 2b Exon 3 (Exon 2a from Gene A)

Exon 1b Exon 2b Exon 2bGene B

Page 14: Sequence comparisons

Dot Plots

A T G C C T A G

A T G C C T A G

**

**

**

**

**

**

**

*

*

Window = 1

Note that 25% ofthe table will befilled due to randomchance. 1 in 4 chanceat each position

Page 15: Sequence comparisons

Dot Plots with window = 2

A T G C C T A GA T G C C T A G

**

**

**

*

Window = 2The larger the windowthe more noise canbe filtered

What is thepercent chance thatyou will receive a match randomly?1/16 * 100 = 6.25%

{{{{{{{

Page 16: Sequence comparisons

Similarity

It is easy to score if an amino acid is identical to another (thescore is 1 if identical and 0 if not). However, it is not easy togive a score for amino acids that are somewhat similar.

+NH3CO2

- +NH3CO2

-

Leucine Isoleucine

Should they get a 0 (non-identical) or a 1 (identical) orSomething in between?

Page 17: Sequence comparisons

Identity Matrix

Simplest type of scoring matrix

LICA

1000L

100I

10C

1A

Page 18: Sequence comparisons

The Point-Accepted-Mutation (PAM) model of evolution and the PAM scoring matrix

It implies that each amino acid (AA) mutates independently ofeach other with a probability which depends only on the AA. Since there are 20 AA, the transition probabilities aredescribed by a 20X20-mutation matrix, denoted by M. A standard M, which defines a 1-PAM change.

Point Accepted Mutation (PAM) Distance: A 1-PAM unit changes 1% of the amino acids on average:

where fi is the frequency of AA i. One PAM is a unit of evolutionarydivergence in which 1% of the amino acids have been changed.

Page 19: Sequence comparisons

The Point-Accepted-Mutation (PAM) model of evolution and the PAM scoring matrix (cont. 1)

A 2-PAM unit is equivalent to two 1-PAM unit evolution (or M2).

A k-PAM unit is equivalent to k 1-PAM unit evolution (or Mk). Example 1:

CNGTTDQVDKIVKILNEGQIASTDVVEVVVSPPYVFLPVVKSQLRPEIQV

|||||||||||||| |||||||||||||||||||||||||||||||||||

CNGTTDQVDKIVKIRNEGQIASTDVVEVVVSPPYVFLPVVKSQLRPEIQV

lengths = 50

1 Mismatch

PAM distance = 2

Page 20: Sequence comparisons
Page 21: Sequence comparisons

Two proteins that are similar in certain regions

Tissue plasminogen activator (PLAT)Coagulation factor 12 (F12).

Page 22: Sequence comparisons

The Dotter Program

• Program consists of three components:

•Sliding window

•A table that gives a score for each amino acid match

•A graph that converts the score to a dot of certain density. The higher the density the higher the score.

Page 23: Sequence comparisons
Page 24: Sequence comparisons

Region ofsimilarity

Single region on F12is similar to two regionson PLAT