Aligning Sequences With T-Coffee Cédric Notredame Comparative Bioinformatics Group Bioinformatics...

38
Aligning Sequences With T-Coffee Cédric Notredame Comparative Bioinformatics Group Bioinformatics and Genomics Program

Transcript of Aligning Sequences With T-Coffee Cédric Notredame Comparative Bioinformatics Group Bioinformatics...

Page 1: Aligning Sequences With T-Coffee Cédric Notredame Comparative Bioinformatics Group Bioinformatics and Genomics Program.

Aligning SequencesWith

T-Coffee

Cédric NotredameComparative Bioinformatics GroupBioinformatics and Genomics Program

Page 2: Aligning Sequences With T-Coffee Cédric Notredame Comparative Bioinformatics Group Bioinformatics and Genomics Program.

T-Coffee and Concistency…

SeqA GARFIELD THE LAST FAT CAT

SeqB GARFIELD THE FAST CAT

SeqC GARFIELD THE VERY FAST CAT

SeqD THE FAT CAT

SeqA GARFIELD THE LAST FA-T CATSeqB GARFIELD THE FAST CA-T ---SeqC GARFIELD THE VERY FAST CATSeqD -------- THE ---- FA-T CAT

Page 3: Aligning Sequences With T-Coffee Cédric Notredame Comparative Bioinformatics Group Bioinformatics and Genomics Program.

Consistency: Conflicts and Information

Y

W Z

X

Z

Y

ZW

Y

Z

X

W

X

Y

OR

+

+Non

ConsistentConsistent

Y

W Z

Y

ZW

ORX

X

X

Page 4: Aligning Sequences With T-Coffee Cédric Notredame Comparative Bioinformatics Group Bioinformatics and Genomics Program.

T-Coffee and Concistency…

SeqA GARFIELD THE LAST FAT CAT Prim. Weight =88SeqB GARFIELD THE FAST CAT ---

SeqA GARFIELD THE LAST FA-T CAT Prim. Weight =77 SeqC GARFIELD THE VERY FAST CAT

SeqA GARFIELD THE LAST FAT CAT Prim. Weight =100SeqD -------- THE ---- FAT CAT

SeqB GARFIELD THE ---- FAST CAT Prim. Weight =100SeqC GARFIELD THE VERY FAST CAT

SeqC GARFIELD THE VERY FAST CAT Prim. Weight =100SeqD -------- THE ---- FA-T CAT

Page 5: Aligning Sequences With T-Coffee Cédric Notredame Comparative Bioinformatics Group Bioinformatics and Genomics Program.

T-Coffee and Concistency…

SeqA GARFIELD THE LAST FAT CAT Prim. Weight =88SeqB GARFIELD THE FAST CAT ---

SeqA GARFIELD THE LAST FA-T CAT Prim. Weight =77 SeqC GARFIELD THE VERY FAST CAT

SeqA GARFIELD THE LAST FAT CAT Prim. Weight =100SeqD -------- THE ---- FAT CAT

SeqB GARFIELD THE ---- FAST CAT Prim. Weight =100SeqC GARFIELD THE VERY FAST CAT

SeqC GARFIELD THE VERY FAST CAT Prim. Weight =100SeqD -------- THE ---- FA-T CAT

SeqA GARFIELD THE LAST FAT CAT Weight =88SeqB GARFIELD THE FAST CAT ---

SeqA GARFIELD THE LAST FA-T CAT Weight =77 SeqC GARFIELD THE VERY FAST CATSeqB GARFIELD THE ---- FAST CAT

SeqA GARFIELD THE LAST FA-T CAT Weight =100SeqD -------- THE ---- FA-T CATSeqB GARFIELD THE ---- FAST CAT

Page 6: Aligning Sequences With T-Coffee Cédric Notredame Comparative Bioinformatics Group Bioinformatics and Genomics Program.

T-Coffee and Concistency…

SeqA GARFIELD THE LAST FAT CAT Weight =88SeqB GARFIELD THE FAST CAT ---

SeqA GARFIELD THE LAST FA-T CAT Weight =77 SeqC GARFIELD THE VERY FAST CATSeqB GARFIELD THE ---- FAST CAT

SeqA GARFIELD THE LAST FA-T CAT Weight =100SeqD -------- THE ---- FA-T CATSeqB GARFIELD THE ---- FAST CAT

Page 7: Aligning Sequences With T-Coffee Cédric Notredame Comparative Bioinformatics Group Bioinformatics and Genomics Program.

T-Coffee and Concistency…

Page 8: Aligning Sequences With T-Coffee Cédric Notredame Comparative Bioinformatics Group Bioinformatics and Genomics Program.

T-Coffee and Concistency…

Page 9: Aligning Sequences With T-Coffee Cédric Notredame Comparative Bioinformatics Group Bioinformatics and Genomics Program.

T-Coffee and Concistency…

Page 10: Aligning Sequences With T-Coffee Cédric Notredame Comparative Bioinformatics Group Bioinformatics and Genomics Program.

Methods

Data

Scalability

Page 11: Aligning Sequences With T-Coffee Cédric Notredame Comparative Bioinformatics Group Bioinformatics and Genomics Program.

Running T-Coffee over the Web

Page 12: Aligning Sequences With T-Coffee Cédric Notredame Comparative Bioinformatics Group Bioinformatics and Genomics Program.

Available Servers and Flavors

Page 13: Aligning Sequences With T-Coffee Cédric Notredame Comparative Bioinformatics Group Bioinformatics and Genomics Program.

Which MSA Method ???

Page 14: Aligning Sequences With T-Coffee Cédric Notredame Comparative Bioinformatics Group Bioinformatics and Genomics Program.

Combining Many MSAs into ONE

MUSCLE

MAFFT

ClustalW

???????

T-Coffee

Page 15: Aligning Sequences With T-Coffee Cédric Notredame Comparative Bioinformatics Group Bioinformatics and Genomics Program.

Consistency and Accuracy

Page 16: Aligning Sequences With T-Coffee Cédric Notredame Comparative Bioinformatics Group Bioinformatics and Genomics Program.

What To Do Without Structures

Page 17: Aligning Sequences With T-Coffee Cédric Notredame Comparative Bioinformatics Group Bioinformatics and Genomics Program.

Using the M-Coffee Server

Page 18: Aligning Sequences With T-Coffee Cédric Notredame Comparative Bioinformatics Group Bioinformatics and Genomics Program.

Using the M-Coffee Server

Page 19: Aligning Sequences With T-Coffee Cédric Notredame Comparative Bioinformatics Group Bioinformatics and Genomics Program.
Page 20: Aligning Sequences With T-Coffee Cédric Notredame Comparative Bioinformatics Group Bioinformatics and Genomics Program.

Integrating New Types of DataTemplate Based Sequence

Alignments

Page 21: Aligning Sequences With T-Coffee Cédric Notredame Comparative Bioinformatics Group Bioinformatics and Genomics Program.

ExperimentalData

TARGET

ExperimentalData

TARGETTemplate

Aligner

Template-Sequence Alignment

Primary Library

Template Alignment

Template based Alignmentof the Sequences

Templates Templates

TARGET

Page 22: Aligning Sequences With T-Coffee Cédric Notredame Comparative Bioinformatics Group Bioinformatics and Genomics Program.

Exploring The Template World

Template Generator Alignment Method

RNA Structure Prediction RNA Aligner

Protein Structure BLAST vs PDB 3D Aligner

Profile BLAST vs NR Profile/Profile Alignment

Gene Structure ENSEMBL Genome Aligner

Promoter Transfac Meta-Aligner

Page 23: Aligning Sequences With T-Coffee Cédric Notredame Comparative Bioinformatics Group Bioinformatics and Genomics Program.

Exploring The Template World

Template Generator Alignment Method

Mode

RNA Structure Prediction RNA Aligner R-Coffee

Protein Structure BLAST /PDB 3D Aligner 3D-Coffee

Profile BLAST/NR Profile/Profile PSI-Coffee

Gene Structure ENSEMBL Genome Aligner Exoset

Promoter Transfac Meta-Aligner Meta-Coffee

Page 24: Aligning Sequences With T-Coffee Cédric Notredame Comparative Bioinformatics Group Bioinformatics and Genomics Program.

3D-Coffee/ExpressoIncorporating

Structural Information

Page 25: Aligning Sequences With T-Coffee Cédric Notredame Comparative Bioinformatics Group Bioinformatics and Genomics Program.

Expresso: Finding the Right Structure

Sources

Templates

Library

BLAST BLAST

SAP

Template Alignment

Source Template Alignment

Remove Templates

Templates

Page 26: Aligning Sequences With T-Coffee Cédric Notredame Comparative Bioinformatics Group Bioinformatics and Genomics Program.

PSI-CoffeeHomology Extension

Page 27: Aligning Sequences With T-Coffee Cédric Notredame Comparative Bioinformatics Group Bioinformatics and Genomics Program.

Exploring The Template World

Page 28: Aligning Sequences With T-Coffee Cédric Notredame Comparative Bioinformatics Group Bioinformatics and Genomics Program.

What is Homology Extension ?

L L

L

?

-Simple scoring schemes result in alignment ambiguities

Page 29: Aligning Sequences With T-Coffee Cédric Notredame Comparative Bioinformatics Group Bioinformatics and Genomics Program.

What is Homology Extension ?

L L

L

LLLLLL

LLIVIL

LLLLLL

Profile 1

Profile 2

Page 30: Aligning Sequences With T-Coffee Cédric Notredame Comparative Bioinformatics Group Bioinformatics and Genomics Program.

What is Homology Extension ?

L L

L

LLLLLL

LLIVIL

LLLLLL

Profile 1

Profile 2

Page 31: Aligning Sequences With T-Coffee Cédric Notredame Comparative Bioinformatics Group Bioinformatics and Genomics Program.

PSI-Coffee: Homology Extension

Sources

Templates

Library

BLAST BLAST

Template Alignment

Source Template Alignment

Remove Templates

TemplatesProfile Aligner

Page 32: Aligning Sequences With T-Coffee Cédric Notredame Comparative Bioinformatics Group Bioinformatics and Genomics Program.

Benchmarks

Page 33: Aligning Sequences With T-Coffee Cédric Notredame Comparative Bioinformatics Group Bioinformatics and Genomics Program.

Do Benchmarks All Tell the same story?

Based on

Page 34: Aligning Sequences With T-Coffee Cédric Notredame Comparative Bioinformatics Group Bioinformatics and Genomics Program.

Method Method Template Score Comment

ClustalW-2 Progressive NO 22.74

PRANK Gap NO 26.18 Science2008

MAFFT Iterative NO 26.18

Muscle Iterative NO 31.37

ProbCons Consistency NO 40.80

ProbCons MonoPhasic NO 37.53

T-Coffee Consistency NO 42.30

M-Coffe4 Consistency NO 43.60

PSI-Coffee Consistency Profile 53.71

PROMAL Consistency Profile 55.08

PROMAL-3D Consistency PDB 57.60

3D-Coffee Consistency PDB 61.00 Expresso

Score: fraction of correct columns when compared with a structure based reference (BB11 of BaliBase).

Page 35: Aligning Sequences With T-Coffee Cédric Notredame Comparative Bioinformatics Group Bioinformatics and Genomics Program.

Method Method Template Score Comment

ClustalW-2 Progressive NO 22.74

PRANK Gap NO 26.18 Science2008

MAFFT Iterative NO 26.18

Muscle Iterative NO 31.37

ProbCons Consistency NO 40.80

ProbCons MonoPhasic NO 37.53

T-Coffee Consistency NO 42.30

M-Coffe4 Consistency NO 43.60

PSI-Coffee Consistency Profile 53.71

PROMAL Consistency Profile 55.08

PROMAL-3D Consistency PDB 57.60

3D-Coffee Consistency PDB 61.00 Expresso

Score: fraction of correct columns when compared with a structure based reference (BB11 of BaliBase).

Consistency

Page 36: Aligning Sequences With T-Coffee Cédric Notredame Comparative Bioinformatics Group Bioinformatics and Genomics Program.

Method Method Template Score Comment

ClustalW-2 Progressive NO 22.74

PRANK Gap NO 26.18 Science2008

MAFFT Iterative NO 26.18

Muscle Iterative NO 31.37

ProbCons Consistency NO 40.80

ProbCons MonoPhasic NO 37.53

T-Coffee Consistency NO 42.30

M-Coffe4 Consistency NO 43.60

PSI-Coffee Consistency Profile 53.71

PROMAL Consistency Profile 55.08

PROMAL-3D Consistency PDB 57.60

3D-Coffee Consistency PDB 61.00 Expresso

Score: fraction of correct columns when compared with a structure based reference (BB11 of BaliBase).

Homology Extension

Page 37: Aligning Sequences With T-Coffee Cédric Notredame Comparative Bioinformatics Group Bioinformatics and Genomics Program.

Method Method Template Score Comment

ClustalW-2 Progressive NO 22.74

PRANK Gap NO 26.18 Science2008

MAFFT Iterative NO 26.18

Muscle Iterative NO 31.37

ProbCons Consistency NO 40.80

ProbCons MonoPhasic NO 37.53

T-Coffee Consistency NO 42.30

M-Coffe4 Consistency NO 43.60

PSI-Coffee Consistency Profile 53.71

PROMAL Consistency Profile 55.08

PROMAL-3D Consistency PDB 57.60

3D-Coffee Consistency PDB 61.00 Expresso

Score: fraction of correct columns when compared with a structure based reference (BB11 of BaliBase).

Structural Extension

Page 38: Aligning Sequences With T-Coffee Cédric Notredame Comparative Bioinformatics Group Bioinformatics and Genomics Program.

T-Coffee and The World

BLAST/SOAP

-Some Templates are obtained with a BLAST-Queries can be sent to the EBI or the NCBI-No Need for a Local BLAST installation

Users sequences