Orthology & Paralogy Alignment & Assembly Alastair Kerr Ph.D. WTCCB Bioinformatics Core [many slides...

27
Orthology & Paralogy Alignment & Assembly Alastair Kerr Ph.D. WTCCB Bioinformatics Core [many slides borrowed from various sources]

Transcript of Orthology & Paralogy Alignment & Assembly Alastair Kerr Ph.D. WTCCB Bioinformatics Core [many slides...

Page 1: Orthology & Paralogy Alignment & Assembly Alastair Kerr Ph.D. WTCCB Bioinformatics Core [many slides borrowed from various sources]

Orthology & ParalogyAlignment & Assembly

Alastair Kerr Ph.D.WTCCB Bioinformatics Core [many slides borrowed from various sources]

Page 2: Orthology & Paralogy Alignment & Assembly Alastair Kerr Ph.D. WTCCB Bioinformatics Core [many slides borrowed from various sources]

Overview

Orthology & Paralogy Definitions and examples Ways to determine an ortholog Pre-calculations: resources

Alignment & Assembly Differences Key programs for each Jalview example

Page 3: Orthology & Paralogy Alignment & Assembly Alastair Kerr Ph.D. WTCCB Bioinformatics Core [many slides borrowed from various sources]

Homologs

Have common origins but may or may not have common activity.

Homologous or not?: Often determined by arbitrary threshold level of similarity determined by alignment

Page 4: Orthology & Paralogy Alignment & Assembly Alastair Kerr Ph.D. WTCCB Bioinformatics Core [many slides borrowed from various sources]

Homologs

…have common ancestry, but the way they are related

can vary

(i.e. the reasons they have diverged into different sequences can vary)

orthologs - Homologs produced by speciation. They tend to have similar function.

paralogs - Homologs produced by gene duplication. They tend to have differing functions.

Page 5: Orthology & Paralogy Alignment & Assembly Alastair Kerr Ph.D. WTCCB Bioinformatics Core [many slides borrowed from various sources]

Orthologous or paralogous homologs

Early globin gene

mouse

ß-chain gene-chain gene

cattle ß human ß mouse ßhuman cattle

Orthologs () Orthologs (ß)Paralogs (cattle)

Homologs

Gene Duplication

Orthologs – diverged after speciation – tend to have similar function

Paralogs – diverged after gene duplication – some functional divergence occurs

Therefore, for linking similar genes between species, or performing “annotation transfer”, identify orthologs

Page 6: Orthology & Paralogy Alignment & Assembly Alastair Kerr Ph.D. WTCCB Bioinformatics Core [many slides borrowed from various sources]

True or False?

A1x is the ortholog in species x of A1y?

A1x is a paralog of A2x?

A1x is a paralog of A2y?

Page 7: Orthology & Paralogy Alignment & Assembly Alastair Kerr Ph.D. WTCCB Bioinformatics Core [many slides borrowed from various sources]

Identifying Gene/Protein Relationships from Phylogenies

Orthologs– Homologs produced by speciation– Gene phylogeny matches organismal

phylogeny

Paralogs– Homologs produced by gene duplication.– Multiple copies of homologs in a given

species • or evidence that gene duplication involved

through phylogenetic analysis

– Lack of match to organismal phylogeny

Page 8: Orthology & Paralogy Alignment & Assembly Alastair Kerr Ph.D. WTCCB Bioinformatics Core [many slides borrowed from various sources]

Gene Orthology: How to detect? Most : Identify reciprocal best BLAST hits (EGO, COGs,…)

Example Problem:

If making comparisons between human and bovine, for example, the bovine gene dataset is still quite incomplete

Therefore, current best hit may be a paralog now and the true ortholog not yet sequenced

cattle human cattle mouse

Page 9: Orthology & Paralogy Alignment & Assembly Alastair Kerr Ph.D. WTCCB Bioinformatics Core [many slides borrowed from various sources]

2 Forms in 1 Species+ + ++ +

Slides from Jonathan Eisen

Page 10: Orthology & Paralogy Alignment & Assembly Alastair Kerr Ph.D. WTCCB Bioinformatics Core [many slides borrowed from various sources]

2 Forms in 1 Species - Gene Loss

Gene duplicated in common ancestor

+ + ++ +

++

LossLoss

Page 11: Orthology & Paralogy Alignment & Assembly Alastair Kerr Ph.D. WTCCB Bioinformatics Core [many slides borrowed from various sources]

Unusual Distribution Pattern+ +

Page 12: Orthology & Paralogy Alignment & Assembly Alastair Kerr Ph.D. WTCCB Bioinformatics Core [many slides borrowed from various sources]

Unusual Distribution - Gene Loss+ +

Gene present in ancestor

Gene losthere

Page 13: Orthology & Paralogy Alignment & Assembly Alastair Kerr Ph.D. WTCCB Bioinformatics Core [many slides borrowed from various sources]

Unusual Distribution -Evolutionary Rate Variation -?

+

+

Gene too diverged to be found

Page 14: Orthology & Paralogy Alignment & Assembly Alastair Kerr Ph.D. WTCCB Bioinformatics Core [many slides borrowed from various sources]

Ortholog guess via synteny

AA CCB

AA CC?

Page 15: Orthology & Paralogy Alignment & Assembly Alastair Kerr Ph.D. WTCCB Bioinformatics Core [many slides borrowed from various sources]

Syntenic blocks

Page 16: Orthology & Paralogy Alignment & Assembly Alastair Kerr Ph.D. WTCCB Bioinformatics Core [many slides borrowed from various sources]

Alignments and Assemblies

Alignment ALL sequences from SAME region Therefore can be useless for

non-overlapping contigs PCR probes/oligos

Good for paralog/orthologs Basis for phylogeny More dissimilar sequences

Assembly: Good for near identical sequences Read Length

Short Read [Next Gen Sequencing] Long Read [Sanger and 3rd Gen sequencing?]

Reference? De-novo Guided [reference sequence]

Page 17: Orthology & Paralogy Alignment & Assembly Alastair Kerr Ph.D. WTCCB Bioinformatics Core [many slides borrowed from various sources]

ensEMBL calculationshttp://www.ensembl.org

demo

Page 18: Orthology & Paralogy Alignment & Assembly Alastair Kerr Ph.D. WTCCB Bioinformatics Core [many slides borrowed from various sources]

OMA Browserhttp://omabrowser.org

demo

Page 19: Orthology & Paralogy Alignment & Assembly Alastair Kerr Ph.D. WTCCB Bioinformatics Core [many slides borrowed from various sources]

Alignment

Implicit statement Each residue in an aligned sequence

derived from the last common ancestor [LCA]

Therefore ok to only look at conserved regions or mask non-conserved regions Especially for phylogeny

Page 20: Orthology & Paralogy Alignment & Assembly Alastair Kerr Ph.D. WTCCB Bioinformatics Core [many slides borrowed from various sources]

Alignment Tools

Faster but less accurate (some better with gaps) Muscle ClustalW/X MAFFT

Slow but more accurate *-Coffee

T: original 3D: uses pdb as guide (structural) M: uses multiple methods

Probcons

Page 21: Orthology & Paralogy Alignment & Assembly Alastair Kerr Ph.D. WTCCB Bioinformatics Core [many slides borrowed from various sources]

Alignment Edit Tools

NEVER use a word processor or excel to edit alignments……

JalView (Java Alignment Viewer) Good for editing DAS capable

Page 22: Orthology & Paralogy Alignment & Assembly Alastair Kerr Ph.D. WTCCB Bioinformatics Core [many slides borrowed from various sources]

FigureGeneration Trees

Annotation

Features

Structures

PDB

‘Standard’ FormatsFASTA MSF CLUSTAL

PILEUP BLC PFAM

DistributedAnnotationSystem

DistributedAnnotationSystem

GFF

Jalview Features

Newick

Secondary StructurePrediction

MultipleSequenceAlignment

Sequences

Alignments

ClickableHTML

ImagesLine Art

Analysis

ConsensusConservation& Clustering

Visualization

Jalview Annotation

Page 23: Orthology & Paralogy Alignment & Assembly Alastair Kerr Ph.D. WTCCB Bioinformatics Core [many slides borrowed from various sources]
Page 24: Orthology & Paralogy Alignment & Assembly Alastair Kerr Ph.D. WTCCB Bioinformatics Core [many slides borrowed from various sources]
Page 25: Orthology & Paralogy Alignment & Assembly Alastair Kerr Ph.D. WTCCB Bioinformatics Core [many slides borrowed from various sources]

Jalview DAS Client Functionality

DASANNOTATIO

NSERVERS

DASANNOTATIO

NSERVERS

•Query matches ID to Authority•Map to local reference frame

•Mouse over for feature name, links and scores

•Group features by source•Type==colour•Highlight start-end

•Select specific sources•Filtered list•Add user defined sources

Page 26: Orthology & Paralogy Alignment & Assembly Alastair Kerr Ph.D. WTCCB Bioinformatics Core [many slides borrowed from various sources]

Assemblers

Many free options : examples below Long Reads

STADEN - staden.sf.net NextGenSequencing

Guided: Bowtie, Novoalign, MAQ Denovo: Velvet

3rd Generation Sequencing ????

Page 27: Orthology & Paralogy Alignment & Assembly Alastair Kerr Ph.D. WTCCB Bioinformatics Core [many slides borrowed from various sources]

Post Assembly

Correction Reads mapping to multiple places PCR amplification prior to mapping

Tools and workflows available in our Galaxy platform demo