Orthology & ParalogyAlignment & Assembly
Alastair Kerr Ph.D. [many slides borrowed from various sources]
Overview
Orthology & Paralogy Definitions and examples Ways to determine an ortholog Pre-calculations: resources
Alignment & Assembly Differences Key programs for each Jalview example
Homologs
Have common origins but may or may not have common activity.
Homologous or not?: Often determined by arbitrary threshold level of similarity determined by alignment
Homologs
…have common ancestry, but the way they are related
can vary
(i.e. the reasons they have diverged into different sequences can vary)
orthologs - Homologs produced by speciation. They tend to have similar function.
paralogs - Homologs produced by gene duplication. They tend to have differing functions.
Orthologous or paralogous homologs
Early globin gene
mouse
ß-chain gene-chain gene
cattle ß human ß mouse ßhuman cattle
Orthologs () Orthologs (ß)Paralogs (cattle)
Homologs
Gene Duplication
Orthologs – diverged after speciation – tend to have similar function
Paralogs – diverged after gene duplication – some functional divergence occurs
Therefore, for linking similar genes between species, or performing “annotation transfer”, identify orthologs
True or False?
A1x is the ortholog in species x of A1y?
A1x is a paralog of A2x?
A1x is a paralog of A2y?
Identifying Gene/Protein Relationships from Phylogenetic trees
orthologs - Homologs produced by speciation. Gene phylogeny matches organismal phylogeny.
paralogs - Homologs produced by gene duplication. Multiple copies of homologs in a given species or evidence that gene duplication involved through phylogenetic analysis and lack of match to organismal phylogeny
Gene Orthology: How to detect? Most : Identify reciprocal best BLAST hits (EGO, COGs,…)
Example Problem:
If making comparisons between human and bovine, for example, the bovine gene dataset is still quite incomplete
Therefore, current best hit may be a paralog now and the true ortholog not yet sequenced
cattle human cattle mouse
Alignments and Assemblies
Alignment ALL sequences from SAME region Therefore can be useless for a
non-overlapping contigs PCR probes/oligos
Good for paralog/orthologs Basis for phylogeny
Assembly: Good for near identical sequences Types:
De-novo Guided [reference sequence]
Alignment
Implicit statement Each residue in an aligned sequence
derived from the last common ancestor [LCA]
Therefore ok to only look at conserved regions or mask non-conserved regions Especially for phylogeny
Alignment Tools
Faster but less accurate (some better with gaps) Muscle ClustalW/X MAFFT
Slow but more accurate *-Coffee
T: original 3D: uses pdb as guide (structural) M: uses multiple methods
Probcons
Alignment Edit Tools
NEVER use a word processor or excel to edit alignments……
JalView (Java Alignment Viewer) Good for editing DAS capable
FigureGeneration Trees
Annotation
Features
Structures
PDB
‘Standard’ FormatsFASTA MSF CLUSTAL
PILEUP BLC PFAM
DistributedAnnotationSystem
DistributedAnnotationSystem
GFF
Jalview Features
Newick
Secondary StructurePrediction
MultipleSequenceAlignment
Sequences
Alignments
ClickableHTML
ImagesLine Art
Analysis
ConsensusConservation& Clustering
Visualization
Jalview Annotation
Jalview DAS Client Functionality
DASANNOTATIO
NSERVERS
DASANNOTATIO
NSERVERS
•Query matches ID to Authority•Map to local reference frame
•Mouse over for feature name, links and scores
•Group features by source•Type==colour•Highlight start-end
•Select specific sources•Filtered list•Add user defined sources
Top Related