Orthology & ParalogyAlignment & Assembly
Alastair Kerr Ph.D.WTCCB Bioinformatics Core [many slides borrowed from various sources]
Overview
Orthology & Paralogy Definitions and examples Ways to determine an ortholog Pre-calculations: resources
Alignment & Assembly Differences Key programs for each Jalview example
Homologs
Have common origins but may or may not have common activity.
Homologous or not?: Often determined by arbitrary threshold level of similarity determined by alignment
Homologs
…have common ancestry, but the way they are related
can vary
(i.e. the reasons they have diverged into different sequences can vary)
orthologs - Homologs produced by speciation. They tend to have similar function.
paralogs - Homologs produced by gene duplication. They tend to have differing functions.
Orthologous or paralogous homologs
Early globin gene
mouse
ß-chain gene-chain gene
cattle ß human ß mouse ßhuman cattle
Orthologs () Orthologs (ß)Paralogs (cattle)
Homologs
Gene Duplication
Orthologs – diverged after speciation – tend to have similar function
Paralogs – diverged after gene duplication – some functional divergence occurs
Therefore, for linking similar genes between species, or performing “annotation transfer”, identify orthologs
True or False?
A1x is the ortholog in species x of A1y?
A1x is a paralog of A2x?
A1x is a paralog of A2y?
Identifying Gene/Protein Relationships from Phylogenies
Orthologs– Homologs produced by speciation– Gene phylogeny matches organismal
phylogeny
Paralogs– Homologs produced by gene duplication.– Multiple copies of homologs in a given
species • or evidence that gene duplication involved
through phylogenetic analysis
– Lack of match to organismal phylogeny
Gene Orthology: How to detect? Most : Identify reciprocal best BLAST hits (EGO, COGs,…)
Example Problem:
If making comparisons between human and bovine, for example, the bovine gene dataset is still quite incomplete
Therefore, current best hit may be a paralog now and the true ortholog not yet sequenced
cattle human cattle mouse
2 Forms in 1 Species+ + ++ +
Slides from Jonathan Eisen
2 Forms in 1 Species - Gene Loss
Gene duplicated in common ancestor
+ + ++ +
++
LossLoss
Unusual Distribution Pattern+ +
Unusual Distribution - Gene Loss+ +
Gene present in ancestor
Gene losthere
Unusual Distribution -Evolutionary Rate Variation -?
+
+
Gene too diverged to be found
Ortholog guess via synteny
AA CCB
AA CC?
Syntenic blocks
Alignments and Assemblies
Alignment ALL sequences from SAME region Therefore can be useless for
non-overlapping contigs PCR probes/oligos
Good for paralog/orthologs Basis for phylogeny More dissimilar sequences
Assembly: Good for near identical sequences Read Length
Short Read [Next Gen Sequencing] Long Read [Sanger and 3rd Gen sequencing?]
Reference? De-novo Guided [reference sequence]
ensEMBL calculationshttp://www.ensembl.org
demo
OMA Browserhttp://omabrowser.org
demo
Alignment
Implicit statement Each residue in an aligned sequence
derived from the last common ancestor [LCA]
Therefore ok to only look at conserved regions or mask non-conserved regions Especially for phylogeny
Alignment Tools
Faster but less accurate (some better with gaps) Muscle ClustalW/X MAFFT
Slow but more accurate *-Coffee
T: original 3D: uses pdb as guide (structural) M: uses multiple methods
Probcons
Alignment Edit Tools
NEVER use a word processor or excel to edit alignments……
JalView (Java Alignment Viewer) Good for editing DAS capable
FigureGeneration Trees
Annotation
Features
Structures
PDB
‘Standard’ FormatsFASTA MSF CLUSTAL
PILEUP BLC PFAM
DistributedAnnotationSystem
DistributedAnnotationSystem
GFF
Jalview Features
Newick
Secondary StructurePrediction
MultipleSequenceAlignment
Sequences
Alignments
ClickableHTML
ImagesLine Art
Analysis
ConsensusConservation& Clustering
Visualization
Jalview Annotation
Jalview DAS Client Functionality
DASANNOTATIO
NSERVERS
DASANNOTATIO
NSERVERS
•Query matches ID to Authority•Map to local reference frame
•Mouse over for feature name, links and scores
•Group features by source•Type==colour•Highlight start-end
•Select specific sources•Filtered list•Add user defined sources
Assemblers
Many free options : examples below Long Reads
STADEN - staden.sf.net NextGenSequencing
Guided: Bowtie, Novoalign, MAQ Denovo: Velvet
3rd Generation Sequencing ????
Post Assembly
Correction Reads mapping to multiple places PCR amplification prior to mapping
Tools and workflows available in our Galaxy platform demo
Top Related