Shrish Tiwari CCMB, Hyderabad Comparative Genomics: Overview.
-
Upload
jewel-carr -
Category
Documents
-
view
225 -
download
2
Transcript of Shrish Tiwari CCMB, Hyderabad Comparative Genomics: Overview.
Shrish Tiwari
CCMB, Hyderabad
Comparative Genomics: Overview
Introduction
• Sequences of 340 species available (274 bacterial, 25 archaeal and 41 eukaryotic)
• An additional 848 prokaryotic and 560 eukaryotic genome projects are ongoing
• Comparison of genomes can provide insights into the functional regions as well as genome dynamics
Sequence Comparison
• Let us look at a simple example
A A T T G A - A T C G C C A
A – A T C A C A G – G A T C5 matches, 6 mismatches, 3 indels
A A T T G A – A T C G C - C A
A A T – C A C A – G G A T C –7 matches, 3 mismatches, 5 indels
Sequence Comparison
• Requirements for sequence comparison:– A scoring scheme or scoring matrix– A search algorithm to identify the optimal
alignment
• Scoring matrices available: PAM, BLOSUM
• Search algorithm used: Dynamic programming
Applications
• Tracing our origins and history
• Assessing the diversity of a species
• Finding virulence genes
• Designing primers for novel species
• Identifying disease-causing mutations
• Predicting mutations in viral genome and design vaccines
Comparative Genomics
• Of distantly related species: look for similarities/conserved regions to infer functional regions of the genome; example mouse and man
• Of closely related species: look for differences, identify subtle mutations that make one species different from the other, understand how genomes evolve; examples chimp and man, virulent E. coli and benign E. coli
Comparative Genomics
• Comparison of the 73Kbp region of human β-globin with mouse and chimp genome shows 1) small stretches covering the first two exons and intervening intron matching at ~73% identity between human and mouse, 2) almost the complete 73Kbp region matches at ~97% for human and chimp
How different are we?
• Physical similarity is striking
How different are we?
• Socially, we have similar behaviour, including cooperation, warfare, politics and even bribery
Ape the toolmaker
Chimp Genome: Statistics
• Sequence of a single male captive-born chimpanzee from West Africa sub-species Pan troglodytes verus, obtained using a whole genome shotgun approach
• Assembly of the genome was done with PCAP and ARACHNE programs
• PCAP is a de novo assembly method; ARACHNE uses the human genome build 34 to facilitate and confirm contig linking and has more continuity
Chimp Genome: Statistics
• 3.6 fold redundancy of autosomes and 1.8 fold for sex chromosomes; covers 94% of chimp genome with >98% of the sequence in high quality bases (quality score >40, error rate <10-4)
• 50% of the sequence (N50) in contigs of length >15.7Kbp and supercontigs of length >8.6Mbp
Chimp Genome Sequence
• Chimp genomes are polymorphic within and between subspecies
• 1.66 million high-quality SNPs identified, of which 1.01 million are heterozygous in the primary donor
• Diversity rates among West African chimps is 8x10-4 (roughly the same as human diversity) and 17.6x10-4 among Central African chimps
Genome Comparison
• Genome comparisons can help to reveal the molecular basis of these traits as well evolutionary mechanisms that have moulded our species
• Reciprocal nucleotide-level alignment of the chimp and human genome covers ~2.4Gbp of high quality sequence
Genome Comparison
• Observed difference nearly always a single event in time and not multiple independent changes over time
• Most differences reflect random drift and hold extensive information about mutational processes
• A minority of functionally important changes underlie our phenotypic differences
Segmental Duplication• Has had a larger impact (~2.7%) in
altering the genomic landscape than single nucleotide substitutions (~1.2%)
• They are responsible for the emergence of new genes and adaptation of humans to their environment
• Human genome particularly enriched in genes resulting from recent duplications
Segmental Duplication
• 33% of human duplications (>94% identity) are not duplicated in chimpanzee
• An estimated duplication rate of 4-5Mbp per million years
• These have resulted in differences in gene expression, disease-causing duplications and change in the genomic landscape in general
Segmental Duplication
• Chimp only duplications: 11 out 17 were found only in chimp and not in man or other great apes in a cross-species comparison, whereas 6 were found also in gorilla
• De novo duplications followed by deletion of older duplications are the most likely scenarios for excess of segmental duplications observed in human-ape genomes
Gene Evolution
• 13,454 pairs of human and chimp genes with unambiguous 1:1 orthology were used
• Rate of evolution of a gene assessed using the non-synonymous substitution rate KA
Gene Evolution
• The background rate is estimated as the synonymous substitution rate Ks
• KA/Ks is a measure of evolutionary constraint on a gene
• KA/Ks > 1 implies adaptive or positive selection, under the assumption that synonymous changes are neutral
Gene Evolution
• KA/Ks = 0.23 for human-chimpanzee lineage 77% of amino acid substitutions are removed by natural selection
• CpG and non-CpG substitution at synonymous sites show lower duvergence, ~50% and ~30% lower respectively, than in introns, implying evolutionary constraint on synonymous substitutions
Gene Evolution
• 585 gene of the 13,454 human-chimp orthologues have KA/KI > 1
• Given the low divergence between human-chimp genome, KA/KI statistic has large variance
• Simulations show that KA/KI > 1 would be expected to occur by chance in 263 cases, if purifying selection acts non-uniformly on genes
Gene Evolution
• The extreme outliers are: – glycophorin C, mediates P. falciparum
invasion pathways in human erythrocytes– granulysin, mediates antimicrobial
activity against intracellular pathogens– protamines & semenogelins involved in
reproduction– Mas-related gene family involved in
nociception
Conclusions• Mean rate of single nucleotide changes
1.23%, <1.06% corresponding to fixed divergence
• Regional variations same in hominid and murid genomes except at subtelomeric regions
• 25% changes in CpG which are similar in both male and female germ lines
• Indels fewer but account for 1.5% of euchromatic sequence being lineage specific
Conclusions• SINEs have been more active in human
while chimp has acquired two new retroviral elements
• Orthologous proteins differ by 2 amino acids, with ~29% identical
• Amino acid altering changes are more frequent in hominids compared to murids, but close to changes seen human polymorphisms
• Substitution rate at silent sites lower than at intronic sites => purifying selections
Is Y going extinct?
• X and Y chromosomes have evolved from an autosomal pair in ancient mammal nearly 300 million years ago
• Most of Y genes in the X-degenerate regions
• X-degenerate region of Y does not recombine, which may lead to rapid gene loss
• Rate of gene loss estimated at 5 genes every million years
Is Y going extinct?
• Assuming gene loss occurs randomly and that human and chimp separated nearly 6 million years ago, many chimp Y genes are expected to have no functional orthologues in human
• Orthologues of all human X-degenerate genes and pseudogenes were searched
• Chimpanzee orthologues of 16 genes and 11 pseudogenes were identified
Is Y going extinct?• All the 11 chimp orthologues of the
human pseudogenes were pseudogenes in the chimp as well, with majority of inactivating mutations shared
• This indicates that none of the pseudogenes were lost between human and chimp in the last 6 million years
• GenScan and BLAST analysis of the chimp X-degenerate Y transcripts revealed that none were chimp specific
Is Y going extinct?• Divergence of X-degenerate exons was
compared with those of introns for genes as well as pedudogenes
• The divergence was found to be less in the exons than introns for genes, but same or more in pseudogenes
• These results suggest that purifying selection has been more effective during human evolution than previously assumed
J.F. Hughes et al. (2005) Nature 437, 101-104
Summary
• While we can learn a lot from a comparison of the human-chimp genomes, they are too much alike to get meaningful answers to many questions, e.g. a DNA sequence found in humans but missing in chimps: was it added in humans or lost in chimps?
Summary
• A difference found could be significant or just a variant within one species
• Sequences of other primates will be needed to establish the uniqueness of changes seen in human and chimps
• Genomes of primates like the orang-utan and rhesus macaque are expected soon
Origin of Clothing
• Humans infested with head and body lice
• Head louse lives and feeds on the scalp
• Body louse lives in clothing and feeds on body
• Chimp louse used as outgroup
Origin of Clothing
• 2 sequences from mtDNA (ND4 and CYTB) and 2 from nuclear DNA (EF-1 and RPII) from 40 lice (26 head lice and 14 body lice) from 12 different geographic regions were used for analysis along with one chimpanzee louse
• Trees built using ND4 and CYTB nearly identical
Origin of Clothing
• Results:– Greater diversity seen in African lice
than in non-African lice African origin for body lice
– Body louse originated ~72000 years ago (assumption human and chimp lice diverged ~5.5 million years ago)
– Demographic expansion of body lice correlates with the spread of modern humans out of Africa
Origin of Clothing
• Results indicate a recent origin of clothing ~72000 years
R, Kittler, M. Kayser and M. Stoneking (2003) “Molecular evolution of Pediculus humanus and the origin of clothing” Current Biology 13, 1414-1417
Conclusions
• Genomes of human and model organisms were sequenced in order to understand ourselves at the molecular level
• Comparative genomics studies have revealed interesting features of genome evolution so far
• This is just the tip of the iceberg!!