MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano
description
Transcript of MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano
![Page 1: MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano](https://reader036.fdocuments.in/reader036/viewer/2022070503/568163da550346895dd53130/html5/thumbnails/1.jpg)
http://cs273a.stanford.edu [Bejerano Fall16/17] 1
CS273A
Lecture 10: Transcription Regulation III,Neutral evolution: repetitive elements
![Page 2: MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano](https://reader036.fdocuments.in/reader036/viewer/2022070503/568163da550346895dd53130/html5/thumbnails/2.jpg)
http://cs273a.stanford.edu [Bejerano Fall16/17] 2
Announcements• PS1 is in. PS2 is out…
![Page 3: MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano](https://reader036.fdocuments.in/reader036/viewer/2022070503/568163da550346895dd53130/html5/thumbnails/3.jpg)
http://cs273a.stanford.edu [Bejerano Fall16/17] 3
Transcription & its regulationhappen in open chromatin
![Page 4: MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano](https://reader036.fdocuments.in/reader036/viewer/2022070503/568163da550346895dd53130/html5/thumbnails/4.jpg)
http://cs273a.stanford.edu [Bejerano Fall16/17] 4
Nucleosomes, Histones, Transcription
Chromatin / Proteins
DNA / Proteins
Genome packaging provides a critical layer of gene regulation.
![Page 5: MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano](https://reader036.fdocuments.in/reader036/viewer/2022070503/568163da550346895dd53130/html5/thumbnails/5.jpg)
Gene Activation / Repression via Chromatin Remodeling
A dedicated machinery opens and closes chromatin.Interactions with this machinery turn genes and/or gene regulatory regions like enhancers and repressors on or off(by making the genomic DNA in/accessible)
http://cs273a.stanford.edu [Bejerano Fall16/17] 5
![Page 6: MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano](https://reader036.fdocuments.in/reader036/viewer/2022070503/568163da550346895dd53130/html5/thumbnails/6.jpg)
http://cs273a.stanford.edu [Bejerano Fall16/17] 6
EpigenomicsThe histone code
![Page 7: MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano](https://reader036.fdocuments.in/reader036/viewer/2022070503/568163da550346895dd53130/html5/thumbnails/7.jpg)
http://cs273a.stanford.edu [Bejerano Fall16/17] 7
Histone Tails, Histone Marks
DNA is wrapped around nucleosomes.Nucleosomes are made of histones.Histones have free tails.Residues in the tails are modified in specific patterns
in conjunction with specific gene regulation activity.
![Page 8: MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano](https://reader036.fdocuments.in/reader036/viewer/2022070503/568163da550346895dd53130/html5/thumbnails/8.jpg)
Histone Mark Correlation ExamplesActive gene promoters are marked by H3K4me3Silenced gene promoters are marked by H3K27me3p300, a protein component of many active enhancers acetylates H3k27Ac.
http://cs273a.stanford.edu [Bejerano Fall16/17] 8
![Page 9: MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano](https://reader036.fdocuments.in/reader036/viewer/2022070503/568163da550346895dd53130/html5/thumbnails/9.jpg)
Measuring these different states
http://cs273a.stanford.edu [Bejerano Fall16/17] 9
Note that the DNA itself doesn’t change. We sequence different portions of it thatare currently in different states (bound by a TF, wrapped around a nucleosome etc.)
![Page 10: MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano](https://reader036.fdocuments.in/reader036/viewer/2022070503/568163da550346895dd53130/html5/thumbnails/10.jpg)
Epigenomics: study all these marks genomewide
http://cs273a.stanford.edu [Bejerano Fall16/17] 10
Translate observationsinto current genome state.
![Page 11: MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano](https://reader036.fdocuments.in/reader036/viewer/2022070503/568163da550346895dd53130/html5/thumbnails/11.jpg)
Obtain a network of all active genes & DNA
http://cs273a.stanford.edu [Bejerano Fall16/17] 11
Now what?(to be revisited)
“Ridicilogram”
![Page 12: MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano](https://reader036.fdocuments.in/reader036/viewer/2022070503/568163da550346895dd53130/html5/thumbnails/12.jpg)
Histone Code HypothesisHistone modifications serve to recruit other proteins by specific recognition of the modified histone via protein domains specialized for such purposes, rather than through simply stabilizing or destabilizing the interaction between histone and the underlying DNA.
http://cs273a.stanford.edu [Bejerano Fall16/17] 12
histonemodification:
writer
reader
eraser…
![Page 13: MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano](https://reader036.fdocuments.in/reader036/viewer/2022070503/568163da550346895dd53130/html5/thumbnails/13.jpg)
Epigenomics is not EpigeneticsEpigenetics is the study of heritable changes in gene expression or cellular phenotype, caused by mechanisms other than changes in the underlying DNA sequence
There are objections to the use of the term epigenetic to describe chemical modification of histone, since it remains unknown whether or not these modifications are heritable.
http://cs273a.stanford.edu [Bejerano Fall16/17] 13
![Page 14: MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano](https://reader036.fdocuments.in/reader036/viewer/2022070503/568163da550346895dd53130/html5/thumbnails/14.jpg)
http://cs273a.stanford.edu [Bejerano Fall16/17] 14
Gene RegulationChromatin / Proteins
DNA / Proteins
Extracellular signals
![Page 15: MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano](https://reader036.fdocuments.in/reader036/viewer/2022070503/568163da550346895dd53130/html5/thumbnails/15.jpg)
http://cs273a.stanford.edu [Bejerano Fall16/17] 15
Cis-Regulatory Components
Low level (“atoms”):• Promoter motifs (TATA box, etc)• Transcription factor binding sites (TFBS)Mid Level:• Promoter• Enhancers• Repressors/silencers• Insulators/boundary elements• Locus control regionsHigh Level:• Epigenomic domains / signatures• Gene expression domains• Gene regulatory networks
![Page 16: MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano](https://reader036.fdocuments.in/reader036/viewer/2022070503/568163da550346895dd53130/html5/thumbnails/16.jpg)
If you only measure gene expression
http://cs273a.stanford.edu [Bejerano Fall16/17] 16
It’s like only seeing the values change in RAM as a program is running.
![Page 17: MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano](https://reader036.fdocuments.in/reader036/viewer/2022070503/568163da550346895dd53130/html5/thumbnails/17.jpg)
Inferring Gene Expression CausalityMeasuring gene expression over time provides sets of genes that change their expression in synchrony.
• But who regulates whom?• Some of the necessary regulators may not change their
expression level when measured, and yet be essential.“Reading” enhancers can provide gene regulatory logic:• If present(TF A, TF B, TF C) then turn on nearby gene X
http://cs273a.stanford.edu [Bejerano Fall16/17] 17
![Page 18: MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano](https://reader036.fdocuments.in/reader036/viewer/2022070503/568163da550346895dd53130/html5/thumbnails/18.jpg)
http://cs273a.stanford.edu [Bejerano Fall16/17] 18
Gene Regulation is in Data Deluge mode
“Data is not information, information is not knowledge, knowledge is not understanding, understanding is not wisdom.”
![Page 19: MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano](https://reader036.fdocuments.in/reader036/viewer/2022070503/568163da550346895dd53130/html5/thumbnails/19.jpg)
Transcription Factors have Large “fan outs”We could have had one TF regulate two TFS, each of which regulates two other TFs, etc. and each of those contributing to the regulation of a modest number of target genes (that do the real work).
Instead TFs reproducibly bind to thousands of genomic locations almost anywhere we’ve looked.
Gene regulation forms a dense network.
http://cs273a.stanford.edu [Bejerano Fall16/17] 19
TFs
pathwaygenes
![Page 20: MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano](https://reader036.fdocuments.in/reader036/viewer/2022070503/568163da550346895dd53130/html5/thumbnails/20.jpg)
http://cs273a.stanford.edu [Bejerano Fall16/17] 20
Some important genes have large “fan ins”
![Page 21: MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano](https://reader036.fdocuments.in/reader036/viewer/2022070503/568163da550346895dd53130/html5/thumbnails/21.jpg)
We are technically DONE with genome function
Biology – not that complicated!!
Functional part list • In our genome:
• Gene• Protein coding• Non coding / RNA genes
• Gene regulatory elements• “Atomic” event: transcription factor binding site• Build up: promoters, enhancers, silencers, gene reg. domain
• “Around” our genome• Chromatin – open / closed• Epigenomic (and some epigenetic) marks
http://cs273a.stanford.edu [Bejerano Fall16/17] 21
![Page 22: MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano](https://reader036.fdocuments.in/reader036/viewer/2022070503/568163da550346895dd53130/html5/thumbnails/22.jpg)
Actually almost done…We’ve talked about transcripts and their regulation.We’re still ignoring most of the genome…
http://cs273a.stanford.edu [Bejerano Fall16/17] 22
Type # in genome % of genome
genes 20,000 2%
ncRNA 20,000 2%
cis elements 1,000,000 >10%
![Page 23: MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano](https://reader036.fdocuments.in/reader036/viewer/2022070503/568163da550346895dd53130/html5/thumbnails/23.jpg)
http://cs273a.stanford.edu [Bejerano Fall16/17] 23
To be continued
![Page 24: MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano](https://reader036.fdocuments.in/reader036/viewer/2022070503/568163da550346895dd53130/html5/thumbnails/24.jpg)
The Functional Genome
http://cs273a.stanford.edu [Bejerano Fall16/17] 24
Type # in genome
genes 20,000
ncRNA 20,000
cis elements 1,000,000
![Page 25: MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano](https://reader036.fdocuments.in/reader036/viewer/2022070503/568163da550346895dd53130/html5/thumbnails/25.jpg)
The Functional Genome
http://cs273a.stanford.edu [Bejerano Fall16/17] 25
Type # in genome % of genome
genes 20,000 2-3%
ncRNA 20,000 2%
cis elements 1,000,000 10-15%
Corollary: most of the genome is devoid of function (which we understand)
![Page 26: MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano](https://reader036.fdocuments.in/reader036/viewer/2022070503/568163da550346895dd53130/html5/thumbnails/26.jpg)
TTATATTGAATTTTCAAAAATTCTTACTTTTTTTTTGGATGGACGCAAAGAAGTTTAATAATCATATTACATGGCATTACCACCATATACATATCCATATCTAATCTTACTTATATGTTGTGGAAATGTAAAGAGCCCCATTATCTTAGCCTAAAAAAACCTTCTCTTTGGAACTTTCAGTAATACGCTTAACTGCTCATTGCTATATTGAAGTACGGATTAGAAGCCGCCGAGCGGGCGACAGCCCTCCGACGGAAGACTCTCCTCCGTGCGTCCTCGTCTTCACCGGTCGCGTTCCTGAAACGCAGATGTGCCTCGCGCCGCACTGCTCCGAACAATAAAGATTCTACAATACTAGCTTTTATGGTTATGAAGAGGAAAAATTGGCAGTAACCTGGCCCCACAAACCTTCAAATTAACGAATCAAATTAACAACCATAGGATGATAATGCGATTAGTTTTTTAGCCTTATTTCTGGGGTAATTAATCAGCGAAGCGATGATTTTTGATCTATTAACAGATATATAAATGGAAAAGCTGCATAACCACTTTAACTAATACTTTCAACATTTTCAGTTTGTATTACTTCTTATTCAAATGTCATAAAAGTATCAACAAAAAATTGTTAATATACCTCTATACTTTAACGTCAAGGAGAAAAAACTATAATGACTAAATCTCATTCAGAAGAAGTGATTGTACCTGAGTTCAATTCTAGCGCAAAGGAATTACCAAGACCATTGGCCGAAAAGTGCCCGAGCATAATTAAGAAATTTATAAGCGCTTATGATGCTAAACCGGATTTTGTTGCTAGATCGCCTGGTAGAGTCAATCTAATTGGTGAACATATTGATTATTGTGACTTCTCGGTTTTACCTTTAGCTATTGATTTTGATATGCTTTGCGCCGTCAAAGTTTTGAACGATGAGATTTCAAGTCTTAAAGCTATATCAGAGGGCTAAGCATGTGTATTCTGAATCTTTAAGAGTCTTGAAGGCTGTGAAATTAATGACTACAGCGAGCTTTACTGCCGACGAAGACTTTTTCAAGCAATTTGGTGCCTTGATGAACGAGTCTCAAGCTTCTTGCGATAAACTTTACGAATGTTCTTGTCCAGAGATTGACAAAATTTGTTCCATTGCTTTGTCAAATGGATCATATGGTTCCCGTTTGACCGGAGCTGGCTGGGGTGGTTGTACTGTTCACTTGGTTCCAGGGGGCCCAAATGGCAACATAGAAAAGGTAAAAGAAGCCCTTGCCAATGAGTTCTACAAGGTCAAGTACCCTAAGATCACTGATGCTGAGCTAGAAAATGCTATCATCGTCTCTAAACCAGCATTGGGCAGCTGTCTATATGAATTAGTCAAGTATACTTCTTTTTTTTACTTTGTTCAGAACAACTTCTCATTTTTTTCTACTCATAACTTTAGCATCACAAAATACGCAATAATAACGAGTAGTAACACTTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCACAAACTTTAAAACACAGGGACAAAATTCTTGATATGCTTTCAACCGCTGCGTTTTGGATACCTATTCTTGACATGATATGACTACCATTTTGTTATTGTACGTGGGGCAGTTGACGTCTTATCATATGTCAAAGTTGCGAAGTTCTTGGCAAGTTGCCAACTGACGAGATGCAGTAACACTTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCACAAACTTTAAAACACAGGGACAAAATTCTTGATATGCTTTCAACCGCTGCGTTTTGGATACCTATTCTTGACATGATATGACTACCATTTTGTTATTGTACGTGGGGCAGTTGACGTCTTATCATATGTCAAAGTCATTTGCGAAGTTCTTGGCAAGTTGCCAACTGACGAGATGCAGTTTCCTACGCATAATAAGAATAGGAGGGAATATCAAGCCAGACAATCTATCATTACATTTAAGCGGCTCTTCAAAAAGATTGAACTCTCGCCAACTTATGGAATCTTCCAATGAGACCTTTGCGCCAAATAATGTGGATTTGGAAAAAGAGTATAAGTCATCTCAGAGTAATATAACTACCGAAGTTTATGAGGCATCGAGCTTTGAAGAAAAAGTAAGCTCAGAAAAACCTCAATACAGCTCATTCTGGAAGAAAATCTATTATGAATATGTGGTCGTTGACAAATCAATCTTGGGTGTTTCTATTCTGGATTCATTTATGTACAACCAGGACTTGAAGCCCGTCGAAAAAGAAAGGCGGGTTTGGTCCTGGTACAATTATTGTTACTTCTGGCTTGCTGAATGTTTCAATATCAACACTTGGCAAATTGCAGCTACAGGTCTACAACTGGGTCTAAATTGGTGGCAGTGTTGGATAACAATTTGGATTGGGTACGGTTTCGTTGGTGCTTTTGTTGTTTTGGCCTCTAGAGTTGGATCTGCTTATCATTTGTCATTCCCTATATCATCTAGAGCATCATTCGGTATTTTCTTCTCTTTATGGCCCGTTATTAACAGAGTCGTCATGGCCATCGTTTGGTATAGTGTCCAAGCTTATATTGCGGCAACTCCCGTATCATTAATGCTGAAATCTATCTTTGGAAAAGATTTACAATGATTGTACGTGGGGCAGTTGACGTCTTATCATATGTCAAAGTCATTTGCGAAGTTCTTGGCAAGTTGCCAACTGACGAGATGCAGTAACACTTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCACAAACTTTAAAACACAGGGACAAAATTCTTGATATGCTTTCAACCGCTGCGTTTTGGATACCTATTCTTGACATGATATGACTACCATTTTGTTATTGTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATAAAG
26http://cs273a.stanford.edu [Bejerano Fall16/17]
![Page 27: MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano](https://reader036.fdocuments.in/reader036/viewer/2022070503/568163da550346895dd53130/html5/thumbnails/27.jpg)
http://cs273a.stanford.edu [Bejerano Fall16/17] 27
“Nothing in Biology Makes Sense Except in the Light of Evolution”
Theodosius Dobzhansky
![Page 28: MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano](https://reader036.fdocuments.in/reader036/viewer/2022070503/568163da550346895dd53130/html5/thumbnails/28.jpg)
http://cs273a.stanford.edu [Bejerano Fall16/17] 28
One Cell, One Genome, One Replication
Every cell holds a copy of all its DNA = its genome.The human body is made of ~1013 cells.All originate from a single cell through repeated cell divisions.
cell
genome =all DNA
chicken ≈ 1013 copies(DNA) of egg (DNA)
chicken
eggegg
egg
celldivision
DNA strings =Chromosomes
![Page 29: MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano](https://reader036.fdocuments.in/reader036/viewer/2022070503/568163da550346895dd53130/html5/thumbnails/29.jpg)
http://cs273a.stanford.edu [Bejerano Fall16/17] 29
Every Genome is Different
DNA Replication is imperfect – between individuals of the same species, even between the cells of an individual.
...ACGTACGACTGACTAGCATCGACTACGA...
chicken
egg ...ACGTACGACTGACTAGCATCGACTACGA...
functionaljunk
TT CAT
“anythinggoes”
many changesare not tolerated
chicken
This has bad implications – disease, and good implications – adaptation.
![Page 30: MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano](https://reader036.fdocuments.in/reader036/viewer/2022070503/568163da550346895dd53130/html5/thumbnails/30.jpg)
Human Mutation Rate• Recent sequencing analysis suggests
~40-60 new mutations in a child that were not present in either parent.
• Mutations range from the smallest possible (single base pair change) to the largest – whole genome duplication (to be discussed).
• Selection does not tolerate all of these mutation, but it sure does tolerate some.
http://cs273a.stanford.edu [Bejerano Fall16/17] 30
chicken
egg
chicken
![Page 31: MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano](https://reader036.fdocuments.in/reader036/viewer/2022070503/568163da550346895dd53130/html5/thumbnails/31.jpg)
TTATATTGAATTTTCAAAAATTCTTACTTTTTTTTTGGATGGACGCAAAGAAGTTTAATAATCATATTACATGGCATTACCACCATATACATATCCATATCTAATCTTACTTATATGTTGTGGAAATGTAAAGAGCCCCATTATCTTAGCCTAAAAAAACCTTCTCTTTGGAACTTTCAGTAATACGCTTAACTGCTCATTGCTATATTGAAGTACGGATTAGAAGCCGCCGAGCGGGCGACAGCCCTCCGACGGAAGACTCTCCTCCGTGCGTCCTCGTCTTCACCGGTCGCGTTCCTGAAACGCAGATGTGCCTCGCGCCGCACTGCTCCGAACAATAAAGATTCTACAATACTAGCTTTTATGGTTATGAAGAGGAAAAATTGGCAGTAACCTGGCCCCACAAACCTTCAAATTAACGAATCAAATTAACAACCATAGGATGATAATGCGATTAGTTTTTTAGCCTTATTTCTGGGGTAATTAATCAGCGAAGCGATGATTTTTGATCTATTAACAGATATATAAATGGAAAAGCTGCATAACCACTTTAACTAATACTTTCAACATTTTCAGTTTGTATTACTTCTTATTCAAATGTCATAAAAGTATCAACAAAAAATTGTTAATATACCTCTATACTTTAACGTCAAGGAGAAAAAACTATAATGACTAAATCTCATTCAGAAGAAGTGATTGTACCTGAGTTCAATTCTAGCGCAAAGGAATTACCAAGACCATTGGCCGAAAAGTGCCCGAGCATAATTAAGAAATTTATAAGCGCTTATGATGCTAAACCGGATTTTGTTGCTAGATCGCCTGGTAGAGTCAATCTAATTGGTGAACATATTGATTATTGTGACTTCTCGGTTTTACCTTTAGCTATTGATTTTGATATGCTTTGCGCCGTCAAAGTTTTGAACGATGAGATTTCAAGTCTTAAAGCTATATCAGAGGGCTAAGCATGTGTATTCTGAATCTTTAAGAGTCTTGAAGGCTGTGAAATTAATGACTACAGCGAGCTTTACTGCCGACGAAGACTTTTTCAAGCAATTTGGTGCCTTGATGAACGAGTCTCAAGCTTCTTGCGATAAACTTTACGAATGTTCTTGTCCAGAGATTGACAAAATTTGTTCCATTGCTTTGTCAAATGGATCATATGGTTCCCGTTTGACCGGAGCTGGCTGGGGTGGTTGTACTGTTCACTTGGTTCCAGGGGGCCCAAATGGCAACATAGAAAAGGTAAAAGAAGCCCTTGCCAATGAGTTCTACAAGGTCAAGTACCCTAAGATCACTGATGCTGAGCTAGAAAATGCTATCATCGTCTCTAAACCAGCATTGGGCAGCTGTCTATATGAATTAGTCAAGTATACTTCTTTTTTTTACTTTGTTCAGAACAACTTCTCATTTTTTTCTACTCATAACTTTAGCATCACAAAATACGCAATAATAACGAGTAGTAACACTTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCACAAACTTTAAAACACAGGGACAAAATTCTTGATATGCTTTCAACCGCTGCGTTTTGGATACCTATTCTTGACATGATATGACTACCATTTTGTTATTGTACGTGGGGCAGTTGACGTCTTATCATATGTCAAAGTTGCGAAGTTCTTGGCAAGTTGCCAACTGACGAGATGCAGTAACACTTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCACAAACTTTAAAACACAGGGACAAAATTCTTGATATGCTTTCAACCGCTGCGTTTTGGATACCTATTCTTGACATGATATGACTACCATTTTGTTATTGTACGTGGGGCAGTTGACGTCTTATCATATGTCAAAGTCATTTGCGAAGTTCTTGGCAAGTTGCCAACTGACGAGATGCAGTTTCCTACGCATAATAAGAATAGGAGGGAATATCAAGCCAGACAATCTATCATTACATTTAAGCGGCTCTTCAAAAAGATTGAACTCTCGCCAACTTATGGAATCTTCCAATGAGACCTTTGCGCCAAATAATGTGGATTTGGAAAAAGAGTATAAGTCATCTCAGAGTAATATAACTACCGAAGTTTATGAGGCATCGAGCTTTGAAGAAAAAGTAAGCTCAGAAAAACCTCAATACAGCTCATTCTGGAAGAAAATCTATTATGAATATGTGGTCGTTGACAAATCAATCTTGGGTGTTTCTATTCTGGATTCATTTATGTACAACCAGGACTTGAAGCCCGTCGAAAAAGAAAGGCGGGTTTGGTCCTGGTACAATTATTGTTACTTCTGGCTTGCTGAATGTTTCAATATCAACACTTGGCAAATTGCAGCTACAGGTCTACAACTGGGTCTAAATTGGTGGCAGTGTTGGATAACAATTTGGATTGGGTACGGTTTCGTTGGTGCTTTTGTTGTTTTGGCCTCTAGAGTTGGATCTGCTTATCATTTGTCATTCCCTATATCATCTAGAGCATCATTCGGTATTTTCTTCTCTTTATGGCCCGTTATTAACAGAGTCGTCATGGCCATCGTTTGGTATAGTGTCCAAGCTTATATTGCGGCAACTCCCGTATCATTAATGCTGAAATCTATCTTTGGAAAAGATTTACAATGATTGTACGTGGGGCAGTTGACGTCTTATCATATGTCAAAGTCATTTGCGAAGTTCTTGGCAAGTTGCCAACTGACGAGATGCAGTAACACTTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCACAAACTTTAAAACACAGGGACAAAATTCTTGATATGCTTTCAACCGCTGCGTTTTGGATACCTATTCTTGACATGATATGACTACCATTTTGTTATTGTTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATGTTTTCAATGTAAGAGATTTCGATTATCTTATAGTTCATACATGCTTCAACTACTTAATAAATGATTGTATGATAATAAAG
31http://cs273a.stanford.edu [Bejerano Fall16/17]
![Page 32: MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano](https://reader036.fdocuments.in/reader036/viewer/2022070503/568163da550346895dd53130/html5/thumbnails/32.jpg)
Why this cartoon?
http://cs273a.stanford.edu [Bejerano Fall16/17] 32
![Page 33: MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano](https://reader036.fdocuments.in/reader036/viewer/2022070503/568163da550346895dd53130/html5/thumbnails/33.jpg)
Genome Composition
http://cs273a.stanford.edu [Bejerano Fall16/17] 33
The functional genome takes about 20% of the genome.The remaining 80% is far from homogeneous…
![Page 34: MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano](https://reader036.fdocuments.in/reader036/viewer/2022070503/568163da550346895dd53130/html5/thumbnails/34.jpg)
Sequences that repeat many times in the genome
• Take up cumulatively a whooping half of the genome• Come in two major, very different, flavors
http://cs273a.stanford.edu [Bejerano Fall16/17] 34
I
II
![Page 35: MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano](https://reader036.fdocuments.in/reader036/viewer/2022070503/568163da550346895dd53130/html5/thumbnails/35.jpg)
http://cs273a.stanford.edu [Bejerano Fall16/17] 35
I. Interspersed Repeats / TEs
[Adapted from Lunter]
![Page 36: MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano](https://reader036.fdocuments.in/reader036/viewer/2022070503/568163da550346895dd53130/html5/thumbnails/36.jpg)
http://cs273a.stanford.edu [Bejerano Fall16/17] 36
I. Interspersed Repeats / TEs
[Adapted from Lunter]
![Page 37: MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano](https://reader036.fdocuments.in/reader036/viewer/2022070503/568163da550346895dd53130/html5/thumbnails/37.jpg)
http://cs273a.stanford.edu [Bejerano Fall16/17] 37
I. Interspersed Repeats / TEs
[Adapted from Lunter]
![Page 38: MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano](https://reader036.fdocuments.in/reader036/viewer/2022070503/568163da550346895dd53130/html5/thumbnails/38.jpg)
http://cs273a.stanford.edu [Bejerano Fall16/17] 38
LINE & SINE Elements
![Page 39: MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano](https://reader036.fdocuments.in/reader036/viewer/2022070503/568163da550346895dd53130/html5/thumbnails/39.jpg)
http://cs273a.stanford.edu [Bejerano Fall16/17] 39
LINE & SINE Elements
![Page 40: MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano](https://reader036.fdocuments.in/reader036/viewer/2022070503/568163da550346895dd53130/html5/thumbnails/40.jpg)
http://cs273a.stanford.edu [Bejerano Fall16/17] 40
Genomic Transmission
For repeat copies to accumulate through human generations they must make it into the germline cells (eggs & sperms).
Equally true for any genomic mutation.
cell
genome =all DNA
chicken ≈ 1013 copies(DNA) of egg (DNA)
chicken
eggegg
egg
celldivision
DNA strings =Chromosomes
![Page 41: MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano](https://reader036.fdocuments.in/reader036/viewer/2022070503/568163da550346895dd53130/html5/thumbnails/41.jpg)
http://cs273a.stanford.edu [Bejerano Fall16/17] 41
Classes of Interspersed Repeats
![Page 42: MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano](https://reader036.fdocuments.in/reader036/viewer/2022070503/568163da550346895dd53130/html5/thumbnails/42.jpg)
http://cs273a.stanford.edu [Bejerano Fall16/17] 42
DNA Transposons
![Page 43: MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano](https://reader036.fdocuments.in/reader036/viewer/2022070503/568163da550346895dd53130/html5/thumbnails/43.jpg)
http://cs273a.stanford.edu [Bejerano Fall16/17] 43
Retrovirus-like Elements
![Page 44: MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano](https://reader036.fdocuments.in/reader036/viewer/2022070503/568163da550346895dd53130/html5/thumbnails/44.jpg)
TE composition and assortment vary among eukaryotic genomes
20%
40%
60%
80%
100%
Slim
e m
oldBu
dding
yeas
tFi
ssion
yeas
tNeu
rosp
ora
Arab
idops
isRice
Nemat
ode
Droso
phila
Mos
quito
Fugu
Mou
seHum
an
DNA transposonsLTR Retro.Non-LTR Retro.
Feschotte & Pritham 2006
44http://cs273a.stanford.edu [Bejerano Fall09/10]
![Page 45: MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano](https://reader036.fdocuments.in/reader036/viewer/2022070503/568163da550346895dd53130/html5/thumbnails/45.jpg)
Repeats: mostly neutralMost repeat events/instances are neutral.
Ie, a repeat instance is dropped in a new place, and joins the rest of the neutral DNA, gradually decaying over time.
Many repeat copies are “dead as a duck” on arrival at their new location (eg 5’ truncation).
Some instances may be active (spawn new instances) for a while, but when an active copy is hit by a mutation – the host is not affected, the instance is inactivated and decays away.
http://cs273a.stanford.edu [Bejerano Fall16/17] 45
![Page 46: MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano](https://reader036.fdocuments.in/reader036/viewer/2022070503/568163da550346895dd53130/html5/thumbnails/46.jpg)
http://cs273a.stanford.edu [Bejerano Fall16/17] 46
Repeat Ages
![Page 47: MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano](https://reader036.fdocuments.in/reader036/viewer/2022070503/568163da550346895dd53130/html5/thumbnails/47.jpg)
Figure from Ryan Gregory (2005)
INTERSPECIES VARIATION IN GENOME SIZE WITHIN VARIOUS GROUPS OF ORGANISMS
47
![Page 48: MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano](https://reader036.fdocuments.in/reader036/viewer/2022070503/568163da550346895dd53130/html5/thumbnails/48.jpg)
The amount of TE correlate positively with genome size
Plas
mod
ium
Slim
e m
old
Budd
ing
yeas
t
Fiss
ion
yeas
tNe
uros
pora
Arab
idop
sisBr
assic
aRi
ceM
aize
Nem
atod
eDr
osop
hila
Mos
quito
Sea
squi
rtZe
brafi
shFu
guM
ouse
Hum
an
0
500
1000
1500
2000
2500
3000 Genomic DNA
TE DNA
Protein-codingDNA
Mb
Feschotte & Pritham 2006
48http://cs273a.stanford.edu [Bejerano Fall09/10]
![Page 49: MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano](https://reader036.fdocuments.in/reader036/viewer/2022070503/568163da550346895dd53130/html5/thumbnails/49.jpg)
TEs
Protein-coding genes
The proportion of protein-coding genes decreases with genome size, while the proportion of TEs increases with genome size
Gregory, Nat Rev Genet 2005 49
![Page 50: MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano](https://reader036.fdocuments.in/reader036/viewer/2022070503/568163da550346895dd53130/html5/thumbnails/50.jpg)
Repeats: not just neutralSo far we treated all repeat proliferation events as neutral.
While the majority of them appear to be neutral, this is certainly not the case for all repeat instances.
And because there are so many repeat instances even a small fraction of all repeats can be a big set compared to other types of elements in the genome.(Eg, 1% of ½ the genome is still a lot)
http://cs273a.stanford.edu [Bejerano Fall16/17] 50
![Page 51: MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano](https://reader036.fdocuments.in/reader036/viewer/2022070503/568163da550346895dd53130/html5/thumbnails/51.jpg)
http://cs273a.stanford.edu [Bejerano Fall16/17] 51
![Page 52: MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano](https://reader036.fdocuments.in/reader036/viewer/2022070503/568163da550346895dd53130/html5/thumbnails/52.jpg)
http://cs273a.stanford.edu [Bejerano Fall16/17] 52
![Page 53: MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano](https://reader036.fdocuments.in/reader036/viewer/2022070503/568163da550346895dd53130/html5/thumbnails/53.jpg)
http://cs273a.stanford.edu [Bejerano Fall16/17] 53
Repeats & Retroposed Genes
Remember how LINEs reverse transcribe copies of themselves back into the genome? How they sometimes reverse transcribe SINEs “by mistake”? Well, they also grab m/ncRNAs and reverse transcribe them into the genome!
Retrogenes (“retrotranscribed”):Protein coding RNA that was reverse transcribed and inserted back into the genome.The RNA can be grabbed at any stage (partial/full transcript, before/during/after all introns are spliced).
![Page 54: MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano](https://reader036.fdocuments.in/reader036/viewer/2022070503/568163da550346895dd53130/html5/thumbnails/54.jpg)
http://cs273a.stanford.edu [Bejerano Fall16/17] 54
Retroposed Genes & Pseudogenes
Pseudogenes (“dead genes”):Genomic sequences that resemble (originated from) genes that no longer make proteins.
Retrogenes (“retrotranscribed”):Protein coding RNA that was reverse transcribed and inserted back into the genome.The RNA can be grabbed at any stage (partial/full transcript, before/during/after all introns are spliced).
![Page 55: MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano](https://reader036.fdocuments.in/reader036/viewer/2022070503/568163da550346895dd53130/html5/thumbnails/55.jpg)
http://cs273a.stanford.edu [Bejerano Fall16/17] 55
Repeat Insertions Can “Break Things”
![Page 56: MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano](https://reader036.fdocuments.in/reader036/viewer/2022070503/568163da550346895dd53130/html5/thumbnails/56.jpg)
http://cs273a.stanford.edu [Bejerano Fall16/17] 56
Repeat Insertions Can “Make Things”
![Page 57: MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano](https://reader036.fdocuments.in/reader036/viewer/2022070503/568163da550346895dd53130/html5/thumbnails/57.jpg)
Any Sequence Can Become Functional
Random mutation (especially in a large place like our genome) can create functional DNA elements out of neutrally evolving sequences.
So is there anything special about a piece of DNA from a repetitive origin that takes on a new function?
http://cs273a.stanford.edu [Bejerano Fall16/17] 57
![Page 58: MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano](https://reader036.fdocuments.in/reader036/viewer/2022070503/568163da550346895dd53130/html5/thumbnails/58.jpg)
http://cs273a.stanford.edu [Bejerano Fall16/17] 58
Regulatory elements from obile Elements
[Yass is a small town in New South Wales, Australia.]
Co-option event, probably due to favorable genomic context
![Page 59: MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano](https://reader036.fdocuments.in/reader036/viewer/2022070503/568163da550346895dd53130/html5/thumbnails/59.jpg)
http://cs273a.stanford.edu [Bejerano Fall16/17] 59
Britten & Davidson Hypothesis: Repeat to Rewire!
Enhancer structure reminder
![Page 60: MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano](https://reader036.fdocuments.in/reader036/viewer/2022070503/568163da550346895dd53130/html5/thumbnails/60.jpg)
The Road to Co-Option
http://cs273a.stanford.edu [Bejerano Fall16/17] 60
Transposition Event
Random Mutations
Neutral decay
PotentialCo-OptionStates
![Page 61: MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano](https://reader036.fdocuments.in/reader036/viewer/2022070503/568163da550346895dd53130/html5/thumbnails/61.jpg)
http://cs273a.stanford.edu [Bejerano Fall16/17] 61
Assemby Challenges
![Page 62: MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano](https://reader036.fdocuments.in/reader036/viewer/2022070503/568163da550346895dd53130/html5/thumbnails/62.jpg)
http://cs273a.stanford.edu [Bejerano Fall16/17] 62
Inferring Phylogeny Using Repeats
[Nishihara et al, 2006]
![Page 63: MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano](https://reader036.fdocuments.in/reader036/viewer/2022070503/568163da550346895dd53130/html5/thumbnails/63.jpg)
http://cs273a.stanford.edu [Bejerano Fall16/17] 63
Transposons as Genetics Engineering Tools
Human Gene Therapy
![Page 64: MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano](https://reader036.fdocuments.in/reader036/viewer/2022070503/568163da550346895dd53130/html5/thumbnails/64.jpg)
Repeats: fun conspiracy theories1. Repeats wreck so much havoc in the genome, by inserting themselves, deleting segments between instances and more – they make the genome feel like a “rolling sea”. Maybe it is because of them that enhancers “learned” to work irrespective of distance and orientation?
2. When the last active copy of a repeat dies, all instances of the repeat are now decaying. Wait long enough and they lose resemblance to each other. Look in 200My and you never know they belonged to the same repeat family. So… if half the genome is recognizable as repetitive now, how much of the genome originated from repeats? Most of it?
http://cs273a.stanford.edu [Bejerano Fall16/17] 64
![Page 65: MW 12:50-2:05pm in Beckman B302 Profs: Serafim Batzoglou & Gill Bejerano](https://reader036.fdocuments.in/reader036/viewer/2022070503/568163da550346895dd53130/html5/thumbnails/65.jpg)
Repeats: fun conspiracy theories3. If repeats do significantly accelerate the rate of creation of novel functional (gene/regulation) elements – how many functional elements today came from repeats (including old ones we no longer can recognize as such)? Most?
4. Is that why our genome “tolerates” these elements?
5. You make a conspiracy theory…
6. You think of ways* to solve one!
* Computationally. Evolution is mostly computational business.
http://cs273a.stanford.edu [Bejerano Fall16/17] 65