Molecular Biology of the Genome - biostat.washington.edu · Genome Christine Queitsch Department of...
-
Upload
truongtruc -
Category
Documents
-
view
218 -
download
0
Transcript of Molecular Biology of the Genome - biostat.washington.edu · Genome Christine Queitsch Department of...
![Page 1: Molecular Biology of the Genome - biostat.washington.edu · Genome Christine Queitsch Department of Genome Sciences queitsch@uw.edu 1 • Information Flow in Genomics • Gene Structure](https://reader030.fdocuments.in/reader030/viewer/2022020305/5cca847d88c993fa708b6c85/html5/thumbnails/1.jpg)
Molecular Biology of the Genome
Christine Queitsch
Department of Genome Sciences
1
![Page 2: Molecular Biology of the Genome - biostat.washington.edu · Genome Christine Queitsch Department of Genome Sciences queitsch@uw.edu 1 • Information Flow in Genomics • Gene Structure](https://reader030.fdocuments.in/reader030/viewer/2022020305/5cca847d88c993fa708b6c85/html5/thumbnails/2.jpg)
• Information Flow in Genomics
• Gene Structure
• Genetic Linkage
• Chromatin Structure
• Genome Sequencing
Outline
2
![Page 3: Molecular Biology of the Genome - biostat.washington.edu · Genome Christine Queitsch Department of Genome Sciences queitsch@uw.edu 1 • Information Flow in Genomics • Gene Structure](https://reader030.fdocuments.in/reader030/viewer/2022020305/5cca847d88c993fa708b6c85/html5/thumbnails/3.jpg)
DNA and the Flow of Information
The genetic material: DNA - Four kinds of subunits (bases A, C, G, T)
Ile
Gly
Ala
Arg
Lys
Val
Leu
Ile
ProSer
Thr
Cys
Tyr
Asn
Glu
Gln
ArgPhe
Val
Asn
Gln
His
Leu
Cys
Gly
Ser
HisLeu Val
Glu
Ala
Leu
Leu
Tyr
Val
Cys
GlyPhe
Phe
Tyr
Arg
Arg
Ala
Pro
Gln
Glu
Ala
Ala
Gly
Glu
Gly
Gly
Gly
Gly
Gly
Leu
Leu
Gln
Ala
LeuAla
Leu
Pro
Gly
Glu
Pro
Gln
Lys
Val
Gly
Cys
Gln
Glu
Thr
Cys
Ser
LeuGln
Leu
Glu
Asn
Asn
Tyr
Cys
H3N+
COO-
Activities within the cell performed by proteins - Twenty kinds of subunits (amino acids)
A coding problem
A C G
T
3
![Page 4: Molecular Biology of the Genome - biostat.washington.edu · Genome Christine Queitsch Department of Genome Sciences queitsch@uw.edu 1 • Information Flow in Genomics • Gene Structure](https://reader030.fdocuments.in/reader030/viewer/2022020305/5cca847d88c993fa708b6c85/html5/thumbnails/4.jpg)
The “Central Dogma” of Molecular Biology
Information into protein flows one way A universal code: 3 nucleotides = 1 amino acid
DNA RNA Protein
phenotype
transcription translation replication
heredity
4
![Page 5: Molecular Biology of the Genome - biostat.washington.edu · Genome Christine Queitsch Department of Genome Sciences queitsch@uw.edu 1 • Information Flow in Genomics • Gene Structure](https://reader030.fdocuments.in/reader030/viewer/2022020305/5cca847d88c993fa708b6c85/html5/thumbnails/5.jpg)
DNA Structure
• Information content is in the sequence of bases along a DNA molecule
rules of base pairing each strand of the double helix has all the info needed to recreate the other strand
• Genetic variation — differences in the base sequence between different individuals
• Redundancy in the code
multiple ways that DNA can specify a single amino acid
why individuals vary in their phenotypes
5
![Page 6: Molecular Biology of the Genome - biostat.washington.edu · Genome Christine Queitsch Department of Genome Sciences queitsch@uw.edu 1 • Information Flow in Genomics • Gene Structure](https://reader030.fdocuments.in/reader030/viewer/2022020305/5cca847d88c993fa708b6c85/html5/thumbnails/6.jpg)
Central Dogma: DNA Replication
DNA structure: polarity and base pairing
5’ 3’ 3’ 5’
Watson
Crick A pairs with T G pairs with C
DNA replication: what’s the point?
duplicate the entire genome prior to cell division
new subunits can only be added to the 3’OH of the growing chain
6 3’
3’
5’
5’ 5’
3’
leading strand
lagging strand
3’
![Page 7: Molecular Biology of the Genome - biostat.washington.edu · Genome Christine Queitsch Department of Genome Sciences queitsch@uw.edu 1 • Information Flow in Genomics • Gene Structure](https://reader030.fdocuments.in/reader030/viewer/2022020305/5cca847d88c993fa708b6c85/html5/thumbnails/7.jpg)
Central Dogma: Transcription
Genes — specific segments along the chromosomal DNA that code for some function
promoter
mRNA
mRNA
promoter
terminator
Transcription: “copy” gene into RNA (to make a specific protein)
gene gene
gene
terminator
7
![Page 8: Molecular Biology of the Genome - biostat.washington.edu · Genome Christine Queitsch Department of Genome Sciences queitsch@uw.edu 1 • Information Flow in Genomics • Gene Structure](https://reader030.fdocuments.in/reader030/viewer/2022020305/5cca847d88c993fa708b6c85/html5/thumbnails/8.jpg)
Transcription
Transcription: “copy” gene into RNA to make a specific protein
5’ 3’ 3’ 5’
w
c
gene coding or sense strand
template strand
Where’s the 5’ end of the gene? of the mRNA?
Which way is RNA polymerase moving?
mRNA RNA polymerase
ribonucleic acid… uses uracil (U) in place of thymine (T)
8
![Page 9: Molecular Biology of the Genome - biostat.washington.edu · Genome Christine Queitsch Department of Genome Sciences queitsch@uw.edu 1 • Information Flow in Genomics • Gene Structure](https://reader030.fdocuments.in/reader030/viewer/2022020305/5cca847d88c993fa708b6c85/html5/thumbnails/9.jpg)
Transcription in vivo
gene
nascent RNA transcripts DNA
RNA polymerases 9
![Page 10: Molecular Biology of the Genome - biostat.washington.edu · Genome Christine Queitsch Department of Genome Sciences queitsch@uw.edu 1 • Information Flow in Genomics • Gene Structure](https://reader030.fdocuments.in/reader030/viewer/2022020305/5cca847d88c993fa708b6c85/html5/thumbnails/10.jpg)
Practice Question
1. Which way (to the right or left) are RNA polymerases moving?
2. Which strand (W or C) is the template strand?
5’ 3’ 3’ 5’
w
c
gene
10
![Page 11: Molecular Biology of the Genome - biostat.washington.edu · Genome Christine Queitsch Department of Genome Sciences queitsch@uw.edu 1 • Information Flow in Genomics • Gene Structure](https://reader030.fdocuments.in/reader030/viewer/2022020305/5cca847d88c993fa708b6c85/html5/thumbnails/11.jpg)
Processing of pre-mRNA
Eukaryotic genes are interrupted by introns (non-coding information). They must be removed from the RNA before translation in a process called “splicing.”
mature mRNA introns discarded exons spliced together
exons introns
ORF
gene
UTR’s (untranslated regions)
pre-mRNA
11
![Page 12: Molecular Biology of the Genome - biostat.washington.edu · Genome Christine Queitsch Department of Genome Sciences queitsch@uw.edu 1 • Information Flow in Genomics • Gene Structure](https://reader030.fdocuments.in/reader030/viewer/2022020305/5cca847d88c993fa708b6c85/html5/thumbnails/12.jpg)
Review of the Central Dogma: Translation
Translating the nucleic acid code to a peptide code…
Possible coding systems:
1 base per amino acid
Could only code for 4 amino acids!
2 bases per amino acid
Could only code for 16 amino acids
3 bases per amino acid
64 possible combinations… that’s plenty!
12
![Page 13: Molecular Biology of the Genome - biostat.washington.edu · Genome Christine Queitsch Department of Genome Sciences queitsch@uw.edu 1 • Information Flow in Genomics • Gene Structure](https://reader030.fdocuments.in/reader030/viewer/2022020305/5cca847d88c993fa708b6c85/html5/thumbnails/13.jpg)
M e t P h e T h r V a l S e r T h r
A U G A C U U U U U A A A A
A A C C C C G
NH3+ COO-
5’ 3’ mRNA
protein
The triplet code
3 bases = 1 amino acid More than 1 triplet can code for the same amino acid
Translation: reads the information in RNA to order the amino acids in a protein
codon
13
![Page 14: Molecular Biology of the Genome - biostat.washington.edu · Genome Christine Queitsch Department of Genome Sciences queitsch@uw.edu 1 • Information Flow in Genomics • Gene Structure](https://reader030.fdocuments.in/reader030/viewer/2022020305/5cca847d88c993fa708b6c85/html5/thumbnails/14.jpg)
Punctuation:
M e t P h e T h r V a l S e r T h r
A U G A C U U U U U A A A A
A A C C C C G
NH3+ COO-
5’ 3’ mRNA
protein STOP
Start: AUG = methionine, the first amino acid in (almost) all proteins
Stop: UAA, UAG, and UGA.
NOT an amino acid! 14
![Page 15: Molecular Biology of the Genome - biostat.washington.edu · Genome Christine Queitsch Department of Genome Sciences queitsch@uw.edu 1 • Information Flow in Genomics • Gene Structure](https://reader030.fdocuments.in/reader030/viewer/2022020305/5cca847d88c993fa708b6c85/html5/thumbnails/15.jpg)
The Genetic Code: Who is the interpreter? Where’s the dictionary? What are the rules of grammar?
aminoacyl tRNA synthetase
amino acid
tRNA
charged tRNA
UAC UAC
Met Met
tRNA = transfer RNA
3’
anticodon
| | | AUG 3’ 5’
recognizes codon in mRNA
5’ 3’
15
![Page 16: Molecular Biology of the Genome - biostat.washington.edu · Genome Christine Queitsch Department of Genome Sciences queitsch@uw.edu 1 • Information Flow in Genomics • Gene Structure](https://reader030.fdocuments.in/reader030/viewer/2022020305/5cca847d88c993fa708b6c85/html5/thumbnails/16.jpg)
5’ 3’
The ribosome: mediates translation
…AUAUGACUUCAGUAACCAUCUAACA…
After the 1st two tRNAs have bound…
ribosome
UAC
Met
... UGA
Thr
...
Locates the 1st AUG, sets the reading frame for codon-anticodon base-pairing
16
![Page 17: Molecular Biology of the Genome - biostat.washington.edu · Genome Christine Queitsch Department of Genome Sciences queitsch@uw.edu 1 • Information Flow in Genomics • Gene Structure](https://reader030.fdocuments.in/reader030/viewer/2022020305/5cca847d88c993fa708b6c85/html5/thumbnails/17.jpg)
5’ 3’ …AUAUGACUUCAGUAACCAUCUAACA…
UAC
Met
the ribosome breaks the Met-tRNA bond; Met is instead joined to the second amino acid
ribosome
UGA
Thr
...
17
P-site A-site
![Page 18: Molecular Biology of the Genome - biostat.washington.edu · Genome Christine Queitsch Department of Genome Sciences queitsch@uw.edu 1 • Information Flow in Genomics • Gene Structure](https://reader030.fdocuments.in/reader030/viewer/2022020305/5cca847d88c993fa708b6c85/html5/thumbnails/18.jpg)
5’ 3’ …AUAUGACUUCAGUAACCAUCUAACA…
UAC
Met
the ribosome breaks the Met-tRNA bond; Met is instead joined to the second amino acid …and the Met-tRNA is released
ribosome
UGA
Thr
...
…then ribosome moves over by 1 codon in the 3’ direction
18
![Page 19: Molecular Biology of the Genome - biostat.washington.edu · Genome Christine Queitsch Department of Genome Sciences queitsch@uw.edu 1 • Information Flow in Genomics • Gene Structure](https://reader030.fdocuments.in/reader030/viewer/2022020305/5cca847d88c993fa708b6c85/html5/thumbnails/19.jpg)
5’ 3’ …AUAUGACUUCAGUAACCAUCUAACA…
Met
UGA
Thr
AGU ...
Ser
19
![Page 20: Molecular Biology of the Genome - biostat.washington.edu · Genome Christine Queitsch Department of Genome Sciences queitsch@uw.edu 1 • Information Flow in Genomics • Gene Structure](https://reader030.fdocuments.in/reader030/viewer/2022020305/5cca847d88c993fa708b6c85/html5/thumbnails/20.jpg)
5’ 3’ …AUAUGACUUCAGUAACCAUCUAACA… UAG ...
Met Thr Ser Val Thr Phe
STOP
When the ribosome reaches the Stop codon… termination
20
![Page 21: Molecular Biology of the Genome - biostat.washington.edu · Genome Christine Queitsch Department of Genome Sciences queitsch@uw.edu 1 • Information Flow in Genomics • Gene Structure](https://reader030.fdocuments.in/reader030/viewer/2022020305/5cca847d88c993fa708b6c85/html5/thumbnails/21.jpg)
5’ 3’ …AUAUGACUUCAGUAACCAUCUAACA…
Met Thr Ser Val Thr Phe NH3
+ COO-
The finished peptide!
21
C-terminus
N-terminus
![Page 22: Molecular Biology of the Genome - biostat.washington.edu · Genome Christine Queitsch Department of Genome Sciences queitsch@uw.edu 1 • Information Flow in Genomics • Gene Structure](https://reader030.fdocuments.in/reader030/viewer/2022020305/5cca847d88c993fa708b6c85/html5/thumbnails/22.jpg)
Practice Question
Which strand on the DNA sequence is the coding (sense) strand? How can you tell?
22
![Page 23: Molecular Biology of the Genome - biostat.washington.edu · Genome Christine Queitsch Department of Genome Sciences queitsch@uw.edu 1 • Information Flow in Genomics • Gene Structure](https://reader030.fdocuments.in/reader030/viewer/2022020305/5cca847d88c993fa708b6c85/html5/thumbnails/23.jpg)
Finding Sense in Nonsense
cbdryloiaucahjdhtheflybitthedogbutnotthecatjhhajctipheq
GGGTATAGAAAATGAATATAAACTCATAGACAAGATCGGTGAGGGAACATTTTCGTCAGTGTATAAAGCCAAAGATATCACTGGGAAAATAACAAAAAAATTTGCATCACATTTTTGGAATTATGGTTCGAACTATGTTGCTTTGAAGAAAATATACGTTACCTCGTCACCGCAAAGAATTTATAATGAGCTCAACCTGCTGTACATAATGACGGGATCTTCGAGAGTAGCCCCTCTATGTGATGCAAAAAGGGTGCGAGATCAAGTCATTGCTGTTTTACCGTACTATCCCCACGAGGAGTTCCGAACTTTCTACAGGGATCTACCAATCAAGGGAATCAAGAAGTACATTTGGGAGCTACTAAGAGCATTGAAGTTTGTTCATTCGAAGGGAATTATTCATAGAGACATCAAACCGACAAATTTTTTATTTAATTTGGAATTGGGGCGTGGAGTGCTTGTTGATTTTGGTCTAGCCGAGGCTCAAATGGATTATAAAAGCATGATATCTAGTCAAAACGATTACGACAATTATGCAAATACAAACCATGATGGTGGATATTCAATGAGGAATCACGAACAATTTTGTCCATGCATTATGCGTAATCAATATTCTCCTAACTCACATAACCAAACACCTCCTATGGTCACCATACAAAATGGCAAGGTCGTCCACTTAAACAATGTAAATGGGGTGGATCTGACAAAGGGTTATCCTAAAAATGAAACGCGTAGAATTAAAAGGGCTAATAGAGCAGGGACTCGTGGATTTCGGGCACCAGAAGTGTTAATGAAGTGTGGGGCTCAAAGCACAAAGATTGATATATGGTCCGTAGGTGTTATTCTTTTAAGTCTTTTGGGCAGAAGATTTCCAATGTTCCAAAGTTTAGATGATGCGGATTCTTTGCTAGAGTTATGTACTATTTTTGGTTGGAAAGAATTAAGAAAATGCGCAGCGTTGCATGGATTGGGTTTCGAAGCTAGTGGGCTCATTTGGGATAAACCAAACGGATATTCTAATGGATTGAAGGAATTTGTTTATGATTTGCTTAATAAAGAATGTACCATAGGTACGTTCCCTGAGTACAGTGTTGCTTTTGAAACATTCGGATTTCTACAACAAGAATTACATGACAGGATGTCCATTGAACCTCAATTACCTGACCCCAAGACAAATATGGATGCTGTTGATGCCTATGAGTTGAAAAAGTATCAAGAAGAAATTTGGTCCGATCATTATTGGTGCTTCCAGGTTTTGGAACAATGCTTCGAAATGGATCCTCAAAAGCGTAGTTCAGCAGAAGATTTACTGAAAACCCCGTTTTTCAATGAATTGAATGAAAACACATATTTACTGGATGGCGAGAGTACTGACGAAGATGACGTTGTCAGCTCAAGCGAGGCAGATTTGCTCGATAAGGATGTTCT
How do you find out if sequence contains a gene? How do you identify the gene?
23
![Page 24: Molecular Biology of the Genome - biostat.washington.edu · Genome Christine Queitsch Department of Genome Sciences queitsch@uw.edu 1 • Information Flow in Genomics • Gene Structure](https://reader030.fdocuments.in/reader030/viewer/2022020305/5cca847d88c993fa708b6c85/html5/thumbnails/24.jpg)
Reading Frame: the ribosome establishes the grouping of nucleotides that correspond to codons by the first AUG encountered.
ORF: open reading frame, from the first AUG to the first in-frame stop. The ORF encodes the information for the protein.
5’ 3’ …AUAUGACUUCAGUAACCAUCUAACA…
Starts counting triplets from this base
More generally: a reading frame with a stretch of codons not interrupted by stop – non-coding RNAs!
24
![Page 25: Molecular Biology of the Genome - biostat.washington.edu · Genome Christine Queitsch Department of Genome Sciences queitsch@uw.edu 1 • Information Flow in Genomics • Gene Structure](https://reader030.fdocuments.in/reader030/viewer/2022020305/5cca847d88c993fa708b6c85/html5/thumbnails/25.jpg)
- read the sequence 5’ 3’, looking for stop
- try each reading frame
- since we know the genetic code—can do a virtual translation if necessary
Looking for ORFs
25
How to identify genes experimentally?
![Page 26: Molecular Biology of the Genome - biostat.washington.edu · Genome Christine Queitsch Department of Genome Sciences queitsch@uw.edu 1 • Information Flow in Genomics • Gene Structure](https://reader030.fdocuments.in/reader030/viewer/2022020305/5cca847d88c993fa708b6c85/html5/thumbnails/26.jpg)
• Information Flow in Genomics
• Gene Structure
• Genetic Linkage
• Chromatin Structure
• Genome Sequencing
Outline
26
![Page 27: Molecular Biology of the Genome - biostat.washington.edu · Genome Christine Queitsch Department of Genome Sciences queitsch@uw.edu 1 • Information Flow in Genomics • Gene Structure](https://reader030.fdocuments.in/reader030/viewer/2022020305/5cca847d88c993fa708b6c85/html5/thumbnails/27.jpg)
Gene Structure: The Parts List
= CRM (cis-regulatory motif) • Can be upstream or downstream of promoter, proximal or distal
Exon Exon
Promoter – proximal regulatory element
5’ UTR 3’ UTR
Intron Intron
Enhancer – distal regulatory element
Genomic DNA for a protein-coding eukaryotic gene is comprised of regulatory and coding sequences
27
![Page 28: Molecular Biology of the Genome - biostat.washington.edu · Genome Christine Queitsch Department of Genome Sciences queitsch@uw.edu 1 • Information Flow in Genomics • Gene Structure](https://reader030.fdocuments.in/reader030/viewer/2022020305/5cca847d88c993fa708b6c85/html5/thumbnails/28.jpg)
Promoters
•Promoters are specific sites on DNA that RNA polymerase first binds to initiate the transcription of a gene
• Composed of a variety of different cis-sequence elements which recruit trans-acting factors through DNA-protein interactions
28
![Page 29: Molecular Biology of the Genome - biostat.washington.edu · Genome Christine Queitsch Department of Genome Sciences queitsch@uw.edu 1 • Information Flow in Genomics • Gene Structure](https://reader030.fdocuments.in/reader030/viewer/2022020305/5cca847d88c993fa708b6c85/html5/thumbnails/29.jpg)
Core Promoter Elements
Exon Exon
Promoter
5’ UTR 3’ UTR
Intron Intron Enhancer
TATA inr
T A TATA A
T A
~-30
PyPyAN T A PyPy
+1
G C A
G G C
CGCC
BRE
- not all elements required
- many promoters lack a TATA box, using instead the
functionally analogous initiator (inr) element
~-50
29
![Page 30: Molecular Biology of the Genome - biostat.washington.edu · Genome Christine Queitsch Department of Genome Sciences queitsch@uw.edu 1 • Information Flow in Genomics • Gene Structure](https://reader030.fdocuments.in/reader030/viewer/2022020305/5cca847d88c993fa708b6c85/html5/thumbnails/30.jpg)
Combinatorial Gene Regulation
• Most eukaryotic genes have multiple cis regulatory motifs
located outside of the core promoter region
• Can be located in promoter proximal regions, 3’ downstream regions, and many kb away from target gene
• Allows for combinatorial control of gene expression
30
![Page 31: Molecular Biology of the Genome - biostat.washington.edu · Genome Christine Queitsch Department of Genome Sciences queitsch@uw.edu 1 • Information Flow in Genomics • Gene Structure](https://reader030.fdocuments.in/reader030/viewer/2022020305/5cca847d88c993fa708b6c85/html5/thumbnails/31.jpg)
Distal regulatory elements: Enhancers
Enhancer :
http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=mcb.figgrp.2601
“Enhancesome”
- Can function in either orientation
- Can occur far (>50 kb) from the gene
- Can be up or downstream
- Range in size between ~50-200 bp
- Contain multiple TF binding sites
31
![Page 32: Molecular Biology of the Genome - biostat.washington.edu · Genome Christine Queitsch Department of Genome Sciences queitsch@uw.edu 1 • Information Flow in Genomics • Gene Structure](https://reader030.fdocuments.in/reader030/viewer/2022020305/5cca847d88c993fa708b6c85/html5/thumbnails/32.jpg)
Exon Exon 5’ UTR 3’ UTR
• Most eukaryotic mRNAs contain untranslated regions in their 5’ and 3’ ends
• The 5’ UTR is the region between the start of transcription and the start of translation
• The 3’ UTR is the region between the stop codon and poly-A tail
• Both the 5’ and 3’ UTRs can contain cis regulatory sequences that bind TFs, influence transport to the cytoplasm, mediate transcript stability, and translational control
Untranslated Regions (UTRs)
32
![Page 33: Molecular Biology of the Genome - biostat.washington.edu · Genome Christine Queitsch Department of Genome Sciences queitsch@uw.edu 1 • Information Flow in Genomics • Gene Structure](https://reader030.fdocuments.in/reader030/viewer/2022020305/5cca847d88c993fa708b6c85/html5/thumbnails/33.jpg)
Alternative Splicing
• mRNA from some genes can be spliced into two or more distinct transcripts
• Creates protein diversity (isoforms)
5’ splice site 3’ splice site
33
![Page 34: Molecular Biology of the Genome - biostat.washington.edu · Genome Christine Queitsch Department of Genome Sciences queitsch@uw.edu 1 • Information Flow in Genomics • Gene Structure](https://reader030.fdocuments.in/reader030/viewer/2022020305/5cca847d88c993fa708b6c85/html5/thumbnails/34.jpg)
• Information Flow in Genomics
• Gene Structure
• Genetic Linkage
• Chromatin Structure
• Genome Sequencing
Outline
34
![Page 35: Molecular Biology of the Genome - biostat.washington.edu · Genome Christine Queitsch Department of Genome Sciences queitsch@uw.edu 1 • Information Flow in Genomics • Gene Structure](https://reader030.fdocuments.in/reader030/viewer/2022020305/5cca847d88c993fa708b6c85/html5/thumbnails/35.jpg)
Transmission of Genetic Information
Chromosomes condensed
Chromosomes decondensed
Diploid 2N 2N
1N
1N
2N
Elements of cell division
Cell growth
Chromosome duplication
Chromosome segregation 35
![Page 36: Molecular Biology of the Genome - biostat.washington.edu · Genome Christine Queitsch Department of Genome Sciences queitsch@uw.edu 1 • Information Flow in Genomics • Gene Structure](https://reader030.fdocuments.in/reader030/viewer/2022020305/5cca847d88c993fa708b6c85/html5/thumbnails/36.jpg)
Meiosis
Interphase: Chromosomes replicate
Meiosis I: Reductive division, homologous chromosomes separate
Meiosis II: Sister chromatids separate
36
![Page 37: Molecular Biology of the Genome - biostat.washington.edu · Genome Christine Queitsch Department of Genome Sciences queitsch@uw.edu 1 • Information Flow in Genomics • Gene Structure](https://reader030.fdocuments.in/reader030/viewer/2022020305/5cca847d88c993fa708b6c85/html5/thumbnails/37.jpg)
Recombination
37
![Page 38: Molecular Biology of the Genome - biostat.washington.edu · Genome Christine Queitsch Department of Genome Sciences queitsch@uw.edu 1 • Information Flow in Genomics • Gene Structure](https://reader030.fdocuments.in/reader030/viewer/2022020305/5cca847d88c993fa708b6c85/html5/thumbnails/38.jpg)
How Does Distance Between Loci Affect Transmission?
Independent Assortment: loci are unlinked or far enough apart that they are transmitted independently from one another
Genetic linkage: loci are close enough together on a chromosome to be transmitted together
38
![Page 39: Molecular Biology of the Genome - biostat.washington.edu · Genome Christine Queitsch Department of Genome Sciences queitsch@uw.edu 1 • Information Flow in Genomics • Gene Structure](https://reader030.fdocuments.in/reader030/viewer/2022020305/5cca847d88c993fa708b6c85/html5/thumbnails/39.jpg)
Genetic Mapping
The frequency of recombination between loci is based on the distance between them
39
![Page 40: Molecular Biology of the Genome - biostat.washington.edu · Genome Christine Queitsch Department of Genome Sciences queitsch@uw.edu 1 • Information Flow in Genomics • Gene Structure](https://reader030.fdocuments.in/reader030/viewer/2022020305/5cca847d88c993fa708b6c85/html5/thumbnails/40.jpg)
Recombination Is A Measure of Distance
• Recombination fraction, = the probability that a recombinant gamete is transmitted
• If two loci are on different chromosomes, they will segregate independently
=> recombination fraction = 0.5
• If two loci are right next to each other, they will segregate together during meiosis
=> recombination fraction = 0
• Jargon:
< 0.5 the loci are close (they are linked)
= 0.5 the loci are far apart (they are not linked) 40
![Page 41: Molecular Biology of the Genome - biostat.washington.edu · Genome Christine Queitsch Department of Genome Sciences queitsch@uw.edu 1 • Information Flow in Genomics • Gene Structure](https://reader030.fdocuments.in/reader030/viewer/2022020305/5cca847d88c993fa708b6c85/html5/thumbnails/41.jpg)
Recombination Is A Measure of Distance
Map Distance = Number Recombinant Gametes
Total Number of Gametesx 100
Centimorgan (cM): a unit of chromosome length, equals the length of chromosome over which crossing-over occurs with 1% frequency
41
![Page 42: Molecular Biology of the Genome - biostat.washington.edu · Genome Christine Queitsch Department of Genome Sciences queitsch@uw.edu 1 • Information Flow in Genomics • Gene Structure](https://reader030.fdocuments.in/reader030/viewer/2022020305/5cca847d88c993fa708b6c85/html5/thumbnails/42.jpg)
Practice Question
• In maize, consider three recessive phenotypes: lazy growth (ll), glossy leaves (gg), and sugary endosperm (ss).
• The following cross was made: Ll Gg Ss x ll gg ss and the observed progeny distribution was (neither gene nor linkage phase is known)
Phenotype Number
wildtype 286
lazy 33
glossy 59
sugary 4
lazy, glossy 2
lazy, sugary 44
glossy, sugary 40
lazy, glossy, sugary 272
Total 740
• Determine order and distances among the three genes
42
![Page 43: Molecular Biology of the Genome - biostat.washington.edu · Genome Christine Queitsch Department of Genome Sciences queitsch@uw.edu 1 • Information Flow in Genomics • Gene Structure](https://reader030.fdocuments.in/reader030/viewer/2022020305/5cca847d88c993fa708b6c85/html5/thumbnails/43.jpg)
Where to begin?
Parental types will constitute ≥ 50% of all progeny, so…
L G S / l g s x l g s / l g s
Recomb. Wild-type for all lazy, gloss, sugary
Rule 1: Two most-frequent gametes types are the parental types
![Page 44: Molecular Biology of the Genome - biostat.washington.edu · Genome Christine Queitsch Department of Genome Sciences queitsch@uw.edu 1 • Information Flow in Genomics • Gene Structure](https://reader030.fdocuments.in/reader030/viewer/2022020305/5cca847d88c993fa708b6c85/html5/thumbnails/44.jpg)
Progeny Phenotype
Progeny Genotypes
Number
wildtype L G S // l g s 286
lazy l G S // l g s 33
glossy L g S // l g s 59
sugary L G s // l g s 4
lazy,glossy l g S // l g s 2
lazy,sugary l G s // l g s 44
glossy,sugary L g s // l g s 40
lazy,glossy,sugary l g s // l g s 272
Total 740
L G S // l g s x l g s // l g s
![Page 45: Molecular Biology of the Genome - biostat.washington.edu · Genome Christine Queitsch Department of Genome Sciences queitsch@uw.edu 1 • Information Flow in Genomics • Gene Structure](https://reader030.fdocuments.in/reader030/viewer/2022020305/5cca847d88c993fa708b6c85/html5/thumbnails/45.jpg)
Linkage phase in heterozygous parent?
• L G S or L g S or l g S or L g s
• l g s l G s L G s
l G S
![Page 46: Molecular Biology of the Genome - biostat.washington.edu · Genome Christine Queitsch Department of Genome Sciences queitsch@uw.edu 1 • Information Flow in Genomics • Gene Structure](https://reader030.fdocuments.in/reader030/viewer/2022020305/5cca847d88c993fa708b6c85/html5/thumbnails/46.jpg)
Rule 2 • The double-recombinant gametes will be the two
least frequent types.
A B C
a b c
Progeny Phenotype Progeny Genotypes
Number
wildtype L G S / l g s 286 lazy l G S / l g s 33 glossy L g S / l g s 59 sugary L G s / l g s 4 lazy,glossy l g S / l g s 2 lazy,sugary l G s / l g s 44 glossy,sugary L g s / l g s 40 lazy,glossy,sugary l g s / l g s 272
Total 740
![Page 47: Molecular Biology of the Genome - biostat.washington.edu · Genome Christine Queitsch Department of Genome Sciences queitsch@uw.edu 1 • Information Flow in Genomics • Gene Structure](https://reader030.fdocuments.in/reader030/viewer/2022020305/5cca847d88c993fa708b6c85/html5/thumbnails/47.jpg)
Rule 3
• Effect of double crossovers is to interchange the members of the middle pair of alleles between the chromosomes
A B C
a b c
A b C
a B c
![Page 48: Molecular Biology of the Genome - biostat.washington.edu · Genome Christine Queitsch Department of Genome Sciences queitsch@uw.edu 1 • Information Flow in Genomics • Gene Structure](https://reader030.fdocuments.in/reader030/viewer/2022020305/5cca847d88c993fa708b6c85/html5/thumbnails/48.jpg)
Double-crossover types:
• L G s and l g S
Which gene is in the middle?
L s G
l S g
Parental types:
L G S and l g s
L S G
l s g
Now you know linkage phase of heterozygous parent
and gene order…how far apart are these genes?
![Page 49: Molecular Biology of the Genome - biostat.washington.edu · Genome Christine Queitsch Department of Genome Sciences queitsch@uw.edu 1 • Information Flow in Genomics • Gene Structure](https://reader030.fdocuments.in/reader030/viewer/2022020305/5cca847d88c993fa708b6c85/html5/thumbnails/49.jpg)
Count the cross-overs between adjacent genes
• In parents, L allele on same homolog as S and l on same homolog as s. So if these get broken up ---> cross-over between L and S loci
• In parents, S on same homolog as G and s on same homolog as g. If these get broken up --> recombination between S and G loci
L S G
l s g
![Page 50: Molecular Biology of the Genome - biostat.washington.edu · Genome Christine Queitsch Department of Genome Sciences queitsch@uw.edu 1 • Information Flow in Genomics • Gene Structure](https://reader030.fdocuments.in/reader030/viewer/2022020305/5cca847d88c993fa708b6c85/html5/thumbnails/50.jpg)
Rule 4: Reciprocal
products expected to occur in approximately equal numbers
• LGS ≈ lgs (286 ≈ 272)
• LgS ≈ lGs (59 ≈ 44)
• Lgs ≈ lGS (40 ≈ 33)
• LGs ≈ lgS (4 ≈ 2)
Progeny Phenotype
Progeny Genotype #
wildtype L G S / l g s 286 lazy l G S / l g s 33 glossy L g S / l g s 59 sugary L G s / l g s 4 lazy,glossy l g S / l g s 2 lazy,sugary l G s / l g s 44 glossy,sugary L g s / l g s 40 lazy,glossy,sugary l g s / l g s 272
Total 740
![Page 51: Molecular Biology of the Genome - biostat.washington.edu · Genome Christine Queitsch Department of Genome Sciences queitsch@uw.edu 1 • Information Flow in Genomics • Gene Structure](https://reader030.fdocuments.in/reader030/viewer/2022020305/5cca847d88c993fa708b6c85/html5/thumbnails/51.jpg)
• l G S 33 • L g s 40 • L G s 4 • l g S 2 79
Rec Freq L-S Rec Freq S-G
L g S 59 l G s 44 L G s 4 l g S 2 109
Progeny Phenotype
Progeny Genotype #
Crossover or Non-Crossover?
wildtype L G S / l g s 286 Parental (NCO) lazy l G S / l g s 33 single CO between L and S glossy L g S / l g s 59 single CO between S and G sugary L G s / l g s 4 double CO lazy,glossy l g S / l g s 2 double CO lazy,sugary l G s / l g s 44 single CO between S and G glossy,sugary L g s / l g s 40 single CO between L and S lazy,glossy,sugary l g s / l g s 272 Parental (NCO)
Total 740
![Page 52: Molecular Biology of the Genome - biostat.washington.edu · Genome Christine Queitsch Department of Genome Sciences queitsch@uw.edu 1 • Information Flow in Genomics • Gene Structure](https://reader030.fdocuments.in/reader030/viewer/2022020305/5cca847d88c993fa708b6c85/html5/thumbnails/52.jpg)
79/740 or 10.7% of gametes recombinant between L & S. distance between L & S = 10.7 map units 109/740 or 14.8 % of gametes recombinant between S & G. distance between S & G= 14.8 map units
l G S 33 L g s 40 L G s 4 l g S 2 79
Rec Freq L-S
Rec Freq S-G
L g S 59 l G s 44 L G s 4 l g S 2 109
10.7 mu 14.8 mu
_____________________________
L S G
![Page 53: Molecular Biology of the Genome - biostat.washington.edu · Genome Christine Queitsch Department of Genome Sciences queitsch@uw.edu 1 • Information Flow in Genomics • Gene Structure](https://reader030.fdocuments.in/reader030/viewer/2022020305/5cca847d88c993fa708b6c85/html5/thumbnails/53.jpg)
• Information Flow in Genomics
• Gene Structure
• Genetic Linkage
• Chromatin Structure
• Genome Sequencing
Outline
53
![Page 54: Molecular Biology of the Genome - biostat.washington.edu · Genome Christine Queitsch Department of Genome Sciences queitsch@uw.edu 1 • Information Flow in Genomics • Gene Structure](https://reader030.fdocuments.in/reader030/viewer/2022020305/5cca847d88c993fa708b6c85/html5/thumbnails/54.jpg)
Chromosome Structure: Coils of Coils of Coils…
nucleosome
Local unpacking of chromatin allows gene expression and replication
at mitosis
54
![Page 55: Molecular Biology of the Genome - biostat.washington.edu · Genome Christine Queitsch Department of Genome Sciences queitsch@uw.edu 1 • Information Flow in Genomics • Gene Structure](https://reader030.fdocuments.in/reader030/viewer/2022020305/5cca847d88c993fa708b6c85/html5/thumbnails/55.jpg)
Nucleosomes
• ~146 bp of DNA wrapped around nucleosome • ~ 80 bp linker • histone octamer
55
![Page 56: Molecular Biology of the Genome - biostat.washington.edu · Genome Christine Queitsch Department of Genome Sciences queitsch@uw.edu 1 • Information Flow in Genomics • Gene Structure](https://reader030.fdocuments.in/reader030/viewer/2022020305/5cca847d88c993fa708b6c85/html5/thumbnails/56.jpg)
Histone Modification and Chromatin Activity
56
• modifications change interaction with DNA and trans-factors
• can activate or repress transcription
• reinforce regulatory patterns set up by TFs
![Page 57: Molecular Biology of the Genome - biostat.washington.edu · Genome Christine Queitsch Department of Genome Sciences queitsch@uw.edu 1 • Information Flow in Genomics • Gene Structure](https://reader030.fdocuments.in/reader030/viewer/2022020305/5cca847d88c993fa708b6c85/html5/thumbnails/57.jpg)
What Do These Modifications Do? A Histone Code?
Carey et al. Cell (2007) 128:707
“Distinct histone modifications, on one or more tails, act sequentially
or in combination that is read by other proteins to bring about distinct
downstream events” (Strahl and Allis, 2000, Nature, 403:41)
57
![Page 58: Molecular Biology of the Genome - biostat.washington.edu · Genome Christine Queitsch Department of Genome Sciences queitsch@uw.edu 1 • Information Flow in Genomics • Gene Structure](https://reader030.fdocuments.in/reader030/viewer/2022020305/5cca847d88c993fa708b6c85/html5/thumbnails/58.jpg)
• Information Flow in Genomics
• Gene Structure
• Genetic Linkage
• Chromatin Structure
• Genome Sequencing
Outline
58
![Page 59: Molecular Biology of the Genome - biostat.washington.edu · Genome Christine Queitsch Department of Genome Sciences queitsch@uw.edu 1 • Information Flow in Genomics • Gene Structure](https://reader030.fdocuments.in/reader030/viewer/2022020305/5cca847d88c993fa708b6c85/html5/thumbnails/59.jpg)
• Next-Generation
• Sanger sequencing
DNA Sequencing Technology
• 3rd and 4th Generation
59
![Page 60: Molecular Biology of the Genome - biostat.washington.edu · Genome Christine Queitsch Department of Genome Sciences queitsch@uw.edu 1 • Information Flow in Genomics • Gene Structure](https://reader030.fdocuments.in/reader030/viewer/2022020305/5cca847d88c993fa708b6c85/html5/thumbnails/60.jpg)
Genome Sequencing: Hierarchical Shotgun Sequencing
• Shear genomic DNA into smaller pieces and subclone into library (such as BACs, Cosmids, etc.)
• Create physical map
• Shotgun sequence each BAC from minimal tiling path (shearing of ~150kb BAC clone into ~ 2kb fragments)
• Data from linkage and physical maps used to assemble sequence maps of chromosomes
60
![Page 61: Molecular Biology of the Genome - biostat.washington.edu · Genome Christine Queitsch Department of Genome Sciences queitsch@uw.edu 1 • Information Flow in Genomics • Gene Structure](https://reader030.fdocuments.in/reader030/viewer/2022020305/5cca847d88c993fa708b6c85/html5/thumbnails/61.jpg)
• Whole genome randomly sheared three times – Plasmid library constructed
with ~ 2kb inserts – Plasmid library with ~10 kb
inserts – BAC library with ~200 kb
inserts
• Computer program assembles sequences into chromosomes
• No physical map construction
• Only one BAC library
• Overcomes problems of repeat sequences…only not really
Genome Sequencing: Whole Genome Shotgun Sequencing
61
![Page 62: Molecular Biology of the Genome - biostat.washington.edu · Genome Christine Queitsch Department of Genome Sciences queitsch@uw.edu 1 • Information Flow in Genomics • Gene Structure](https://reader030.fdocuments.in/reader030/viewer/2022020305/5cca847d88c993fa708b6c85/html5/thumbnails/62.jpg)
62
![Page 63: Molecular Biology of the Genome - biostat.washington.edu · Genome Christine Queitsch Department of Genome Sciences queitsch@uw.edu 1 • Information Flow in Genomics • Gene Structure](https://reader030.fdocuments.in/reader030/viewer/2022020305/5cca847d88c993fa708b6c85/html5/thumbnails/63.jpg)
Next-Generation Sequencing Technology
• Illumina HiSeq:
– 4 billion reads per flow cell X 100 bases, paired = 400 Gbp
– 8 samples per flow cell = 50 Gbp each (one human genome = 3 Gbp)
– Reagent cost ~$8K per run
Updated: HiSeq 3000/4000 SBS Kits enable up to 1500 Gb (1.5 Tb) of output per dual flow cell run
• ABI SOLID: similar yield
• Roche 454: 1 million reads X 500 bases = 0.5 Gbp
63
![Page 64: Molecular Biology of the Genome - biostat.washington.edu · Genome Christine Queitsch Department of Genome Sciences queitsch@uw.edu 1 • Information Flow in Genomics • Gene Structure](https://reader030.fdocuments.in/reader030/viewer/2022020305/5cca847d88c993fa708b6c85/html5/thumbnails/64.jpg)
Illumina sequencing
64
Mardis, ER, 2008, ARGHG
1. 2.
3. 4.
![Page 65: Molecular Biology of the Genome - biostat.washington.edu · Genome Christine Queitsch Department of Genome Sciences queitsch@uw.edu 1 • Information Flow in Genomics • Gene Structure](https://reader030.fdocuments.in/reader030/viewer/2022020305/5cca847d88c993fa708b6c85/html5/thumbnails/65.jpg)
Illumina sequencing: clusters
65
![Page 66: Molecular Biology of the Genome - biostat.washington.edu · Genome Christine Queitsch Department of Genome Sciences queitsch@uw.edu 1 • Information Flow in Genomics • Gene Structure](https://reader030.fdocuments.in/reader030/viewer/2022020305/5cca847d88c993fa708b6c85/html5/thumbnails/66.jpg)
Illumina sequencing: sequence reaction
66
![Page 67: Molecular Biology of the Genome - biostat.washington.edu · Genome Christine Queitsch Department of Genome Sciences queitsch@uw.edu 1 • Information Flow in Genomics • Gene Structure](https://reader030.fdocuments.in/reader030/viewer/2022020305/5cca847d88c993fa708b6c85/html5/thumbnails/67.jpg)
Illumina sequencing: sequence reaction
Sequence clusters are imaged after each cycle of
synthesis
67
![Page 68: Molecular Biology of the Genome - biostat.washington.edu · Genome Christine Queitsch Department of Genome Sciences queitsch@uw.edu 1 • Information Flow in Genomics • Gene Structure](https://reader030.fdocuments.in/reader030/viewer/2022020305/5cca847d88c993fa708b6c85/html5/thumbnails/68.jpg)
What is missed?
68
Plenty: repetitive DNA and structural variation
C
C
C
C
C C A
A
A
A A
A G G
G
G G G
Example: short tandem repeats
![Page 69: Molecular Biology of the Genome - biostat.washington.edu · Genome Christine Queitsch Department of Genome Sciences queitsch@uw.edu 1 • Information Flow in Genomics • Gene Structure](https://reader030.fdocuments.in/reader030/viewer/2022020305/5cca847d88c993fa708b6c85/html5/thumbnails/69.jpg)
3rd Generation Sequencing Technology
• Single Molecule Real Time (SMRT) sequencing technology (PacBio RS)
• based on ‘circular’ DNA molecules read by polymerase
• and long reads - up to 10kb
• error-prone
69
![Page 70: Molecular Biology of the Genome - biostat.washington.edu · Genome Christine Queitsch Department of Genome Sciences queitsch@uw.edu 1 • Information Flow in Genomics • Gene Structure](https://reader030.fdocuments.in/reader030/viewer/2022020305/5cca847d88c993fa708b6c85/html5/thumbnails/70.jpg)
4th Generation Sequencing Technology
• Protein nanopore sequencing
(Oxford Nanopore)
• ultra-long reads - up to 1MB, limited by integrity of the DNA
• high error rate, low throughput
70
![Page 71: Molecular Biology of the Genome - biostat.washington.edu · Genome Christine Queitsch Department of Genome Sciences queitsch@uw.edu 1 • Information Flow in Genomics • Gene Structure](https://reader030.fdocuments.in/reader030/viewer/2022020305/5cca847d88c993fa708b6c85/html5/thumbnails/71.jpg)
Next-Gen Sequencing - What’s All the Fuss About?
71
![Page 72: Molecular Biology of the Genome - biostat.washington.edu · Genome Christine Queitsch Department of Genome Sciences queitsch@uw.edu 1 • Information Flow in Genomics • Gene Structure](https://reader030.fdocuments.in/reader030/viewer/2022020305/5cca847d88c993fa708b6c85/html5/thumbnails/72.jpg)
The Era of Personal Genomics?
James D. Watson
(5/31/2007)
J. Craig Venter (8/4/2007)
http://www.ffrf.org/day/img/0406_watson.gif, http://www-news.uchicago.edu/releases/07/images/070601.watson.jpg
http://www.usnews.com/usnews/images/news/photos/venter051022.jpg
It is here. The challenge is interpretation.
![Page 73: Molecular Biology of the Genome - biostat.washington.edu · Genome Christine Queitsch Department of Genome Sciences queitsch@uw.edu 1 • Information Flow in Genomics • Gene Structure](https://reader030.fdocuments.in/reader030/viewer/2022020305/5cca847d88c993fa708b6c85/html5/thumbnails/73.jpg)
“Censoring” of Watson’s ApoE gene
3.6 kb
Important ethical issues confront personal
genomics.
73
![Page 74: Molecular Biology of the Genome - biostat.washington.edu · Genome Christine Queitsch Department of Genome Sciences queitsch@uw.edu 1 • Information Flow in Genomics • Gene Structure](https://reader030.fdocuments.in/reader030/viewer/2022020305/5cca847d88c993fa708b6c85/html5/thumbnails/74.jpg)
Interpreting Genome Sequences
• Pilot Project Description – ENCODE Project Consortium et al. The
ENCODE (ENCyclopedia Of DNA Elements) Project. Science (2004) vol. 306 (5696)
• Pilot Project Results – ENCODE Project Consortium et al.
Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature (2007) vol. 447 (7146)
The ENCODE Project: comprehensive parts list of the functional elements in the human genome
74
![Page 75: Molecular Biology of the Genome - biostat.washington.edu · Genome Christine Queitsch Department of Genome Sciences queitsch@uw.edu 1 • Information Flow in Genomics • Gene Structure](https://reader030.fdocuments.in/reader030/viewer/2022020305/5cca847d88c993fa708b6c85/html5/thumbnails/75.jpg)
Let’s Play “Gene” or “No Gene”
A gene is often a segment of DNA that encodes a protein.
a micro RNA that binds to an mRNA to inhibit translation?
How about DNA that encodes:
an RNA spliced out of an intron and used for another function?
an antisense transcript?
a long non-coding RNA of unknown function?
a pseudogene? 75