Post on 16-Jul-2020
Biology 210
GENETICS
20 March, 1998
Chapter 10a
Gene Expression:
Transcription
Brief Outline
1. The flow of Genetic Information2. Synthesizing Proteins from the Instructions of DNA3. The Genetic Code4. RNA: Intermediary in Protein Synthesis
1. The flow of Genetic Information:
DNA -> RNA -> proteinHow does the sequence of a strand of DNA correspond to the amino acid sequence of
a protein? This concept is explained by
The Central Dogma of
Molecular Biology:
The Relationship between Genes and Proteins
Most genes encode the information for the synthesis of a protein
The sequence of bases in DNA codes for the sequence of amino acids in proteins
Shown below is an Illustration of the transcription of DNA to RNA to protein
which forms the backbone of molecular biology.
LEGEND
DNA codes for the production of RNA.
RNA codes for the production of protein.
Protein does not code for the production of protein, RNA or DNA.
The end.
Or in the words of Francis Crick:
Once information has passed into protein, it cannot get out again.
This was taken from Genetech's homepage:
However, the "Central Dogma" has had to be revised a bit. It turns out that you
CAN go back from RNA to DNA, and that RNA can also make copies of itself. It is
still not possible to go from Proteins back to RNA or DNA, and no known mechanism
has yet been demonstrated for proteins making copies of themselves.
Try it for youself on the "DNA Workshop" (from PBS).
Click HERE for a link to nice historical review of The Central Dogma.
2. Synthesizing Proteins from the Instructions ofDNA
Genetic information flows in a cell from:
DNA ->RNARNARNARNA-> ProteinIn a prokaryotic cell, this process happens at the same time:
However, in an eukaryotic cell, the transcription & translation occur in different places:
3. The Genetic Code
The Genetic Code uses three bases to specify each amino acid
4. RNA: Intermediary in Protein Synthesis
Why would the cell want to have an intermediate
between DNA and the proteins it encodes?
The DNA can then stay pristine and protected, away from the caustic chemistry
of the cytoplasm.
Gene information can be amplified by having many copies of an RNA made from
one copy of DNA.
Regulation of gene expression can be effected by having specific controls at
each element of the pathway between DNA and proteins. The more elements
there are in the pathway, the more opportunities there are to control it in different
circumstances.
What is RNA?RNA has the same primary structure as DNA. It consists of a sugar-phosphate
backbone, with nucleotides attaches to the 1' carbon of the sugar. The differences
between DNA and RNA are that:
1. RNA has a hydroxyl group on the 2' carbon of the sugar (thus, the
difference between deoxyribonucleic acid and ribonucleic acid).
2. Instead of using the nucleotide thymine, RNA uses another nucleotide
called uracil:
Because of the extra hydroxyl group on the sugar, RNA is too bulky to form a
a stable double helix. RNA exists as a single-stranded molecule. However,
regions of double helix can form where there is some base pair complementation
(U and A , G and C), resulting in hairpin loops. The RNA molecule with its
hairpin loops is said to have a secondary structure.
In addition, because the RNA molecule is not restricted to a rigid
double helix, it can form many different tertiary structures. Each RNA
molecule, depending on the sequence of its bases, can fold into a stable
three-dimensional structure.
From http://motif.stanford.edu/thesis/tRNA.html.
Transcription produces RNA molecules that are complimentary copies of one strand of DNA
Three types of RNA cooperate in protein synthesis
The Genetic Code
How does an mRNA specify amino acid sequence? The answer lies in the genetic code. It would be
impossible for each amino aciud to be specified by one nucleotide, because there are only 4 nucleotides and
20 amino acids. Similarly, two nucleotide combinations could only specify 16 amino acids. The final
conclusion is that each amino acid is specified by a particular combination of three nucleotides, called a
codon:
Note the degeneracy of the genetic code. Each amino acid might have up to six codons that specify it. It is
also interesting to note that different organisms have different frequencies of codon usage. A giraffe might use
CGC for arginine much more often than CGA, and the reverse might be true for a sperm whale. Another
interesting point is that some species vary from the codon association described above, and use different
codons fo different amino acids. In general, however, the code depicted can be relied upon.
How do tRNAs recognize to which codon to bring an amino acid? The tRNA has an anticodon on its
mRNA-binding end that is complementary to the codon on the mRNA. Each tRNA only binds the appropriate
amino acid for its anticodon.
From http://motif.stanford.edu/thesis/tRNA.html.
hyperbio@mit.edu
Central Dogma, Part 1: Transcription
link to Kimball biology page.
How does the sequence information from DNA get transferred to mRNA so that it can
be carried to the ribosomes in the cytoplasm? This process, called transcription is
highly analogous to DNA replication. Of course, there are different effectors, or
proteins, that direct transcription. Primary among these is the RNA polymerase
holoenzyme, an agglomeration of many different factors that together direct the
synthesis of mRNA on a DNA template.
As mentioned above, transcription (like ANY polymerisation process) is divided into
three parts:
1. Initiation of Transcription
RNA polymerase must be able to recognize the beginning of a gene so that it knows
where to start synthesizing an mRNA. It is directed to the start site of transcription by
one of its subunits' affinity to a particular DNA sequence that appears at the beginning
of genes. This sequence is called a promoter. It is a unidirectional sequence on one
strand of the DNA that tells the RNA polymerase both where to start and in which
direction (that is, on which strand) to continue synthesis. The bacterial promoter almost
always contains some version of the following elements:
The two sequences shown in red are known
as the "-35" (TTGACA) and "-10" (TATAAT)
sites, based on their positions from the start
of transcription. These two sequences
represent the CONSENSUS, based on
comparison of several different sequences
aligned at the transcription start site.
Another way of representing this consensus is by the application of information theory
to sequence analysis. One currently used method is "sequence logos", (this is based
on "Shannon information", for those of you who are interested - see Schneider, T.M.,
Stepehns,R.M., "Sequence logos: a new way to display Consensus Sequences", Nucleic Acids
Research, 18:6097-6100, (1990).) The sequence logo, based on the promoter region of
167 different genes, (aligned by their transcriptional start site) is shown below:
The sequence logo for the -10 "TATA" box for 60 human promoters, aligned on the
TATA box, is shown below:
2. Elongation of Transcription
The RNA polymerase then stretches open the double helix at that point in the DNA and
begins synthesis of an RNA strand complementary to one of the strands of DNA. We
call the strand from which it copies the antisense or template strand, and the other
strand, to which it is identical, the sense or coding strand.
The RNA polymerase recruits rNTPs (ribonucleic nucleotides triphosphates) in the
same way that DNA polymerase recruits dNTPs.
However, since synthesis is single stranded and only
proceeds in the 5' to 3' direction, there is no need for
Okazaki fragments.
It is important to note that synthesis once again
proceeds in a unidirectional fashion, because of the
reasons outlined in the previous section.
3. Termination of Transcription
How does RNA polymerase know when to stop transcribing a gene? This system has
been elucidated in prokaryotes. It is important to know that since there is no nucleus in
prokaryotes, ribosomes can begin making protein from an mRNA immediately upon its
synthesis. At the end of a gene, the sequence of the mRNA allows it to form a hairpin
loop, which blocks the ribosome. The ribosome falls off the mRNA, and that is the
termination signal recognized by the RNA polymerase. As soon as the ribosome falls
off the mRNA, the RNA polymerase falls off the DNA and transcription ceases.
Gene Expression: Transcription
The majority of genes are expressed as the proteins they encode. The process occurs in two steps:
Transcription = DNA -> RNATranslation = RNA -> protein
Taken together, they make up the "central dogma" of biology: DNA -> RNA -> protein. Here is an
overview.
This page examines the first step:
Gene Transcription: DNA -> RNA
DNA serves as the template for the synthesis of RNA much as it does for its own replication.
The Steps
several protein transcription factors bind to promoter sites, usually on the 5' side of the geneto be transcribedan enzyme, RNA polymerase, binds to the complex of transcription factorsworking together, they open the DNA double helixRNA polymerase proceeds down one strand moving in the 3' -> 5' directionas it does so, it assembles ribonucleotides (supplied as triphosphates, e.g., ATP) into a strandof RNAeach ribonucleotide is inserted into the growing RNA strand following the rules of basepairing. Thus for each C encountered on the DNA strand, a G is inserted in the RNA; for eachG, a C; and for each T, an A. However, each A on the DNA guides the insertion of thepyrimidine uracil (U, from uridine triphosphate, UTP). There is no T in RNA.synthesis of the RNA proceeds in the 5' -> 3' direction.as each nucleoside triphosphate is brought in to add to the 3' end of the growing strand, thetwo terminal phosphates are removed
Note that at any place in a DNA molecule, either strand may be serving as the template; that is,some genes "run" one way, some the other (and in a few remarkable cases, the same segment ofdouble helix contains genetic information on both strands!). In all cases, however, RNApolymerase proceeds along a strand in its 3' -> 5' direction.
Types of RNA
Several types of RNA are synthesized:
messenger RNA (mRNA). This will later be translated into a polypeptide.ribosomal RNA (rRNA). This will be used in the building of ribosomes: machinery forsynthesizing proteins by translating mRNA.transfer RNA (tRNA). RNA molecules that carry amino acids to the growing polypeptide.small nuclear RNA (snRNA). DNA transcription of the genes for mRNA, rRNA, and tRNA
produces large precursor molecules ("primary transcripts") that must be processed within thenucleus to produce the functional molecules for export to the cytosol. Some of theseprocessing steps are mediated by snRNAs.
Ribosomal RNA (rRNA)
There are 4 kinds. In eukaryotes, these are
18S rRNA. One of these molecules, along with some 30 different protein molecules, is used tomake the small subunit of the ribosome.28S, 5.8S, and 5S rRNA. One each of these molecules, along with some 45 different proteins,are used to make the large subunit of the ribosome.
The name given each type of rRNA reflects the rate at which the molecules sediment in theultracentrifuge. The larger the number, the larger the molecule (but not proportionally).
The 28S, 18S, and 5.8S molecules are produced by the processing of a single primary transcriptfrom a cluster of identical copies of a single gene. The 5S molecules are produced from a differentcluster of identical genes.
Transfer RNA (tRNA)
There are some 32 different kinds of tRNA in a typical eukaryotic cell.
each is the product of a separate genethey are small (~4S), containing 73-93 nucleotidesmany of the bases in the chain pair with each other forming sections of double helixthe unpaired regions form 3 loopseach kind of tRNA carries (at its 3' end) one of the 20 amino acids (thus most amino acidshave more than one tRNA responsible for them)at one loop, 3 unpaired bases form an anticodonbase pairing between the anticodon and the complementary codon on a mRNA moleculebrings the correct amino acid into the growing polypeptide chain. Further details of thisprocess are described in the discussion of translation.
Messenger RNA (mRNA)
Messenger RNA comes in a wide range of sizes reflecting the size of the polypeptide it encodes.Most cells produce small amounts of thousands of different mRNA molecules, each to betranslated into a peptide needed by the cell. Many mRNAs are common to most cells, encoding"housekeeping" proteins needed by all cells (e.g. the enzymes of glycolysis). Other mRNAs arespecific for only certain types of cells. These encode proteins needed for the function of thatparticular cell (e.g., the mRNA for hemoglobin in the precursors of red blood cells).
Small Nuclear RNA (snRNA)
Approximately a dozen different genes for snRNAs, each present in multiple copies, have beenidentified. The snRNAs have various roles in the processing of the other classes of RNA. Forexample, several snRNAs are part of the spliceosome that participates in converting pre-mRNAinto mRNA by excising the introns and splicing the exons.
The RNA polymerases
The RNA polymerases are huge multi-subunit protein complexes. Three kinds are found ineukaryotes.
RNA polymerase I (Pol I). It transcribes the rRNA genes for the precursor of the 28S, 18S, and5.8S molecules. (and is the busiest of the RNA polymerases)RNA polymerase II (Pol II). It transcribes the mRNA and snRNA genes.RNA polymerase III (Pol III). It transcribes the 5S rRNA genes and all the tRNA genes.
RNA Processing: pre-mRNA -> mRNA
All the primary transcripts produced in the nucleus must undergo processing steps to produce functional RNA
molecules for export to the cytosol. We shall confine ourselves to a view of the steps as they occur in the
processing of pre-mRNA to mRNA.
Synthesis of the cap. This is a stretch of three modified nucleotides attached to the 5' end of the
pre-mRNA.
Synthesis of the poly(A) tail. This is a stretch of adenine nucleotides attached to the 3' end of the
pre-mRNA.
Step-by-step removal of introns present in the pre-mRNA and splicing of the remaining exons. This
step is required because most eukaryotic genes are split.
Split Genes
Most eukaryotic genes are split into segments. In decoding the open reading frame of a gene for a known
protein, one usually encounters periodic stretches of DNA calling for amino acids that do not occur in the
actual protein product of that gene. Such stretches of DNA, which get transcribed into RNA but not translated
into protein, are called introns. Those stretches of DNA that do code for amino acids in the protein are called
exons. Examples:
the gene for one type of collagen found in chickens is split into 52 separate exons
the gene for dystrophin, which is mutated in boys with muscular dystrophy, has 79 exons
even the genes for rRNA and tRNA are split.
The cutting and splicing of mRNA must be done with great precision. If even one nucleotide is left over from
an intron or one is removed from an exon, the reading frame from that point on will be shifted, producing
new codons specifying a totally different sequence of amino acids from that point to the end of the molecule
(which often ends prematurely anyway when the shifted reading frame generates a STOP codon).
The removal of introns and splicing of exons is done with the spliceosome. This is a complex of several
snRNA molecules and several proteins. The introns in most pre-mRNAs begin with a GU and end with an
AG. Presumably these short sequences are essential for guiding the spliceosome.
Alternate Splicing
The processing of pre-mRNA for many proteins proceeds along various paths in different cells or under
different conditions. For example, early in the differentiation of a B cell (a lymphocyte that synthesizes an
antibody) the cell first uses an exon that encodes a transmembrane domain that causes the molecule to be
retained at the cell surface. Later, the B cell switches to using a different exon whose domain enables the
protein to be secreted from the cell as a circulating antibody molecule.
So, whether a particular segment of RNA will be retained as an exon or excised as an intron can vary under
different circumstances. Clearly the switching to an alternate splicing pathway must be closely regulated.
Why split genes?
Perhaps during evolution, eukaryotic genes have been assembled from smaller, primitive genes - today's
exons. Some proteins, like the antibodies mentioned in the previous section, are organized in a set of separate
sections or domains each with a special function to perform in the complete molecule. Each domain is
encoded by a separate exon. Having the different functional parts of the antibody molecule encoded by
separate exons makes it possible to use these units in different combinations. Thus a set of exons in the
genome may be the genetic equivalent of the various modular pieces in a box of "Lego" for children to
assemble in whatever forms they wish.
But the boundaries of other exons do not seem to correspond domain boundaries of the protein. Furthermore,
rRNA and tRNA genes are also split, and these do not encode proteins. So perhaps some exons are simply
"junk" DNA that was inserted into the gene at some point in evolution without causing any harm.
Summary
Gene expression occurs in two steps:
transcription of the information encoded in DNA into a molecule of RNA (described here) and
translation of the information encoded in the nucleotides of mRNA into a defined sequence of amino
acids in a protein (discussed in Gene Translation: RNA -> Protein).
Back to the Genetics Syllabus
Last modified on: 4 February, 2000 by Dave Ussery