Genomics – The Language of DNA

Genomics – The Language of DNA

Honors Genetics2006

The structure and meaning of the human

genome

Genes span about 25% of the human chromosomes

Coding sequences amount to about 1%( the exons)

The intragenic regions or introns amount to the remaining 24%

Intergenic sequences are non coding and are between genes

Genes

About 20-30%of the genes are clustered in CpG islands.

There are gene deserts that comprise about 20% of the genome.

There are also sequences that are made of heterochromatin

( centromeres and telomeres)

Repetitive DNA

Highly repetitive: About 10-15% of mammalian DNA reassociates very rapidly. This class includes tandem repeats.

Moderately repetitive: Roughly 25-40% of mammalian DNA reassociates at an intermediate rate. This class includes interspersed repeats.

Single copy (or very low copy number): This class accounts for 50-60% of mammalian DNA.

Satellite DNA

Satellites The size of a satellite DNA ranges from

100 kb to over 1 Mb. In humans, a well known example is the alphoid DNA located at the centromere of all chromosomes.

Its repeat unit is 171 bp and the repetitive region accounts for 3-5% of the DNA in each chromosome. Other satellites have a shorter repeat unit.

Most satellites in humans or in other organisms are located at the centromere.

Minisatellites

The size of a minisatellite ranges from 1 kb to 20 kb. One type of minisatellites is called variable number of tandem repeats (VNTR). Its repeat unit ranges from 9 bp to 80 bp. They are located in non-coding regions. The number of repeats for a given minisatellite may differ between individuals. This feature is the basis of DNA fingerprinting.

VNTR- individual differences

Variable Number of Tandem Repeat (VNTR) Polymorphism

VNTR may result from unequal crossover. It is the molecular basis of DNA fingerprinting which has many practical applications

Telomeres

Another type of minisatellites is the telomere. In a human germ cell, the size of a telomere is about 15 kb. In an aging somatic cell, the telomere is shorter. The telomere contains tandemly repeated sequence GGGTTA.

Microsatellites

Microsatellites are also known as short tandem repeats (STR), because a repeat unit consists of only 1 to 6 bp and the whole repetitive region spans less than 150 bp.

Similar to minisatellites, the number of repeats for a given microsatellite may differ between individuals. Therefore, microsatellites can also be used for DNA fingerprinting

Miniature Inverted-repeat Transposable Elements (MITES)

almost identical sequences of about 400 base pairs flanked by

characteristic inverted repeats of about 15 base pairs such as 5' GGCCAGTCACAATGG..~400 nt..CCATTGTGACTGGCC 3'3' CCGGTCAGTGTTACC..~400 nt..GGTAACACTGACCGG 5'

Transposons

Transposons are segments of DNA that can move around to different positions in the genome of a single cell. In the process, they may

cause mutations increase (or decrease) the amount

of DNA in the genome. These mobile segments of DNA are

sometimes called "jumping genes".

Transposons

Class II Transposons consisting only of DNA that moves directly from place to place.

Class III Transposons; also known as Miniature Inverted-repeats Transposable Elements or MITEs.

Retrotransposons (Class I) that first transcribe the DNA into RNA and then use reverse transcriptase to make a DNA

copy of the RNA to insert in a new location.

Transposase actions

Both ends of the transposon, which consist of inverted repeats; that is, identical sequences reading in opposite directions.

A sequence of DNA that makes up the target site. Some transposases require a specific sequence as their target site; other can insert the transposon anywhere in the genome.

LINES

The human genome contains some 850,000 LINEs (representing some 21% of the genome).

Most of these belong to a family called LINE-1 (L1).

These L1 elements are DNA sequences that range in length from a few hundred to as many as 9,000 base pairs.

Only about 50 L1 elements are functional "genes"; that is, can be transcribed and translated.

L1 elements

The functional L1 elements are about 6,500 bp in length and encode three proteins, including

An endonuclease that cuts DNA

A reverse transcriptase that makes a DNA copy of an

L1 activity

L1 activity proceeds as follows: RNA polymerase II transcribes the L1 DNA

into RNA. The RNA is translated by ribosomes in the

cytoplasm into the proteins. The proteins and RNA join together and

reenter the nucleus. The endonuclease cuts a strand of

"target" DNA, often in the intron of a gene. The reverse transcriptase copies the L1

RNA into L1 DNA which is inserted into the target DNA forming a new L1 element there.

SINEs (Short interspersed elements)

SINEs are short DNA sequences (100–400 base pairs) that represent reverse-transcribed RNA molecules originally transcribed by RNA polymerase III; that is, molecules of tRNA, 5S rRNA, and some other small nuclear RNAs. The most abundant SINEs are the Alu elements. There are over one million copies in the human genome (representing about 11% of the total DNA).

Alus

Alu elements consist of a sequence of 300 base pairs containing a site that is recognized by the restriction enzyme AluI. They appear to be reverse transcripts of 7S RNA, part of the signal recognition particle.

Most SINEs do not encode any functional molecules and depend on the machinery of active L1 elements to be transposed; that is, copied and pasted in new locations.

Genomics – The Language of DNA

Documents

Transcript of Genomics – The Language of DNA