Human Genome. Human Genome Contents: 3200 Mb Genes: 1200 Mb –Genes 48 Mb –Related 1152 Mb:...
-
Upload
elmer-mathews -
Category
Documents
-
view
223 -
download
4
Embed Size (px)
Transcript of Human Genome. Human Genome Contents: 3200 Mb Genes: 1200 Mb –Genes 48 Mb –Related 1152 Mb:...

Human Genome

Human Genome Contents: 3200 Mb
• Genes: 1200 Mb– Genes 48 Mb– Related 1152 Mb: Pseudogenes, Gene Fragments,
Introns
• Intergenic DNA 2000 Mb– Interspersed Repeats 1400 Mb– Microsatellite (short tandem repeats) 90 Mb
• Telomeres: End Sequences• Centromeres:• Single Nucleotide Polymorphisms

Chromosomes
• Shorter than DNA they contain
• Histones: DNA binding proteins
• Two Copies held together by centromeres
• Telomere: Terminal region
• Two humans differ by 0.1%


Donors
• HGP: – Opportunity advertised near labs
– First come; First Taken
– 5-10 samples for every one used
– No link between donor and sample
• Celera: 5 subjects (three men; two women)– One Asian; One African-American; One Hispanic; Two
Caucasians
– Craig Venter

Basic Technology
• Physical Mapping
• Cloning
• Shotgun Sequencing
• Computational Sequence Reassembly

STS
• High Resolution, Rapid, Simple
• 100 - 500 bp
• Collection of overlapping fragments
• Each point represented multiple times in random fragments
• Sequence must be known
• Unique in chromosome under study

Physical Mapping
• A set of clone fragments whose position relative to each other is known
• Restriction Maps: Relative locations of Restriction Sites• Fluorescent in situ hybridization (FISH): Marker
locations mapped by hybridizing probe to chromosomes• Sequence Tagged Sites (STS): Positions of short
sequences mapped by PCR or hybridization analysis of genome fragments
• Expressed Sequence Tags (EST): short sequences from cDNA clones

QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Genome cut into fragments
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Cloned as library in vector (red)

Hybridisation mapping:1 pick clones into a grid 2 hybridise to probe 1 3 hybridise to probe 2 4 build contigs In this case, two clones hybridised to both probes and thus they are predicted to overlap. Those hybridising to only one probe are predicted to extend out to the left or right.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.QuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Overlap by sharedbands
Fingerprinting:Digest clones and runOn gel

Assembly of Contiguous DNA Sequence
• Shotgun Approach
• Contigs: Result of joining overlapping sequences
• Scaffold: Result of connecting contigs by filling in gaps
• BAC: Bacteria artificial chromosome vector: Inserts 100 - 200 kbs

Regional mapping

Regional mapping

Minimal tiling path selected for sequencing.
Regional mapping

>20 kbp
~300 bp
Molecular weightmarker every
5th laneRestriction fragmentfingerprinting
- BAC clones are grown
in 96-well format
- Hind III digest
- 1% agarose

Contig assembly
Clone A B C D E F G
FPC* Overlap identification by
restriction pattern similarities Facilitated contig assembly
*Sanger Centre C. Soderlund, I Longden and R. Mott
*
*
*
*
*
*
All restriction fragments withina clone selected for the tilingpath must be verified by theirpresence in overlapping clones. : vector fragments
: insert fragments

BCM-BCM-HGSCHGSC

Shotgun Sequencing I :RANDOM PHASE
Bac Clone: Bac Clone: 100-200 kb100-200 kb
Sheared DNA: Sheared DNA: 1.0-2.0 kb1.0-2.0 kb
SequencingSequencingTemplates: Templates:
RandomRandomReadsReads
BCM-BCM-HGSCHGSC

Shotgun Sequencing II:ASSEMBLY
ConsensusConsensusSequenceSequence
GapGap
Low Base Low Base QualityQuality
SingleSingleStrandedStrandedRegionRegion
Mis-AssemblyMis-Assembly
((InvertedInverted))
BCM-BCM-HGSCHGSC

ConsensusConsensusSequenceSequence
GapGap
Low Base Low Base QualityQuality
SingleSingleStrandedStrandedRegionRegion
Mis-AssemblyMis-Assembly
((InvertedInverted))
BCM-BCM-HGSCHGSC
Shotgun Sequencing III: FINISHING

ConsensusConsensusSequenceSequence
GapGap
SingleSingleStrandedStrandedRegionRegion
Mis-AssemblyMis-Assembly
((InvertedInverted))
BCM-BCM-HGSCHGSC
Shotgun Sequencing III: FINISHING

ConsensusConsensusSequenceSequence
GapGap
Mis-AssemblyMis-Assembly
((InvertedInverted))
BCM-BCM-HGSCHGSC
Shotgun Sequencing III: FINISHING

ConsensusConsensusMis-AssemblyMis-Assembly
((InvertedInverted))
BCM-BCM-HGSCHGSC
Shotgun Sequencing III: FINISHING

BCM-BCM-HGSCHGSC
Shotgun Sequencing III: FINISHING
High Accuracy Sequence:High Accuracy Sequence:< 1 error/ 10,000 bases< 1 error/ 10,000 bases

Whole Genome Shotgun Sequencing
Whole Genome: Whole Genome: 3,000 Mb3,000 Mb
Sheared DNA: Sheared DNA: 1.0-2.0 kb1.0-2.0 kb
SequencingSequencingTemplates: Templates:
RandomRandomReadsReads
BCM-BCM-HGSCHGSC

Whole Genome Shotgun Sequencing:Assembly
ConsensusConsensusSequenceSequence
GapGap
Low Base Low Base QualityQuality
SingleSingleStrandedStrandedRegionRegion
Mis-AssemblyMis-Assembly
((InvertedInverted))
BCM-BCM-HGSCHGSC

Whole Genome Shotgun Sequencing:Assembly
ConsensusConsensusSequenceSequence
GapGap
Low Base Low Base QualityQuality
BCM-BCM-HGSCHGSC

QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Random fragmentation of genome produces good sampling of itssequence space. Overlaps are identified, and subassembly of sequence takes place after cloning into universal vector.

QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Digested into RandomFragments

QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Cloned into Vector

QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Sequenced from know ends of plasmid (vector)

QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Assembled into contigs. Gaps and single-stranded regions identified for further study. Targeted fornew sequencing. Double-Barreled: Both Strands.

QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
In the gaps:


Whole-Genome Shotgun Sequencing
• Speed-up: Assembled Correctly?• Avoid up-front mapping• Huge amount of computer time to identify
overlaps• Have to reference a map• Repeats are a problem:
– Leave out sequence between repeats– Missing Reference End Sequence means Error


HGP
• Isolate large fragments in BACs with framework of landmark-based physical map
• Sequence on clone-by-clone basis
• Time-Consuming subcloning of random fragments and physical mapping

QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.

Sequence Reassembly
• Phrap
• Shortest Covering Superstring
• Map Assembly
• Overlap: Finding overlapping fragments
• Layout: ordering fragments
• Consensus: Sequences from layout

QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.

QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.