General Introduction to the Genome
-
Upload
lael-walters -
Category
Documents
-
view
27 -
download
0
description
Transcript of General Introduction to the Genome
General Introduction to the Genome
An Outlines• Molecular Biology Major Events• DNA, RNA • Protein Synthesis(Transcription & Translation)• Genome Anatomy• Bioinformatics• Genomics Signal Processing
2
• Molecular Biology Major Events• DNA, RNA • Protein Synthesis(Transcription & Translation)• Genome Anatomy• Bioinformatics• Genomics Signal Processing
3
Molecular Biology Major Events
DNA Discovery
1865Mendel
Inheritance is controlled by unit factors
1881
Chromosomes are composed of DNA
1869Johann Friedrich
Molecular Biology Major Events
1881
Chromosomes are composed of DNA
1911
Thomas Hunt
Genes on chromosomes are the discrete units of heredity
1941George Beadle
Identify that genes make proteins
Edward Tatum
The Central Dogma1
2
3
TargetBookBook shelvesNucleus
What is Life made of?
715
Eukaryotes vs Prokaryotes
8
DNA
DNA
Prokaryotes EukaryotesSingle cell Single or multi cell
No nucleus Nucleus
No organelles Organelles
One piece of circular DNA Chromosomes
No mRNA post transcriptional modification
Exons/Introns splicing
915
The Cell: Chemical Composition
–70% Water–7% Small molecules• Salts• Amino acids (Protein)• Nucleotides (DNA, RAN)
–23% macromolecules• Proteins• Polysaccharides• Lipids
10
The Cell: The 3 Critical Molecules
DNA
Hold Genetic information Transfer Information
Synthesize Protein
PROTEIN
Form enzymesForm body’s components
RNA
m-RNA t-RNAr-RNA
• Molecular Biology Major Events
• DNA, RNA • Protein Synthesis(Transcription & Translation)• Genome Anatomy• Bioinformatics• Genomics Signal Processing
12
DNA: the Nucleotide
13
Phosphate
Sugar
Nitrogenous base
A
DNA: Nitrogenous base
14
Purines Pyrimidines
A TG C
DNA: Polymerization reaction
A T G C
5 P’ 3OH’
A T G C A T G C
5 3
DNA: hydrogn bounds
A C G T A C G T
ACG
TACG
T
No of base pairs= Genome SizeHG= 3200 Mbp (Mb)
AC
GT
AC
GTA
CG
TA
CG
T
Sugar- Phosphate Back bone
DNA: Watson - Crick Model 1951
DNA: Watson - Crick Model
Sugar- Phosphate Back bone
No of base pairs= Genome SizeHG= 3200 Mbp (Mb)
RNA versus DNA
19
Phosphate
Sugar "Ribose”
Nitrogenous base
Phosphate
Sugar” deoxyRibose”
Nitrogenous base
G, A ,C,T G, A ,C,U
Protein structure
• 1902 - Emil Hermann Fischer wins Nobel prize: showed amino acids are linked and form proteins
20
A AFNG
GS T
SD
K
Amino acid: Basic unit of proteinAmino acid: Basic unit of protein
COO-NH3+ C
R
HAn amino
acid
Different side chains, R, determine the properties of 20 amino acids.
Amino group Carboxylic acid group
21
22
Protein structure
• Primary structure
• Secondary structure
• Super-secondary structure
• Tertiary structure
• Quaternary structure
Protein Structure: Predication Problem
Protein sequence
Protein 3D structure
Protein Function
A FNG S T
The Central Dogma:Genes is protein’s blueprint, Genes is protein’s blueprint,
Gene
GenomeDNA
Protein
Gene GeneGene
Gene
GeneGeneGeneGene
GeneGeneGeneGene
GeneGene
Protein Protein
ProteinProtein
Protein
ProteinProtein
Protein
Protein
Protein
Protein
ProteinProtein
Protein
• Molecular Biology Major Events• DNA, RNA • Protein Synthesis(Transcription & Translation)• Genome Anatomy• Bioinformatics• Genomics Signal Processing
26
Protein Synthesis: DNA, RNA, and the Flow of Information
TranslationTranscription
Replication
27
Protein Synthesis: Gene Expression
28
Gene 2
Gen
e 1
Pre-mRNA
mRNAGen
e 3
Transcription
1
2
3
Translation
Splicing
1
2
3
Gene 2
Gen
e 1
Pre-mRNA
mRNAGen
e 3
Transcription
1
2
3
Translation
Alternative Splicing
1
3
2
Gene 2
Gen
e 1
Pre-mRNA
mRNAGen
e 3
Transcription
1
2
3
Translation
m-RNA Editing
1
2
3
32
Gene 2
Gen
e 1
Pre-mRNA
mRNAGen
e 31
2
3
Translation
AUGAUAACUAG
MS
AK
Start Codon
Stop Codon
CV
Protein Synthesis: The Genetic Code
34
Start
Stop
Gene 1
R Ge
ne 1
1
2
3
1
2
3
Gene Regulation
Regulatory protein
Gene Regulation
Regulatory protein Gene 1
Gene 1 Gene 2
Regulatory protein Gene 2
We have a little knowledge about regulatory mechanisms
What a big Genome Size?
• The 12 font size enables approximately 60 nucleotides of DNA sequence to be written in a line 10 cm in length.
• Genome size = total number of nucleotide base pairs.– typically in millions of base pairs, or megabases
[abbreviated Mb or Mbp])
37
• Molecular Biology Major Events• DNA, RNA • Protein Synthesis(Transcription & Translation)
• Genome Anatomy• Bioinformatics• Genomics Signal Processing
38
the human genome sequence would stretch for 5000 km, the distance from Montreal to London, Los Angeles to Panama, Tokyo to Calcutta, Cape Town to Addis Ababa, or Auckland to Perth
The sequence would fill about 3000 books the size of book 600 pages size.
39
Genome size of organism are different
40
41
Genome size is not good indicator for genes number
42
• Space is saved in the genomes of less complex organisms because the genes are more closely packed together.
C-value paradox
• Correlation between the complexity of an organism and the size of its genome was looked on as a bit of a puzzle.
•
43
Genome Anatomy
Gene 1
Gene 6Gene 5
Gene 4
Gene 2
Gene 3
Human Genome Anatomy
Human genome Nuclear genome Mitochondrial genome
45
Human Mitochondrial Genome Anatomy
46
• it is much smaller than the nuclear genome(~17 kB), and it contains just 37 genes.
• 13 code proteins and 24 specify non-coding RNA.
• do not contain intron.• is typical of the
mitochondrial genomes of other animals
47
Nuclear Human Genome Anatomy
48
62%
Nuclear Human Genome Anatomy: Protein Coding Genes
Nuclear Human Genome Anatomy: Protein Coding Genes
50
five exons, separated by four introns.
average exons= nine exons per gene
51
Two gene segments (V28 and V29-1)
Nuclear Human Genome Anatomy: pseudogene
52
Non functional genes
Nuclear Human Genome Anatomy: genome-wide repeat
Nuclear Human Genome Anatomy: genome-wide repeat
•Tandemly repeated DNA•Minisatellite DNA•Microsatellite DNA
•Interspersed genome-wide repeats•SINE•LINES•LTR•DNA transposons
54
Nuclear Human Genome Anatomy: genome-wide repeat Minisatellite DNA
• we are familiar with because of its association with structural features of chromosomes.
• Telomeric DNA, which in humans comprises hundreds of copies of the motif 5 -TTAGGG-3 .′ ′
55
TTAGGGAATCCC
TTAGGGAATCCC
TTAGGGAATCCC
………………………..………………………..
The content of the human nuclear genome: genome-wide repeat Microsatellite DNA
• microsatellites with a CA repeat, such as:
make up 0.25% of the genome, 8 Mb in all. • Single base-pair repeats such as:
make up another 0.15%.
56
Nuclear Human Genome Anatomy: genome-wide repeat Interspersed repeat
57
Gene Classification: Gene function
• This system has the advantage that the fairly broad functional categories used in can be further subdivided to produce a hierarchy of increasingly specific functional descriptions for smaller and smaller sets of genes.
• The weakness : functions have not yet been assigned to
many eukaryotic genes.
58
Gene Classification: Gene function
• The gene catalog couldn’t tell us why we are human?
• it may still not be possible simply from genome comparisons with the chimpanzee genome to determine what makes us human
59
Gene Classification: Gene function
• The major categories of protein coding genes represent the most studied areas of cell biology, which means that many of the relevant genes can be recognized because their protein products are known.
• Genes whose products have not yet been identified are more likely to be involved in the less well studied areas of cellular activity.
60
Gene classification: Protein Domain
• A more powerful method is to base the classification not on the functions of genes but on the structures of the proteins that they specify.
• A protein molecule is constructed from a series of domains, each of which has a particular biochemical function.
61
Gene classification: Protein Domain
62
• Molecular Biology Major Events• DNA, RNA • Protein Synthesis(Transcription & Translation)• Genome Anatomy
• Bioinformatics• Genomics Signal Processing
63
What is Bioinformatics?
• Integration of computational and biological methodsto convert biological information into general theories.
64
aatgcatgcggctatgctaatgcatgcggctatgctaagctgggatccgatgacaatgcatgcggctatgctaatgcatgcggctatgcaagctgggatccgatgactatgctaagctgggatccgatgacaatgcatgcggctatgctaatgaatggtcttgggatttaccttggaatgctaagctgggatccgatgacaatgcatgcggctatgctaatgaatggtcttgggatttaccttggaatatgctaatgcatgcggctatgctaagctgggatccgatgacaatgcatgcggctatgctaatgcatgcggctatgcaagctgggatccgatgactatgctaagctgcggctatgctaatgcatgcggctatgctaagctgggatccgatgacaatgcatgcggctatgctaatgcatgcggctatgcaagctgggatcctgcggctatgctaatgaatggtcttgggatttaccttggaatgctaagctgggatccgatgacaatgcatgcggctatgctaatgaatggtcttgggatttaccttggaatatgctaatgcatgcggctatgctaagctgggaatgcatgcggctatgctaagctgggatccgatgacaatgcatgcggctatgctaatgcatgcggctatgcaagctgggatccgatgactatgctaagctgcggctatgctaatgcatgcggctatgctaagctcatgcggctatgctaagctgggaatgcatgcggctatgctaagctgggatccgatgacaatgcatgcggctatgctaatgcatgcggctatgcaagctgggatccgatgactatgctaagctgcggctatgctaatgcatgcggctatgctaagctcggctatgctaatgaatggtcttgggatttaccttggaatgctaagctgggatccgatgacaatgcatgcggctatgctaatgaatggtcttgggatttaccttggaatatgctaatgcatgcggctatgctaagctgggaatgcatgcggctatgctaagctgggatccgatgacaatgcatgcggctatgctaatgcatgcggctatgcaagctgggatccgatgactatgctaagctgcggctatgctaatgcatgcggctatgctaagctcatgcgg
aatgcatgcggctatgctaatgcatgcggctatgctaagctgggatccgatgacaatgcatgcggctatgctaatgcatgcggctatgcaagctgggatccgatgactatgctaagctgggatccgatgacaatgcatgcggctatgctaatgaatggtcttgggatttaccttggaatgctaagctgggatccgatgacaatgcatgcggctatgctaatgaatggtcttgggatttaccttggaatatgctaatgcatgcggctatgctaagctgggatccgatgacaatgcatgcggctatgctaatgcatgcggctatgcaagctgggatccgatgactatgctaagctgcggctatgctaatgcatgcggctatgctaagctgggatccgatgacaatgcatgcggctatgctaatgcatgcggctatgcaagctgggatcctgcggctatgctaatgaatggtcttgggatttaccttggaatgctaagctgggatccgatgacaatgcatgcggctatgctaatgaatggtcttgggatttaccttggaatatgctaatgcatgcggctatgctaagctgggaatgcatgcggctatgctaagctgggatccgatgacaatgcatgcggctatgctaatgcatgcggctatgcaagctgggatccgatgactatgctaagctgcggctatgctaatgcatgcggctatgctaagctcatgcggctatgctaagctgggaatgcatgcggctatgctaagctgggatccgatgacaatgcatgcggctatgctaatgcatgcggctatgcaagctgggatccgatgactatgctaagctgcggctatgctaatgcatgcggctatgctaagctcggctatgctaatgaatggtcttgggatttaccttggaatgctaagctgggatccgatgacaatgcatgcggctatgctaatgaatggtcttgggatttaccttggaatatgctaatgcatgcggctatgctaagctgggaatgcatgcggctatgctaagctgggatccgatgacaatgcatgcggctatgctaatgcatgcggctatgcaagctgggatccgatgactatgctaagctgcggctatgctaatgcatgcggctatgctaagctcatgcgg
Bioinformatics
Statistics
BiologyComputer Science
Chemistry
Data structuresSoftware engineering
(C, C++,PERL)Cell structure
Genome, genesDNA, RNA
Protein structureMolecular bounds
Markof ModelNeural Network
65
Bioinformatics Subareas
• The subareas within bioinformatics include Genomics and Proteomics.
66
Genome comparisonevolutionary tree
Microarray AnalysisGene predicationGene classificationGene regulation
Protein 3D predicationProtein protein interactionProtein alignment
• Molecular Biology Major Events• DNA, RNA • Protein Synthesis(Transcription & Translation)• Genome Anatomy• Bioinformatics
• Genomics Signal Processing
67
What is GSP?
aatgcatgcggctatgctaatgcatgcggctatgctaagctgggatccgatgacaatgcatgcggctatgctaatgcatgcggctatgcaagctgggatccgatgactatgctaagctgggatccgatgacaatgcatgcggctatgctaatgaatggtcttgggatttaccttggaatgctaagctgggatccgatgacaatgcatgcggctatgctaatgaatggtcttgggatttaccttggaatatgctaatgcatgcggctatgctaagctgggatccgatgacaatgcatgcggctatgctaatgcatgcggctatgcaagctgggatccgatgactatgctaagctgcggctatgctaatgcatgcggctatgctaagctgggatccgatgacaatgcatgcggctatgctaatgcatgcggctatgcaagctgggatcctgcggctatgctaatgaatggtcttgggatttaccttggaatgctaagctgggatccgatgacaatgcatgcggctatgctaatgaatggtcttgggatttaccttggaatatgctaatgcatgcggctatgctaagctgggaatgcatgcggctatgctaagctgggatccgatgacaatgcatgcggctatgctaatgcatgcggctatgcaagctgggatccgatgactatgctaagctgcggctatgctaatgcatgcggctatgctaagctcatgcggctatgctaagctgg
Analysis Processing
Using Theory and Methods of Signal Processing
To gain global understanding of Genome.
GSP Labs
• The Genomic Signal Processing Laboratory at
Texas A&M University.• The Computational Biology
Division of the Translational Genomics
Research Institute in Phoenix, Arizona.
Edward R. DoughertyTo model Genomic Regulatory Mechanisms for the purposes of diagnosis and therapy.
GSP Labs
• Columbia's Genomic Information Systems Laboratory
at Columbia University
Dimitris Anastassiou
GSP Labs
• DSP Group, Department of Electrical Engineering, California Institute of Technology
P. P. Vaidyanathan
Mapping Character String to Numerical Sequences
AAAATTTTCCCGGGTAGCTTTCCCGGGT
0001110101010101111111111000
Research Area of GSP
• Gene Predication• Genes Predication– Hidden Markov Models (HMM)– Fourier Transform– Wavelet Transform• Resonant Recognition Model (RRM)To identify the common hot spots of many protein
molecules using Fourier transform methods.•
References
• http://biology.ucok.edu/bidlack/biology/notes.htm
• http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=genomes
• http://www.estrellamountain.edu/faculty/farabee/biobk/biobooktoc.html
• http://www.werathah.com/• http://lectures.molgen.mpg.de/
online_lectures.html
74
References
• http://www.biology.lsu.edu/webfac/jmoroney/BIOL3090/
75
THANKYOU FOR YOUR
ATTENATION