Post on 02-Jun-2020
1
GraduateGraduate ComputationalComputationalGenomicsGenomics
02-710 / 10-81002-710 / 10-810 & MSCBIO2070& MSCBIO2070
Elements of MolecularElements of Molecular BiologyBiology
Takis BenosTakis Benos
Lecture #6a, February 1, 2007Lecture #6a, February 1, 2007
Reading: hand-outsReading: hand-outs
Benos 02-710/MSCBIO2070 1-FEB-2007 2
Sequence analysis (6 lectures)Sequence analysis (6 lectures)
A little biology… …and statistics (conditional, Markov chains, HMMs)
Biological sequence “matchmaking” Evolution of DNA and protein sequences - Distances Pairwise and multiple sequence analysis Algorithms for database search
Gene finding DNA motif dicovery
cis-regulatory motifs and modules microRNA genes
2
Benos 02-710/MSCBIO2070 1-FEB-2007 3
Sequence analysis (6 lectures)Sequence analysis (6 lectures)
What we will not talk in these lectures Genome sequencing assembly Clustering/classification (K-means, SVMs, etc) RNA folding
Benos 02-710/MSCBIO2070 1-FEB-2007 4
Outline of the biology partOutline of the biology part
Basic Definitions Cells’ basic components Basic characteristics of DNA & Proteins Transcription and Translation: Central Dogma Other Features of Genetic Sequence Molecular Evolution
3
Benos 02-710/MSCBIO2070 1-FEB-2007 5
CellsCells’’ components components
Cells are complex We look at a simplified
version: Extracellular environment Membrane Cytoplasm Nucleus (in eukaryotes)
Benos 02-710/MSCBIO2070 1-FEB-2007 6
Chromosome
DNA - Chromosomes - GenesDNA - Chromosomes - Genesp arm
q armcentromere
4
Benos 02-710/MSCBIO2070 1-FEB-2007 7
DNA - Chromosomes - GenesDNA - Chromosomes - Genes
5’ - A T C G G T - 3’| | | | | |
3’ - T A G C C A - 5’
5’ - A C C G A T - 3’
Benos 02-710/MSCBIO2070 1-FEB-2007 8
• We cannot define it (but we know it when we see it…)
What is a What is a ““genegene””??
“Gene” is a DNA information unit that is ableto perform a function in a cellular environment
• A loose definition:
5
Benos 02-710/MSCBIO2070 1-FEB-2007 9
Central Dogma (and beyond)Central Dogma (and beyond)
Human gDNA, ~3x109 bpContains ~ 22,000 genes G
A
C
A
G
C
messenger-RNA
transcription translation folding
Slide courtesy: Serafim Batzoglou
Benos 02-710/MSCBIO2070 1-FEB-2007 10
Protein coding genesProtein coding genes
5’
3’
6
Benos 02-710/MSCBIO2070 1-FEB-2007 11
Genes and ProteinsGenes and Proteins amino acids proteins the genetic codechart of code
Benos 02-710/MSCBIO2070 1-FEB-2007 12
Amino acid varietiesAmino acid varieties
Slide courtesy: Serafim Batzoglou
7
Benos 02-710/MSCBIO2070 1-FEB-2007 13
Amino acid varieties (Amino acid varieties (cntdcntd))
Slide courtesy: Serafim Batzoglou
Benos 02-710/MSCBIO2070 1-FEB-2007 14
Categories of living organismsCategories of living organisms
Source: http://www.bmb.psu.edu/courses/micro401/default.htm
EukaryotesEukaryotesProkaryotesProkaryotes
2-4 billion yrs
8
Benos 02-710/MSCBIO2070 1-FEB-2007 15
Cell membranes,nucleus/organelles
Cell walls, nonucleus/organellesCell
ComplexSimpleTranscription
ComplexSimple (no introns,short UTRs)Gene
Monocistronic(mostly)PolycistronicmRNA
EukaryotesProkaryotes
Prokaryotes Prokaryotes vsvs.. Eukaryotes Eukaryotes
Benos 02-710/MSCBIO2070 1-FEB-2007 16
Genome logistics:Genome logistics:viruses and prokaryotesviruses and prokaryotes
Organism Size (bp x 106) No. of prot. genes
HIV-1 0.1 8
phage 0.05 71
E. coli 4.7 3,200
H. influenza 1.8 1,700
9
Benos 02-710/MSCBIO2070 1-FEB-2007 17
Genome logistics:Genome logistics:eukaryoteseukaryotes
Organism Size (bp x 106) No. of prot. genes
S. cerevisiae 12 6,300
C. elegans 97 19,100
A. thaliana 125 25,500
D. melanogaster 180 13,600
H. sapiens 2,900 25,000
Benos 02-710/MSCBIO2070 1-FEB-2007 18
Gene structure: prokaryotesGene structure: prokaryotes
ayz-10 box-35 box
17 bp
mRNA
proteins
10
Benos 02-710/MSCBIO2070 1-FEB-2007 19
Gene structure: eukaryotesGene structure: eukaryotes
-100 -30TATACAAT
“core” promoter exon-1 exon-2 exon-3intron-1 intron-2
Benos 02-710/MSCBIO2070 1-FEB-2007 20
RNA SplicingRNA Splicing
-100 -30
TATACAAT
5’ UTR 3’ UTRCDS
mRNA
11
Benos 02-710/MSCBIO2070 1-FEB-2007 21
TranslationTranslation
-100 -30
TATACAAT
mRNA
protein
ATG TAA
CAP AAAAAAAAAAA
Benos 02-710/MSCBIO2070 1-FEB-2007 22
Alternative splicingAlternative splicing
-100 -30
TATACAAT
mRNA
mRNA(transcript-2)
12
Benos 02-710/MSCBIO2070 1-FEB-2007 23
promoter region expression levels degradation post modifications
Transcription regulationTranscription regulation
Benos 02-710/MSCBIO2070 1-FEB-2007 24
promoter region expression levels degradation post modifications
Transcription regulationTranscription regulation
13
Benos 02-710/MSCBIO2070 1-FEB-2007 25
tRNA(*)
ribosomal RNA(*)
snoRNA(*)
microRNA etc
Non-coding genesNon-coding genes
Source: http://www.emc.maricopa.edu/faculty/farabee/BIOBK/BioBookPROTSYn.html
(*)Not to be discussed in this course.
Benos 02-710/MSCBIO2070 1-FEB-2007 26
transposable elements(*)
repetitive DNA(*)
“junk” DNA(*)
Other DNA elementsOther DNA elements
(*)Not to be discussed in this course.