Computational Problems in Molecular Biology
description
Transcript of Computational Problems in Molecular Biology
![Page 1: Computational Problems in Molecular Biology](https://reader033.fdocuments.in/reader033/viewer/2022051821/56815d3b550346895dcb3e90/html5/thumbnails/1.jpg)
Computational Problemsin Molecular Biology
Dong Xu
Computer Science Department109 Engineering Building WestE-mail: [email protected]
573-882-7064http://digbio.missouri.edu
![Page 2: Computational Problems in Molecular Biology](https://reader033.fdocuments.in/reader033/viewer/2022051821/56815d3b550346895dcb3e90/html5/thumbnails/2.jpg)
Lecture Outline
From DNA to gene
Protein sequence and structure
Gene expression
Protein interaction and pathway Provide a roadmap for the entire course Biology from system level (computational
perspective)
![Page 3: Computational Problems in Molecular Biology](https://reader033.fdocuments.in/reader033/viewer/2022051821/56815d3b550346895dcb3e90/html5/thumbnails/3.jpg)
About Life
Life is wonderful: amazing mechanisms
Life is not perfect: errors and diseases
Life is a result of evolution
![Page 4: Computational Problems in Molecular Biology](https://reader033.fdocuments.in/reader033/viewer/2022051821/56815d3b550346895dcb3e90/html5/thumbnails/4.jpg)
Cells
Basic unit of life Prokaryotes/eukaryotes Different types of cell:
Skin, brain, red/white blood Different biological function
Cells produced by cells Cell division (mitosis) 2 daughter cells
![Page 5: Computational Problems in Molecular Biology](https://reader033.fdocuments.in/reader033/viewer/2022051821/56815d3b550346895dcb3e90/html5/thumbnails/5.jpg)
DNA
Double Helix (Watson & Crick)
Nitrogenous Base Pairs Adenine Thymine [A,T]Cytosine Guanine [C,G]Weak bonds (can be broken)Form long chains
![Page 6: Computational Problems in Molecular Biology](https://reader033.fdocuments.in/reader033/viewer/2022051821/56815d3b550346895dcb3e90/html5/thumbnails/6.jpg)
Genome Each cell contains a full genome (DNA) The size varies:
Small for viruses and prokaryotes (10 kbp-20Mbp)Medium for lower eukaryotes
Yeast, unicellular eukaryote 13 Mbp Worm (Caenorhabditis elegans) 100 Mbp Fly, invertebrate (Drosophila melanogaster) 170 Mbp
Larger for higher eukaryotes Mouse and man 3000 Mbp
Very variable for plants (many are polyploid) Mouse ear cress (Arabidopsis thaliana) 120 Mbp Lilies 60,000 Mbp
![Page 7: Computational Problems in Molecular Biology](https://reader033.fdocuments.in/reader033/viewer/2022051821/56815d3b550346895dcb3e90/html5/thumbnails/7.jpg)
Differences in DNA
~2% ~4%
~0.2%
![Page 8: Computational Problems in Molecular Biology](https://reader033.fdocuments.in/reader033/viewer/2022051821/56815d3b550346895dcb3e90/html5/thumbnails/8.jpg)
Genes
Chunks of DNA sequence that can translate into functional biomolecules (protein, RNA)
2% human DNA sequence for coding genes
32,000 human genes, 100,000 genes in tulips
![Page 9: Computational Problems in Molecular Biology](https://reader033.fdocuments.in/reader033/viewer/2022051821/56815d3b550346895dcb3e90/html5/thumbnails/9.jpg)
Gene Structure General structure of an eukaryotic gene
Unlike eukaryotic genes, a prokaryotic gene typically consists of only one contiguous coding region
![Page 10: Computational Problems in Molecular Biology](https://reader033.fdocuments.in/reader033/viewer/2022051821/56815d3b550346895dcb3e90/html5/thumbnails/10.jpg)
Informational Classes in Genomic DNA
Transcribed sequences (exons and introns) Messenger sequences (mRNA, exons only) Coding sequences (CDS, part of the exons only) Heads and tails: untranslated parts (UTR) Regulatory sequences ... and all the rest
Identify them: gene-finding
![Page 11: Computational Problems in Molecular Biology](https://reader033.fdocuments.in/reader033/viewer/2022051821/56815d3b550346895dcb3e90/html5/thumbnails/11.jpg)
Genetic CodeA=Ala=Alanine
C=Cys=Cysteine
D=Asp=Aspartic acid
E=Glu=Glutamic acid
F=Phe=Phenylalanine
G=Gly=Glycine
H=His=Histidine
I=Ile=Isoleucine
K=Lys=Lysine
L=Leu=Leucine
M=Met=Methionine
N=Asn=Asparagine
P=Pro=Proline
Q=Gln=Glutamine
R=Arg=Arginine
S=Ser=Serine
T=Thr=Threonine
V=Val=Valine
W=Trp=Tryptophan
Y=Tyr=Tyrosine
![Page 12: Computational Problems in Molecular Biology](https://reader033.fdocuments.in/reader033/viewer/2022051821/56815d3b550346895dcb3e90/html5/thumbnails/12.jpg)
Protein Synthesis
AGCCACTTAGACAAACTA (DNA)Transcribed to:
AGCCACUUAGACAAACUA (mRNA)Translated to:
SHLDKL (Protein)
![Page 13: Computational Problems in Molecular Biology](https://reader033.fdocuments.in/reader033/viewer/2022051821/56815d3b550346895dcb3e90/html5/thumbnails/13.jpg)
About Protein
10s – 1000s amino acids (average 300)Lysozyme sequence (129 amino acids):
KVFGRCELAA AMKRHGLDNY RGYSLGNWVC AAKFESNFNT QATNRNTDGS TDYGILQINS RWWCNDGRTP GSRNLCNIPC SALLSSDITA SVNCAKKIVS
DGNGMNAWVA WRNRCKGTDV QAWIRGCRL
Protein backbones:Side chain
![Page 14: Computational Problems in Molecular Biology](https://reader033.fdocuments.in/reader033/viewer/2022051821/56815d3b550346895dcb3e90/html5/thumbnails/14.jpg)
Evolution of Genes: Mutation
Genes alter (slightly) during reproduction Caused by errors, from radiation, from toxicity
3 possibilities: deletion, insertion, alteration
Deletion: ACGTTGACTC ACGTGACTC
Insertion: ACGTTGACTC AGCGTTGACTC
Substitution: ACGTTGACTC ACGATGACTC
Mutations are mostly deleterious
![Page 15: Computational Problems in Molecular Biology](https://reader033.fdocuments.in/reader033/viewer/2022051821/56815d3b550346895dcb3e90/html5/thumbnails/15.jpg)
Ancestor
Gene duplication
X YRecombination
75%X 25%Y
Paralogs(related functions)
Mixed Homology
Orthologs(similar
function)
Evolution and Homology
Twilight zone: undetectable homology (<20% sequence identity)
![Page 16: Computational Problems in Molecular Biology](https://reader033.fdocuments.in/reader033/viewer/2022051821/56815d3b550346895dcb3e90/html5/thumbnails/16.jpg)
Sequence Comparison
o Pairwise sequence comparisono multiple alignment
SAANLEYLKNVLLQFIFLKPG--SERERLLPVINTMLQLSPEEKGKLAAV O15045NEKNMEYLKNVFVQFLKPESVP-AERDQLVIVLQRVLHLSPKEVEILKAA P34562KNEKIAYIKNVLLGFLEHKE----QRNQLLPVISMLLQLDSTDEKRLVMS Q06704REINFEYLKHVVLKFMSCRES---EAFHLIKAVSVLLNFSQEEENMLKET Q92805MLIDKEYTRNILFQFLEQRD----RRPEIVNLLSILLDLSEEQKQKLLSV O42657EPTEFEYLRKVMFEYMMGR-----ETKTMAKVITTVLKFPDDQAQKILER O70365DPAEAEYLRNVLYRYMTNRESLGKESVTLARVIGTVARFDESQMKNVISS Q21071STSEIDYLRNIFTQFLHSMGSPNAASKAILKAMGSVLKVPMAEMKIIDKK Q18013
![Page 17: Computational Problems in Molecular Biology](https://reader033.fdocuments.in/reader033/viewer/2022051821/56815d3b550346895dcb3e90/html5/thumbnails/17.jpg)
Phylogenetic Trees
Understand evolution
![Page 18: Computational Problems in Molecular Biology](https://reader033.fdocuments.in/reader033/viewer/2022051821/56815d3b550346895dcb3e90/html5/thumbnails/18.jpg)
Protein Structure
Lysozyme structure:
ball & stick strand surface
![Page 19: Computational Problems in Molecular Biology](https://reader033.fdocuments.in/reader033/viewer/2022051821/56815d3b550346895dcb3e90/html5/thumbnails/19.jpg)
Structure Features of Folded Proteins
Compact Secondary structures:
loop -helix -sheet
Protein cores mostly consist of -helices and -sheets
![Page 20: Computational Problems in Molecular Biology](https://reader033.fdocuments.in/reader033/viewer/2022051821/56815d3b550346895dcb3e90/html5/thumbnails/20.jpg)
Protein Structure Comparison
Structure is better conserved than sequence
Structure can adopt a wide range of mutations.
Physical forces favorcertain structures.
Number of fold is limited. Currently ~700 Total: 1,000 ~10,000 TIM barrel
![Page 21: Computational Problems in Molecular Biology](https://reader033.fdocuments.in/reader033/viewer/2022051821/56815d3b550346895dcb3e90/html5/thumbnails/21.jpg)
Protein Folding Problem
A protein folds into a unique 3D structure under the physiological condition
Lysozyme sequence: KVFGRCELAA AMKRHGLDNY RGYSLGNWVC AAKFESNFNT QATNRNTDGS TDYGILQINS RWWCNDGRTP GSRNLCNIPC SALLSSDITA SVNCAKKIVS DGNGMNAWVA WRNRCKGTDV
QAWIRGCRL
![Page 22: Computational Problems in Molecular Biology](https://reader033.fdocuments.in/reader033/viewer/2022051821/56815d3b550346895dcb3e90/html5/thumbnails/22.jpg)
Structure-Function Relationship
Certain level of function can be found without structure. But a structure is a key to understand the detailed mechanism.
A predicted structure is a powerful tool for function inference. Trp repressor as a function switch
![Page 23: Computational Problems in Molecular Biology](https://reader033.fdocuments.in/reader033/viewer/2022051821/56815d3b550346895dcb3e90/html5/thumbnails/23.jpg)
Structure-Based Drug Design
HIV protease inhibitor
Structure-based rational drug design is still a major method for drug discovery.
![Page 24: Computational Problems in Molecular Biology](https://reader033.fdocuments.in/reader033/viewer/2022051821/56815d3b550346895dcb3e90/html5/thumbnails/24.jpg)
Gene Expression
Same DNA in all cells, but only a few percent common genes expressed (house-keeping genes).
A few examples:
(1) Specialized cell: over-represented hemoglobin in blood cells.(2) Different stages of life cycle: hemoglobins before and after birth, caterpillar and butterfly.(3) Different environments: microbial in nutrient poor or rich environment.(4) Special treatment: response to wound.
![Page 25: Computational Problems in Molecular Biology](https://reader033.fdocuments.in/reader033/viewer/2022051821/56815d3b550346895dcb3e90/html5/thumbnails/25.jpg)
Eucaryote Gene Expression Control
DNAPrimaryRNA
transcriptmRNA mRNA
nucleus cytosol
RNA transportcontrol
inactivemRNA
mRNA degradation
control
translationcontrol
nucleus membrane
transcriptionalcontrol
protein
inactiveprotein
protein activitycontrol
RNA processing
control
Methods: Mass-spec Microarray
![Page 26: Computational Problems in Molecular Biology](https://reader033.fdocuments.in/reader033/viewer/2022051821/56815d3b550346895dcb3e90/html5/thumbnails/26.jpg)
Gene Regulation
DNA sequenceStart of transcription
promoter
operator
![Page 27: Computational Problems in Molecular Biology](https://reader033.fdocuments.in/reader033/viewer/2022051821/56815d3b550346895dcb3e90/html5/thumbnails/27.jpg)
Microarray Experiments
Microarray data
Regulation/function/pathway/cellular state/phenotype
Disease: diagnosis/gene identification/sub-typing
Microarray chip
![Page 28: Computational Problems in Molecular Biology](https://reader033.fdocuments.in/reader033/viewer/2022051821/56815d3b550346895dcb3e90/html5/thumbnails/28.jpg)
Genetic vs. Physical Interaction
Regulatory network
Genetic interaction
Complex system
Physical interaction
Gene/protein interaction
Expressedgene
Transcriptionfactor
![Page 29: Computational Problems in Molecular Biology](https://reader033.fdocuments.in/reader033/viewer/2022051821/56815d3b550346895dcb3e90/html5/thumbnails/29.jpg)
Biological Pathway
![Page 30: Computational Problems in Molecular Biology](https://reader033.fdocuments.in/reader033/viewer/2022051821/56815d3b550346895dcb3e90/html5/thumbnails/30.jpg)
Studying Pathways throughSystems Biology Approach
RGYSLGNWVC AAKFESNFNT QATNRNTDGS TDYGILQINS RWWCNDGRTP GSRNLCNIPCsequence
structure
function protein interaction
gene regulation
pathway(cross-talk)
![Page 31: Computational Problems in Molecular Biology](https://reader033.fdocuments.in/reader033/viewer/2022051821/56815d3b550346895dcb3e90/html5/thumbnails/31.jpg)
Discussion
Possible impacts of biotechnology to our life
![Page 32: Computational Problems in Molecular Biology](https://reader033.fdocuments.in/reader033/viewer/2022051821/56815d3b550346895dcb3e90/html5/thumbnails/32.jpg)
Assignments
Required reading:* Chapter 13 in “Pavel Pevzner: Computational Molecular Biology - An Algorithmic Approach. MIT Press, 2000.”* Larry Hunter: molecular biology for computer scientists
Optional reading: http://www.ncbi.nih.gov/About/primer/bioinformatics.htmlhttp://www.bentham.org/cpps1-1/Dong%20Xu/xu_cpps.htm