Practice (Molecular Biology) Theory: Central Dogma Methods: separation, visualization
description
Transcript of Practice (Molecular Biology) Theory: Central Dogma Methods: separation, visualization
![Page 1: Practice (Molecular Biology) Theory: Central Dogma Methods: separation, visualization](https://reader030.fdocuments.in/reader030/viewer/2022033104/56814842550346895db55949/html5/thumbnails/1.jpg)
Bioinformatics: Theory and Practice – Striking a Balance
(a plea for teaching, as well as doing, Bioinformatics)
Practice(Molecular Biology)
Theory: Central DogmaMethods: separation,
visualizationExperiment as “Art”
Theory(population/statistical genetics)
Theory: 80+ years of Mathematical Biology
Methods: Ag,RFLPs,SNPs…
BioinformaticsTheory: 40 years of algorithms,
information theory20+ years of statistics
Current practice My ideal
The spectrum of experimental BiologyPractice – Theory
![Page 2: Practice (Molecular Biology) Theory: Central Dogma Methods: separation, visualization](https://reader030.fdocuments.in/reader030/viewer/2022033104/56814842550346895db55949/html5/thumbnails/2.jpg)
Teaching and Bioinformatics
What is the goal?• Learning Biology / learning Computer Science• Becoming "computer literate"
scripting/programming• Exploring uncertainty
– experimental shortcomings– computational biases
• Utility – getting something done
Bioinformatics is challenging because biology is complicated and idiosyncratic
![Page 3: Practice (Molecular Biology) Theory: Central Dogma Methods: separation, visualization](https://reader030.fdocuments.in/reader030/viewer/2022033104/56814842550346895db55949/html5/thumbnails/3.jpg)
Biology: A “clean” experiment –Internal positive and negative controls
Southern blot of human class-mu Glutathione transferase genes from individuals with low (-) or high (+) GT-tSBO activity.
Bands found with high GT-tSBO (GSTM1)
RFLP independent of GT-tSBO
• When GSTM1 is present, it is detected
• When it is not detected, it is absent
![Page 4: Practice (Molecular Biology) Theory: Central Dogma Methods: separation, visualization](https://reader030.fdocuments.in/reader030/viewer/2022033104/56814842550346895db55949/html5/thumbnails/4.jpg)
Bioinformatics –ambiguity or computational error?
D3BUQ5 is “clearly” homologous to GSTA6_RAT, aligning from beginning to end• Does it have a GST_C domain?• Does it have glutathione transferase activity?• Could it be a steroid isomerase? Prostaglandin synthetase?
![Page 5: Practice (Molecular Biology) Theory: Central Dogma Methods: separation, visualization](https://reader030.fdocuments.in/reader030/viewer/2022033104/56814842550346895db55949/html5/thumbnails/5.jpg)
Why is Bioinformatics “hard”?
Bioinformatics is at the intersection of Biology, Computer
science, and Statistics• What is interesting to Computer Scientists, – algorithms, optimality –
is less relevant to Biologists (text book bias)
• “irrelevant” parameters for Computer Scientists – DNA vs protein –
are important in practice
• Statistics are central, and the statistical perspective is not well
integrated into either Biology or CS curricula
• The biological assumptions behind a “null hypothesis” are rarely
explicit and often idealistic
• Biologists do experiments (CS folks like theory). If it works, use it.Bioinformatics uses "hard/true/reproducible" techniques
to solve "soft/ambiguous/varying" biological questions.
A teaching "opportunity"
![Page 6: Practice (Molecular Biology) Theory: Central Dogma Methods: separation, visualization](https://reader030.fdocuments.in/reader030/viewer/2022033104/56814842550346895db55949/html5/thumbnails/6.jpg)
6
Alberts is wrong about sequence similarity(three times in three claims)
“With such a large number of proteins in the database, the search programs find many nonsignificant matches, resulting in a background noise level that makes it very difficult to pick out all but the closest relatives. Generally speaking, one requires a 30% identity in sequence to consider that two proteins match. However, we know the function of many short signature sequences ("fingerprints"), and these are widely used to find more distant relationships.”
– Alberts, Molecular Biology of the Cell (5th ed, 2007) p. 139
• Sequences producing statistically significant alignments ALWAYS share a common structure
• Many significant alignments share < 30% identity (<25% identity is routine, and <20% identity can be significant)
• In the absence of significant similarity, “fingerprints” should never be trusted.
![Page 7: Practice (Molecular Biology) Theory: Central Dogma Methods: separation, visualization](https://reader030.fdocuments.in/reader030/viewer/2022033104/56814842550346895db55949/html5/thumbnails/7.jpg)
How can we teach better?
• Discuss the strengths and weaknesses of data
resources
• Examine how published protocols go out of date
(or are optimized for different problems).
Examine potential weaknesses – what do the
protocols assume?
• Review high-profile papers with mistaken
conclusions to understand what went wrong.
![Page 8: Practice (Molecular Biology) Theory: Central Dogma Methods: separation, visualization](https://reader030.fdocuments.in/reader030/viewer/2022033104/56814842550346895db55949/html5/thumbnails/8.jpg)
Biology 4XXX – Bioinformatics and Functional Genomics
3hr lecture, 1hr lab
• Introduction to Unix / perl (python) scripting / web resources
– programming by imitation
• similarity searching / domain identification
– homology, scoring matrices– errors in domain annotation (why)
• multiple sequence alignment– sequences vs domains
• evolutionary tree-building– finding the best tree– evaluating alternative trees– where is the uncertainty (why)
• Introduction to 'R' statistical language
– programming by imitation
• Expression analysis– read mapping, read counting
• Motif extraction, mapping– motif independence?
• Pathway analysis – gene enrichment
• Gene models and alternative splicing
– which gene/splicing models supported?
![Page 9: Practice (Molecular Biology) Theory: Central Dogma Methods: separation, visualization](https://reader030.fdocuments.in/reader030/viewer/2022033104/56814842550346895db55949/html5/thumbnails/9.jpg)
Computational and Comparative GenomicsOct 29 – Nov 4, 2014
(application deadline July 15, 2014)