Application of Data mining and Soft Computing in Bioinformatics
Welcome to Introduction to Bioinformatics Computing
description
Transcript of Welcome to Introduction to Bioinformatics Computing
![Page 1: Welcome to Introduction to Bioinformatics Computing](https://reader035.fdocuments.in/reader035/viewer/2022062408/5681418e550346895dad7659/html5/thumbnails/1.jpg)
Welcome toIntroduction to Bioinformatics Computing
aka
BIC1
![Page 2: Welcome to Introduction to Bioinformatics Computing](https://reader035.fdocuments.in/reader035/viewer/2022062408/5681418e550346895dad7659/html5/thumbnails/2.jpg)
Team taught by
• Rhys Price Jones, Ph.D.– [email protected]– Bldg. 7B-2250; 5-5866– Office Hours: Monday, Wednesday, Friday 10-
11am
• Anne R. Haake, Ph.D.– [email protected]– Bldg. 70-2325; 5-5365– Office Hours: Tuesday 2-4 p.m; Friday 10-noon
![Page 3: Welcome to Introduction to Bioinformatics Computing](https://reader035.fdocuments.in/reader035/viewer/2022062408/5681418e550346895dad7659/html5/thumbnails/3.jpg)
The Focus of Bioinformatics
• Using computers to answer biological questions– Storage– Visualization– Analysis
• Using computers to figure which biological questions to ask
![Page 4: Welcome to Introduction to Bioinformatics Computing](https://reader035.fdocuments.in/reader035/viewer/2022062408/5681418e550346895dad7659/html5/thumbnails/4.jpg)
What is this course about?
• We will focus on analysis:
– We will study techniques for quickly and effectively commandeering computing resources to the solution of problems raised in the realm of biology
– We will study algorithms (more on this later..) that underlie many of the popular bioinformatics software packages
• The majority of these algorithms are concerned with sequence analysis (more on this, too…)
![Page 5: Welcome to Introduction to Bioinformatics Computing](https://reader035.fdocuments.in/reader035/viewer/2022062408/5681418e550346895dad7659/html5/thumbnails/5.jpg)
The Context of Bioalgorithms
• It is important to keep in mind that a mathematically perfect solution to an ideally posed problem may not be the most biologically relevant
• We need a flexibility, a willingness to rephrase the question, to rethink the process, to adapt and re-adapt
![Page 6: Welcome to Introduction to Bioinformatics Computing](https://reader035.fdocuments.in/reader035/viewer/2022062408/5681418e550346895dad7659/html5/thumbnails/6.jpg)
Course Structure
• 3 Classroom sessions each week to introduce the biological perspective and computational approaches for each biological problem
• 1 Laboratory session to give you hands-on experience in applying and refining computational methods in the context of biology
![Page 7: Welcome to Introduction to Bioinformatics Computing](https://reader035.fdocuments.in/reader035/viewer/2022062408/5681418e550346895dad7659/html5/thumbnails/7.jpg)
Readings
• Textbook:– Algorithms on Strings Trees and Sequences, Computer
Science and Computational Biology, Dan Gusfield, Cambridge University Press, 1997, ISBN 0 521 58519 8
• Papers from the current literature, as assigned• Lecture notes and lab manuals as posted and linked
to from the course home page• Note that, unless otherwise noted, net-based
resources should be accessed using Netscape. Other browsers may not be able to correctly interpret the JavaScript code.
![Page 8: Welcome to Introduction to Bioinformatics Computing](https://reader035.fdocuments.in/reader035/viewer/2022062408/5681418e550346895dad7659/html5/thumbnails/8.jpg)
Expectations – Computing Background
• There are skills you should possess in part already, but which will be significantly enhanced by being exercised in this course:– identifying and clearly phrasing a computational
problem from a general biological query– rapidly developing, testing and analyzing tools for
the solution of such problems if necessary– locating existing tools if not– understanding the capabilities and limitations of
such tools
![Page 9: Welcome to Introduction to Bioinformatics Computing](https://reader035.fdocuments.in/reader035/viewer/2022062408/5681418e550346895dad7659/html5/thumbnails/9.jpg)
Computing Background – Specific skills
• Programming in a language such as Lisp, Perl, Scheme, Java, C, Python, etc. (if in doubt, ask!)
• Static and dynamic data structures – arrays, lists, trees, etc.
• Control structures, especially recursion• Rapid prototyping, careful version control• Understanding of mathematics for:
– analysis– proof– modeling
![Page 10: Welcome to Introduction to Bioinformatics Computing](https://reader035.fdocuments.in/reader035/viewer/2022062408/5681418e550346895dad7659/html5/thumbnails/10.jpg)
Biological Motivation
• The fundamental building blocks of life are proteins– Enzymes, structural proteins, transport molecules,
antibodies
• 100,000 or so different proteins in a human• Their properties and interactions are what
make us what we are
![Page 11: Welcome to Introduction to Bioinformatics Computing](https://reader035.fdocuments.in/reader035/viewer/2022062408/5681418e550346895dad7659/html5/thumbnails/11.jpg)
Biological Motivation
• What are proteins?– Polymers of amino acids (20 different)– Sequence of these amino acids (primary structure)
determines the protein’s shape (secondary and tertiary structures)
– Protein shape and chemical composition it’s amino acids determine protein function
![Page 12: Welcome to Introduction to Bioinformatics Computing](https://reader035.fdocuments.in/reader035/viewer/2022062408/5681418e550346895dad7659/html5/thumbnails/12.jpg)
Figure from W. Gilbert, Ph.D New Hampshire Biotech. Center
So…in theory, we can infer protein function if we know the protein sequence
![Page 13: Welcome to Introduction to Bioinformatics Computing](https://reader035.fdocuments.in/reader035/viewer/2022062408/5681418e550346895dad7659/html5/thumbnails/13.jpg)
Biological Motivation
• How do we find out protein sequence?– Can sequence proteins directly but this has been
technically difficult– Determine protein sequence from the DNA
sequences that encode them
![Page 14: Welcome to Introduction to Bioinformatics Computing](https://reader035.fdocuments.in/reader035/viewer/2022062408/5681418e550346895dad7659/html5/thumbnails/14.jpg)
The Central Dogma
Hereditary information for a complete individual stored in the DNA,which is self-replicating, and is organized into units of expression (genes)
A gene is expressed in 2 steps:
DNA is transcribed into RNA
RNA is translated into protein
![Page 15: Welcome to Introduction to Bioinformatics Computing](https://reader035.fdocuments.in/reader035/viewer/2022062408/5681418e550346895dad7659/html5/thumbnails/15.jpg)
Most Protein Sequences Are Determined From DNA Sequence
• Why?• Availability of DNA sequence information
– Rapid development of DNA sequencing technology
– Genomes of many different species have now been sequenced
• Difficulties?– Data sets are large– Cellular pathway from DNA to RNA to protein can
be complicated
![Page 16: Welcome to Introduction to Bioinformatics Computing](https://reader035.fdocuments.in/reader035/viewer/2022062408/5681418e550346895dad7659/html5/thumbnails/16.jpg)
Some Genomes
• E. coli 4.6 x 106 bases– Approx. 4,000 genes
• Yeast15 x 106 bases– Approx. 6,000 genes
• Smallest human chromosome 50 x 106 bases• Human 3 x 109 bases
– Approx. 30,000 genes ?
![Page 17: Welcome to Introduction to Bioinformatics Computing](https://reader035.fdocuments.in/reader035/viewer/2022062408/5681418e550346895dad7659/html5/thumbnails/17.jpg)
The Computational Approach
• The nucleotide sequence of a genome contains all information necessary to produce a functional organism
• Therefore, we should, in theory, be able to duplicate this decoding using computers
![Page 18: Welcome to Introduction to Bioinformatics Computing](https://reader035.fdocuments.in/reader035/viewer/2022062408/5681418e550346895dad7659/html5/thumbnails/18.jpg)
Why Use Computational Techniques?
• The datasets are too large to analyze by hand
• Efficient algorithms are the only way to perform the analyses that we need to answer the biological questions
![Page 19: Welcome to Introduction to Bioinformatics Computing](https://reader035.fdocuments.in/reader035/viewer/2022062408/5681418e550346895dad7659/html5/thumbnails/19.jpg)
Common Biological Questions Answered Through Sequence Analysis
• Determine if an interesting DNA sequence has been seen by anyone else
• Find all the protein coding regions in a genome• Infer the function of a new gene from a known one by
matching two amino acid sequences• Measure the evolutionary distance between species• Predict local secondary structure of a peptide
sequence, predict protein conformation, predict function
• Study protein families
![Page 20: Welcome to Introduction to Bioinformatics Computing](https://reader035.fdocuments.in/reader035/viewer/2022062408/5681418e550346895dad7659/html5/thumbnails/20.jpg)
Many Molecular Biology Problems on Sequences Can Be Formulated As String
Matching Problems• Comparing two or more strings for similarities• Searching databases for related strings• Looking for new patterns occurring frequently
in DNA• Reconstructing long strings of DNA from
overlapping string fragments• And more…
![Page 21: Welcome to Introduction to Bioinformatics Computing](https://reader035.fdocuments.in/reader035/viewer/2022062408/5681418e550346895dad7659/html5/thumbnails/21.jpg)
We Will Be Studying Algorithms For:
• Exact string matching• Inexact string matching• Sequence alignment problems• Multiple alignment problems• And more…
![Page 22: Welcome to Introduction to Bioinformatics Computing](https://reader035.fdocuments.in/reader035/viewer/2022062408/5681418e550346895dad7659/html5/thumbnails/22.jpg)
Role of Evolutionary Theory
• Central to computational biology• Evolution is descent with modification, driven
by:– Diversity: different individuals carry different
variants of the basic blueprint– Mutations: DNA sequence can be changed – Selection bias
![Page 23: Welcome to Introduction to Bioinformatics Computing](https://reader035.fdocuments.in/reader035/viewer/2022062408/5681418e550346895dad7659/html5/thumbnails/23.jpg)
Role of Evolutionary Theory
• Related organisms have:– similar DNA– similar protein sequences– similar organization of genes
• Similar structures tend to have similar functions
• The bottom line:– evolution is the reason that we can assume
similarity is meaningful in computational biology