Introduction unit 1 BIOL221T: Advanced Bioinformatics for Biotechnology Irene Gabashvili, PhD...
Transcript of Introduction unit 1 BIOL221T: Advanced Bioinformatics for Biotechnology Irene Gabashvili, PhD...
IntroductionIntroductionunit 1unit 1
BIOL221TBIOL221T: Advanced : Advanced Bioinformatics for Bioinformatics for
BiotechnologyBiotechnology
Irene Gabashvili, PhD
Course availabilityCourse availability
Lectures & Lab: every Wednesday, Lectures & Lab: every Wednesday, Duncan Hall, Room 550, 6:00 pm to 9:45 Duncan Hall, Room 550, 6:00 pm to 9:45 pmpm
Office hours: Wednesday, 4pm-6pm (Room Office hours: Wednesday, 4pm-6pm (Room
554, phone: 92404831) and by 554, phone: 92404831) and by appointmentappointment
Lecture notes will be posted at: Lecture notes will be posted at: http://home.comcast.net/~igabashvili/221T.htmhttp://home.comcast.net/~igabashvili/221T.htm
DATES
Units
B&O Topic Due
Jan23-30
1-4 Foreword, Intro, chapter3, lecture notes
Introduction: information, databases, programming
Survey PS0
Feb-March
1,2,5,11,12 Sequence informatics
PS1 PS2
March-April
14, 16, Lecture Notes
Network informatics
PS3
April 8-10, 17 Structure informatics
Projects
May Review PS4Exam
Final GradingFinal Grading
Voted for Voted against
SurveySurvey Compose a short message introducing Compose a short message introducing
yourself, your science background, yourself, your science background, bioinformatics interests and what you hope bioinformatics interests and what you hope to learn from taking this course.to learn from taking this course. What bioinformatics databases and tools have What bioinformatics databases and tools have
you used in your previous courses/projects?you used in your previous courses/projects? How familiar are you with resources/tools How familiar are you with resources/tools
mentioned in this lecture and listed in the mentioned in this lecture and listed in the Survey? Survey? (? = not aware of / 0 = aware of, but never (? = not aware of / 0 = aware of, but never use / 1 = seldom use / 2 = weekly / 3 = daily )use / 1 = seldom use / 2 = weekly / 3 = daily )
If you were to start a company, what If you were to start a company, what bioinformatics service would you provide or bioinformatics service would you provide or need for the development of your solution?need for the development of your solution?
The bioinformatics The bioinformatics projectproject
An opportunity to use the tools and An opportunity to use the tools and approaches taught in this course to approaches taught in this course to research an area of personal research an area of personal interest. interest.
Example 1Example 1
Choose a nucleotide or protein sequence Choose a nucleotide or protein sequence with some presumed functional or with some presumed functional or structural importance, at least 140 structural importance, at least 140 residues in length. residues in length.
Define the problem or question, for example:Define the problem or question, for example: Detection of distantly related (divergent) Detection of distantly related (divergent)
sequences. sequences. Detection of sequence homologs in various Detection of sequence homologs in various
species. species. Detection of homologous motifs in proteins of Detection of homologous motifs in proteins of
varied function. varied function.
Example 1Example 1
AbstractAbstract Introduction: define the problemIntroduction: define the problem Materials and Methods. Materials and Methods. Multiple sequence alignment figure. Multiple sequence alignment figure. Phylogenetic tree. Phylogenetic tree. Discussion. Discussion.
Example 1Example 1
cctgttaaaaatggtaaaattactaatgat cctgttaaaaatggtaaaattactaatgat PVKNGKITNDPVKNGKITNDEC 2.7.2.3 EC 2.7.2.3
Nucleic acid translator Nucleic acid translator O Owl protein wl protein db db function & structure function & structure drugs drugs Q– Q– how many protein sequences?how many protein sequences?
BLAST (blastn, blastp?) BLAST (blastn, blastp?) clustalw clustalw BLAT BLAT SNPdb SNPdb
Example 2Example 2
Choose a disease. Find genes responsible Choose a disease. Find genes responsible or predisposing to this disease. or predisposing to this disease. Hypothesize on the disease pathway. Or Hypothesize on the disease pathway. Or find genes expressed in diseased tissue, find genes expressed in diseased tissue, compare to normal, research and report compare to normal, research and report findingsfindings OMIM, biol. literature, even google OMIM, biol. literature, even google NCBI NCBI
Gene Gene KEGG KEGG IPAIPA Unigene DDD or GEO DB Unigene DDD or GEO DB Pathway tools Pathway tools
Example 2: in the newsExample 2: in the news
six more gene regions associated with the six more gene regions associated with the severest form of lupus reported last Sundayseverest form of lupus reported last Sunday
ITGAM, located on Chromosome 16; ITGAM, located on Chromosome 16; BLK, on Chromosome 8; BLK, on Chromosome 8; KIAA1542, on Chromosome 11; KIAA1542, on Chromosome 11; rs10798269, on Chromosome 1; rs10798269, on Chromosome 1; PXK on Chromosome 3; and PXK on Chromosome 3; and BANK1, on Chromosome 4.BANK1, on Chromosome 4.
Genes Linked to Height Also Tied to OsteoarthritisGenes Linked to Height Also Tied to Osteoarthritis Genes Stacked Against Weight Loss?Genes Stacked Against Weight Loss?
Example 3Example 3
Assay on New and NotableAssay on New and Notable Personal Genome Services: Personal Genome Services:
workflow, shortcomings, future workflow, shortcomings, future trends (Decode Genetics, 23andMe, trends (Decode Genetics, 23andMe, Knome, Navigenics)Knome, Navigenics)
Inexpensive whole-genome Inexpensive whole-genome sequencing technologiessequencing technologies
Projects: more ideasProjects: more ideas
http://biochem218.stanford.edu/Projects.html
Comparing bioinformatics tools: Comparing bioinformatics tools: Pathway AnalysisPathway Analysis
Research with MatlabResearch with Matlab HCE, TreeView, SAMHCE, TreeView, SAM VectorNTIVectorNTI Visualization: Chimera, CN3D, PymolVisualization: Chimera, CN3D, Pymol R and other statistics toolsR and other statistics tools
Bioinformatics: A Bioinformatics: A Practical Guide to the Practical Guide to the Analysis of Genes and Analysis of Genes and Proteins, 3rd EditionProteins, 3rd EditionAndreas D. Baxevanis Andreas D. Baxevanis (Editor), B. F. Francis (Editor), B. F. Francis
Ouellette (Editor) Ouellette (Editor) Previously chosen for this course, still the main book
Developing Bioinformatics Developing Bioinformatics Computer Skills Computer Skills by Cynthia by Cynthia
Gibas, Per Jambeck Gibas, Per Jambeck
Introduction to Bioinformatics Introduction to Bioinformatics by by Arthur M. LeskArthur M. Lesk
Bioinformatics for dummies by Jean-Michel D. Claverie, etc
Other Other good good booksbooks
More computational
Online lectures and Online lectures and resourcesresources
http://www.ebi.ac.uk/2can/tutorials/http://www.ebi.ac.uk/2can/tutorials/ http://www.ncbi.nlm.nih.gov/About/http://www.ncbi.nlm.nih.gov/About/ http://lectures.molgen.mpg.de/http://lectures.molgen.mpg.de/
online_lectures.htmlonline_lectures.html http://zlab.bu.edu/zlab/links.shtmlhttp://zlab.bu.edu/zlab/links.shtml http://www.nslij-genetics.org/http://www.nslij-genetics.org/
bioinfotraining/bioinfotraining/ http://learn.perl.org/http://learn.perl.org/
More links at the course pageMore links at the course page
Databases & Online Databases & Online Resources:Resources:
NCBI databases: http://www.ncbi.nlm.nih.gov/NCBI databases: http://www.ncbi.nlm.nih.gov/ The Protein Data Bank: http://www.rcsb.org/pdb/The Protein Data Bank: http://www.rcsb.org/pdb/ Proteomics Software toolsProteomics Software tools from ExPASy (Expert from ExPASy (Expert
Protein Analysis System). Protein Analysis System). http://www.expasy.org/tools/http://www.expasy.org/tools/
NCBINCBI BLAST can be used and downloaded from BLAST can be used and downloaded from this site. this site. http://www.ncbi.nlm.nih.gov/http://www.ncbi.nlm.nih.gov/
UCSC Genome Browser: http://genome.ucsc.edu/UCSC Genome Browser: http://genome.ucsc.edu/ EBI EBI http://www.ebi.ac.uk/clustalw/http://www.ebi.ac.uk/clustalw/ Tree of Life: http://itol.embl.de/Tree of Life: http://itol.embl.de/ KEGG: http://www.genome.jp/kegg/KEGG: http://www.genome.jp/kegg/
More on the course websiteMore on the course website
Software:Software:
PerlPerl. Perl is open source software and may be . Perl is open source software and may be downloaded for free from several sites. downloaded for free from several sites. http://www.activestate.com/Products/activeperl/ http://www.activestate.com/Products/activeperl/ http://www.perl.com/download.csp#stablehttp://www.perl.com/download.csp#stable
Unix/Linux (Mac OS X)Unix/Linux (Mac OS X) MATLABMATLAB. Will be available in the Lab. Will be available in the Labhttp://www.mathworks.com/products/bioinfo/demos.html IPA – trial version available for free, account in MarchIPA – trial version available for free, account in March R, Treeview, HCA, SAM – can be downloaded for freeR, Treeview, HCA, SAM – can be downloaded for free Visualization:Visualization: Rasmol, Chimera, VND, Cn3d, Pymol Rasmol, Chimera, VND, Cn3d, Pymol
Why these choices?Why these choices? Why BLAST? Because you can learn a lot Why BLAST? Because you can learn a lot
by comparing sequences, and BLAST is by comparing sequences, and BLAST is the standard program for this task. the standard program for this task.
Why Unix? Because most bioinformatics Why Unix? Because most bioinformatics applications were originally developed in applications were originally developed in Unix. Unix.
Why Perl? Because Perl (and BioPerl) is Why Perl? Because Perl (and BioPerl) is the most popular programming language the most popular programming language in bioinformatics. in bioinformatics.
Other Programming Other Programming LanguagesLanguages
Python (bioPython) also popular in Python (bioPython) also popular in BioinformaticsBioinformatics
Ruby is another scripting language Ruby is another scripting language with a rapid development cycle. with a rapid development cycle.
Java, C++, and the like can be Java, C++, and the like can be overkill for bioinformatics (vs overkill for bioinformatics (vs hardcore coding/software hardcore coding/software development)development)
biomedical biomedical informatics?informatics?Definitions may differ, but objectives are the sameDefinitions may differ, but objectives are the same
What What isis
What is bioinformatics?What is bioinformatics?
Biologists using computers, or the other Biologists using computers, or the other way aroundway around
Twenty-First Century Rocket ScienceTwenty-First Century Rocket Science The science of Blast searches The science of Blast searches
Writing bioinformatics software is Writing bioinformatics software is tougher and very competitive. You tougher and very competitive. You probably won’t get rich in this arena, probably won’t get rich in this arena, but…but…
End of Unit 1End of Unit 1
Please fill out the SurveyPlease fill out the Survey
Demo for Problem Set 0 (Jan.30)Demo for Problem Set 0 (Jan.30)
(to be continued after the break)(to be continued after the break)