Introduction unit 1 BIOL221T: Advanced Bioinformatics for Biotechnology Irene Gabashvili, PhD...

Post on 02-Jan-2016

219 views 2 download

Tags:

Transcript of Introduction unit 1 BIOL221T: Advanced Bioinformatics for Biotechnology Irene Gabashvili, PhD...

IntroductionIntroductionunit 1unit 1

BIOL221TBIOL221T: Advanced : Advanced Bioinformatics for Bioinformatics for

BiotechnologyBiotechnology

Irene Gabashvili, PhD

igabashvili@yahoo.com

Course availabilityCourse availability

Lectures & Lab: every Wednesday, Lectures & Lab: every Wednesday, Duncan Hall, Room 550, 6:00 pm to  9:45 Duncan Hall, Room 550, 6:00 pm to  9:45 pmpm

Office hours: Wednesday, 4pm-6pm (Room Office hours: Wednesday, 4pm-6pm (Room

554, phone: 92404831) and by 554, phone: 92404831) and by appointmentappointment

Lecture notes will be posted at: Lecture notes will be posted at: http://home.comcast.net/~igabashvili/221T.htmhttp://home.comcast.net/~igabashvili/221T.htm

DATES

Units

B&O Topic Due

Jan23-30

1-4 Foreword, Intro, chapter3, lecture notes

Introduction: information, databases, programming

Survey PS0

Feb-March

1,2,5,11,12 Sequence informatics

PS1 PS2

March-April

14, 16, Lecture Notes

Network informatics

PS3

April 8-10, 17 Structure informatics

Projects

May Review PS4Exam

Final GradingFinal Grading

Voted for Voted against

SurveySurvey Compose a short message introducing Compose a short message introducing

yourself, your science background, yourself, your science background, bioinformatics interests and what you hope bioinformatics interests and what you hope to learn from taking this course.to learn from taking this course. What bioinformatics databases and tools have What bioinformatics databases and tools have

you used in your previous courses/projects?you used in your previous courses/projects? How familiar are you with resources/tools How familiar are you with resources/tools

mentioned in this lecture and listed in the mentioned in this lecture and listed in the Survey? Survey? (? = not aware of / 0 = aware of, but never (? = not aware of / 0 = aware of, but never use / 1 = seldom use / 2 = weekly / 3 = daily )use / 1 = seldom use / 2 = weekly / 3 = daily )

If you were to start a company, what If you were to start a company, what bioinformatics service would you provide or bioinformatics service would you provide or need for the development of your solution?need for the development of your solution?

The bioinformatics The bioinformatics projectproject

An opportunity to use the tools and An opportunity to use the tools and approaches taught in this course to approaches taught in this course to research an area of personal research an area of personal interest. interest.

Example 1Example 1

Choose a nucleotide or protein sequence Choose a nucleotide or protein sequence with some presumed functional or with some presumed functional or structural importance, at least 140 structural importance, at least 140 residues in length. residues in length.

Define the problem or question, for example:Define the problem or question, for example: Detection of distantly related (divergent) Detection of distantly related (divergent)

sequences. sequences. Detection of sequence homologs in various Detection of sequence homologs in various

species. species. Detection of homologous motifs in proteins of Detection of homologous motifs in proteins of

varied function. varied function.

Example 1Example 1

AbstractAbstract Introduction: define the problemIntroduction: define the problem Materials and Methods. Materials and Methods. Multiple sequence alignment figure. Multiple sequence alignment figure. Phylogenetic tree. Phylogenetic tree. Discussion. Discussion.

Example 1Example 1

cctgttaaaaatggtaaaattactaatgat cctgttaaaaatggtaaaattactaatgat PVKNGKITNDPVKNGKITNDEC 2.7.2.3 EC 2.7.2.3

Nucleic acid translator Nucleic acid translator O Owl protein wl protein db db function & structure function & structure drugs drugs Q– Q– how many protein sequences?how many protein sequences?

BLAST (blastn, blastp?) BLAST (blastn, blastp?) clustalw clustalw BLAT BLAT SNPdb SNPdb

Example 2Example 2

Choose a disease. Find genes responsible Choose a disease. Find genes responsible or predisposing to this disease. or predisposing to this disease. Hypothesize on the disease pathway. Or Hypothesize on the disease pathway. Or find genes expressed in diseased tissue, find genes expressed in diseased tissue, compare to normal, research and report compare to normal, research and report findingsfindings OMIM, biol. literature, even google OMIM, biol. literature, even google NCBI NCBI

Gene Gene KEGG KEGG IPAIPA Unigene DDD or GEO DB Unigene DDD or GEO DB Pathway tools Pathway tools

Example 2: in the newsExample 2: in the news

six more gene regions associated with the six more gene regions associated with the severest form of lupus reported last Sundayseverest form of lupus reported last Sunday

ITGAM, located on Chromosome 16; ITGAM, located on Chromosome 16; BLK, on Chromosome 8; BLK, on Chromosome 8; KIAA1542, on Chromosome 11; KIAA1542, on Chromosome 11; rs10798269, on Chromosome 1; rs10798269, on Chromosome 1; PXK on Chromosome 3; and PXK on Chromosome 3; and BANK1, on Chromosome 4.BANK1, on Chromosome 4.

Genes Linked to Height Also Tied to OsteoarthritisGenes Linked to Height Also Tied to Osteoarthritis Genes Stacked Against Weight Loss?Genes Stacked Against Weight Loss?

Example 3Example 3

Assay on New and NotableAssay on New and Notable Personal Genome Services: Personal Genome Services:

workflow, shortcomings, future workflow, shortcomings, future trends (Decode Genetics, 23andMe, trends (Decode Genetics, 23andMe, Knome, Navigenics)Knome, Navigenics)

Inexpensive whole-genome Inexpensive whole-genome sequencing technologiessequencing technologies

Projects: more ideasProjects: more ideas

http://biochem218.stanford.edu/Projects.html

Comparing bioinformatics tools: Comparing bioinformatics tools: Pathway AnalysisPathway Analysis

Research with MatlabResearch with Matlab HCE, TreeView, SAMHCE, TreeView, SAM VectorNTIVectorNTI Visualization: Chimera, CN3D, PymolVisualization: Chimera, CN3D, Pymol R and other statistics toolsR and other statistics tools

Bioinformatics: A Bioinformatics: A Practical Guide to the Practical Guide to the Analysis of Genes and Analysis of Genes and Proteins, 3rd EditionProteins, 3rd EditionAndreas D. Baxevanis Andreas D. Baxevanis (Editor), B. F. Francis (Editor), B. F. Francis

Ouellette (Editor) Ouellette (Editor) Previously chosen for this course, still the main book

Developing Bioinformatics Developing Bioinformatics Computer Skills Computer Skills by Cynthia by Cynthia

Gibas, Per Jambeck Gibas, Per Jambeck

Introduction to Bioinformatics Introduction to Bioinformatics by by Arthur M. LeskArthur M. Lesk

Bioinformatics for dummies by Jean-Michel D. Claverie, etc

Other Other good good booksbooks

More computational

Online lectures and Online lectures and resourcesresources

http://www.ebi.ac.uk/2can/tutorials/http://www.ebi.ac.uk/2can/tutorials/ http://www.ncbi.nlm.nih.gov/About/http://www.ncbi.nlm.nih.gov/About/ http://lectures.molgen.mpg.de/http://lectures.molgen.mpg.de/

online_lectures.htmlonline_lectures.html http://zlab.bu.edu/zlab/links.shtmlhttp://zlab.bu.edu/zlab/links.shtml http://www.nslij-genetics.org/http://www.nslij-genetics.org/

bioinfotraining/bioinfotraining/ http://learn.perl.org/http://learn.perl.org/

More links at the course pageMore links at the course page

Databases & Online Databases & Online Resources:Resources:

NCBI databases: http://www.ncbi.nlm.nih.gov/NCBI databases: http://www.ncbi.nlm.nih.gov/ The Protein Data Bank: http://www.rcsb.org/pdb/The Protein Data Bank: http://www.rcsb.org/pdb/ Proteomics Software toolsProteomics Software tools from ExPASy (Expert from ExPASy (Expert

Protein Analysis System). Protein Analysis System). http://www.expasy.org/tools/http://www.expasy.org/tools/

NCBINCBI BLAST can be used and downloaded from BLAST can be used and downloaded from this site. this site. http://www.ncbi.nlm.nih.gov/http://www.ncbi.nlm.nih.gov/

UCSC Genome Browser: http://genome.ucsc.edu/UCSC Genome Browser: http://genome.ucsc.edu/ EBI EBI http://www.ebi.ac.uk/clustalw/http://www.ebi.ac.uk/clustalw/ Tree of Life: http://itol.embl.de/Tree of Life: http://itol.embl.de/ KEGG: http://www.genome.jp/kegg/KEGG: http://www.genome.jp/kegg/

More on the course websiteMore on the course website

Software:Software:

PerlPerl. Perl is open source software and may be . Perl is open source software and may be downloaded for free from several sites. downloaded for free from several sites. http://www.activestate.com/Products/activeperl/ http://www.activestate.com/Products/activeperl/ http://www.perl.com/download.csp#stablehttp://www.perl.com/download.csp#stable

Unix/Linux (Mac OS X)Unix/Linux (Mac OS X) MATLABMATLAB. Will be available in the Lab. Will be available in the Labhttp://www.mathworks.com/products/bioinfo/demos.html IPA – trial version available for free, account in MarchIPA – trial version available for free, account in March R, Treeview, HCA, SAM – can be downloaded for freeR, Treeview, HCA, SAM – can be downloaded for free Visualization:Visualization: Rasmol, Chimera, VND, Cn3d, Pymol Rasmol, Chimera, VND, Cn3d, Pymol

Why these choices?Why these choices? Why BLAST? Because you can learn a lot Why BLAST? Because you can learn a lot

by comparing sequences, and BLAST is by comparing sequences, and BLAST is the standard program for this task. the standard program for this task.

Why Unix? Because most bioinformatics Why Unix? Because most bioinformatics applications were originally developed in applications were originally developed in Unix. Unix.

Why Perl? Because Perl (and BioPerl) is Why Perl? Because Perl (and BioPerl) is the most popular programming language the most popular programming language in bioinformatics. in bioinformatics.

Other Programming Other Programming LanguagesLanguages

Python (bioPython) also popular in Python (bioPython) also popular in BioinformaticsBioinformatics

Ruby is another scripting language Ruby is another scripting language with a rapid development cycle. with a rapid development cycle.

Java, C++, and the like can be Java, C++, and the like can be overkill for bioinformatics (vs overkill for bioinformatics (vs hardcore coding/software hardcore coding/software development)development)

biomedical biomedical informatics?informatics?Definitions may differ, but objectives are the sameDefinitions may differ, but objectives are the same

What What isis

What is bioinformatics?What is bioinformatics?

Biologists using computers, or the other Biologists using computers, or the other way aroundway around

Twenty-First Century Rocket ScienceTwenty-First Century Rocket Science The science of Blast searches The science of Blast searches

Writing bioinformatics software is Writing bioinformatics software is tougher and very competitive. You tougher and very competitive. You probably won’t get rich in this arena, probably won’t get rich in this arena, but…but…

End of Unit 1End of Unit 1

Please fill out the SurveyPlease fill out the Survey

Demo for Problem Set 0 (Jan.30)Demo for Problem Set 0 (Jan.30)

(to be continued after the break)(to be continued after the break)