Introduction unit 1 BIOL221T: Advanced Bioinformatics for Biotechnology Irene Gabashvili, PhD...

24
Introduction Introduction unit 1 unit 1 BIOL221T BIOL221T : Advanced : Advanced Bioinformatics for Bioinformatics for Biotechnology Biotechnology Irene Gabashvili, PhD [email protected] m

Transcript of Introduction unit 1 BIOL221T: Advanced Bioinformatics for Biotechnology Irene Gabashvili, PhD...

Page 1: Introduction unit 1 BIOL221T: Advanced Bioinformatics for Biotechnology Irene Gabashvili, PhD igabashvili@yahoo.com.

IntroductionIntroductionunit 1unit 1

BIOL221TBIOL221T: Advanced : Advanced Bioinformatics for Bioinformatics for

BiotechnologyBiotechnology

Irene Gabashvili, PhD

[email protected]

Page 2: Introduction unit 1 BIOL221T: Advanced Bioinformatics for Biotechnology Irene Gabashvili, PhD igabashvili@yahoo.com.

Course availabilityCourse availability

Lectures & Lab: every Wednesday, Lectures & Lab: every Wednesday, Duncan Hall, Room 550, 6:00 pm to  9:45 Duncan Hall, Room 550, 6:00 pm to  9:45 pmpm

Office hours: Wednesday, 4pm-6pm (Room Office hours: Wednesday, 4pm-6pm (Room

554, phone: 92404831) and by 554, phone: 92404831) and by appointmentappointment

Lecture notes will be posted at: Lecture notes will be posted at: http://home.comcast.net/~igabashvili/221T.htmhttp://home.comcast.net/~igabashvili/221T.htm

Page 3: Introduction unit 1 BIOL221T: Advanced Bioinformatics for Biotechnology Irene Gabashvili, PhD igabashvili@yahoo.com.

DATES

Units

B&O Topic Due

Jan23-30

1-4 Foreword, Intro, chapter3, lecture notes

Introduction: information, databases, programming

Survey PS0

Feb-March

1,2,5,11,12 Sequence informatics

PS1 PS2

March-April

14, 16, Lecture Notes

Network informatics

PS3

April 8-10, 17 Structure informatics

Projects

May Review PS4Exam

Page 4: Introduction unit 1 BIOL221T: Advanced Bioinformatics for Biotechnology Irene Gabashvili, PhD igabashvili@yahoo.com.

Final GradingFinal Grading

Voted for Voted against

Page 5: Introduction unit 1 BIOL221T: Advanced Bioinformatics for Biotechnology Irene Gabashvili, PhD igabashvili@yahoo.com.

SurveySurvey Compose a short message introducing Compose a short message introducing

yourself, your science background, yourself, your science background, bioinformatics interests and what you hope bioinformatics interests and what you hope to learn from taking this course.to learn from taking this course. What bioinformatics databases and tools have What bioinformatics databases and tools have

you used in your previous courses/projects?you used in your previous courses/projects? How familiar are you with resources/tools How familiar are you with resources/tools

mentioned in this lecture and listed in the mentioned in this lecture and listed in the Survey? Survey? (? = not aware of / 0 = aware of, but never (? = not aware of / 0 = aware of, but never use / 1 = seldom use / 2 = weekly / 3 = daily )use / 1 = seldom use / 2 = weekly / 3 = daily )

If you were to start a company, what If you were to start a company, what bioinformatics service would you provide or bioinformatics service would you provide or need for the development of your solution?need for the development of your solution?

Page 6: Introduction unit 1 BIOL221T: Advanced Bioinformatics for Biotechnology Irene Gabashvili, PhD igabashvili@yahoo.com.

The bioinformatics The bioinformatics projectproject

An opportunity to use the tools and An opportunity to use the tools and approaches taught in this course to approaches taught in this course to research an area of personal research an area of personal interest. interest.

Page 7: Introduction unit 1 BIOL221T: Advanced Bioinformatics for Biotechnology Irene Gabashvili, PhD igabashvili@yahoo.com.

Example 1Example 1

Choose a nucleotide or protein sequence Choose a nucleotide or protein sequence with some presumed functional or with some presumed functional or structural importance, at least 140 structural importance, at least 140 residues in length. residues in length.

Define the problem or question, for example:Define the problem or question, for example: Detection of distantly related (divergent) Detection of distantly related (divergent)

sequences. sequences. Detection of sequence homologs in various Detection of sequence homologs in various

species. species. Detection of homologous motifs in proteins of Detection of homologous motifs in proteins of

varied function. varied function.

Page 8: Introduction unit 1 BIOL221T: Advanced Bioinformatics for Biotechnology Irene Gabashvili, PhD igabashvili@yahoo.com.

Example 1Example 1

AbstractAbstract Introduction: define the problemIntroduction: define the problem Materials and Methods. Materials and Methods. Multiple sequence alignment figure. Multiple sequence alignment figure. Phylogenetic tree. Phylogenetic tree. Discussion. Discussion.

Page 9: Introduction unit 1 BIOL221T: Advanced Bioinformatics for Biotechnology Irene Gabashvili, PhD igabashvili@yahoo.com.

Example 1Example 1

cctgttaaaaatggtaaaattactaatgat cctgttaaaaatggtaaaattactaatgat PVKNGKITNDPVKNGKITNDEC 2.7.2.3 EC 2.7.2.3

Nucleic acid translator Nucleic acid translator O Owl protein wl protein db db function & structure function & structure drugs drugs Q– Q– how many protein sequences?how many protein sequences?

BLAST (blastn, blastp?) BLAST (blastn, blastp?) clustalw clustalw BLAT BLAT SNPdb SNPdb

Page 10: Introduction unit 1 BIOL221T: Advanced Bioinformatics for Biotechnology Irene Gabashvili, PhD igabashvili@yahoo.com.

Example 2Example 2

Choose a disease. Find genes responsible Choose a disease. Find genes responsible or predisposing to this disease. or predisposing to this disease. Hypothesize on the disease pathway. Or Hypothesize on the disease pathway. Or find genes expressed in diseased tissue, find genes expressed in diseased tissue, compare to normal, research and report compare to normal, research and report findingsfindings OMIM, biol. literature, even google OMIM, biol. literature, even google NCBI NCBI

Gene Gene KEGG KEGG IPAIPA Unigene DDD or GEO DB Unigene DDD or GEO DB Pathway tools Pathway tools

Page 11: Introduction unit 1 BIOL221T: Advanced Bioinformatics for Biotechnology Irene Gabashvili, PhD igabashvili@yahoo.com.

Example 2: in the newsExample 2: in the news

six more gene regions associated with the six more gene regions associated with the severest form of lupus reported last Sundayseverest form of lupus reported last Sunday

ITGAM, located on Chromosome 16; ITGAM, located on Chromosome 16; BLK, on Chromosome 8; BLK, on Chromosome 8; KIAA1542, on Chromosome 11; KIAA1542, on Chromosome 11; rs10798269, on Chromosome 1; rs10798269, on Chromosome 1; PXK on Chromosome 3; and PXK on Chromosome 3; and BANK1, on Chromosome 4.BANK1, on Chromosome 4.

Genes Linked to Height Also Tied to OsteoarthritisGenes Linked to Height Also Tied to Osteoarthritis Genes Stacked Against Weight Loss?Genes Stacked Against Weight Loss?

Page 12: Introduction unit 1 BIOL221T: Advanced Bioinformatics for Biotechnology Irene Gabashvili, PhD igabashvili@yahoo.com.

Example 3Example 3

Assay on New and NotableAssay on New and Notable Personal Genome Services: Personal Genome Services:

workflow, shortcomings, future workflow, shortcomings, future trends (Decode Genetics, 23andMe, trends (Decode Genetics, 23andMe, Knome, Navigenics)Knome, Navigenics)

Inexpensive whole-genome Inexpensive whole-genome sequencing technologiessequencing technologies

Page 13: Introduction unit 1 BIOL221T: Advanced Bioinformatics for Biotechnology Irene Gabashvili, PhD igabashvili@yahoo.com.

Projects: more ideasProjects: more ideas

http://biochem218.stanford.edu/Projects.html

Comparing bioinformatics tools: Comparing bioinformatics tools: Pathway AnalysisPathway Analysis

Research with MatlabResearch with Matlab HCE, TreeView, SAMHCE, TreeView, SAM VectorNTIVectorNTI Visualization: Chimera, CN3D, PymolVisualization: Chimera, CN3D, Pymol R and other statistics toolsR and other statistics tools

Page 14: Introduction unit 1 BIOL221T: Advanced Bioinformatics for Biotechnology Irene Gabashvili, PhD igabashvili@yahoo.com.

Bioinformatics: A Bioinformatics: A Practical Guide to the Practical Guide to the Analysis of Genes and Analysis of Genes and Proteins, 3rd EditionProteins, 3rd EditionAndreas D. Baxevanis Andreas D. Baxevanis (Editor), B. F. Francis (Editor), B. F. Francis

Ouellette (Editor) Ouellette (Editor) Previously chosen for this course, still the main book

Page 15: Introduction unit 1 BIOL221T: Advanced Bioinformatics for Biotechnology Irene Gabashvili, PhD igabashvili@yahoo.com.

Developing Bioinformatics Developing Bioinformatics Computer Skills Computer Skills by Cynthia by Cynthia

Gibas, Per Jambeck Gibas, Per Jambeck

Introduction to Bioinformatics Introduction to Bioinformatics by by Arthur M. LeskArthur M. Lesk

Bioinformatics for dummies by Jean-Michel D. Claverie, etc

Page 16: Introduction unit 1 BIOL221T: Advanced Bioinformatics for Biotechnology Irene Gabashvili, PhD igabashvili@yahoo.com.

Other Other good good booksbooks

More computational

Page 17: Introduction unit 1 BIOL221T: Advanced Bioinformatics for Biotechnology Irene Gabashvili, PhD igabashvili@yahoo.com.

Online lectures and Online lectures and resourcesresources

http://www.ebi.ac.uk/2can/tutorials/http://www.ebi.ac.uk/2can/tutorials/ http://www.ncbi.nlm.nih.gov/About/http://www.ncbi.nlm.nih.gov/About/ http://lectures.molgen.mpg.de/http://lectures.molgen.mpg.de/

online_lectures.htmlonline_lectures.html http://zlab.bu.edu/zlab/links.shtmlhttp://zlab.bu.edu/zlab/links.shtml http://www.nslij-genetics.org/http://www.nslij-genetics.org/

bioinfotraining/bioinfotraining/ http://learn.perl.org/http://learn.perl.org/

More links at the course pageMore links at the course page

Page 18: Introduction unit 1 BIOL221T: Advanced Bioinformatics for Biotechnology Irene Gabashvili, PhD igabashvili@yahoo.com.

Databases & Online Databases & Online Resources:Resources:

NCBI databases: http://www.ncbi.nlm.nih.gov/NCBI databases: http://www.ncbi.nlm.nih.gov/ The Protein Data Bank: http://www.rcsb.org/pdb/The Protein Data Bank: http://www.rcsb.org/pdb/ Proteomics Software toolsProteomics Software tools from ExPASy (Expert from ExPASy (Expert

Protein Analysis System). Protein Analysis System). http://www.expasy.org/tools/http://www.expasy.org/tools/

NCBINCBI BLAST can be used and downloaded from BLAST can be used and downloaded from this site. this site. http://www.ncbi.nlm.nih.gov/http://www.ncbi.nlm.nih.gov/

UCSC Genome Browser: http://genome.ucsc.edu/UCSC Genome Browser: http://genome.ucsc.edu/ EBI EBI http://www.ebi.ac.uk/clustalw/http://www.ebi.ac.uk/clustalw/ Tree of Life: http://itol.embl.de/Tree of Life: http://itol.embl.de/ KEGG: http://www.genome.jp/kegg/KEGG: http://www.genome.jp/kegg/

More on the course websiteMore on the course website

Page 19: Introduction unit 1 BIOL221T: Advanced Bioinformatics for Biotechnology Irene Gabashvili, PhD igabashvili@yahoo.com.

Software:Software:

PerlPerl. Perl is open source software and may be . Perl is open source software and may be downloaded for free from several sites. downloaded for free from several sites. http://www.activestate.com/Products/activeperl/ http://www.activestate.com/Products/activeperl/ http://www.perl.com/download.csp#stablehttp://www.perl.com/download.csp#stable

Unix/Linux (Mac OS X)Unix/Linux (Mac OS X) MATLABMATLAB. Will be available in the Lab. Will be available in the Labhttp://www.mathworks.com/products/bioinfo/demos.html IPA – trial version available for free, account in MarchIPA – trial version available for free, account in March R, Treeview, HCA, SAM – can be downloaded for freeR, Treeview, HCA, SAM – can be downloaded for free Visualization:Visualization: Rasmol, Chimera, VND, Cn3d, Pymol Rasmol, Chimera, VND, Cn3d, Pymol

Page 20: Introduction unit 1 BIOL221T: Advanced Bioinformatics for Biotechnology Irene Gabashvili, PhD igabashvili@yahoo.com.

Why these choices?Why these choices? Why BLAST? Because you can learn a lot Why BLAST? Because you can learn a lot

by comparing sequences, and BLAST is by comparing sequences, and BLAST is the standard program for this task. the standard program for this task.

Why Unix? Because most bioinformatics Why Unix? Because most bioinformatics applications were originally developed in applications were originally developed in Unix. Unix.

Why Perl? Because Perl (and BioPerl) is Why Perl? Because Perl (and BioPerl) is the most popular programming language the most popular programming language in bioinformatics. in bioinformatics.

Page 21: Introduction unit 1 BIOL221T: Advanced Bioinformatics for Biotechnology Irene Gabashvili, PhD igabashvili@yahoo.com.

Other Programming Other Programming LanguagesLanguages

Python (bioPython) also popular in Python (bioPython) also popular in BioinformaticsBioinformatics

Ruby is another scripting language Ruby is another scripting language with a rapid development cycle. with a rapid development cycle.

Java, C++, and the like can be Java, C++, and the like can be overkill for bioinformatics (vs overkill for bioinformatics (vs hardcore coding/software hardcore coding/software development)development)

Page 22: Introduction unit 1 BIOL221T: Advanced Bioinformatics for Biotechnology Irene Gabashvili, PhD igabashvili@yahoo.com.

biomedical biomedical informatics?informatics?Definitions may differ, but objectives are the sameDefinitions may differ, but objectives are the same

What What isis

Page 23: Introduction unit 1 BIOL221T: Advanced Bioinformatics for Biotechnology Irene Gabashvili, PhD igabashvili@yahoo.com.

What is bioinformatics?What is bioinformatics?

Biologists using computers, or the other Biologists using computers, or the other way aroundway around

Twenty-First Century Rocket ScienceTwenty-First Century Rocket Science The science of Blast searches The science of Blast searches

Writing bioinformatics software is Writing bioinformatics software is tougher and very competitive. You tougher and very competitive. You probably won’t get rich in this arena, probably won’t get rich in this arena, but…but…

Page 24: Introduction unit 1 BIOL221T: Advanced Bioinformatics for Biotechnology Irene Gabashvili, PhD igabashvili@yahoo.com.

End of Unit 1End of Unit 1

Please fill out the SurveyPlease fill out the Survey

Demo for Problem Set 0 (Jan.30)Demo for Problem Set 0 (Jan.30)

(to be continued after the break)(to be continued after the break)