Introduction to Bioinformatics Spring 2002 Adapted from Irit Orr Course at WIS.

33
Introduction to Bioinformatics Spring 2002 Adapted from Irit Orr Course at WIS

Transcript of Introduction to Bioinformatics Spring 2002 Adapted from Irit Orr Course at WIS.

Page 1: Introduction to Bioinformatics Spring 2002 Adapted from Irit Orr Course at WIS.

Introduction to Bioinformatics

Spring 2002 Adapted from Irit Orr Course at WIS

Page 2: Introduction to Bioinformatics Spring 2002 Adapted from Irit Orr Course at WIS.

What Is Bioinformatics?

A Marriage Between Biology and Computers!

Page 3: Introduction to Bioinformatics Spring 2002 Adapted from Irit Orr Course at WIS.

What Is Bioinformatics?(My Correction)

A Marriage Between Molecular Biology and

Computer Science!

Page 4: Introduction to Bioinformatics Spring 2002 Adapted from Irit Orr Course at WIS.

What Is Bioinformatics?

Bioinformatics is the field of science in which biology, computer science, and information technology merge into a single discipline.

Bioinformatics is the science of managing and analyzing biological data using advanced computing techniques.

Page 5: Introduction to Bioinformatics Spring 2002 Adapted from Irit Orr Course at WIS.

What Is Bioinformatics?

Bioinformatics ultimate goal, (as is described by an expert), is to enable the discovery of new biological insights as well as to create a global perspective from which unifying principles in biology can be discerned.

Page 6: Introduction to Bioinformatics Spring 2002 Adapted from Irit Orr Course at WIS.

What Is Bioinformatics?The Definition From Bioplanet

Bioinformatics is the application of computer technology to the management of biological information.

Computers are used to gather, store, analyze and integrate biological and genetic information which can then be applied to gene-based drug discovery and development.

Page 7: Introduction to Bioinformatics Spring 2002 Adapted from Irit Orr Course at WIS.

What Is Bioinformatics?

The need for Bioinformatics capabilities has been precipitated by the explosion of publicly available genomic information resulting from the Human Genome Project.

The science of Bioinformatics, is essential to the use of genomic information in understanding human diseases and in the identification of new molecular targets for drug discovery.

Page 8: Introduction to Bioinformatics Spring 2002 Adapted from Irit Orr Course at WIS.

What Is Bioinformatics?The Definition From Whatis.Com

Bioinformatics is the science of developing computer databases and algorithms for the purpose of speeding up and enhancing biological research.

Bioinformatics is being used most noticeably in the Human Genome Project, the effort to identify the 80,000 genes in human DNA .

Page 9: Introduction to Bioinformatics Spring 2002 Adapted from Irit Orr Course at WIS.

What Is Bioinformatics?The Definition From BitsJournal

Two simple statements seem to suffice in order to create our definition of bioinformatics and make it reflect the future of the life sciences:

Bioinformatics is “a combination of Computer Science, Information Technology and Genetics used to determine and analyze genetic information.”

Page 10: Introduction to Bioinformatics Spring 2002 Adapted from Irit Orr Course at WIS.

What Is Bioinformatics?The Definition From BitsJournal

The highlight is on the words determine and analyze. These are expressions which involve very real human elements.

Page 11: Introduction to Bioinformatics Spring 2002 Adapted from Irit Orr Course at WIS.

What Is Bioinformatics?The Definition From BitsJournal

Bioinformatics is the future of the Life Sciences.

Page 12: Introduction to Bioinformatics Spring 2002 Adapted from Irit Orr Course at WIS.

What Is Done in Bioinformatics?

Analysis and interpretation of various types of biological data including: nucleotide and amino acid sequences, protein domains, and protein structures.

Page 13: Introduction to Bioinformatics Spring 2002 Adapted from Irit Orr Course at WIS.

What Is Done in Bioinformatics?

Development of new algorithms and statistics with which to assess biological information, such as relationships among members of large data sets.

Development and implementation of tools that enable efficient access and management of different types of information, such as various databases, integrated mapping information.

Page 14: Introduction to Bioinformatics Spring 2002 Adapted from Irit Orr Course at WIS.

Why Use Bioinformatics?

The explosive growth in the amount of biological information necessitates the use of computers for cataloging and retrieval of this information.

A more global perspective in experimental design. As we move from the one scientist-one gene/protein/disease paradigm of the past, to a consideration of whole organisms, we gain opportunities for new, more general insights into health and disease.

Page 15: Introduction to Bioinformatics Spring 2002 Adapted from Irit Orr Course at WIS.

Why Use Bioinformatics?

Data-mining - the process by which testable hypotheses are generated regarding the function or structure of a gene or protein of interest by identifying similar sequences in better characterized organisms.– For example, new insight into the molecular basis of a

disease may come from investigating the function of ortholog of the disease gene in other organisms.

Equally exciting is the potential for uncovering phylogenetic relationships and evolutionary patterns.

Page 16: Introduction to Bioinformatics Spring 2002 Adapted from Irit Orr Course at WIS.

Examples of Biology Data for Bioinformatics ?

Whole Genome AnnotationsExperimental data involving thousands

of Genes simultaneously DNA Chips, MicroArray, and Expression

Arrays Analyses Comparative Genomics (Analyses between

Species and Strains )Proteomics ('Proteome' of an Organism,

2D gels, Mass Spec, 3D structure )

Page 17: Introduction to Bioinformatics Spring 2002 Adapted from Irit Orr Course at WIS.

Examples of Biology Data for Bioinformatics ?

Medical applications: Genetic Disease (SNPs)

Pharmaceutical and Biotech Industry (Drug design)

Agricultural applications (Plants resistance to various types of diseases) 

Page 18: Introduction to Bioinformatics Spring 2002 Adapted from Irit Orr Course at WIS.

Molecular Biology Information DNA

Raw DNA Sequence

• Coding or Not coding?• Parse into genes? • 4 bases: AGCT • ~1 K in a gene, • ~2 M in genome

atggcaattaaaattggtatcaatggttttggtcgtatcggccgtatcgtattccgtgcagcacaacaccgtgatgacattgaagttgtaggtattaacgacttaatcgacgttgaatacatggcttatatgttgaaatatgattcaactcacggtcgtttcgacggcactgttgaagtgaaagatggtaacttagtggttaatggtaaaactatccgtgtaactgcagaacgtgatcca

Page 19: Introduction to Bioinformatics Spring 2002 Adapted from Irit Orr Course at WIS.

Molecular Biology InformationProtein

• 20 letter alphabet ACDEFGHIKLMNPQRSTVWY

But not BJOUXZ

• Strings of ~300 aa in an average protein

(I.g bacteria),• ~200 aa in a protein domain

• LNCIVAVSQNMGIGKNGDLPWPPLRNEFRYFQRMTTTSSVEGKQNLVIMGKKTWFSILNSIVAVCQNMGIGKDGNLPWPPLRNEYKYFQRMTSTSHVEGKQNAVIMGKKTWFSIISLIAALAVDRVIGMENAMPWNLPADLAWFKRNTLDKPVIMGRHTWESITAFLWAQDRNGLIGKDGHLPWHLPDDLHYFRAQTVGKIMVVGRRTYESF

Page 20: Introduction to Bioinformatics Spring 2002 Adapted from Irit Orr Course at WIS.

Molecular Biology InformationMacromolecular Structure

DNA, RNA, Protein

Page 21: Introduction to Bioinformatics Spring 2002 Adapted from Irit Orr Course at WIS.

Molecular Biology InformationWhole Genomes

Viruses, Prokaryotes, Yeast, Fly, Arabidopsis,

Mouse, Human

Page 22: Introduction to Bioinformatics Spring 2002 Adapted from Irit Orr Course at WIS.

Molecular Biology InformationOther Integrative Data

Information to understand whole genomes:

¥Metabolic Pathways traditional biochemistry

¥Regulatory Networks

¥Whole Organisms Phylogeny

¥Environments, Habitats, ecology

¥The Literature (MEDLINE)

Page 23: Introduction to Bioinformatics Spring 2002 Adapted from Irit Orr Course at WIS.

Molecular Biology InformationRedundancy and Multiplicity

• Different Sequences Have the Same Structure.

• One Organism has many similar genes.

• Single Gene May Have Multiple Functions.

• Genomic Sequence Redundancy due to the Genetic Code.

Page 24: Introduction to Bioinformatics Spring 2002 Adapted from Irit Orr Course at WIS.

• CPU vs Diskspace & Net

As important as the increase in computerspeed has been, the ability to store largeamounts of information on computers is even more crucial and need special attention as well.

Exponential Growth of Data Matchedby

Development of Computer Technology

Page 25: Introduction to Bioinformatics Spring 2002 Adapted from Irit Orr Course at WIS.

Bioinformatics Tasks(Several Suggestions)

Finding the genes in the DNA sequences of various organisms.

Prediction of promoters and other regulatory regions.

Page 26: Introduction to Bioinformatics Spring 2002 Adapted from Irit Orr Course at WIS.

Bioinformatics Tasks(Several Suggestions)

Developing methods to predict the structure of structural RNA sequences.

Developing methods to predict the structure and/or function of newly discovered proteins.

Clustering protein sequences into families of related sequences and the development of protein models.

Page 27: Introduction to Bioinformatics Spring 2002 Adapted from Irit Orr Course at WIS.

Bioinformatics Tasks (Several Suggestions)

Aligning similar proteins and generating phylogenetic trees to examine evolutionary relationships. The process of evolution has produced DNA

sequences that encode proteins with very specific functions.

It is possible to predict the three-dimensional structure of a protein using algorithms that have been derived from our knowledge of physics, chemistry and most importantly, from the analysis of other proteins with similar amino acid sequences.

Page 28: Introduction to Bioinformatics Spring 2002 Adapted from Irit Orr Course at WIS.

Top 10 Future Challenges for Bioinformatics

(At a Bioinformatics Conference Last Fall)

Precise, predictive model of transcription initiation and termination: ability to predict where and when transcription will occur in a genome.

Precise, predictive model of RNA splicing /alternative splicing: ability to predict the splicing pattern of any primary transcript in any tissue.

Page 29: Introduction to Bioinformatics Spring 2002 Adapted from Irit Orr Course at WIS.

Top 10 Future Challenges for Bioinformatics

(At a Bioinformatics Conference Last Fall)

Precise, quantitative models of signal transduction pathways: ability to predict cellular responses to external stimuli

Page 30: Introduction to Bioinformatics Spring 2002 Adapted from Irit Orr Course at WIS.

Top 10 Future Challenges for Bioinformatics

(At a Bioinformatics Conference Last Fall)

Determining effective protein:DNA, protein:RNA and protein:protein recognition codes

Accurate ab initio protein structure prediction

Page 31: Introduction to Bioinformatics Spring 2002 Adapted from Irit Orr Course at WIS.

Top 10 Future Challenges for Bioinformatics

(At a Bioinformatics Conference Last Fall)

Rational design of small molecule inhibitors of proteins.

Page 32: Introduction to Bioinformatics Spring 2002 Adapted from Irit Orr Course at WIS.

Top 10 Future Challenges for Bioinformatics

(At a Bioinformatics Conference Last Fall)

Mechanistic understanding of protein evolution: understanding exactly how new protein functions evolve.

Mechanistic understanding of speciation: molecular details of how speciation occurs.

Page 33: Introduction to Bioinformatics Spring 2002 Adapted from Irit Orr Course at WIS.

Top 10 Future Challenges for Bioinformatics

(At a Bioinformatics Conference Last Fall)

Continued development of effective gene ontologies - systematic ways to describe the functions of any gene or protein.

Education: development of appropriate bioinformatics curricula for secondary, undergraduate and graduate education.