November 16, 2001Slide 1 Opportunities in Bioinformatics for Computer Science Lenwood S. Heath...

39
November 16, 200 1 Slide 1 Opportunities in Bioinformatics for Computer Science Lenwood S. Heath Virginia Tech Blacksburg, VA 24061 [email protected] University of Iowa November 16, 2001

Transcript of November 16, 2001Slide 1 Opportunities in Bioinformatics for Computer Science Lenwood S. Heath...

Page 1: November 16, 2001Slide 1 Opportunities in Bioinformatics for Computer Science Lenwood S. Heath Virginia Tech Blacksburg, VA 24061 heath@cs.vt.edu University.

November 16, 2001 Slide 1

Opportunities in Bioinformaticsfor Computer Science

Lenwood S. HeathVirginia Tech

Blacksburg, VA [email protected]

University of IowaNovember 16, 2001

Page 2: November 16, 2001Slide 1 Opportunities in Bioinformatics for Computer Science Lenwood S. Heath Virginia Tech Blacksburg, VA 24061 heath@cs.vt.edu University.

November 16, 2001 Slide 2

• The New Biology

• Existing bioinformatics tools

• Bioinformatics challenges

• Bioinformatics at Virginia Tech

Overview

Page 3: November 16, 2001Slide 1 Opportunities in Bioinformatics for Computer Science Lenwood S. Heath Virginia Tech Blacksburg, VA 24061 heath@cs.vt.edu University.

November 16, 2001 Slide 3

Some Molecular Biology

• The instruction set for a cell is contained in its chromosomes.

• Each chromosome is a long molecule called DNA.

• Each DNA molecule contains 100s or 1000s of genes.

• Each gene encodes a protein.

• A gene is transcribed to mRNA in the nucleus.

• An mRNA is translated to a protein in a ribosome.

Page 4: November 16, 2001Slide 1 Opportunities in Bioinformatics for Computer Science Lenwood S. Heath Virginia Tech Blacksburg, VA 24061 heath@cs.vt.edu University.

November 16, 2001 Slide 4

Transcription and Translation

DNA mRNA ProteinTranscription Translation

Page 5: November 16, 2001Slide 1 Opportunities in Bioinformatics for Computer Science Lenwood S. Heath Virginia Tech Blacksburg, VA 24061 heath@cs.vt.edu University.

November 16, 2001 Slide 5

Elaborating Cellular Function

DNA mRNA ProteinTranscription Translation

ReverseTranscription

Degradation

Regulation

Functions:• Structure• Catalyze chemical reactions• Respond to environment

(Genetic Code)

Page 6: November 16, 2001Slide 1 Opportunities in Bioinformatics for Computer Science Lenwood S. Heath Virginia Tech Blacksburg, VA 24061 heath@cs.vt.edu University.

November 16, 2001 Slide 6

Chromosomes• Long molecules of DNA: 10^4 to 10^8 base pairs

• 26 matched pairs in humans

• A gene is a subsequence of a chromosome that encodes a protein.

• Proteins associated with cell function, structure, and regulation.

• Only a fraction of the genes are in use at any time.

• Every gene is present in every cell.

Page 7: November 16, 2001Slide 1 Opportunities in Bioinformatics for Computer Science Lenwood S. Heath Virginia Tech Blacksburg, VA 24061 heath@cs.vt.edu University.

November 16, 2001 Slide 7

DNA Strand

A= adenine complements T= thymine

C = cytosine complements G=guanine

Page 8: November 16, 2001Slide 1 Opportunities in Bioinformatics for Computer Science Lenwood S. Heath Virginia Tech Blacksburg, VA 24061 heath@cs.vt.edu University.

November 16, 2001 Slide 8

Complementary DNA Strands

Double-Stranded DNA

Page 9: November 16, 2001Slide 1 Opportunities in Bioinformatics for Computer Science Lenwood S. Heath Virginia Tech Blacksburg, VA 24061 heath@cs.vt.edu University.

November 16, 2001 Slide 9

RNA Strand

U=uracil replaces T= thymine

Page 10: November 16, 2001Slide 1 Opportunities in Bioinformatics for Computer Science Lenwood S. Heath Virginia Tech Blacksburg, VA 24061 heath@cs.vt.edu University.

November 16, 2001 Slide 10

Amino Acids

• Protein is a large molecule that is a chain of amino acids (100 to 5000).

• There are 20 common amino acids

(Alanine, Cysteine, …, Tyrosine)

• Three bases --- a codon --- suffice to encode an amino acid.

• There are also START and STOP codons.

Page 11: November 16, 2001Slide 1 Opportunities in Bioinformatics for Computer Science Lenwood S. Heath Virginia Tech Blacksburg, VA 24061 heath@cs.vt.edu University.

November 16, 2001 Slide 11

Genetic Code

Page 12: November 16, 2001Slide 1 Opportunities in Bioinformatics for Computer Science Lenwood S. Heath Virginia Tech Blacksburg, VA 24061 heath@cs.vt.edu University.

November 16, 2001 Slide 12

Translation to a Protein

Unlike DNA, proteins have three-dimensional structure

Protein folds to a three-dimensional shape thatminimizes energy

Page 13: November 16, 2001Slide 1 Opportunities in Bioinformatics for Computer Science Lenwood S. Heath Virginia Tech Blacksburg, VA 24061 heath@cs.vt.edu University.

November 16, 2001 Slide 13

Cell’s Fetch-Execute Cycle

• Stored Program: DNA, chromosomes, genes

• Fetch/Decode: RNA, ribosomes

• Execute Functions: Proteins --- oxygen transport, cell structures, enzymes

• Inputs: Nutrients, environmental signals, external proteins

• Outputs: Waste, response proteins, enzymes

Page 14: November 16, 2001Slide 1 Opportunities in Bioinformatics for Computer Science Lenwood S. Heath Virginia Tech Blacksburg, VA 24061 heath@cs.vt.edu University.

November 16, 2001 Slide 14

A new language has been created. Words in the language that are useful for today’s talk.

Genomics

Functional Genomics

Proteomics

cDNA microarrays

Global Gene Expression Patterns

The Language of the New Biology

Page 15: November 16, 2001Slide 1 Opportunities in Bioinformatics for Computer Science Lenwood S. Heath Virginia Tech Blacksburg, VA 24061 heath@cs.vt.edu University.

November 16, 2001 Slide 15

•Discovery of genetic sequences and the ordering of those sequences into

• individual genes;• gene families;• chromosomes.

• Identification of• sequences that code for gene products/proteins; • sequences that act as regulatory elements.

Genomics

Page 16: November 16, 2001Slide 1 Opportunities in Bioinformatics for Computer Science Lenwood S. Heath Virginia Tech Blacksburg, VA 24061 heath@cs.vt.edu University.

November 16, 2001 Slide 16

Genome Sequencing Projects

• Drosophila

• Yeast

• Mouse

• Rat

• Arabidopsis

• Human

• Microbes

• …

Page 17: November 16, 2001Slide 1 Opportunities in Bioinformatics for Computer Science Lenwood S. Heath Virginia Tech Blacksburg, VA 24061 heath@cs.vt.edu University.

November 16, 2001 Slide 17

Drosophila Genome

Page 18: November 16, 2001Slide 1 Opportunities in Bioinformatics for Computer Science Lenwood S. Heath Virginia Tech Blacksburg, VA 24061 heath@cs.vt.edu University.

November 16, 2001 Slide 18

• The biological role of individual genes;

• mechanisms underlying the regulation of their expression;

• regulatory interactions among them.

Functional Genomics

Page 19: November 16, 2001Slide 1 Opportunities in Bioinformatics for Computer Science Lenwood S. Heath Virginia Tech Blacksburg, VA 24061 heath@cs.vt.edu University.

November 16, 2001 Slide 19

Glycolysis, Citric Acid Cycle, and Related Metabolic Processes

Page 20: November 16, 2001Slide 1 Opportunities in Bioinformatics for Computer Science Lenwood S. Heath Virginia Tech Blacksburg, VA 24061 heath@cs.vt.edu University.

November 16, 2001 Slide 20

• Only certain genes are “turned on” at any particular time.

• When a gene is transcribed (copied to mRNA), it is said to be expressed.

• The mRNA in a cell can be isolated. Its contents give a snapshot of the genes currently being expressed.

• Correlating gene expressions with conditions gives hints into the dynamic functioning of the cell.

Gene Expression

Page 21: November 16, 2001Slide 1 Opportunities in Bioinformatics for Computer Science Lenwood S. Heath Virginia Tech Blacksburg, VA 24061 heath@cs.vt.edu University.

November 16, 2001 Slide 21

Gene Expression:Control Points

Page 22: November 16, 2001Slide 1 Opportunities in Bioinformatics for Computer Science Lenwood S. Heath Virginia Tech Blacksburg, VA 24061 heath@cs.vt.edu University.

November 16, 2001 Slide 22

Free Radicals

Page 23: November 16, 2001Slide 1 Opportunities in Bioinformatics for Computer Science Lenwood S. Heath Virginia Tech Blacksburg, VA 24061 heath@cs.vt.edu University.

November 16, 2001 Slide 23

Responses to Environmental Signals

Page 24: November 16, 2001Slide 1 Opportunities in Bioinformatics for Computer Science Lenwood S. Heath Virginia Tech Blacksburg, VA 24061 heath@cs.vt.edu University.

November 16, 2001 Slide 24

Virginia Tech:

Plant Biologists: Ruth Alscher, Boris Chevone.

CS: Lenny Heath, Naren Ramakrishnan, and colleagues.

Statistics: Ina Hoeschele, Shun-Hwa Li.

NC State (Forest Biotechnology):

Ying-Hsuan Sun, Ron Sederoff, Ross Whetten

Effects of Drought Stress

Page 25: November 16, 2001Slide 1 Opportunities in Bioinformatics for Computer Science Lenwood S. Heath Virginia Tech Blacksburg, VA 24061 heath@cs.vt.edu University.

November 16, 2001 Slide 25

Intracellular Decision Making

Page 26: November 16, 2001Slide 1 Opportunities in Bioinformatics for Computer Science Lenwood S. Heath Virginia Tech Blacksburg, VA 24061 heath@cs.vt.edu University.

November 16, 2001 Slide 26

Spots:(Sequences affixed to slide)

1 2 3

11

2

21

3

1 2

2333

Treatment Control

Mix

1 2 3

Excitatio

n

Emission

Detection

Relative AbundanceDetection

Hybridization

Page 27: November 16, 2001Slide 1 Opportunities in Bioinformatics for Computer Science Lenwood S. Heath Virginia Tech Blacksburg, VA 24061 heath@cs.vt.edu University.

November 16, 2001 Slide 27

Gene Expression Varies

Page 28: November 16, 2001Slide 1 Opportunities in Bioinformatics for Computer Science Lenwood S. Heath Virginia Tech Blacksburg, VA 24061 heath@cs.vt.edu University.

November 16, 2001 Slide 28

Existing Computational Tools in Bioinformatics

• Sequence similarity

• Multiple sequence alignments

• Database searching

• Evolutionary (phylogenetic) tree construction

• Sequence assemblers

• Gene finders

Page 29: November 16, 2001Slide 1 Opportunities in Bioinformatics for Computer Science Lenwood S. Heath Virginia Tech Blacksburg, VA 24061 heath@cs.vt.edu University.

November 16, 2001 Slide 29

Challenges for Bioinformatics• Analyzing and synthesizing complex

experimental data• Representing and accessing vast quantities

of information• Pattern matching• Data mining• Gene discovery• Function discovery• Modeling the dynamics of cell function

Page 30: November 16, 2001Slide 1 Opportunities in Bioinformatics for Computer Science Lenwood S. Heath Virginia Tech Blacksburg, VA 24061 heath@cs.vt.edu University.

November 16, 2001 Slide 30

Computer science interacts with the life sciences.

Bioinformatics at Virginia Tech

• Computer Science in Bioinformatics:

• Joint research with: plant biologists, microbial biologists, biochemists, cell-cycle biologists, animal scientists, crop scientists, statisticians.

• Projects: Expresso; Nupotato; MURI; Arabidopsis Genome; Barista; Cell-Cycle Modeling

• Graduate option in bioinformatics

• Virginia Bioinformatics Institute (VBI)

Page 31: November 16, 2001Slide 1 Opportunities in Bioinformatics for Computer Science Lenwood S. Heath Virginia Tech Blacksburg, VA 24061 heath@cs.vt.edu University.

November 16, 2001 Slide 31

• Integration of design and procedures

• Integration of image analysis tools and statistical analysis

• Data mining using inductive logic programming (ILP)

• Closing the loop

• Integrating models

Expresso: A Problem Solving Environment (PSE) for Microarray Experiment Design and Analysis

Page 32: November 16, 2001Slide 1 Opportunities in Bioinformatics for Computer Science Lenwood S. Heath Virginia Tech Blacksburg, VA 24061 heath@cs.vt.edu University.

November 16, 2001 Slide 32

Flow of a Microarray Experiment

Hypotheses

Select cDNAs

PCR

Test of Hypotheses

Extract RNA

Replication and Randomization

Reverse Transcription and

Fluorescent Labeling

Robotic Printing

Hybridization

Identify Spots

Intensities

Statistics

Clustering

Data Mining, ILP

Page 33: November 16, 2001Slide 1 Opportunities in Bioinformatics for Computer Science Lenwood S. Heath Virginia Tech Blacksburg, VA 24061 heath@cs.vt.edu University.

November 16, 2001 Slide 33

Expresso: A Microarray Experiment Management System

Page 34: November 16, 2001Slide 1 Opportunities in Bioinformatics for Computer Science Lenwood S. Heath Virginia Tech Blacksburg, VA 24061 heath@cs.vt.edu University.

November 16, 2001 Slide 34

Nupotato

• Potatoes originated in the Andes, where there are many varieties.

• Many varieties survive at high altitude in cold, dry conditions.

• Microarray technology can be used to investigate genes that are responsible for stress resistance and that are responsible for the production of nutrients.

Page 35: November 16, 2001Slide 1 Opportunities in Bioinformatics for Computer Science Lenwood S. Heath Virginia Tech Blacksburg, VA 24061 heath@cs.vt.edu University.

November 16, 2001 Slide 35

MURI• Some microorganisms have the ability to

survive drying out or intense radiation.

• Their genomes are just being sequenced.

• Using microarrays and proteomics, we will try to correlate computationally the genes in the genomes with the special traits of the microorganisms.

• We are currently using multiple genome analysis.

Page 36: November 16, 2001Slide 1 Opportunities in Bioinformatics for Computer Science Lenwood S. Heath Virginia Tech Blacksburg, VA 24061 heath@cs.vt.edu University.

November 16, 2001 Slide 36

Arabidopsis Genome Project

• Arabidopsis is a model higher plant.

• It is the first higher plant whose genome has been fully sequenced.

• Gene finder software has been used to identify putative genes.

• We are computationally mining the regulatory regions of these genes for promoter patterns.

Page 37: November 16, 2001Slide 1 Opportunities in Bioinformatics for Computer Science Lenwood S. Heath Virginia Tech Blacksburg, VA 24061 heath@cs.vt.edu University.

November 16, 2001 Slide 37

Barista

• Barista serves Expresso!

• Software development team across projects to minimize duplication of effort.

• Work with Linux, Perl, C, Python, cvs, Apache, PHP, …

Page 38: November 16, 2001Slide 1 Opportunities in Bioinformatics for Computer Science Lenwood S. Heath Virginia Tech Blacksburg, VA 24061 heath@cs.vt.edu University.

November 16, 2001 Slide 38

Virginia Bioinformatics Institute (VBI)

• Research institute based at Virginia Tech

• Established July 1, 2000, with $3 million

• Will occupy 2 building and have 100+ employees in 4 years

Page 39: November 16, 2001Slide 1 Opportunities in Bioinformatics for Computer Science Lenwood S. Heath Virginia Tech Blacksburg, VA 24061 heath@cs.vt.edu University.

November 16, 2001 Slide 39

Getting Into Bioinformatics

• Learn some biology --- genetics, cell biology

• Study computational (molecular) biology

• Get involved with bioinformatics research in interdisciplinary teams

• Work with biologists to solve their problems