December 14, 2001Slide 1 Some Biology That Computer Scientists Need for Bioinformatics Lenwood S....

36
December 14, 200 1 Slide 1 Some Biology That Computer Scientists Need for Bioinformatics Lenwood S. Heath Virginia Tech Blacksburg, VA 24061 [email protected] University of Maryland December 14, 2001
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    216
  • download

    0

Transcript of December 14, 2001Slide 1 Some Biology That Computer Scientists Need for Bioinformatics Lenwood S....

December 14, 2001 Slide 1

Some Biology That Computer Scientists Need for

BioinformaticsLenwood S. Heath

Virginia TechBlacksburg, VA [email protected]

University of MarylandDecember 14, 2001

December 14, 2001 Slide 2

I. Some Molecular Biology and Genomics

II. Language of the New Biology

III. Existing bioinformatics tools

IV. Bioinformatics challenges

V. Bioinformatics at Virginia Tech

Overview

December 14, 2001 Slide 3

I. Some Molecular Biology

• The instruction set for a cell is contained in its chromosomes.

• Each chromosome is a long molecule called DNA.

• Each DNA molecule contains 100s or 1000s of genes.

• Each gene encodes a protein.

• A gene is transcribed to mRNA in the nucleus.

• An mRNA is translated to a protein on ribosomes.

December 14, 2001 Slide 4

Transcription and Translation

DNA mRNA ProteinTranscription Translation

December 14, 2001 Slide 5

Elaborating Cellular Function

DNA mRNA ProteinTranscription Translation

ReverseTranscription

Degradation

Regulation

Functions:• Structure• Catalyze chemical reactions• Respond to environment

(Genetic Code)

Thousands of Genes!

December 14, 2001 Slide 6

Chromosomes• Long molecules of DNA: 10^4 to 10^8 base pairs

• 26 matched pairs in humans

• A gene is a subsequence of a chromosome that encodes a protein.

• Proteins associated with cell function, structure, and regulation.

• Only a fraction of the genes are in use at any time.

• Every gene is present in every cell.

December 14, 2001 Slide 7

DNA Strand

C (cytosine) complements G (guanine)

C TCA AT T GA G CG

Bases

A (adenine) complements T (thymine)

2’-deoxyribose (sugar)5’ End 3’ End

December 14, 2001 Slide 8

Complementary DNA Strands

Double-Stranded DNA

C

G

TG A

C TCA AT T GA G CG

C

G

C

G

A

TA

T

A

T

A

T A

T

A

T C

G

C

G

C

G

GC CT TAA CG

December 14, 2001 Slide 9

RNA Strand

C UCA AU U GA G CG

Bases

U (uracil) replaces T (thymine)

Ribose (sugar)5’ End 3’ End

December 14, 2001 Slide 10

Transcription of DNA to mRNA

C

G

C

G

C

G

A

TA

T

A

T

A

T A

T

A

T C

G

C

G

C

G

TG A GC CT TAA CG

C UCA AU U GA G CG

mRNA Strand

Template DNA Strand

Coding DNA Strand

Template DNA Strand

December 14, 2001 Slide 11

Proteins and Amino Acids

• Protein is a large molecule that is a chain of amino acids (100 to 5000).

• There are 20 common amino acids(Alanine, Cysteine, …, Tyrosine)

• Three bases --- a codon --- suffice to encode an amino acid.

• There are also START and STOP codons.

December 14, 2001 Slide 12

Genetic Code

December 14, 2001 Slide 13

Translation to a Protein

C UCA AU U GA G CG

Phenylalanine ArginineHistidine Alanine

Unlike DNA, proteins have three-dimensional structure essential to protein function.

Protein folds to a three-dimensional shape that cannot yet be predicted from the primary sequence.

mRNA Strand

Nascent Polypeptide: Amino Acids Bound Together by Peptide Bonds

December 14, 2001 Slide 14

Transcription and Translation

DNA mRNA ProteinTranscription Translation

December 14, 2001 Slide 15

Transcription of DNA to mRNA

C

G

C

G

C

G

A

TA

T

A

T

A

T A

T

A

T C

G

C

G

C

G

TG A GC CT TAA CG

C UCA AU U GA G CG

mRNA Strand

Template DNA Strand

Coding DNA Strand

Template DNA Strand

December 14, 2001 Slide 16

Translation to a Protein

C UCA AU U GA G CG

Phenylalanine ArginineHistidine Alanine

mRNA Strand

Nascent Polypeptide: Amino Acids Bound Together by Peptide Bonds

December 14, 2001 Slide 17

Cell’s Fetch-Execute Cycle

• Stored Program: DNA, chromosomes, genes

• Fetch/Decode: RNA, ribosomes

• Execute Functions: Proteins --- oxygen transport, cell structures, enzymes

• Inputs: Nutrients, environmental signals, external proteins

• Outputs: Waste, response proteins, enzymes

December 14, 2001 Slide 18

A new language has been created. Words in the language that are useful for today’s talks.

Genomics

Functional Genomics

Proteomics

cDNA Microarrays

Global Gene Expression Patterns

II. The Language of the New Biology

December 14, 2001 Slide 19

• Discovery of genetic sequences and the ordering of those sequences into

• individual genes;• gene families;• chromosomes.

• Identification of• sequences that code for gene products/proteins; • sequences that act as regulatory elements.

Genomics

December 14, 2001 Slide 20

Genome Sequencing Projects

• Drosophila

• Yeast

• Mouse

• Rat

• Arabidopsis

• Human

• Microbes

• …

December 14, 2001 Slide 21

Drosophila Genome

December 14, 2001 Slide 22

• The biological role of individual genes.

• Mechanisms underlying the regulation of their expression.

• Regulatory interactions among them .

Functional Genomics

December 14, 2001 Slide 23

Glycolysis, Citric Acid Cycle, and Related Metabolic Processes

December 14, 2001 Slide 24

• Only certain genes are “turned on” at any particular time.

• When a gene is transcribed (copied to mRNA), it is said to be expressed.

• The mRNA in a cell can be isolated. Its contents give a snapshot of the genes currently being expressed.

• Correlating gene expressions with conditions gives hints into the dynamic functioning of the cell.

Gene Expression

December 14, 2001 Slide 26

Responses to Environmental Signals

December 14, 2001 Slide 27

Intracellular Decision Making

December 14, 2001 Slide 28

Microarray Technology

• In the past, gene expression and gene interactions were examined known gene by known gene, process by process.

• With microarray technology:

– Simultaneous examination of large groups of genes and associated interactions

– Possible discovery of new cellular mechanisms involving gene expression

December 14, 2001 Slide 29

Flow of a Microarray Experiment

Hypotheses

Select cDNAs

PCR

Test of Hypotheses

Extract RNA

Replication and Randomization

Reverse Transcription and

Fluorescent Labeling

Robotic Printing

Hybridization

Identify Spots

Intensities

Statistics

Clustering

Data Mining, ILP

December 14, 2001 Slide 30

Spots:(Sequences affixed to slide)

1 2 3

11

2

21

3

1 2

2333

Treatment Control

Mix

1 2 3

Excitatio

n

Emission

Detection

Relative AbundanceDetection

Hybridization

December 14, 2001 Slide 31

Gene Expression Varies

Cy5 to Cy3 ratios

December 14, 2001 Slide 32

III. Existing Computational Tools in Bioinformatics

• Sequence similarity

• Multiple sequence alignments

• Database searching

• Evolutionary (phylogenetic) tree construction

• Sequence assemblers

• Gene finders

December 14, 2001 Slide 33

Existing Biological Databases

• Molecular Sequences: Genomic DNA, mRNA, ESTs, proteins

• Protein domains, motifs, or blocks

• Protein families

• Genomes

• Nomenclature and ontologies

• Biological literature

December 14, 2001 Slide 34

IV. Challenges for Bioinformatics• Analyzing and synthesizing complex

experimental data• Representing and accessing vast quantities

of information• Pattern matching• Data mining --- whole genome analysis• Gene discovery• Function discovery• Modeling the dynamics of cell function

December 14, 2001 Slide 35

Computer science interacts with the life sciences.

V. Bioinformatics at Virginia Tech

• Computer Science in Bioinformatics:• Joint research with: plant biologists, microbial biologists, biochemists, cell-cycle biologists, animal scientists, crop scientists, statisticians.• Projects: Expresso; Nupotato; MURI; Arabidopsis Genome; Barista; Cell-Cycle Modeling• Graduate option in bioinformatics

• Virginia Bioinformatics Institute (VBI)

December 14, 2001 Slide 36

• Integration of design and procedures

• Integration of image analysis tools and statistical analysis

• Data mining using inductive logic programming (ILP)

• Closing the loop

• Integrating models

Expresso: A Problem Solving Environment (PSE) for Microarray Experiment Design

and Analysis

December 14, 2001 Slide 42

Getting Into Bioinformatics

• Learn some biology --- genetics, cell biology

• Study computational (molecular) biology

• Get involved with bioinformatics research in interdisciplinary teams

• Work with biologists to solve their problems