Introduction to Bioinformatics - Craig...

39
Introduction to Bioinformatics

Transcript of Introduction to Bioinformatics - Craig...

Page 1: Introduction to Bioinformatics - Craig Ventermaize.jcvi.org/cellgenomics/outreach/2007/notes/lecture_bioinformat… · Introduction to Bioinformatics. 2 ... – Bioinformatics is

Introduction to Bioinformatics

Page 2: Introduction to Bioinformatics - Craig Ventermaize.jcvi.org/cellgenomics/outreach/2007/notes/lecture_bioinformat… · Introduction to Bioinformatics. 2 ... – Bioinformatics is

2

• Cell biology– Organisms and cells– Building blocks of cells– How genes encode proteins?

• Bioinformatics– What is bioinformatics?– Practical applications– Tools and databases

Contents

Page 3: Introduction to Bioinformatics - Craig Ventermaize.jcvi.org/cellgenomics/outreach/2007/notes/lecture_bioinformat… · Introduction to Bioinformatics. 2 ... – Bioinformatics is

Cell Biology

Page 4: Introduction to Bioinformatics - Craig Ventermaize.jcvi.org/cellgenomics/outreach/2007/notes/lecture_bioinformat… · Introduction to Bioinformatics. 2 ... – Bioinformatics is

4

Page 5: Introduction to Bioinformatics - Craig Ventermaize.jcvi.org/cellgenomics/outreach/2007/notes/lecture_bioinformat… · Introduction to Bioinformatics. 2 ... – Bioinformatics is

5

Lineage tree of life on earth

Page 6: Introduction to Bioinformatics - Craig Ventermaize.jcvi.org/cellgenomics/outreach/2007/notes/lecture_bioinformat… · Introduction to Bioinformatics. 2 ... – Bioinformatics is

6

• Prokaryotes– Bacteria– Archaebacteria

• Eukaryotes– Plants– Animals– Fungi

Lineage tree of life on earth

Page 7: Introduction to Bioinformatics - Craig Ventermaize.jcvi.org/cellgenomics/outreach/2007/notes/lecture_bioinformat… · Introduction to Bioinformatics. 2 ... – Bioinformatics is

7

• Single cell organisms• Consists of cytosol bounded by the plasma membrane• Possesses a cell wall• Gram-negative bacteria have a thin cell wall and an

outer membrane• Gram-positive bacteria have a thick cell wall and no

outer membrane• DNA is condensed to the cell center and lacks a

defined nucleus• Ribosomes are found in the DNA-free region • Relatively simplified internal organization• Some can grow in extreme conditions (temperature,

pH, salt concentration)

Prokaryotic cells

Page 8: Introduction to Bioinformatics - Craig Ventermaize.jcvi.org/cellgenomics/outreach/2007/notes/lecture_bioinformat… · Introduction to Bioinformatics. 2 ... – Bioinformatics is

8

Prokaryotic cells

Page 9: Introduction to Bioinformatics - Craig Ventermaize.jcvi.org/cellgenomics/outreach/2007/notes/lecture_bioinformat… · Introduction to Bioinformatics. 2 ... – Bioinformatics is

9

• Single cell (unicellular fungi and protozoans) or multicellular organisms (plants and animals)

• Both plant cells and fungi possess a cell wall, however are of different compositions

• Surrounded by a plasma membrane, like the prokaryotes

• Contains a defined nucleus• Structurally more complex: organelles, cytoskeleton• Organelles are enclosed compartments separated

from the cytoplasm, defined by internal membranes• Cytoskeletons are structural proteins giving cell

strength and rigidity; can be connected to organelles and provide tracks for organelle movements

Eukaryotic cells

Page 10: Introduction to Bioinformatics - Craig Ventermaize.jcvi.org/cellgenomics/outreach/2007/notes/lecture_bioinformat… · Introduction to Bioinformatics. 2 ... – Bioinformatics is

10

Eukaryotic cells

Page 11: Introduction to Bioinformatics - Craig Ventermaize.jcvi.org/cellgenomics/outreach/2007/notes/lecture_bioinformat… · Introduction to Bioinformatics. 2 ... – Bioinformatics is

11

Lineage tree of life on earth

Page 12: Introduction to Bioinformatics - Craig Ventermaize.jcvi.org/cellgenomics/outreach/2007/notes/lecture_bioinformat… · Introduction to Bioinformatics. 2 ... – Bioinformatics is

12

Animal cell structure

Page 13: Introduction to Bioinformatics - Craig Ventermaize.jcvi.org/cellgenomics/outreach/2007/notes/lecture_bioinformat… · Introduction to Bioinformatics. 2 ... – Bioinformatics is

13

Plant cell structure

Page 14: Introduction to Bioinformatics - Craig Ventermaize.jcvi.org/cellgenomics/outreach/2007/notes/lecture_bioinformat… · Introduction to Bioinformatics. 2 ... – Bioinformatics is

14

Building blocks of cells

• Macromolecules– Nucleic acids (e.g. DNA, RNA)– Proteins (e.g. collagen)– Sugars (e.g. glucose, glycogen)– Lipids (e.g. cholesterol)

• Other molecules– Water– Ions

Page 15: Introduction to Bioinformatics - Craig Ventermaize.jcvi.org/cellgenomics/outreach/2007/notes/lecture_bioinformat… · Introduction to Bioinformatics. 2 ... – Bioinformatics is

15

Central dogma

DNA

mRNA

Protein

• Genetic information flow:

Page 16: Introduction to Bioinformatics - Craig Ventermaize.jcvi.org/cellgenomics/outreach/2007/notes/lecture_bioinformat… · Introduction to Bioinformatics. 2 ... – Bioinformatics is

16

• Contains genetic information arranged in units termed “genes”

• In an organism, all cells contain the same DNA content

• Basic subunits– adenine (A)– guanine (G)– cytosine (C)– thymine (T)

Deoxyribonucleic acid (DNA)

Page 17: Introduction to Bioinformatics - Craig Ventermaize.jcvi.org/cellgenomics/outreach/2007/notes/lecture_bioinformat… · Introduction to Bioinformatics. 2 ... – Bioinformatics is

17

Native DNA is a double helix of complementary antiparallel chains

Page 18: Introduction to Bioinformatics - Craig Ventermaize.jcvi.org/cellgenomics/outreach/2007/notes/lecture_bioinformat… · Introduction to Bioinformatics. 2 ... – Bioinformatics is

18

DNA is packaged into chromosomes

Page 19: Introduction to Bioinformatics - Craig Ventermaize.jcvi.org/cellgenomics/outreach/2007/notes/lecture_bioinformat… · Introduction to Bioinformatics. 2 ... – Bioinformatics is

19

The total DNA in the chromosomes of an organism is its genome

Page 20: Introduction to Bioinformatics - Craig Ventermaize.jcvi.org/cellgenomics/outreach/2007/notes/lecture_bioinformat… · Introduction to Bioinformatics. 2 ... – Bioinformatics is

20

Ribonucleic acid (RNA)

• Contains genetic information as messenger RNAs (mRNA)

• In an organism, cells contain different types of mRNAs

• Basic subunits– adenine (A)– guanine (G)– cytosine (C)– uracil (U)

Page 21: Introduction to Bioinformatics - Craig Ventermaize.jcvi.org/cellgenomics/outreach/2007/notes/lecture_bioinformat… · Introduction to Bioinformatics. 2 ... – Bioinformatics is

21

Protein

• Contains genetic information as amino acid sequence

• Basic subunits are 20 amino acids• A protein’s amino acid sequence determines

its 3D structure, which in turn determines the function of that protein

Question: What are essential amino acids?Amino acids that cannot be synthesized by the body cells, therefore have

to be included in the diet. Soy bean and corn are rich in essential amino acids

Page 22: Introduction to Bioinformatics - Craig Ventermaize.jcvi.org/cellgenomics/outreach/2007/notes/lecture_bioinformat… · Introduction to Bioinformatics. 2 ... – Bioinformatics is

22

The genetic code is a triplet code

Page 23: Introduction to Bioinformatics - Craig Ventermaize.jcvi.org/cellgenomics/outreach/2007/notes/lecture_bioinformat… · Introduction to Bioinformatics. 2 ... – Bioinformatics is

23

AUG GCU UGU UUA CGA AUU TAG

M A C L R I *

Met Ala Cys Leu Arg Ile *

ATG GCT TGT TTA CGA ATT TAGGene X

mRNA

Protein

• Example:

How genes encode proteins?

Page 24: Introduction to Bioinformatics - Craig Ventermaize.jcvi.org/cellgenomics/outreach/2007/notes/lecture_bioinformat… · Introduction to Bioinformatics. 2 ... – Bioinformatics is

24

Page 25: Introduction to Bioinformatics - Craig Ventermaize.jcvi.org/cellgenomics/outreach/2007/notes/lecture_bioinformat… · Introduction to Bioinformatics. 2 ... – Bioinformatics is

Bioinformatics

Page 26: Introduction to Bioinformatics - Craig Ventermaize.jcvi.org/cellgenomics/outreach/2007/notes/lecture_bioinformat… · Introduction to Bioinformatics. 2 ... – Bioinformatics is

26

• Background– Massive explosion in the amount of biological information

available due to huge advances in the fields of molecular biology and genomics.

• What is bioinformatics?– Bioinformatics is the application of computer technology to

the management, interpretation and analysis of biological data.

– An interdisciplinary research area that is the interface between the biological and computational sciences.

• Goals– To uncover the wealth of biological information hidden in the

mass of data.– To provide improvements in research fields such as human

health, agriculture, the environment, energy and biotechnology.

What is Bioinformatics?

Page 27: Introduction to Bioinformatics - Craig Ventermaize.jcvi.org/cellgenomics/outreach/2007/notes/lecture_bioinformat… · Introduction to Bioinformatics. 2 ... – Bioinformatics is

27

• Large scale sequencing projects– Genome sequencing

• Examples: microbial or human genome sequencing• Determine the DNA sequence of an organism• Discover “genes” in the genome using bioinformatics tools

– EST (Expressed Sequence Tag) sequencing• Examples: a specific tissue or cell type from a given

organism• Determine the mRNA sequences found in specific tissue

or cell type• Determine “genes” expressed in specific tissue or cell

type

Data generation

Page 28: Introduction to Bioinformatics - Craig Ventermaize.jcvi.org/cellgenomics/outreach/2007/notes/lecture_bioinformat… · Introduction to Bioinformatics. 2 ... – Bioinformatics is

28

DNA sequencing

Page 29: Introduction to Bioinformatics - Craig Ventermaize.jcvi.org/cellgenomics/outreach/2007/notes/lecture_bioinformat… · Introduction to Bioinformatics. 2 ... – Bioinformatics is

29

4,000

-

4.6

E. coli

35,000

80%

3,000

Human

45,000

80%

2,500

Maize

40,00027,000Estimated gene count

40%10%Repetitive DNA

400125Genome size (Mb)

RiceArabidopsis

Genome sizes

Page 30: Introduction to Bioinformatics - Craig Ventermaize.jcvi.org/cellgenomics/outreach/2007/notes/lecture_bioinformat… · Introduction to Bioinformatics. 2 ... – Bioinformatics is

30

• Whole genome shotgun sequencing– For genome of relatively small sizes (e.g. bacteria)– Break up the genome into small DNA fragments– Rely on computer algorithms to assemble the fragments– Examples: microbial genomes, Drosophila, Human (Celera)

• Hierarchical sequencing– For genome of large sizes (e.g. human, maize)– Break genome into many long pieces– Map each long piece onto the chromosome (physical

mapping)– Select and sequence pieces with minimal overlaps– Examples: Rice, Human

Genome sequencing strategies

Page 31: Introduction to Bioinformatics - Craig Ventermaize.jcvi.org/cellgenomics/outreach/2007/notes/lecture_bioinformat… · Introduction to Bioinformatics. 2 ... – Bioinformatics is

31

cut many times at random

Genome

1. Cut genomic DNA cut into pieces2. Sequence random fragments3. Put sequence together into one piece relying on

computer algorithms

Whole Genome Shotgun

Page 32: Introduction to Bioinformatics - Craig Ventermaize.jcvi.org/cellgenomics/outreach/2007/notes/lecture_bioinformat… · Introduction to Bioinformatics. 2 ... – Bioinformatics is

32

1. Genomic DNA cut into pieces2. Assign chromosomal location for each DNA

fragment3. Sequence fragments originated from known

location4. Stitch together fragments from each

chromosomal location

GenomeChr 1, region 5Chr 1, region 1

Hierarchical Sequencing

Page 33: Introduction to Bioinformatics - Craig Ventermaize.jcvi.org/cellgenomics/outreach/2007/notes/lecture_bioinformat… · Introduction to Bioinformatics. 2 ... – Bioinformatics is

33

• The order of the nucleotide bases contains the instructions for making an organism.

• There are 4 types of nucleotide base: A-adenine, T- thymine, C- cytosine, G- guanine.

• Every three bases codes for an amino acid.• There are 20 different amino acids that

combined in different ways make different proteins.

Genome facts

Page 34: Introduction to Bioinformatics - Craig Ventermaize.jcvi.org/cellgenomics/outreach/2007/notes/lecture_bioinformat… · Introduction to Bioinformatics. 2 ... – Bioinformatics is

34

• The human genome is composed of more than 3 billion nucleotide bases.

• Almost all nucleotide bases (99.9%) are exactly the same in all people.

• Our DNA is 98% identical to chimpanzees.• Less than 2% of the genome codes for proteins.• The vast majority of the DNA in the genome (>97%) has no

known function.• The functions remain unknown for over 50% of discovered

genes.• Chromosome 1 has the most genes (2,968) and chromosome Y

has the least (231).• If unwound and tied together, the strands of DNA in one cell

would stretch 6 feet.• The total number of human genes is estimated to be between

30,000 - 40,000.

Human genome facts

Page 35: Introduction to Bioinformatics - Craig Ventermaize.jcvi.org/cellgenomics/outreach/2007/notes/lecture_bioinformat… · Introduction to Bioinformatics. 2 ... – Bioinformatics is

35

CTCTAGCTATCTTGGTCTCCTACACAGCCTATGCACATGAGCCCATGCCTCTCCTCTCCTTGCGCCTGCATAGAGAGGTGGTATGATCACCTGGAAAGTTTTTAACTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTTACAAGCCTAGACCTTATGCATGGTCGGACGGACACATCTGATCATAGGACATATGAGTAGGCCACACTCCTCCTGCCCCTCTCTCGTAGAGATCAACACACACTGCTCTTAGTGCCAGGACCTAGAGAGGGGAGCGTGGAGAGGGCATCAGGGGGCCTTGGAGTCCCATCAGTAAAGCACATGTTTCCTTTCTGTGATTCCTCAAGCCCCATGGACTTACCGCTTTACCAACAACTGCAGCTAAGCCCGTCTTCCCCAAAGACGGACCAATCCAGCAGCTTCTACTGCTACCCATGCTCCCCTCCCTTCGCCGCCGCCGACGCCAGCTTTCCCCTCAGCTACCAGATCGGTAGTGCCGCGGCCGCCGACGCCACCCCTCCACAAGCCGTGATCAACTCGCCGGACCTGCCGGTGCAGGCGCTGATGGACCACGCGCCGGCGCCGGCTACAGAGCTGGGCGCCTGCGCCAGTGGTGCAGAAGGATCCGGCGCCAGCCTCGACAGGGCGGCTGCCGCGGCGAGGAAAGACCGGCACAGCAAGATATGCACCGCCGGCGGGATGAGGGACCGCCGGATGCGGCTCTCCCTTGACGTCGCGCGCAAATTCTTCGCGCTGCAGGACATGCTTGGCTTCGACAAGGCAAGCAAGACGGTACAGTGGCTCCTCAACACGTCCAAGTCCGCCATCCAGGAGATCATGGCCGACGACGCGTCTTCGGAGTGCGTGGAGGACGGCTCCAGCAGCCTCTCCGTCGACGGCAAGCACAACCCGGCAGAGCAGCTGGGAGGAGGAGGAGATCAGAAGCCCAAGGGTAATTGCCGCGGCGAGGGGAAGAAGCCGGCCAAGGCAAGTAAAGCGGCGGCCACCCCGAAGCCGCCAAGAAAATCGGCCAATAACGCACACCAGGTCCCCGACAAGGAGACGAGGGCGAAAGCGAGGGAGAGGGCGAGGGAGCGGACCAAGGAGAAGCACCGGATGCGCTGGGTAAAGCTTGCTTCAGCAATTGACGTGGAGGCGGCGGCTGCCTCGGGGCCGAGCGACAGGCCGAGCTCGAACAATTTGAGCCACCACTCATCGTTGTCCATGAACATGCCGTGTGCTGCCGCTGAATTGGAGGAGAGGGAGAGGTGTTCATCAGCTCTCAGCAATAGATCAGCAGGTAGGATGCAAGAAATCACAGGGGCGAGCGACGTGGTCCTGGGCTTTGGCAACGGAGGAGGAGGATACGGCGACGGCGGCGGCAACTACTACTGCCAAGAGCAATGGGAACTCGGTGGAGTCGTCTTTCAGCAGAACTCACGCTTCTACTGAACACTACGGGCGCACTAGGTACTAGAACTACTCTTTCGACTTACATCTATCTCCTTTCCCTCAACGTGAGCTTCTCAATAATTTGCTGTCTTAATCTATGCGTGTGTTTCTCTTTCTAGACTTCGTAATTGGCTGTGTGACGATGAACT

A piece of DNA sequence

Page 36: Introduction to Bioinformatics - Craig Ventermaize.jcvi.org/cellgenomics/outreach/2007/notes/lecture_bioinformat… · Introduction to Bioinformatics. 2 ... – Bioinformatics is

36

CTCTAGCTATCTTGGTCTCCTACACAGCCTATGCACATGAGCCCATGCCTCTCCTCTCCTTGCGCCTGCATAGAGAGGTGGTATGATCACCTGGAAAGTTTTTAACTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTCTTACAAGCCTAGACCTTATGCATGGTCGGACGGACACATCTGATCATAGGACATATGAGTAGGCCACACTCCTCCTGCCCCTCTCTCGTAGAGATCAACACACACTGCTCTTAGTGCCAGGACCTAGAGAGGGGAGCGTGGAGAGGGCATCAGGGGGCCTTGGAGTCCCATCAGTAAAGCACATGTTTCCTTTCTGTGATTCCTCAAGCCCCATGGACTTACCGCTTTACCAACAACTGCAGCTAAGCCCGTCTTCCCCAAAGACGGACCAATCCAGCAGCTTCTACTGCTACCCATGCTCCCCTCCCTTCGCCGCCGCCGACGCCAGCTTTCCCCTCAGCTACCAGATCGGTAGTGCCGCGGCCGCCGACGCCACCCCTCCACAAGCCGTGATCAACTCGCCGGACCTGCCGGTGCAGGCGCTGATGGACCACGCGCCGGCGCCGGCTACAGAGCTGGGCGCCTGCGCCAGTGGTGCAGAAGGATCCGGCGCCAGCCTCGACAGGGCGGCTGCCGCGGCGAGGAAAGACCGGCACAGCAAGATATGCACCGCCGGCGGGATGAGGGACCGCCGGATGCGGCTCTCCCTTGACGTCGCGCGCAAATTCTTCGCGCTGCAGGACATGCTTGGCTTCGACAAGGCAAGCAAGACGGTACAGTGGCTCCTCAACACGTCCAAGTCCGCCATCCAGGAGATCATGGCCGACGACGCGTCTTCGGAGTGCGTGGAGGACGGCTCCAGCAGCCTCTCCGTCGACGGCAAGCACAACCCGGCAGAGCAGCTGGGAGGAGGAGGAGATCAGAAGCCCAAGGGTAATTGCCGCGGCGAGGGGAAGAAGCCGGCCAAGGCAAGTAAAGCGGCGGCCACCCCGAAGCCGCCAAGAAAATCGGCCAATAACGCACACCAGGTCCCCGACAAGGAGACGAGGGCGAAAGCGAGGGAGAGGGCGAGGGAGCGGACCAAGGAGAAGCACCGGATGCGCTGGGTAAAGCTTGCTTCAGCAATTGACGTGGAGGCGGCGGCTGCCTCGGGGCCGAGCGACAGGCCGAGCTCGAACAATTTGAGCCACCACTCATCGTTGTCCATGAACATGCCGTGTGCTGCCGCTGAATTGGAGGAGAGGGAGAGGTGTTCATCAGCTCTCAGCAATAGATCAGCAGGTAGGATGCAAGAAATCACAGGGGCGAGCGACGTGGTCCTGGGCTTTGGCAACGGAGGAGGAGGATACGGCGACGGCGGCGGCAACTACTACTGCCAAGAGCAATGGGAACTCGGTGGAGTCGTCTTTCAGCAGAACTCACGCTTCTACTGAACACTACGGGCGCACTAGGTACTAGAACTACTCTTTCGACTTACATCTATCTCCTTTCCCTCAACGTGAGCTTCTCAATAATTTGCTGTCTTAATCTATGCGTGTGTTTCTCTTTCTAGACTTCGTAATTGGCTGTGTGACGATGAACT

A piece of DNA sequence- carrying a gene unit

Page 37: Introduction to Bioinformatics - Craig Ventermaize.jcvi.org/cellgenomics/outreach/2007/notes/lecture_bioinformat… · Introduction to Bioinformatics. 2 ... – Bioinformatics is

37

• Sequence properties– Length, base composition, GC content, etc.

• Sequence assembly– Put sequence together based on similarity

• Gene prediction– Find gene units in a given DNA sequence

• Repeat finding– Find repeated units in a given DNA sequence

• Sequence similarity search– Find other similar sequences based on DNA or protein

sequences• Protein function analysis

– Predict protein function based on known functional units found in protein sequence (domains)

Data analysis (tools)

Page 38: Introduction to Bioinformatics - Craig Ventermaize.jcvi.org/cellgenomics/outreach/2007/notes/lecture_bioinformat… · Introduction to Bioinformatics. 2 ... – Bioinformatics is

38

• Databases– Research articles

• What is the latest research with regard to genes involved in horse coat color?

– Taxonomy• How many plant or animal genomes have been sequenced?

– Nucleotide• What is the nucleotide sequence of the maize

domestication gene teosinte branched 1 (tb1)?– Protein

• What is the protein sequence of the maize domestication gene tb1?

– Genome• Where are the human diabetes genes located in the

human genome? Which chromosome?

Data storage (databases)

Page 39: Introduction to Bioinformatics - Craig Ventermaize.jcvi.org/cellgenomics/outreach/2007/notes/lecture_bioinformat… · Introduction to Bioinformatics. 2 ... – Bioinformatics is

39

• Agriculture– Improve insect resistance– Improve nutritional quality– Improve drought resistant and/or environmental adaptability

• Animals– Improve production and nutrition of farm animals

• Molecular medicine– Preventative medicine– Gene therapy

• Microbial genome– Waste cleanup– Climate change– Alternative energy sources– Biotechnology – Antibiotic resistance– Forensic analysis of microbes– Metagenomics

Long term goals