Sequencing Informatics Gabor T. Marth Department of Biology, Boston College [email protected] BI420 –...

20
Sequencing Informatics Gabor T. Marth Department of Biology, Boston College [email protected] BI420 – Introduction to Bioinformatics
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    217
  • download

    0

Transcript of Sequencing Informatics Gabor T. Marth Department of Biology, Boston College [email protected] BI420 –...

Page 1: Sequencing Informatics Gabor T. Marth Department of Biology, Boston College marth@bc.edu BI420 – Introduction to Bioinformatics.

Sequencing

Informatics

Gabor T. Marth

Department of Biology, Boston [email protected]

BI420 – Introduction to Bioinformatics

Page 2: Sequencing Informatics Gabor T. Marth Department of Biology, Boston College marth@bc.edu BI420 – Introduction to Bioinformatics.

The nuclear genome (chromosomes)

Page 3: Sequencing Informatics Gabor T. Marth Department of Biology, Boston College marth@bc.edu BI420 – Introduction to Bioinformatics.

The genome sequence

• the primary template on which to outline functional features of our genetic code (genes, regulatory elements, secondary structure, tertiary structure, etc.)

Page 4: Sequencing Informatics Gabor T. Marth Department of Biology, Boston College marth@bc.edu BI420 – Introduction to Bioinformatics.

Completed genomes

~1 Mb~100 Mb

>100 Mb

~3,000 Mb

Page 5: Sequencing Informatics Gabor T. Marth Department of Biology, Boston College marth@bc.edu BI420 – Introduction to Bioinformatics.

Main genome sequencing strategies

Clone-based shotgun sequencing

Whole-genome shotgun sequencing

Human Genome Project Celera Genomics, Inc.

Page 6: Sequencing Informatics Gabor T. Marth Department of Biology, Boston College marth@bc.edu BI420 – Introduction to Bioinformatics.

Hierarchical genome sequencing

BAC library construction

clone mapping

shotgun subclone library construction

sequencing

sequence reconstruction (sequence assembly)Lander et al. Nature 2001

Page 7: Sequencing Informatics Gabor T. Marth Department of Biology, Boston College marth@bc.edu BI420 – Introduction to Bioinformatics.

Clone mapping – “sequence ready” map

Page 8: Sequencing Informatics Gabor T. Marth Department of Biology, Boston College marth@bc.edu BI420 – Introduction to Bioinformatics.

Hierarchical genome sequencing

BAC library construction

clone mapping

shotgun subclone library construction

sequencing/read processing

sequence reconstruction (sequence assembly)Lander et al. Nature 2001

Page 9: Sequencing Informatics Gabor T. Marth Department of Biology, Boston College marth@bc.edu BI420 – Introduction to Bioinformatics.

Shotgun subclone library construction

BAC primary clone cloning vector

sequencing vector

subclone insert

Page 10: Sequencing Informatics Gabor T. Marth Department of Biology, Boston College marth@bc.edu BI420 – Introduction to Bioinformatics.

Hierarchical genome sequencing

BAC library construction

clone mapping

shotgun subclone library construction

sequencing/read processing

sequence reconstruction (sequence assembly)Lander et al. Nature 2001

Page 11: Sequencing Informatics Gabor T. Marth Department of Biology, Boston College marth@bc.edu BI420 – Introduction to Bioinformatics.

Sequencing

Page 12: Sequencing Informatics Gabor T. Marth Department of Biology, Boston College marth@bc.edu BI420 – Introduction to Bioinformatics.

Robotic automation

Lander et al. Nature 2001

Page 13: Sequencing Informatics Gabor T. Marth Department of Biology, Boston College marth@bc.edu BI420 – Introduction to Bioinformatics.

Base calling

GGGCTCAGCTGTATCAGCCACGTGCCTACAACAATCTGCCCCT

Page 14: Sequencing Informatics Gabor T. Marth Department of Biology, Boston College marth@bc.edu BI420 – Introduction to Bioinformatics.

Base calling

PHREDbase = AQ = 40

Page 15: Sequencing Informatics Gabor T. Marth Department of Biology, Boston College marth@bc.edu BI420 – Introduction to Bioinformatics.

Vector clipping

Page 16: Sequencing Informatics Gabor T. Marth Department of Biology, Boston College marth@bc.edu BI420 – Introduction to Bioinformatics.

Hierarchical genome sequencing

BAC library construction

clone mapping

shotgun subclone library construction

sequencing/read processing

sequence reconstruction (sequence assembly)Lander et al. Nature 2001

Page 17: Sequencing Informatics Gabor T. Marth Department of Biology, Boston College marth@bc.edu BI420 – Introduction to Bioinformatics.

Sequence assembly

PHRAP

Page 18: Sequencing Informatics Gabor T. Marth Department of Biology, Boston College marth@bc.edu BI420 – Introduction to Bioinformatics.

Repetitive DNA may confuse assembly

Page 19: Sequencing Informatics Gabor T. Marth Department of Biology, Boston College marth@bc.edu BI420 – Introduction to Bioinformatics.

Sequence completion (finishing)

CONSED, AUTOFINIS

H

gapregion of low sequence coverage and/or quality

Page 20: Sequencing Informatics Gabor T. Marth Department of Biology, Boston College marth@bc.edu BI420 – Introduction to Bioinformatics.

New sequencing technologies

From familiar ABI traces …

… and Solexa reads.… to 454 pyrograms …

100 x 1,000 bp

100 thousand x 100 bp50 million x 20 bp