Approaches to cDNA Cloning and Analysis

36
Approaches to cDNA Cloning and Analysis Dr. Matthias Harbers Chief Scientist DNAFORM Inc. Co-assigned Scientist at the RIKEN Omics Center © Matthias Harbers 2008 1

description

The analysis of all transcripts within a cell is of essential importance. Molecular biology provides many approaches to clone RNA transcripts into cDNA. Large cDNA collections are in the public domain to serve the research community. Today, however, new high-speed sequencing methods allow a much deeper view into transcriptomes than possible by classical cloning.

Transcript of Approaches to cDNA Cloning and Analysis

Page 1: Approaches to cDNA Cloning and Analysis

Approaches to cDNA Cloning and Analysis

Dr. Matthias Harbers

Chief Scientist DNAFORM Inc.

Co-assigned Scientist at the RIKEN Omics Center

© Matthias Harbers 20081

Page 2: Approaches to cDNA Cloning and Analysis

2

Genomic DNA(storage of information)

Coding mRNA(transport of information)

Promoter “Gene”

Transcript Start Site

Protein(tools to operate “functions”)

Transcription by RNA polymerase II

Translation at ribosome

Transcription Factors

Nucleus

Cytoplasm

Classical View on the Utilization of Genomic Information

AAAAACap

(7-methylguanosine cap or m7G cap)

Developed in the 50th and 60th of last century.

Page 3: Approaches to cDNA Cloning and Analysis

3

The Classical View Has Been Challenged by new Developments

Discovery/Project Importance Year

Discovery of reverse transcriptases

DNA can be synthesized from RNA templates

1969

Discovery of ligase and restriction endonucleases

Establishing DNA recombination, DNA cloning, and preparation of DNA libraries

1960s and 70s

DNA sequencing Chain-termination method(“Sanger Sequencing”)

1975

Human Genome Project Move to sequencing entire genomes 1990 to 2003

Expressed sequence tags (ESTs)

First attempt to gene discoveryand expression profiling

1991

IMAGE Project Program to create cDNA collections from key organisms

1993 to 2007

ENCODE Project Functional elements in human genome

Since 2003

Page 4: Approaches to cDNA Cloning and Analysis

Approaches to cDNA cloning

Special topics related to cDNA cloning

Large-scale cDNA cloning projects

Small RNA (sRNA) cloning

Tag-based approaches

Next-Generation Sequencing

Where do we go from here?

4

Topics of the Presentation

Page 5: Approaches to cDNA Cloning and Analysis

5

Approaches to cDNA cloning

Capped and polyadenylated mRNA

1st Strand cDNA synthesis:Commonly oligo(dT) priming

Prime 2nd strand cDNA synthesis:5’-Linker ligation or tailing reaction

2nd Strand synthesis(Option to make PCR)

Digestion with cloning enzyme(s):Methylation can protect against internalcleavage within cDNA

Ligation into phage or plasmid vector:(Plasmid with cDNA insert may be excised from phage vector)

PlPasmidPlasmidPlasmid

PhagePhage

AAAAACap5’3’

A A A A A…mRNAT T T T T

Cap

mRNAAdaptor

cDNA

cDNAAdaptor

cDNA

Page 6: Approaches to cDNA Cloning and Analysis

6

Synthesis of very long cDNAs (>10.000 bp, not further discussed)

Full-length cDNA cloning (important to obtain functional cDNAs)

Normalization (key to gene discovery in large-scale projects)

Cloning vectors and applications (not further discussed)

Subtractive cloning (not further discussed)

Expression cloning (not further discussed)

Addressing splicing (left out of large-scale projects)

Special Topics Related to cDNA Cloning

Ref.: Harbers M: The current status of cDNA cloning, Genomics. 2008 Mar;91(3):232-42.

Page 7: Approaches to cDNA Cloning and Analysis

7

Use of cDNA Libraries

Isolation of individual target genes

in Research Laboratories

Transcriptome Analysis and Genome Projects

Large-scale random clone picking

End-sequencing to build transcript catalogs

Full-length sequencing of selected clones

Creation of sequence data bases

Creation of cDNA collections

Ref.: Carninci P et al.: Targeting a complex transcriptome: the construction of the mouse full-length cDNA encyclopedia. Genome Res. 2003 Jun;13(6B):1273-89.

Page 8: Approaches to cDNA Cloning and Analysis

8

Benefits of Large-Scale cDNA Cloning Projects

Improved cDNA Cloning Technology

Gene Regulation:Promoter Identification

Expression Profiling

Genomics:Gene Discovery

Mapping

Sequence Data

Clone Collections

Proteomics:Functional Studies on

Proteins

RNAiKnock down

SNP Analysis:Location in Promoter or

ExonFunctional Studies

Noncoding RNASense-antisense Pairs

Public sequence databases and clone collections are essential tools for research!

Page 9: Approaches to cDNA Cloning and Analysis

9

The mRNA Pool of a Cell

500 t0 2,000 transcripts40 to 60 % of mRNA

5 t0 10 transcriptsup to 20% of mRNA

10,000 t0 20,000 transcripts<20% of mRNA

Discovery of rarely expressed genes is a difficult task!

(Old numbers estimated fromreassociation and hybridization studies)

Page 10: Approaches to cDNA Cloning and Analysis

10Number of Libraries

Num

ber o

f non

-red

unda

nd c

lone

s

Driver 2

Lib. 1

Lib. 3 + Driver 1

Lib. 4 + Driver 2

No Driver

Driver 1

Lib. 2

Without Normalization /Subtraction

With Normalization /Subtraction

: Highly expressed genes

/Hind III /Hind III

9.4 kbp6.6 kbp

4.4 kbp

2.2 kbp2.0 kbp

0.5 kbp

9.4 kbp6.6 kbp

4.4 kbp

2.2 kbp2.0 kbp

0.5 kbp

Example: Pancreas cDNA

Normalization of cDNA LibrariesDuring a Normalization Step a cDNA pool is hybridized against an aliquot of the

original mRNA sample or the same cDNA pool. Due to concentration dependent

hybridization kinetics the number clones representing highly expressed genes will

be reduced yielding in a more equal distribution of different cDNAs in the library.

Combine Normalization and

Subtraction for higher Gene

Discovery

Page 11: Approaches to cDNA Cloning and Analysis

11

Full-Length cDNA Cloning

“Cap Trapper” Method “Oligo Capping” Method

A A A A A…mRNA

cDNA T T T T T

A A A A A…mRNA

Adaptor

T T T T T

cDNA T T T T T

cDNA

Cap

A A A A A…mRNACap

Cap

Biotin

BiotinBeads

A A A A A…

cDNA T T T T T

Primer

T T T T T

cDNA

A A A A A…Adaptor

mRNA

mRNA

Adaptor

cDNA T T T T TA A A A A…mRNACapBiotin

RNase I digestion

A A A A A…mRNA

A A A A A…mRNAP

A A A A A…mRNA

A A A A A…mRNA

A A A A A…mRNA

A A A A A…mRNA

P

Key Steps:Biotinylation of Cap structure and RNase I Treatment

Key Steps:Replacement of Cap structure by RNA oligonucleotide

PP PCap

PP PCap

Phosphatase

Pyrophosphatase

RNA Ligase

Chemical reaction

Recovery on beads

Page 12: Approaches to cDNA Cloning and Analysis

12

Examples for Large-Scale cDNA Cloning Projects

Project Organisms URLIMAGE Consortium Human, mouse, rat, zebrafish, fugu,

Xenopus (X. laevis and X. tropicalis), cow, and primate

http://image.llnl.gov/

Mammalian Gene Collection (MGC)

Human, mouse, rat, cow, others http://mgc.nci.nih.gov/

Tokyo University Human http://cdna.hgc.jp/

RIKEN FANTOM Mouse http://fantom3.gsc.riken.go.jp/

Rice full-length cDNAConsortium

Rice http://cdna01.dna.affrc.go.jp/cDNA/

RIKEN Arabidopsis Arabidopsis http://www.brc.riken.jp/lab/epd/Eng/news/071015.shtml

ORF Consortium Human (some mouse clones) http://www.orfeomecollaboration.org

Targeting at the cloning and full-length sequencing of “one representative” cDNA clone for

each gene. This reduces cost, but it entirely ignores splicing events.

Page 13: Approaches to cDNA Cloning and Analysis

13

Pre-mRNA is Spliced into mRNA

Large-scale cloning projects do not cover splice variants.But maybe 75% of all signal transducers are regulated by splicing!

Page 14: Approaches to cDNA Cloning and Analysis

14

Capturing alternatively Spliced Exons in mRNA

Sense strandSample 1

Antisense strandSample 2

Cut double-stranded regions

Capture single-stranded regions

Ref.: Watahiki A et al.: Libraries enriched for alternatively spliced exons reveal splicing patterns in melanocytes and melanomas. Nature Methods 2004 Dec 1(3): 233-9.

Page 15: Approaches to cDNA Cloning and Analysis

15

The Discovery of small RNAs

Classical cloning protocols removed all cDNA fragments of less than500 bp (avoid linker contamination, cutoff of cloning vectors).

Proteins of less than 100 amino acids were commonly not annotated.

However, small RNAs have important functions!

Small RNAs are non-coding RNAs (ncRNAs) often derived from maturationprocesses in the cell that include digestion steps by RNases.

Most prominent example: microRNAs (miRNA) have reverse complement sequences to other mRNA transcripts. They are around 21-23 base pairs long after maturation and can alter the expression/translation of one or several target genes through RNA interference.

And we are still finding many more new RNA species!

Ref.: Kawaji H, Hayashizaki Y. Exploration of small RNAs. PLoS Genet. 2008 Jan;4(1):e22.

Page 16: Approaches to cDNA Cloning and Analysis

Short RNA

Modify 3’ end:C-Tailing or adaptor ligation

Modify 5’ end:Here by adaptor ligation

1st Strand cDNA synthesis

2nd Strand synthesis and PCR

Sequence analysis:Direct sequencing of DNA fragments(Option to ligate into plasmid vector)

CCCCCCCCC

CCCCCCCCC

CCCCCCCCCGGGGGGGG

CCCCCCCCCGGGGGGGG

PlPasmidPlasmidPlasmid

P

P

OH

Small RNA (sRNA) Cloning

16

Key Steps:Modification of 5’ and 3’ end of RNA for PCR amplification. Selection by size range. Commonly only sequenced.No cloning needed as short cDNAs can be chemically synthesized.

5’ 3’

P

Page 17: Approaches to cDNA Cloning and Analysis

17

Tag-Based Approaches

Gene discovery cannot be done by standard methods used in expression profiling such as microarray or PCR.

Unsupervised approaches are needed for gene discovery that donot require sequence information for probe design.

First approach to gene discovery was sequencing of 3’ ends of cDNAclones (EST sequencing). Requires one read per clone.

Gene identification does not require sequences of 500 to 800 bp,but much shorter sequences of some 20 bp or less are sufficient.

Use long sequencing reads to cover many short fragments by one run.

New protocols to isolated short fragments from RNA.

Tag-based approaches in expression profiling and gene discovery.

Ref.: Harbers M and Carninci P: Tag-based approaches for transcriptome research and genome annotation. Nature Methods 2005 Jul 2(7): 495-502.

Page 18: Approaches to cDNA Cloning and Analysis

18

Tag-Based Approaches

A A A A AmRNACap

Anchoring enzyme sites

CAGE5’ SAGE

SAGE(5’ related)

SAGE(3’ related)

MPSSDGE

3’ SAGE

RNA-Seqor other shotgun approaches

5’ endCap selection

3’ endRemove poly(A)

Paired-end Tags or PETs

Page 19: Approaches to cDNA Cloning and Analysis

19

Serial Analysis Gene Expression (SAGE)(Digital Gene Expression (DGE))

Ref.: Velculescu VE et al. Serial analysis of gene expression. Science. 1995 Oct 20;270(5235):368-9, 371.

A A A A A…mRNAT T T T T T Biotin

cDNA

Adaptor cDNABiotin

Biotin

Adaptor Adaptor

1st Strand cDNA Synthesis with biotinylated primer(Commonly starting from mRNA.)

Preparation of double-stranded cDNA and digestion with anchoring enzyme

Adaptor Ligation and digestion with Mme I (20 bp) or EcoP15I (27 bp)

Formation of “Di-Tags”(Di-Tags can be used for direct sequencing (DGE).)

Concatenation and cloning into plasmid vector(Classic sequencing of concatemers.)

Beads

Beads

Very well established and rich reference/annotation information.Digital expression profiling by “tag counting”.

Page 20: Approaches to cDNA Cloning and Analysis

20

Cap Analysis Gene Expression (CAGE)

Ref.: Kodzius R et al.: Cap analysis of gene expression: transcription start site mapping and expression profiling. Nature Methods 2006 Mar 3(3): 211-222.

A A A A AmRNA

A A A A AmRNAcDNA N N N N N N

1st Strand cDNA Synthesis(Covering poly(A-) mRNA and long mRNA.)

A A A A AmRNAcDNA

Beads

N N N N N N

5’-End Selection on Beads by Cap Trapper(Less bias due to chemical modification of Cap.)

cDNA N N N N N NAdaptor I

Adaptor Ligation and 2nd Strand Synthesis

cDNAAdaptor I

Digestion with Mme I (20 bp) or EcoP15I (27 bp)

TAGAdaptor I

Isolation of CAGE TAGs

TAGAdaptor I Adaptor II

3’-End Adaptor Ligation

5’ 3’ Commonly starting from 50g total RNA.

Preferably used for direct sequencing (>4,000,000 tags per run).

CAP

CAP

CAP

Page 21: Approaches to cDNA Cloning and Analysis

21

Cap Analysis Gene Expression (CAGE)

TF1 TF2 TF3

Signal 1 Signal 2 Signal 3

TF

ChIPChIP

Exon 1

TSSTSS

CAGE TagsCAGE Tags

GenomeGenome 2 3 4 5

Tiling Tiling Array/RNAArray/RNA--SeqSeqMicroarrayMicroarraySAGESAGE

A A A A AA A A A AmRNAmRNA

RACERACE

CAP

CAGE tags experimentally link transcripts to their promoters.CAGE tags integrate information based on genome annotations.CAGE tags can be linked to whole genome tiling arrays and RNA-Seq data.CAGE tags can be linked to Chromatin IP/ChIP-Seq data.CAGE tags correlate with open chromatin.CAGE tags provide primer information for cloning new transcripts.

Page 22: Approaches to cDNA Cloning and Analysis

22

Classical DNA Sequencing by Chain-Termination Method

A G C T

A C C A

ACT

G

T G G T T G GT ACC AC G TT

A

CT

G

A

C

A TG

Primer

DNA Template

DNA Polymerase

dNTP/ddNTP Mix

One reaction per nucleotide

T G G T T G GT ACC AC G TT

T G G T T G GT CC AC

T G G T T G

T G G T T G GT CC

A T G C

Analyze fragmentsby gel electrophoresis

DNA fragments fromPrimer extension reactions

Capillary Sequencer

Over 30 years the most important method in molecular biology.

Challenged by emerging new sequencing technologies: Next-Generation Sequencing.

Page 23: Approaches to cDNA Cloning and Analysis

23

Next-Generation Sequencing

Platform Mb per run/read length Method

Roche 454 Sequencing 100 Mb/250 bp/7h per run Emulsion PCR and Pyrosequencing

Illumina (Solexa) 1300 Mb/32-40bp/4 days per run Bridge PCR and sequencing-by-synthesis

ABI SOLiD 3000 Mb/35 bp/5 days per run Emulsion PCR and ligation-based sequencing

Helicos 25 to 90 Mb per h/up to 55 bp Single-molecule detection

Driven by the “$1000 genome” different companies are on the move to provide new sequencing

technologies based on “sequencing by synthesis” or “ligation-based sequencing”. Other approaches

may use hybridization methods or physical means in the future.

Ref.: Mardis ER. The impact of next-generation sequencing technology on genetics. Trends Genet. 2008 Mar;24(3):133-41. Epub 2008 Feb 11.von Bubnoff A. Next-generation sequencing: the race is on. Cell. 2008 Mar 7;132(5):721-3.

Page 24: Approaches to cDNA Cloning and Analysis

24

Example for Ligation-Based Sequencing: ABI SOLID System

DNA fragments havingadaptor sequences:

Genomic DNATag Sequencing

Project specific data analysis:Mapping to genome

Reference information

Images are the courtesy of ABI and were kindly provided by ABI Japan.

Page 25: Approaches to cDNA Cloning and Analysis

25

Example for Ligation-Based Sequencing: ABI SOLID System

Images are the courtesy of ABI and were kindly provided by ABI Japan.

Page 26: Approaches to cDNA Cloning and Analysis

26

Example for Ligation-Based Sequencing: ABI SOLID System

Images are the courtesy of ABI and were kindly provided by ABI Japan.

Page 27: Approaches to cDNA Cloning and Analysis

27

Example for Sequencing-by-Synthesis: Illumina 1G System

Images are the courtesy of Illumina and were kindly provided by Illumina Japan.

DNA per run0.1 ~1µg

Addition of 2 adaptors

Add to flow cell

Preparationof clusters

Page 28: Approaches to cDNA Cloning and Analysis

28

Example for Sequencing-by-Synthesis: Illumina 1G System

Images are the courtesy of Illumina and were kindly provided by Illumina Japan.

Cycle 1 Addition of the sequence reagent

5’3’

5’

C

C

C

C

C

C

G

G

G

GT

T

T

T

T

A

A

A

A

A

CG

CGTA

TGCC

GCAA

TGTT

One base extension reaction

Cycle 2

Repetition of the above reactions

Removal of non-incorporated bases

Detect fluorescence signal

Removal of the fluorescence label

Cycle 3, 4, 5…..

Repetition of the above reaction

Page 29: Approaches to cDNA Cloning and Analysis

29

Example for Sequencing-by-Synthesis: Illumina 1G System

Images are the courtesy of Illumina and were kindly provided by Illumina Japan.

100um

20um

40,000,000 clusters on a flow cell

Page 30: Approaches to cDNA Cloning and Analysis

30

Where do we go from here?

Next-Generation Sequencing will push genome sequencing field forre-sequencing and de novo sequencing (“1000 Genome Project”).

Metagenomics (Environmental Genomics, Ecogenomics, or Community Genomics): Direct analysis of genetic materials obtainedfrom environmental samples.

Expression profiling: SAGE (DGE), CAGE, PET, RNA-Seq.

Analytical applications to identify functional regions/elements in genomes: ChIP-Seq, open chromatin, SNPs, splicing, others to come .

Analytical applications in mutation screens.

Analytical applications for detection of infectious agents.

Page 31: Approaches to cDNA Cloning and Analysis

31

Ref.: Mattick, J.S. "Challenging the dogma: The hidden layer of non-protein-coding RNAs on complex organisms" Bioessays. (2003) 25, 930-939.

Transcriptome Analysis: The Dominance of noncoding RNA

Genome sequencing and annotation did not tell us about the realextent of gene expression!

Tiling array experiments and deep sequencing by next-generationsequencing methods indicates that >90% of the genome is expressed.

Maybe 40 to 50% of the mRNA is not polyadenylated, and we did notanalyze it yet.

Most of the transcripts are potentially noncoding RNAs having unknown (regulatory ?) functions.

The definition of a “gene” may no longer hold with many differenttranscripts derived from same loci.

We do not understand the “hidden layers” regulating the utilization ofgenomic information.

Page 32: Approaches to cDNA Cloning and Analysis

32

Example for RNA-Seq in Yeast Saccharomyces pombe (fission yeast)

Illumina 1G sequencer; average read length 39.1 base, fragments from poly(A) mRNA

> 23 mil reads (~60 genome length) proliferating cells.

> 99 mil reads (~ 190 genome length) from five different stages.

Covering ~94% nuclear and > 99% of mitochondrial genome.

Confirmed expression from intergenic regions by RT-PCR.

Control experiments using whole genome tiling arrays (25 mer/20 nt intervals)confirmed identification novel transcripts (26 out of 453 may encode shortproteins).

Ref.: Wilhelm BT, Dynamic repertoire of a eukaryotic transcriptome surveyed at single-nucleotide resolution.Nature. 2008 Jun 26;453(7199):1239-43. Epub 2008 May 18.Graveley BR. Molecular biology: power sequencing. Nature. 2008 Jun 26;453(7199):1197-8.

Recent publications on the use of RNA-Seq include S. pombe, S. cerevisiae, Arabidopsis,

mouse tissues, mouse stem cells, and HeLa S3.

Page 33: Approaches to cDNA Cloning and Analysis

33

Examples for Genome Size (haploid)

Genome Length in bp Estimated gene number

Phi-X 174 5,386 10

Human mitochondrion 16,569 37

E. coli 4,639,221 4,377

Saccharomyces cerevisiae 12,495,682 5,770

Caenorhabditis elegans 100,258,171 19,427

Arabidopsis thaliana 115,409,949 ~28,000

Drosophila melanogaster 122,653,977 13,379

Humans 3.3 x 109 ~20,500

Amphibians 109–1011 ?

Values taken from: http://users.rcn.com/jkimball.ma.ultranet/BiologyPages/G/GenomeSizes.html out of July 2007

Page 34: Approaches to cDNA Cloning and Analysis

34

Where are our limitations?

Mammalian genome size and transcriptome complexity:Enrichment of fragments e.g. using microarrays,Normalization and longer reads required.

Thus far uneven representation requires use of more than one method.

Requirements for starting materials (target is to analyze single cells).

No unified cDNA library method: using different methods depending on RNA length.

Very large data files and lack of computational analysis tools.

What is transcriptional noise?

Research dominated by “detection” rather than “functional analysis”.

Ref.: Struhl K. Transcriptional noise and the fidelity of initiation by RNA polymerase II. Nat Struct Mol Biol. 2007 Feb;14(2):103-5.

Page 35: Approaches to cDNA Cloning and Analysis

35

Present Strategies for Transcriptome Analysis

Interest has shifted to next-generation sequencing to profile transcriptionalactivities.

We cannot predict ends of transcripts, and therefore tag-based approaches to indentify start sites and termination sites are needed.

Identification of transcription start sites in combination with other information is driving “gene networks studies” and “system biology”.

RNA-Seq provides new means for the identification of splice sites andexpressed mutations.

We do not clone all those new transcripts, but there will be a need to getresources for functional analysis of new transcripts.

We are more than ever falling short on the functional analysis of new transcripts.Thus far we have not even analyzed all coding transcripts!

It is an exciting time to work on transcriptome analysis offering many challenges and rewards!

Page 36: Approaches to cDNA Cloning and Analysis

36

Contact:

Dr. Matthias Harbers

DNAFORM Inc.

Leading Venture Plaza-2, 75-1, Ono-choTsurumi-ku, Yokohama City, Kanagawa, 230-0046 Japan

E-mail: [email protected]

Phone: +81-(0)45-510-0607

FAX: +81-(0) 45-510-0608

URL: http://www.dnaform.jp