ModENCODE August 20-21, 2007 Drosophila Transcriptome: Aim 2.2.

modENCODEAugust 20-21, 2007

Drosophila Transcriptome: Aim 2.2

Aim 2.2 Experimental Validationof Transcript Models

1. Experimental verification of selected splice sites in transcript models (short RT-PCR)

2. Mapping transcript ends using RACE

3. Screening cDNA libraries for transcripts

4. Recovering cDNA clones using long RT-PCR

5. High-throughput sequencing of small RNAs

6. Submitting sequence data to databases

7. Reviewing the transcriptome annotation

Experiments at LBNL

Transcript EndsTSSs: 20,000 targeted 5’ RACE experiments poly-A: 1,000 targeted 3’ RACE experiments

Full-Length Transcript Structures6,000 cDNA screens and full-insert sequencing3,000 long RT-PCRs and full-insert sequencing

Small RNA Sequencing15 runs on on 454 Life Sciences deviceSize fractionate < 500 nt (larger range than Eric Lai)

Mapping TSSs

• 5’ RLM-RACE is a simple, scalable method

• RLM primer replaces the 5’ CAP structure

• Gene specific primers are nested & near 5’ end

• Sequence 8 clones• Direct sequencing is also

proposed but is difficult• We are prioritizing

transcripts and tissues using our 5’ EST data

TSSs: Slippery vs Discrete

head RACE productslarval RACE products

cDNAs

Cap-Trapped 5’ ESTs Define Discrete…

…and Slippery Transcripotion Start Sites

How Many TSSs Does bowl Have?

5’ RACE Plans

• Identify TSSs that are well mapped by 5’ EST data• Test RLM-RACE production protocol on 96 well

mapped TSSs to measure experimental success rate• Prioritize 5’ RACE experiments:

1. Transcripts with < 8 RE ESTs, using mixed embryo RNA2. Transcripts with ESTs from other embryo-derived libraries3. Transcripts with < 8 RH/TA ESTs4. Transcripts with larval/pupal ESTs5. Transcript without ESTs. Use appropriate RNA samples.

• Develop statistical description of “slipperiness”• Biological validation with microarrays & P elements

Computationally predicted conserved exons validated by cDNA screening and sequencing

I. Gene modifications II. Identification of New Genes

cDNA and Long RT-PCR Plans

• Identify all transcripts that are well defined by cDNA sequence- complete & spliced ORF, poly-A tail, (not necessarily a defined TSS)

• Identify targets for cDNA screening (DGC goals in parentheses)(Transcripts with a community cDNA but no BDGP cDNA)(Transcripts with truncated ORFs)(Alternative transcripts that encode alternative coding sequences)1. Conserved ORFs that failed on the first SLIP attempt: choose best RNA2. Transfrags & RACEfrags that are not captured in sequenced transcripts

• Identify targets for long RT-PCR- targets that fail in SLIP screening on the best RNA sample- RT-PCR is probably more sensitive than SLIP but seems limited to ~2 kb

• cDNA and RT-PCR design depends on Aim 1 & Aim 2.1 and should be an iterative process.

• Biological validation using integrated description of all data

An Unannotated Transfrag

A Relatively Rare Transript

CG31036: chordotonal neurons,lateral and head sensory neurons

High Throughput Sequencing Plan

• Pyrosequence RNA samples on 454 Life Sciences device- consider alternative platforms, e.g. Solexa

• Select 15 target tissues for analysis• Define a transcript size range to target

- avoid redundancy with Eric Lai: < 50 bases vs 50-500 bases- consider avoiding tRNAs

• Align transcript sequences and integrate with models• Biological validation:

Compare to microarray dataConservation in other species, including structure for ncRNAsFunctional genomics in Aim 3

Some Questions for Discussion

• How many genes & transcripts in Drosophila?

• How many genes with multiple transcripts? CDSs?

• Are these expressed in different cell types?

• Can we segregate them in different RNA samples to avoid mixed RACE, cDNA and RT-PCR products?

• How do we prioritize screening

• What will we miss?

• How do we know when we’re done?

Future Directions

• Do different promoter motifs correlate with “slipperiness”, tissue, stage?

• Confidence scores associated with exons, transcripts and gene models:How do we measure confidence?How confident can we be?How much data do we need per gene?

ModENCODE August 20-21, 2007 Drosophila Transcriptome: Aim 2.2.

Documents

Transcript of ModENCODE August 20-21, 2007 Drosophila Transcriptome: Aim 2.2.