ModENCODE August 20-21, 2007 Drosophila Transcriptome: Aim 2.2.
-
Upload
barbra-berry -
Category
Documents
-
view
224 -
download
3
Transcript of ModENCODE August 20-21, 2007 Drosophila Transcriptome: Aim 2.2.
modENCODEAugust 20-21, 2007
Drosophila Transcriptome: Aim 2.2
Aim 2.2 Experimental Validationof Transcript Models
1. Experimental verification of selected splice sites in transcript models (short RT-PCR)
2. Mapping transcript ends using RACE
3. Screening cDNA libraries for transcripts
4. Recovering cDNA clones using long RT-PCR
5. High-throughput sequencing of small RNAs
6. Submitting sequence data to databases
7. Reviewing the transcriptome annotation
Experiments at LBNL
Transcript EndsTSSs: 20,000 targeted 5’ RACE experiments poly-A: 1,000 targeted 3’ RACE experiments
Full-Length Transcript Structures6,000 cDNA screens and full-insert sequencing3,000 long RT-PCRs and full-insert sequencing
Small RNA Sequencing15 runs on on 454 Life Sciences deviceSize fractionate < 500 nt (larger range than Eric Lai)
Mapping TSSs
• 5’ RLM-RACE is a simple, scalable method
• RLM primer replaces the 5’ CAP structure
• Gene specific primers are nested & near 5’ end
• Sequence 8 clones• Direct sequencing is also
proposed but is difficult• We are prioritizing
transcripts and tissues using our 5’ EST data
TSSs: Slippery vs Discrete
head RACE productslarval RACE products
cDNAs
Cap-Trapped 5’ ESTs Define Discrete…
…and Slippery Transcripotion Start Sites
How Many TSSs Does bowl Have?
5’ RACE Plans
• Identify TSSs that are well mapped by 5’ EST data• Test RLM-RACE production protocol on 96 well
mapped TSSs to measure experimental success rate• Prioritize 5’ RACE experiments:
1. Transcripts with < 8 RE ESTs, using mixed embryo RNA2. Transcripts with ESTs from other embryo-derived libraries3. Transcripts with < 8 RH/TA ESTs4. Transcripts with larval/pupal ESTs5. Transcript without ESTs. Use appropriate RNA samples.
• Develop statistical description of “slipperiness”• Biological validation with microarrays & P elements
Computationally predicted conserved exons validated by cDNA screening and sequencing
I. Gene modifications II. Identification of New Genes
cDNA and Long RT-PCR Plans
• Identify all transcripts that are well defined by cDNA sequence- complete & spliced ORF, poly-A tail, (not necessarily a defined TSS)
• Identify targets for cDNA screening (DGC goals in parentheses)(Transcripts with a community cDNA but no BDGP cDNA)(Transcripts with truncated ORFs)(Alternative transcripts that encode alternative coding sequences)1. Conserved ORFs that failed on the first SLIP attempt: choose best RNA2. Transfrags & RACEfrags that are not captured in sequenced transcripts
• Identify targets for long RT-PCR- targets that fail in SLIP screening on the best RNA sample- RT-PCR is probably more sensitive than SLIP but seems limited to ~2 kb
• cDNA and RT-PCR design depends on Aim 1 & Aim 2.1 and should be an iterative process.
• Biological validation using integrated description of all data
An Unannotated Transfrag
A Relatively Rare Transript
CG31036: chordotonal neurons,lateral and head sensory neurons
High Throughput Sequencing Plan
• Pyrosequence RNA samples on 454 Life Sciences device- consider alternative platforms, e.g. Solexa
• Select 15 target tissues for analysis• Define a transcript size range to target
- avoid redundancy with Eric Lai: < 50 bases vs 50-500 bases- consider avoiding tRNAs
• Align transcript sequences and integrate with models• Biological validation:
Compare to microarray dataConservation in other species, including structure for ncRNAsFunctional genomics in Aim 3
Some Questions for Discussion
• How many genes & transcripts in Drosophila?
• How many genes with multiple transcripts? CDSs?
• Are these expressed in different cell types?
• Can we segregate them in different RNA samples to avoid mixed RACE, cDNA and RT-PCR products?
• How do we prioritize screening
• What will we miss?
• How do we know when we’re done?
Future Directions
• Do different promoter motifs correlate with “slipperiness”, tissue, stage?
• Confidence scores associated with exons, transcripts and gene models:How do we measure confidence?How confident can we be?How much data do we need per gene?