Sequence Analysis & Gene Expression MUPGRET workshop, Columbia, MO, June 2005 (HJ Bohnert, UIUC)...
-
date post
22-Dec-2015 -
Category
Documents
-
view
215 -
download
2
Transcript of Sequence Analysis & Gene Expression MUPGRET workshop, Columbia, MO, June 2005 (HJ Bohnert, UIUC)...
Sequence Analysis & Gene Expression
MUPGRET workshop, Columbia, MO, June 2005 (HJ Bohnert, UIUC)
Organism selection: genome size – why – what is the benefit - politics
Decisions: mapping first, “shotgun sequencing”, BAC alignment/sequencing[BAC – bacterial artificial chromosome; also YAC (yeast)]
Genome sequence: raw sequence – confirmed sequencegene models – verification
Verification: is the gene model transcribed? Yes/no/perhaps“ubiquitous” gene, family specific, homolog - ortholog - paralog
Transcript profiles: when – how much [abundant] – where transcript “variants” – inducible by condition X?
GenomicsGenomics
information mining, hypotheses, experiment - insight, application, virtual life
expressionprofiles
knock-outsRNA & RNAi
protein localization
structure analysis
dynamic metabolite
catalogs
biochemicalgenetics
protein interaction maps
TPMal
A
BX Y
ATCCGAAGCGCTTGGAAAA
Databases, Integration& Intuition
genome & transcriptome sequences
… not just genes
markers& QTLs
How (much) will‘encyclopedic’
approaches lead to better
understanding?
control
O3
CO2
Columbia grown in Soy-FACE
Field ona dish!
Arabidopsis – model plant
small, fast, prolific,mutants, lines, ecotypes,genome sequence
PIP2;2
Ch-1
PIP1;3 TIP3;2 NIP3;1 TIP2;xpseudo TIP3;1 NIP6;1
Ch-5
SIP1;2 NIP4;1 NIP4;2 TIP2;3 PIP2;4
5 10 20 30Mb
Ch-4
PIP1;4 TIP1;3 NIP5;1 TIP2;2NIP1;1NIP1;2 PIP1;5
Ch-3
TIP1;2TIP2;1NIP7;1SIP1;1 TIP5;1
PIP1;1
PIP2;1
PIP2;5
SIP2;1
Ch-2
PIP1;2TIP4;1NIP2;1pseudoNIP3;1pseudo NIP2;1
TIP1;1
PIP2;6
PIP2;3
PIP2;8
PIP2;7
(15)
(4)
(14)
(3)
(12)
- duplicated regions that include AQPs.
rDNA
AQP are distributed over all Chromosomes - a few clusters, many duplications
Arabidopsis thalianaAGI, 2000
Plants in silico? Sure! And then: Plant Design from Scratch
Ecosystem – population – species – ecotype (- breeding line)
Organism – organ – tissue – cell – compartment
Nucleus – envelope & pore – nucleoplasm, nucleolus & chromosomes
Euchromatin & heterochromatin – gene islands – gene
Promoters – 5’-regulatory (untranslated = UTR) –
introns & exons – mature coding region –
3’-regulatory (UTR) regions
The Plant Genome
The Plant Genome ControlsControls for Gene Expression – many Switchboards
• Chromatin condensation state
• Local chromatin environment• Transcription initiation• Transcript elongation• mRNA splicing • mRNA export• mRNA place in the cell• RNA half-life• Killer microRNAs• Ribosome loading• Protein transport/targeting• Protein modifications• Protein turnover
Levels of regulation that
affect what we call
“gene expression”
The Plant Transcriptome
Killer RNAs(there are micro-genes)
Result: no protein-
i.e., gene isessentially“silenced”
5 years ago, we did not know that
such a control system existed!
microRNAs
The Plant Transcriptome
How to sample the transcriptome?
• Morphological dissection (root, leaf, flower - epidermis, guard cell, etc.)
• Cell sorting make single cells, send through cell sorter (size, color, reporter gene)
• Laser ablation micromanipulation of laser to cut individual cells
• Biochemical dissection (compartment isolation) chloroplasts, mitochondria, ribosomes, other membranes
Painting cellswith a
reporter gene-
here isGFP
GreenFluorescence
Protein
Painting tissuesthen isolating desired cells
Enzymatic staining
The Plant Transcriptome
The Endodermis of the root tip
is highlighted in transgenic
plants using pSCR::mGFP5.
Emerging lateral roots[requires plant transformation]
The Plant Transcriptome
> cDNA libraries
• “neat”
• normalized
• subtracted
> SAGE libraries
cDNA – complementary DNA
converts messenger RNA into
double-stranded DNA
“Normalization” removes mRNAs
for which there are many copies
in a cell – thus enriching for
“rare mRNAs” (not so much sequencing to do)
Subtraction removes cDNAs which you already know
(less sequencing)
Total RNA
Poly(A)+ RNA
1st strand cDNA
ds-cDNA
Size-selected double stranded cDNA (>500 bp)
Ligate to EcoRI adapters/digest NotI
Clone (EcoRI/NotI) digested pBSII/SK+ & adaptored cDNA
Primary cDNA Library
Primary (neat) library may be used for “normalization”
Library Normalization
primary cDNA library
ss-DNA
DNA “tracer”
PCR inserts by T7 and T3
standard primers
DNA “driver”tracer/driverhybridization
column chromatogr.(double-strands stick)
Non-hybridized DNA from flow-through = normalized clones
make ss-DNA out of primary
library
cDNA Libraries
Cloning ofroot RNAs
from segmentsS1 – S4root tip
(Sharp lab)
sequenced~18,000 clones
found~8,000 unique
and~130 novel genes
How many genesmake a root?
The Plant Transcriptome
SerialAnalysisGeneExpression
http://www.sagenet.org/
Velculescu et al. 1995
1 2 3 4 5 6 7 8 9 10 M
coding region (known or expected)forward p.
reverse p.
Amplicon(sequence or clone + sequence)
results
Serial dilution1x - 1/5x - 1/25x - 1/125x
[cycle number]
Real-time PCR)(quantitative)
RNA (DNA-free) to cDNA
use product in dilutionsfor amplification
Assumption each cycle increases amountby factor 2 (or 1.8)
Check by using knownamount of cloned control cDNA
Melting curves
[single products]
Two amplicons are shown
Each shows a single melting curve
Single genes have been amplified here
Melting curves
[multiple products]
More than one gene has been amplified here
Homologous genes
[identity – similarity – divergence]
orthologous – paralogous
relationships
Quantitative PCRin 384-well plates
(96 primer pairs,3 repeats each)
Taking SAGE & cDNA
sequences together-
corn roots
“express”
20-23,000 genes(i.e., mRNA is made)
-
The entire corn genomeis expected to include
~50,000 genes
The Plant Transcriptome
Substrates for High Throughput Arrays
Nylon Membrane Glass SlidesGeneChip
Single label 33P Single label biotinstreptavidin
Dual labelCy3, Cy5
TeleChem ChipMaker2 Pins
Pin pick-up volume 100-250 nlSpot diameter 75-200 umSpot volume 0.2-1.0 nl
Creating cDNA Arrays
cDNA cloned into vector and transformed to create cDNA library
Clones sequencedand unique setchosen and reracked
Slides printed on Cartesian Arrayer
384 well microtiter plate
Q-Pix
PCR on Tecan workstation Final product
Unique set of clones
NSF Soybean Functional GenomicsSteve Clough / Vodkin Lab
Printing Arrays on 50 slides
Slide Chemistry
Glass
Coatings
Si
OH
OH
OHC
O
H
SilylatedAldehyde
...NCCNCCNCC.......
O
O
O
NH3+
NH3
+
NH3+
HN3+ Si
OH
OH
OH
Poly-L-lysineAmine
Silanated
NSF Soybean Functional GenomicsSteve Clough / Vodkin Lab
SiO
O
Si
O
OSi
O
Si
O
Si
O
Si
O
Si Si
O
O OO OO OO O O OSi Si
O OO O
Si
We use SuperAmine and SuperAldehyde from TeleChem (arrayit.com)
GSI Lumonics
NSF Soybean Functional GenomicsSteve Clough / Vodkin Lab
Placenta vs. Brain – 3800 Cattle Placenta Array cy3 cy5
GenePix Image Analysis Software
Troubleshooting
The Good
The BadThe Ugly
NSF Soybean Functional GenomicsSteve Clough / Vodkin Lab
Post-Print Processing
HotWater
UV light
Printed slide
Rehydrate spots
Snap dry
Fix DNA to coatingHybridize & Scan
Chemically block background.Denature tosingle strands.
Cells from condition ACells from condition ACells from condition ACells from condition A Cells from condition BCells from condition BCells from condition BCells from condition B
mRNA
Label Dye 2
Ratio of expression of genes from two sources
Label Dye 1
cDNA
equal over under
Mix
ScanArray 3000 Fluorescent Scanner
Overlay Images
Slide 2Cy5 over-expressed
Slide 1Cy3 over-expressed
Reverse Labeling
Universal vs. Universal (control v. control)
Problem area atlow intensity readings
LungvsControl
Cholesterol Biosynthesis
Cell Cycle
Immediate Early Response
Signaling and Angiogenesis
Wound Healing and Tissue Remodeling
Clustered display of data from time course of serum stimulation of primary human fibroblasts.
Eisen et al. Proc. Natl. Acad. Sci. USA 95 (1998) pg 14865
Hierarchical Clustering: 14 Tissues7653 Genes
• One sample, one chipOne sample, one chip• Single Color ScansSingle Color Scans• Labeling by incorporating Biotin into cRNA Labeling by incorporating Biotin into cRNA not not Cy3Cy3 or or Cy5 Cy5 dyesdyes• Oligonucleotides instead of full-length cDNAsOligonucleotides instead of full-length cDNAs• Higher Density ArraysHigher Density Arrays
–Feature sizes down to 18 Feature sizes down to 18 m instead of ~100 m instead of ~100 mm
–Non-contact Creation of ArraysNon-contact Creation of Arrays
Differences in TechnologyDifferences in Technology
Affymetrix
GeneChips
Affy Technology OverviewAffy Technology Overview
• Photolithography and Photolithography and combinatorial combinatorial chemistrychemistry– Technology from Technology from
microchip microchip industry: industry: “GeneChip”“GeneChip”
– Coat slidesCoat slides– ““Mask” to apply Mask” to apply
light to only light to only desired features, desired features, de-protects featurede-protects feature
Technology Overview (cont.)Technology Overview (cont.)
• Apply required Apply required nucleotide base to nucleotide base to arrayarray
• Apply new mask to de-Apply new mask to de-protect different protect different featuresfeatures
• Stack nucleotides on Stack nucleotides on top of one anothertop of one another
• Repeat with bases and Repeat with bases and masks until 25-mer masks until 25-mer oligonucleotides are oligonucleotides are built directly onto arraybuilt directly onto array
OOOOO
Light(deprotection)
HO HO OOO TTOOO
TTCCO
Light(deprotection)
TTOOO
CATATAGCTGTTCCG
MaskMask
SubstrateSubstrate
MaskMask
SubstrateSubstrate
T T ––
C C ––REPEATREPEAT
OOOOO
Light(deprotection)
OOOOO
Light(deprotection)
HO HO OOOHO HO OOO TTOOOTTOOO
TTCCOTTCCO
Light(deprotection)
TTOOO
Light(deprotection)
TTOOO
CATATAGCTGTTCCG
CATATAGCTGTTCCG
MaskMask
SubstrateSubstrate
MaskMask
SubstrateSubstrate
T T ––
C C ––REPEATREPEAT
Technology Final StepsTechnology Final Steps
• Silicon “wafers” of 90 arrays are cutSilicon “wafers” of 90 arrays are cut
• Glass substrate is then added to plastic Glass substrate is then added to plastic cartridge for:cartridge for:
– Safe handlingSafe handling
– Easy storageEasy storage
– Easy hybridizationEasy hybridization
– Easy scanning Easy scanning
Easy, convenientExpensive (very much so)No confirmation of qualityErroneous data when
low intensityProblems with SNPs*
*not with 70-mer oligo glass slides
Questions?
Give me a call or send a message
217-265-5475
http://www.life.uiuc.edu/bohnert/
Remember:
YOU CAN ALWAYS FIND EVERYTHING ON GOOGLE!
(though not these slides)