1
Expression Analysis Platforms
Friday's Class
4:00-5:00
140 SH
Work in the Laboratory of Signal Processing of the EESC / USP revolves around modeling biological systems. Currently, our lines of research are: diagnosing speech pathology, ultrasound signal processing, and
bioinformatics (particularly for phylogeny). All lines of research involve clustering algorithms.
The work on clustering seeks to determine the natural number of groups and to validate the clustering algorithms. Several techniques have been applied to genomic databases, among them are: resampling, analysis of missing data, using assumptions about a priori information. Now we are focusing on probabilistic models and validation of structural models. The studies are conducted on information generated by electrophoresis of species with agricultural applications, and are provided by Embrapa (www.embrapa.br). Now we are working with 3 doctoral students focusing on the area of binder phenotypic and genotypic information for varieties of corn.
2
3
Other Courses
• Intro to Informatics (in CS)• Intro to Bioinformatics (51:121)
– provides a first exposure to some available computational techniques and resources
– however, the emphasis is on utilization
• In this course (51:123) -- I try to emphasize tools and techniques that you would use to go about developing your own computational resources (software, systems, tools, etc).
• Computational Methods in Molecular Biology (51:122 -- Casavant, Bair)– advanced topics
Bioinformatics Certificate
• Offered by the Graduate College (MS/PhD)
• http://informatics.grad.uiowa.edu/bioinformatics/
4
Final Exam
• 25 questions – mostly short answer/T/F– 1 paper– 1 genome sequencing– 2 Ensembl– 1 references– 1 array– 2 programming– 1 pattern matching– 2 expression– 3 other– 3 p-genes– 1 Blast/Blat– 5 Hash questions– 1 N-W– 1 sequencing
5
6
Outline
• What is expression• Platforms
– ChIP on Chip– Gene expression– Exon arrays– Tiling arrays– SNP chips
• Applied S/W for Expression "Library" -- OTDB• Alternative Splicing• Association Study Example -- AMD• How to analyze
7
What is expression?
Gene expression
mRNA - transcription
- microRNAs
Protein - translation
8
A Typical Experiment
Case vs. Control
Ex. Retina cells +/- 7-keto-cholesterol3x redundancy
Look for differentially expressed genest-test, ANOVA
fold-change
Result --> set of genes
9
But there’s so much more…
1. Differential expression of genes2. Time-courses3. Alternative splicing4. ChIP-on-chip5. High-density SNP genotyping6. Using chips to select genomic fragments for
re-sequencing
7. Additional annotation/analyses
10
Definition of Microarray
• What is a gene expression array?– “A microarray is a small analytical device that
allows genomic exploration with speed and precision unprecedented in the history of biology” - Schena 2003
11
Source: www.bioteach.ubc.ca/MolecularBiology/microarray/graphics by Jiang Long
12
13
14
Physical Spotting
15
http://www.nimblegen.com/technology/manufacture.html
16
Advantages to Arrays
• A single array permits monitoring thousands of genes in parallel
• Provides information at genomic scale– Reveals gene function and gene interactions– Identifies relationship between genetic and biochemical
pathways– Identifies traits associated with multigenic origins
• Caveat - further modifications may occur– Post-transcriptional– Translational– Protein
17
Microarray Research
• Ubiquitous in biology & agriculture research
• Interdisciplinary disciplines– Biology– Computer Science– Statistics
• Experiments require teams of individuals
• Analysis presents many obstacles that need to be overcome
18
Statement of the Problem
• Obstacles impeding analysis process– Analysis is complex with multiple steps– Requires multiple discipline expertise
• Bio - understand underlying biology• Stats - normalization & statistical measures• Comp Sci - programmatic solutions, computation
resources
• Necessity for centralized analysis system– Robust– Extensible– Portable
19
Platforms
Gene Expression Arrays
Exon Arrays
Tiling Arrays
SNP chips
Venders: Affymetrix, Nimblegene, Agilent, others
20
An Aside
• State-of-the-art sequencing technology + microarray == ?
• 454-, pyro-, pyrophosphate sequencing
21
Margulies, et. al, Nature, 2005
22
23
24
GS 20 System Brochure 454/Roche
25
26
27
Gene Chip + Sequencing454, pyro- or pyrophosphate sequencing
Genome sequencing in microfabricated high-density picolitre reactors, Margulies, et. al, Nature, 2005
Nature 2007
Sequence Capture
28
http://www.nimblegen.com/products/seqcap/index.html
29
Gene Expression Arrays
Traditional method, typically provides one or more probes that interrogate the expression level of a gene.
U133Plus2 - 54,000 probes
30
Exon Arrays
Target each exon of a gene individually1,400,000 probe sets
Different levels of confidence/quality300,000 exons from full-length mRNAs
880,000+ exons from gene predictions
500,000+ “control” exons
Available for human, mouse and rat
31
Tiling Array
http://www.affymetrix.com/products/arrays/specific/human_tiling.affx
32
Tiling Arrays
Covering the entire genome with probesProbes every 35 bp across the genome
7-14 chips (depending on the application)
… or can focus on a specific area10,000 bp proximal promoter of every gene
1 chip
33
Tiling Arrays - Applications
Applicationsexpression
protein-DNA interaction
DNA modificationsmethylation
acetylation
Anywhere in the genome!
34
• What can you use tiling arrays for?
35
ENCODE Project
Identification and analysis of functional elements in 1% of the human genome by the ENCODE
pilot project
Nature, V 447, June 14, 2007
36
Transcript Connectivity• protein-coding loci are more transcriptionally complex than
previously thought• 19% of pseudogenes transcribed• genes had, on average, 10 different transcriptional start sites
37
ChIP on Chip
http://www.chem.agilent.com/Scripts/generic.asp?lpage=37461&indcol=N&prodcol=N
38
SNP chipsSNPs - single nucleotide polymorphisms
Affymetrix 6.0 Array• 906,600 SNPs• 946,000 (non-polymorphic) "monomorphic" SNPs
Applications: LinkageAssociation StudiesChanges in Copy Number (deletions/duplications)
39
Association
populations Unaffected Affected
allele frequencies A1 A2 A1 A2SNP 1 0.74 0.26 0.75 0.25
SNP 2 0.70 0.30 0.10 0.90
Power increases with more samples, and more SNPs
40
41
42
Alternative Splicing in the Eye
GOAL:To identify the splicing variants expressed in
retina, retinal pigment epithelia, and optic nerve head. (3x biological replicates)
Motivation:To guide/focus screening efforts to those
exons that are expressed.
In collaboration with Rob Mullins
43
Show Probes
44
Ocular Tissue Expression Database
Survey of 10 ocular tissues
GOAL: catalog which genes are expressed across tissues of specific interest in ocular
In collaboration with Abe Clark at Alcon
45
Ocular Tissue Expression Database
46
Ocular Tissue Expression Database
47
End
48
AMD Association Study
GOAL:Identify the major susceptibility regions for age-related macular degeneration.
Several regions have been reported– How may susceptibility regions are there?
Genotyping 400 AMD patients and controls with high-density SNP chips
400,000,000 genotypes
49
Association
populations Unaffected Affected
allele frequencies A1 A2 A1 A2
SNP 1 0.74 0.26 0.75 0.25
SNP 2 0.70 0.30 0.10 0.90
Power increases with more samples, and more SNPs
50
How to analyze the data?
First step is acquiring the data!
Normalization
Analysis
51
Analysis
Differential Expressiont-test
ANOVA
Fold-change
Time series (all of the above)Correlation of expression
Early response vs. late response
52
AnalysisDAVID Database for Annotation, Visualization and
Integrated Discoveryhttp://david.abcc.ncifcrf.gov/
– Look for conservation of a particular function or annotation in the set of differentially expressed genes.
GSEA Gene Set Enrichment Analysishttp://www.broad.mit.edu/gsea/software/software_index.html
– Look for annotations that are differentially expressed (as a group).
Ex. Tour de France
Top Related