Gene expression introduction
-
Upload
setio-pramono -
Category
Education
-
view
1.029 -
download
1
description
Transcript of Gene expression introduction
Analysis of Gene Expression An overview
Setia Pramana
Outline
• Biological background – Central Dogma – DNA – Genes
• Genomics • Microarrays • Gene Expression data analysis pipeline • What’s next ??
Gene expression analysis
Central Dogma
http://compbio.pbworks.com Gene expression analysis
DeoxyriboNucleic Acid (DNA)
• DNA is the organic molecule that carries the informaBon used by a cell to build the proteins that carry out most of the biological processes in a cell.
• Double helix • Pair: G ≡ C,A = T • Example sequence: ATGCTGATCGATGCAGAATCGATC • Length of human DNA is about 3 × 109 base pair (bp) • Between us, DNA 99.9 % the same, • Our DNA 99 % the same chimpanzees.
Gene expression analysis
wikipedia
Gene
• The full DNA sequence of an organism is called its genome
• A segment that specifies the sequence of a protein. • Length: 1000-‐3000 bases • Approximately around 20,000 -‐25,000 genes
h(p://www.dna-‐sequencing-‐service.com/dna-‐sequencing/gene-‐dna/a(achment/gene-‐dna/
Gene expression analysis
Genetic Code • NucleoBde sequence of a mRNA is translated into the
amino acid sequence of the corresponding protein.
h\p://www.cs.tau.ac.il/~rshamir/
Gene expression analysis
Genomics
• Genomics is the study of all the genes of a cell, or Bssue, at : – the DNA (genotype), e.g., GWAS SNP, CNV etc… – mRNA (transcriptomics), Gene expression, – or protein levels (proteomics).
• FuncBonal Genomics: study of the funcBonality of specific genes, their relaBons to diseases, their associated proteins and their parBcipaBon in biological processes.
Gene expression analysis
Gene Expression
Gene expression analysis
• Different Bssues in the same human may express different genes, according to their role in the human body.
• The same cell may express different genes under different circumstances (stress, nutriBon, etc.).
• Cells express different genes during lifeBme (for instance, embryonic gene expression differs from adult gene expression).
• Technologies for measuring mRNA assume: – The level of mRNA in the cell is an indicaBon of the protein level in the
cell, since the major regularity is on the subscripBon process, and not the transcripBon process.
– Genes are expressed only when needed.
Microarrays
Gene expression analysis
Microarray Technologies
• Two type of microarray technologies: – Single channel – Dual channel
• Plaforms: – Affymetrix, – Illumina, – Agilent
Gene expression analysis
Microarrays Applications
• Gene expression profiling (our focus) • SNP arrays for studying single nucleoBde polymorphisms (SNP) and copy
number variaBons (CNV) such as deleBons or inserBons. • Etc:
– ChIP on chip for invesBgaBng protein binding site occupancy, – Exon arrays to search for alternaBve splicing events – Tiling arrays for idenBfying novel transcripts that are either coding or
non-‐coding.
Gene expression analysis
Microarrays Applications: MammaPrint
• MammaPrint-‐ test, can determine the likelihood of breast cancer returning within 10 years aher treatment.
• First FDA-‐approved molecular test that is based on microarray technology. • Predict whether exisBng cancer will metastasize. • InvesBgate the pa\erns and behavior of large numbers of genes. • The recurrence of cancer is partly dependent on the acBvaBon and
suppression of certain genes located in the tumor. • MammaPrint can measure the acBvity of those genes, then it can predict
paBents’ odds of the cancer spreading.
Gene expression analysis
The Pipeline
• Experiment design à Lab work à Image processing • à Background correcBon • à NormalizaBon • à Signal summarizaBon (GCRMA, FARMS) (for affymetrix plaform) • à Data Analysis:
– DifferenBally Expressed genes – Clustering – ClassificaBon – Etc.
• à Network / Pathways analysis (GSEA etc..) • à Biological interpretaBons
Gene expression analysis
Image Processing
http://isda.ncsa.uiuc.edu/Microarrays/ Gene expression analysis
Log2 Intensity
• Response: log2 Intensity ……. why? • StaBsBcs: Log-‐transforming the data makes the intensity distribuBon more
symmetric and bell-‐shaped, i.e., a normal distribuBon • Biology: The biological processes in whole individuals presumably act in a
mulBplicaBve way. Log-‐transformaBon exactly makes the intensiBes and the expression levels behave in a mulBplicaBve way.
Gene expression analysis
Normalization
• Process to remove systemaBc errors which can cause considerable biases.
• SystemaBc errors are due to: – Different incorporaBon efficiencies of dyes. – Different amounts of mRNA in the tested sample,
causing different expression levels. – Difference in experimenter or protocol (if data were
gathered in different labs). – Different scanning parameters – Differences between chips created in different
producBon batches. • Example: QuanBle normalizaBon
Gene expression analysis
Normalization
Gene expression analysis
Microarrays, Data structure
Gene expression analysis http://www.ebi.ac.uk
Microrrays, Applications
• IdenBfy diseases related genes • ClassificaBon, example Mamaprint • Cluster genes • Clusters the samples (disease stages, Bssues) : class
discovery • Clusters genes and samples
• Pharmacogenomics: – Personalized medicine: individualize therapies – Target based medicine: More effecBve but less side
effect drugs.
Gene expression analysis
Data Analysis Challenges
• The curse of high-‐dimensionality: • Obstacle in the soluBon of classificaBon and clustering problems • Problem of mulBple tesBng problem: the problem of having an increased
number of false posiBve results because the same hypothesis is tested mulBple Bmes.
• MulBple tesBng correcBon: – FWER: Bonferroni, Holm. – FDR: BH, BY
Gene expression analysis
Identification of Differential Genes
• Discover genes with different expression in two or more different Bssues/condiBons.
• Fold change • t-‐type test:
– t-‐ test – Modified t-‐test: Significance
Analyss of Microarray (SAM), t -‐ LIMMA
• Linear Models for Microarray Data (LIMMA)
Gene expression analysis
Clustering
• Clustering genes or condiBons or both. • Deducing funcBons of unknown genes from known genes with similar
expression pa\erns. • IdenBfying disease profiles -‐ Bssues with similar pathology should yield
similar expression profiles. • Co-‐expression of genes may imply co-‐regulaBon. • ClassificaBon of biological condiBons. • Drug development
Gene expression analysis
Clustering
Gene expression analysis
Statistical Methods: Hierarchical clustering, K-means, CLICK (CLuster Identification via Connectivity Kernels), Biclustering, etc. More: http://www.bioconductor.org/help/course-materials/2002/Seattle02/Cluster/cluster.pdf
Classification
Gene expression analysis
• Classification of tumor malignancies into
known classes : supervised learning; • Identification of marker genes that
characterize the different tumor classes: feature selection.
Genes distinguishing ALL from AML (two types of leukemia).
Classification
• Methods: – Discriminant analysis : LDA, K nearest neighbor. – ClassificaBon Tree – LogisBc regression, penalized LR: LASSO. – Neural network – Support vector machines (SVM) – Random forest, etc….. A survey of these methods: h\p://www.ibiostat.be/publicaBons/phd/suzyvansanden.pdf h\p://www.stat.cmu.edu/~jiashun/Research/sohware/Data/papers/dudoit.pdf
Gene expression analysis
Pathways Analysis
• We discover DE genes, what's next?
• IdenBfy which pathways (e,g,. GO KEGG) terms are most commonly associated with the DE genes.
• Methods: GEA, GSEA, NEA, etc.
Gene expression analysis
What’s next
• Next-‐generaBon sequencing + No need to know the sequence of the transcript. + There are no arBfacts due to cross-‐hybridizaBon + Be\er quanBtaBon of low abundance transcripts. -‐ New data types and huge data volumes. -‐ Quality
• EpigeneBcs – The study of heritable changes in genome funcBon
that occur without a change in DNA sequence (h\p://epigenome.eu/en/1,1,0 ).
– DNA methylaBon Gene expression analysis
Reference
• Gohlmann,, H. and Talloen, W, Gene Expression Studies Using Affymetrix Microarrays, Chapman & Hall/CRC MathemaBcal & ComputaBonal Biology, 2009.
• h\p://www.cs.tau.ac.il/~rshamir/ge/09/ Other useful books: • Gentleman R, Carey V, Huber W, Irizarry R, Dudoit S, editors:
BioinformaBcs and computaBonal biology soluBons using R and Bioconductor . Springer Science, New York, 2005.
• Amaratunga D, Cabrera J: ExploraBon and Analysis of DNA Microarray and Protein Array Data. Wiley-‐Interscience, 2004.
Gene expression analysis