Gene expression introduction

28
Analysis of Gene Expression An overview Setia Pramana

description

Microarray as one of recent biomedical technologies produce high dimensional data. This makes statistical analysis become challenging. I presented an overview of microarray analysis specifically in the use of gene expression profiling in a discussion.

Transcript of Gene expression introduction

Page 1: Gene expression introduction

Analysis of Gene Expression An overview

Setia Pramana

Page 2: Gene expression introduction

Outline

•  Biological  background  –  Central  Dogma  –  DNA    –  Genes  

•  Genomics  •  Microarrays  •  Gene  Expression  data  analysis  pipeline  •  What’s  next  ??  

Gene  expression  analysis  

Page 3: Gene expression introduction

Central Dogma

http://compbio.pbworks.com Gene  expression  analysis  

Page 4: Gene expression introduction

DeoxyriboNucleic Acid (DNA)

•  DNA  is  the  organic  molecule  that  carries  the  informaBon  used  by  a  cell  to  build  the  proteins  that  carry  out  most  of  the  biological  processes  in  a  cell.  

•  Double  helix  •  Pair:  G  ≡  C,A  =  T    •  Example  sequence:              ATGCTGATCGATGCAGAATCGATC  •  Length  of  human  DNA  is  about                3  ×  109  base  pair  (bp)  •  Between  us,  DNA    99.9  %  the  same,  •  Our  DNA  99  %  the  same  chimpanzees.      

Gene  expression  analysis  

wikipedia

Page 5: Gene expression introduction

Gene

•  The  full  DNA  sequence  of  an  organism  is  called  its  genome  

•  A  segment  that  specifies  the  sequence  of  a  protein.  •  Length:  1000-­‐3000  bases    •  Approximately  around  20,000  -­‐25,000  genes    

h(p://www.dna-­‐sequencing-­‐service.com/dna-­‐sequencing/gene-­‐dna/a(achment/gene-­‐dna/  

Gene  expression  analysis  

Page 6: Gene expression introduction

Genetic Code •  NucleoBde  sequence  of  a  mRNA  is  translated  into  the  

amino  acid  sequence  of  the  corresponding  protein.  

                                                             h\p://www.cs.tau.ac.il/~rshamir/  

Gene  expression  analysis  

Page 7: Gene expression introduction

Genomics

•  Genomics  is  the  study  of  all  the  genes  of  a  cell,  or  Bssue,  at  :  –  the  DNA  (genotype),  e.g.,  GWAS  SNP,  CNV  etc…  –  mRNA  (transcriptomics),    Gene  expression,  –  or  protein  levels  (proteomics).  

•  FuncBonal  Genomics:  study  of  the  funcBonality  of  specific  genes,  their  relaBons  to  diseases,  their  associated  proteins  and  their  parBcipaBon  in  biological  processes.  

     

Gene  expression  analysis  

Page 8: Gene expression introduction

Gene Expression

Gene  expression  analysis  

•  Different  Bssues  in  the  same  human  may  express  different  genes,  according  to  their  role  in  the  human  body.  

•  The  same  cell  may  express  different  genes  under  different  circumstances  (stress,  nutriBon,  etc.).  

•  Cells  express  different  genes  during  lifeBme  (for  instance,  embryonic  gene  expression  differs  from  adult  gene  expression).  

•  Technologies  for  measuring  mRNA  assume:  –  The  level  of  mRNA  in  the  cell  is  an  indicaBon  of  the  protein  level  in  the  

cell,  since  the  major  regularity  is  on  the  subscripBon  process,  and  not  the  transcripBon  process.  

–  Genes  are  expressed  only  when  needed.  

Page 9: Gene expression introduction

Microarrays

Gene  expression  analysis  

Page 10: Gene expression introduction

Microarray Technologies

•  Two  type  of  microarray  technologies:    –  Single  channel    –  Dual  channel    

•  Plaforms:    –  Affymetrix,    –  Illumina,    –  Agilent  

Gene  expression  analysis  

Page 11: Gene expression introduction

Microarrays Applications

•  Gene  expression  profiling  (our  focus)  •  SNP  arrays  for  studying  single  nucleoBde  polymorphisms  (SNP)  and  copy  

number  variaBons  (CNV)  such  as  deleBons  or  inserBons.  •  Etc:    

–  ChIP  on  chip  for  invesBgaBng  protein  binding  site  occupancy,  –  Exon  arrays  to  search  for  alternaBve  splicing  events  –  Tiling  arrays  for  idenBfying  novel  transcripts  that  are  either  coding  or  

non-­‐coding.  

Gene  expression  analysis  

Page 12: Gene expression introduction

Microarrays Applications: MammaPrint

•  MammaPrint-­‐  test,  can  determine  the  likelihood  of  breast  cancer  returning  within  10  years  aher  treatment.  

•  First  FDA-­‐approved  molecular  test  that  is  based  on  microarray  technology.  •  Predict  whether  exisBng  cancer  will  metastasize.    •  InvesBgate  the  pa\erns  and  behavior  of  large  numbers  of  genes.    •  The  recurrence  of  cancer  is  partly  dependent  on  the  acBvaBon  and  

suppression  of  certain  genes  located  in  the  tumor.  •  MammaPrint  can  measure  the  acBvity  of  those  genes,  then  it  can  predict    

paBents’  odds  of  the  cancer  spreading.  

Gene  expression  analysis  

Page 13: Gene expression introduction

The Pipeline

•  Experiment  design  à  Lab  work  à  Image  processing      •  à  Background  correcBon  •  à  NormalizaBon    •  à  Signal  summarizaBon  (GCRMA,  FARMS)  (for  affymetrix  plaform)  •  à  Data  Analysis:    

–  DifferenBally  Expressed  genes  –  Clustering  –  ClassificaBon  –  Etc.  

•  à  Network  /  Pathways    analysis  (GSEA  etc..)    •  à  Biological  interpretaBons  

Gene  expression  analysis  

Page 14: Gene expression introduction

Image Processing

http://isda.ncsa.uiuc.edu/Microarrays/ Gene  expression  analysis  

Page 15: Gene expression introduction

Log2 Intensity

•  Response:  log2  Intensity  …….    why?  •  StaBsBcs:  Log-­‐transforming  the  data  makes  the  intensity  distribuBon  more  

symmetric  and  bell-­‐shaped,  i.e.,  a  normal  distribuBon  •  Biology:  The  biological  processes  in  whole  individuals  presumably  act  in  a  

mulBplicaBve  way.  Log-­‐transformaBon  exactly  makes  the  intensiBes  and  the  expression  levels  behave  in  a  mulBplicaBve  way.  

Gene  expression  analysis  

Page 16: Gene expression introduction

Normalization

•  Process  to  remove  systemaBc  errors  which  can  cause  considerable  biases.    

•  SystemaBc  errors  are  due  to:  –  Different  incorporaBon  efficiencies  of  dyes.    –  Different  amounts  of  mRNA  in  the  tested  sample,  

causing  different  expression  levels.  –  Difference  in  experimenter  or  protocol  (if  data  were  

gathered  in  different  labs).  –  Different  scanning  parameters  –  Differences  between  chips  created  in  different  

producBon  batches.  •  Example:  QuanBle  normalizaBon  

Gene  expression  analysis  

Page 17: Gene expression introduction

Normalization

Gene  expression  analysis  

Page 18: Gene expression introduction

Microarrays, Data structure

Gene  expression  analysis  http://www.ebi.ac.uk

Page 19: Gene expression introduction

Microrrays, Applications

•  IdenBfy  diseases  related  genes    •  ClassificaBon,  example  Mamaprint    •  Cluster  genes  •  Clusters  the  samples  (disease  stages,  Bssues)  :  class  

discovery  •  Clusters  genes  and  samples    

•  Pharmacogenomics:  –  Personalized  medicine:  individualize  therapies  –  Target  based  medicine:  More  effecBve  but  less  side  

effect  drugs.  

 

Gene  expression  analysis  

Page 20: Gene expression introduction

Data Analysis Challenges

•  The  curse  of  high-­‐dimensionality:  •  Obstacle  in  the  soluBon  of  classificaBon  and  clustering  problems  •  Problem  of  mulBple  tesBng  problem:  the  problem  of  having  an  increased  

number  of  false  posiBve  results  because  the  same  hypothesis  is  tested  mulBple  Bmes.  

•  MulBple  tesBng  correcBon:    –  FWER:  Bonferroni,  Holm.    –  FDR:  BH,  BY  

Gene  expression  analysis  

Page 21: Gene expression introduction

Identification of Differential Genes

•  Discover  genes  with  different  expression  in  two  or  more  different  Bssues/condiBons.  

•  Fold  change  •  t-­‐type  test:  

–  t-­‐  test  –  Modified  t-­‐test:  Significance    

 Analyss  of  Microarray  (SAM),                  t  -­‐  LIMMA  

•  Linear  Models  for  Microarray  Data    (LIMMA)  

Gene  expression  analysis  

Page 22: Gene expression introduction

Clustering

•  Clustering  genes  or  condiBons  or  both.  •  Deducing  funcBons  of  unknown  genes  from  known  genes  with  similar  

expression  pa\erns.  •  IdenBfying  disease  profiles  -­‐  Bssues  with  similar  pathology  should  yield  

similar  expression  profiles.    •  Co-­‐expression  of  genes  may  imply  co-­‐regulaBon.    •  ClassificaBon  of  biological  condiBons.    •  Drug  development    

Gene  expression  analysis  

Page 23: Gene expression introduction

Clustering

Gene  expression  analysis  

Statistical Methods: Hierarchical clustering, K-means, CLICK (CLuster Identification via Connectivity Kernels), Biclustering, etc. More: http://www.bioconductor.org/help/course-materials/2002/Seattle02/Cluster/cluster.pdf

Page 24: Gene expression introduction

Classification

Gene  expression  analysis  

•  Classification of tumor malignancies into

known classes : supervised learning; •  Identification of marker genes that

characterize the different tumor classes: feature selection.

Genes distinguishing ALL from AML (two types of leukemia).

Page 25: Gene expression introduction

Classification

•  Methods:  –  Discriminant  analysis  :  LDA,  K  nearest  neighbor.  –  ClassificaBon  Tree  –  LogisBc  regression,  penalized  LR:  LASSO.  –  Neural  network  –  Support  vector  machines  (SVM)  –  Random  forest,  etc…..  A  survey  of  these  methods:  h\p://www.ibiostat.be/publicaBons/phd/suzyvansanden.pdf  h\p://www.stat.cmu.edu/~jiashun/Research/sohware/Data/papers/dudoit.pdf    

Gene  expression  analysis  

Page 26: Gene expression introduction

Pathways Analysis

•  We  discover  DE  genes,  what's  next?  

•  IdenBfy  which  pathways  (e,g,.  GO  KEGG)  terms  are  most  commonly  associated  with  the    DE  genes.  

•  Methods:  GEA,  GSEA,  NEA,  etc.  

Gene  expression  analysis  

Page 27: Gene expression introduction

What’s next

•  Next-­‐generaBon  sequencing  +  No  need  to  know  the  sequence  of  the  transcript.  +  There  are  no  arBfacts  due  to  cross-­‐hybridizaBon  +  Be\er  quanBtaBon  of  low  abundance  transcripts.  -­‐  New  data  types  and  huge  data  volumes.  -­‐  Quality  

•  EpigeneBcs  –  The  study  of  heritable  changes  in  genome  funcBon  

that  occur  without  a  change  in  DNA  sequence  (h\p://epigenome.eu/en/1,1,0  ).    

–  DNA  methylaBon  Gene  expression  analysis  

Page 28: Gene expression introduction

Reference

•  Gohlmann,,  H.  and  Talloen,  W,  Gene  Expression  Studies  Using  Affymetrix  Microarrays,  Chapman  &  Hall/CRC  MathemaBcal  &  ComputaBonal  Biology,  2009.  

•  h\p://www.cs.tau.ac.il/~rshamir/ge/09/    Other  useful  books:  •  Gentleman  R,  Carey  V,  Huber  W,  Irizarry  R,  Dudoit  S,  editors:  

BioinformaBcs  and  computaBonal  biology  soluBons  using  R  and  Bioconductor  .  Springer  Science,  New  York,  2005.  

•  Amaratunga  D,  Cabrera  J:  ExploraBon  and  Analysis  of  DNA  Microarray  and  Protein  Array  Data.  Wiley-­‐Interscience,  2004.  

Gene  expression  analysis