Analysis of Experiment E-GEOD-29989: Alternative Splicing of Exons in Human Hematopoietic Stem cells...

download Analysis of Experiment E-GEOD-29989: Alternative Splicing of Exons in Human Hematopoietic Stem cells The Johns Hopkins University Advanced Genomics & Genetics.

If you can't read please download the document

Transcript of Analysis of Experiment E-GEOD-29989: Alternative Splicing of Exons in Human Hematopoietic Stem cells...

  • Slide 1
  • Analysis of Experiment E-GEOD-29989: Alternative Splicing of Exons in Human Hematopoietic Stem cells The Johns Hopkins University Advanced Genomics & Genetics Analysis AS410.713.81.SU12 Phillip Woolwine
  • Slide 2
  • Outline Alternative splicing of exons (ASE) ASE probed w/ Affymetrix GeneChip Exon array Data analyzed w/ Affymetrix Power Tools (APT), R & Bioconductor ASE during lineage-specific hematopoietic differentiation
  • Slide 3
  • Background ASE leads to mRNAs that can have similar or different functions/products and is a basis for functional diversity in gene expression Includes exon skipping, mutually exclusive exons, alternative 5' donor sites, alternative 3' donor sites, and/or intron retention ASE is believed to be a major player in lineage-specific differentiation of blood cells Aberrant ASE can lead to leukemias, lymphomas, etc Understanding of ASE events and the exome profile will benefit understanding of disease pathogenesis and aid in therapies
  • Slide 4
  • Methods Data retrieved from Experiment E-GEOD-29989 by Lui et al (2011) Affymetrix GeneChip Human Exon 1.0 ST array Transcriptional profile of lineage-specific differentiation of CD34 cells into Erythropoietic (E), Granulopoietic (G), and Megakaryopoietic (M) cells Data normalized with APT 1.14.2 PCA outlier analysis in R and Bioconductor Filtered by DABG p-val > 0.05 in >50% of classes T-test used to determine exon enrichment/depletion by p-val & fold change (FC) Evidence for alternative splicing verified in Ensembl Top exon probes mapped to genes and differential expression plotted green plots are CD34, red plots are lineage-specific DAVID pathway analysis
  • Slide 5
  • Results PCA of exon probes Exon arrays are clustered along cell lineages; no outliers G & E lineage more similar than M lineage
  • Slide 6
  • Results Scree plot of exon probes Approx. 70% of variability is explained in the first two eigenvectors
  • Slide 7
  • Results Filtering and Testing Correction Filter on DABG of all class types removed ~83,000 low intensity probes mRNA filters may be more sensitive in some cases (table1, Della Beffa et al, 2008) Other filtering included using core probesets in APT (core,ps, core.mps) 197,245 probes remained after statistical tests Potential for false positives (FP) though multiple testing correction not performed At p-val |1.5|there were 6413 in E, 3316 in G, 9638 in M; possible high FP It can be argued that FWER is too conservative for the high-dimensionality of exon data ; the tests may not necessarily be independent nor uniform in a non-significant way and FDR may not be appropriate (Della Beffa et al, 2008) Proper pre-filtering of probes and true splice sites is a better strategy to limit FP
  • Slide 8
  • Results One Significant ASE Transcript Cluster Common to All 3 Lineages Dimensionality reduction at p-val |2| IGHA2 immunoglobulin heavy constant alpha 2
  • Slide 9
  • Results Significant ASE in Erythropoietic vs CD34 Dimensionality reduction at p-val |2| 20 unique transcript clusters ordered by p-val exon probetest_statisticpvlower_ciupper_cilog2_fctranscript_id 3764725-94.662011.85E-07-1.3478-1.2689-1.308353764680 2522514-89.207053.30E-07-2.17621-2.03966-2.107932522509 294966145.406271.41E-062.531272.861052.696162949622 336233145.166131.70E-062.103662.381062.242363362263 383450459.2271.97E-062.5302742.7931662.661723834502 2350929-41.355242.12E-06-1.83529-1.60407-1.719682350922 3081719-65.999542.88E-06-2.81293-2.56643-2.689683081707 2500292-43.563992.97E-06-1.2957-1.13712-1.216412500275 3932139-52.203343.03E-06-2.43312-2.17509-2.30413932131 2340510-38.160353.16E-06-2.35865-2.03735-2.1982340433 3841868-40.80343.36E-06-2.29577-1.99845-2.147113841862 3568540-33.247236.05E-06-2.36948-2.00122-2.185353568534 3724580-31.969476.19E-06-3.6351-3.05234-3.343723724545 314516146.621586.92E-062.3446842.6652092.5049473145149 2333140-34.859997.10E-06-3.23725-2.74898-2.993112333136 3841587-36.890257.10E-06-3.64063-3.11492-3.377773841574 3581927-52.928977.97E-06-2.68494-2.39052-2.537733581637 2340357-32.723318.60E-06-3.21971-2.7059-2.96282340350 307921633.572868.97E-061.6918412.0063991.849123079202 3326951-62.765219.98E-06-2.025-1.8283-1.926653326950
  • Slide 10
  • Results Significant ASE in Erythropoietic vs CD34 Genes and Pathways DAVID Functional Annotation reveals enrichment for alternative splicing IDGene Name 2500275BCL2-like 11 3834502CD79a molecule, immunoglob-assoc a 3362263DENN/MADD domain containing 5A 2340350DnaJ (Hsp40) homolog, subfam C, mem 6 3841862Fc fragment of IgA, receptor for 2522509NIF3 NGG1 interacting factor 3-like 1 2333136cell division cycle 20 homolog 2350922glutathione S-transferase mu 1 3581637immunoglobulin heavy variable 3-30 3724545integrin, beta 3 2340433leptin receptor 3841574leukocyte immunoglob-like receptor, B, 1 3326950ld lipoprotein receptor class A dom con 3 3081707motor neuron and pancreas homeobox 1 3079202K+ voltage-gated channel, subfam H, 2 3932131proteasome assembly chaperone 1 3568534spectrin, beta, erythrocytic 2949622tenascin XB; tenascin XA pseudogene 3764680tripartite motif-containing 37 3145149tumor protein p53 induc nuclear prot 1
  • Slide 11
  • Results Significant ASE in Granulopoietic vs CD34 Dimensionality reduction at p-val |2| 9 unique transcript clusters ordered by p-val exon probestest_statisticpvlower_ciupper_cilog2_fctranscript_id 366284671.180552.92E-072.4809592.6838342.5823973662808 3835043-43.711781.69E-06-6.9581-6.12603-6.542063835035 383716239.518092.52E-064.0166324.6243424.3204873837132 358200337.105813.39E-060.9612121.1171751.0391933581637 287442736.011813.62E-061.7319122.0215081.876712874371 330462932.805358.14E-062.6872963.1950172.9411573304624 3815226-29.111748.47E-06-3.5762-2.95292-3.264563815223 3959353-30.374619.42E-06-1.98261-1.64655-1.814583959350 376130655.438589.59E-061.7830761.9962571.8896673761291
  • Slide 12
  • Results Significant ASE in Granulopoietic vs CD34 Genes & Pathways DAVID Functional Annotation reveals significant enrichment for signaling Clusters include alternative splicing IDGene Name 33046245'-nucleotidase, cytosolic II 3835035CD177 molecule 3662808G protein-coupled receptor 56 3837132SUMO1 activating enzyme subunit 1 3959350apolipoprotein L, 3 2874371fibrillin 2 3761291homeobox B2 3581637immunoglobulin heavy variable 3-30 3815223proteinase 3
  • Slide 13
  • Results Significant ASE in Megakaryopoietic vs CD34 Dimensionality reduction at p-val |2| 37 unique transcript clusters ordered by p-val exon probes test_statistic pv Lower_ci upper_ci log2_fc transcript_id exon probes test_statistic pv lower_ci upper_ci log2_fc transcript_id 2883456-84.512292.73E-07-3.50093-3.27244-3.3866928834402712681-37.695033.17E-06-1.5477-1.33481-1.441252712632 3996325-109.044463.42E-07-1.61352-1.52799-1.5707539963062692866-49.653373.61E-06-5.39051-4.7912-5.090852692816 2902594-56.262565.98E-07-3.2605-2.95384-3.1071729025933581932-44.697654.07E-06-2.52521-2.21855-2.371883581637 3726702-57.649316.76E-07-4.37159-3.96699-4.1692937266913595502-37.432274.13E-06-4.1035-3.53029-3.816893595441 361978954.182777.45E-071.3851621.5351651.46016336197733595493-34.276334.39E-06-4.33439-3.68445-4.009423595441 3595504-53.692919.06E-07-4.53231-4.08325-4.307783595441343253533.468815.03E-061.5620171.8453361.7036773432514 360429456.160991.14E-061.1885971.315231.25191336042872361299-33.852125.11E-06-2.29419-1.94484-2.119512361279 322935347.778811.15E-064.5436315.1043294.8239832293383838099-32.249235.75E-06-2.79873-2.3543-2.576523838094 3580603-54.687961.31E-06-2.31754-2.08843-2.2029835804983716864-32.604926.01E-06-3.26711-2.7518-3.009463716783 252775844.345041.58E-062.3671722.6836612.5254172527747302403434.143166.13E-062.3614092.7858172.5736133024025 395987045.132761.72E-061.2654581.4325021.3489839598623661695-31.319696.21E-06-1.57125-1.31533-1.443293661684 3321394-42.075741.95E-06-3.01379-2.64041-2.827133213613726707-31.224886.32E-06-5.70935-4.77663-5.242993726691 236817443.603991.97E-064.1253534.690364.40785723679632320744-30.791176.86E-06-2.61282-2.18002-2.396422320727 3870412-41.489942.04E-06-1.85525-1.62244-1.7388538703613654966-30.528367.17E-06-3.2377-2.69696-2.967333654956 330852842.73712.12E-061.3751351.5675311.4713333308489377850632.035667.32E-062.4834582.9604492.7219533778504 3582276-52.450792.15E-06-4.868-4.36081-4.6144135816373738849-29.42317.97E-06-2.51354-2.08001-2.296773738842 3538094-41.906612.16E-06-2.25268-1.97169-2.1121835380872639809-37.82118.21E-06-3.54434-3.03911-3.291722639734 344215240.565742.36E-063.3003723.7866623.5435173442150240260929.284548.25E-061.6367831.9799641.8083732402601 3922533-39.859652.66E-06-2.60946-2.26819-2.438833922444236817738.670978.41E-063.275273.809713.542492367963 392728138.269222.79E-061.0908471.261521.17618339272262476518-32.729978.74E-06-7.47151-6.27858-6.875052476510 266434040.735832.96E-062.273122.6099732.44154726643322376831-28.841599.00E-06-2.24658-1.85133-2.048952376799 2712681-37.695033.17E-06-1.5477-1.33481-1.441252712632
  • Slide 14
  • Results Significant ASE in Megakaryopoietic vs CD34 Genes & Pathways DAVID Functional Annotation reveals enrichment for alternative splicing & signaling IDGene Name IDGene Name 34325142'-5'-oligoadenylate synthetase 2, 69/71kDa3738842hexosaminidase containing 2883440ADAM metallopeptidase domain 193581637immunoglobulin heavy variable 3-30 3726691ATP-binding cassette, sub-family C, member 32376799inhibitor of kappa light pp enhancer in B-cells, kinase epsilon 3922444ATP-binding cassette, sub-family G, member 12692816integrin, beta 5 3580498CDC42 binding protein kinase beta3604287interleukin 16 3619773INO80 homolog2639734kalirin, RhoGEF kinase 3308489KIAA15982361279lamin A/C 3870361NLR family, pyrin domain containing 122476510latent transforming gf beta binding protein 1 2367963RAB GTPase activating protein 1-like2902593lymphocyte antigen 6 complex, locus G6F 3716783RAB11 family interacting protein 4 (class II)3661684matrix metallopeptidase 2 3778504RAB31, member RAS oncogene family3024025mesoderm specific transcript homolog 2402601UBX domain protein 113654956nuclear pore complex interacting protein-like 2 3442150acrosin binding protein3959862parvalbumin 3927226amyloid beta (A4) precursor protein3996306ribosomal protein L10; ribosomal protein L10 pseudogene 15 2664332collagen-like tail subunit of acetylcholinesterase3838094similar to ferritin, light polypeptide; ferritin, light polypeptide 3538087dapper, antagonist of beta-catenin, homolog 12527747solute carrier family 11, member 1 3229338ficolin (collagen/fibrinogen domain containing) 13321361spondin 1, extracellular matrix protein 3595441glutamate receptor, N-methyl D-aspartate-like 1B2712632transferrin receptor (p90, CD71) 2320727tumor necrosis factor receptor superfamily, member 1B (Several categories not shown but include those for immune system development)
  • Slide 15
  • Results Top Significantly Upregulated ASE in Erythropoietic P-val 2 in Erythropoietic ; P-val < 0.01 & FC < 1.5 in G & M lineages Top upregulated exon probe 2527682; cluster id 2527672; gene PKND Significantly downregulated in Megakaryopoietic lineage
  • Slide 16
  • Results Top Significantly Upregulated ASE in Granulopoietic P-val 2 in Granulopoietic ; P-val < 0.01 & FC < 1.5 in E & M lineages Top upregulated exon probe 4016430, 4016431; cluster id 4016428; gene BEX2 Significantly reduced expression versus CD34 in Megakaryopoietic lineage Significantly reduced expression versus CD34 in Erythropoietic lineage
  • Slide 17
  • Results Top Significantly Upregulated ASE in Megakaryopoietic P-val 2 in Megakaryopoietic ; P-val < 0.01 & FC < 1.5 in E & G lineages Top upregulated exon probe 3275248; cluster id 3275132; gene GDI2 downregulated upregulated
  • Slide 18
  • Discussion ASE occurs during lineage-specific hematopoietic differentiation of CD34 cells into Erythropoietic, Granulopoietic, and Megakaryopoietic cells Pathway terms are significantly enriched in alternative splicing and signaling, including those for immune system development, consistent with known biology Relatively increased ASE in megakaryopoietic differentiation may suggest increased transcriptional complexity during development Comparison to original research results by Lui et al (2011) share a few top hits and similar pathway enrichment However, most top genes were not identical and is probably due to their use of the ExonSVD model for statistical assessment of exon enrichment/depletion High number of significant hits at p< 0.01 may indicate high FDR and may warrant further filtering and dimensionality reduction May be interesting to combine MiDAS and Rank Product for testing and correction
  • Slide 19
  • References Della Beffa et al (2008) Dissecting an alternative splicing analysis workflow for GeneChip Exon 1.0 ST Affymetrix arrays. BMC Genomics 9:571, PMID:19040723 EBI (2012) Ensembl Genome Browser, release 68. Available at: [Access 8/20/12] Higgs, B (2012) Advanced Genomics & Genetics Analysis, Lecture 2: Analysis and interpretation of splice variants. Johns Hopkins University, unpublished Lui et al (2011) Transcriptome Profiling and Sequencing of differentiated Human Hematopoietic Stem cells Reveal Lineage Specific Expression and Alternative Splicing of Genes, Physiol Genomics 43(20):1117-34, PMID: 21828245 NIAID/NIH (2012) DAVID Bioinformatics Resources 6.7: Functional Annotation Tool. Available at: [Accessed 8/20/12]