Integrated sequence analysis of pancreatic...
Transcript of Integrated sequence analysis of pancreatic...
Tipping Point Meeting - 1st December 2010
Integrated sequence analysis of pancreatic cancer
Queensland Centre for Medical Genomics
Sequencing Group
Dr Brooke Gardiner
QCMG:Sean GrimmondPeter Wilson
Genome Biology: Bioinformatics: Genome Sequencing: Life Technologies:Nicole Cloonan John Pearson Brooke Gardiner Gabriel KolleKarin Kassahn Darrin Taylor David Miller John DavisNic Waddell David Tang Craig Nourse John SheppardAnita Steptoe Conrad Leonard Suzanne Manning Kevin McKernanShivangi Wani Jason Steen Ehsan Nourbakhsh Yongming SunKeerthana Krishnan Christina Xu Ivon Harliwong Beverly R&D Mellissa Brown Matt Anderson Senel Idrisoglu Foster City R&DNick Matigan David WoodRathi Thiagarajan
Acknowledgements
Array Facility (IMB):Katia NonesRebecca Foale
HPC (UQ):Lutz ProssZiping FangDavid Green
Garvan Institute:Andrew BiankinAmber Johns Chris Scarlett Mark PineseDavid Chang Michelle Thomas Chris ToonMary-Anne Brancato Cathy Axford Emily ColvinAmanda Mawson Johana Susanto Rob SutherlandSue Henshall Liz Musgrove Roger Daly
QCMG sequencing and analysis facilities
Computing:3 servers, 400 Cores, 3Tb RAM, 1Pb storage10G network connectivity
Computing:3 servers, 400 Cores, 3Tb RAM, 1Pb storage10G network connectivity
Workflow:Manual Library prepAutomated emulsion PCR & EnrichmentRobotic Library assembly & Enrichment(Bravo Agilent) & (Spri- Beckmann)
Workflow:Manual Library prepAutomated emulsion PCR & EnrichmentRobotic Library assembly & Enrichment(Bravo Agilent) & (Spri- Beckmann)
Sequencers11 SOLiD Genome Sequencers V4.
Technology development:SOLiDHQ (250Gb Q4-09 500Gb Q1-10)Ion Torrent, … ? ….
Sequencers11 SOLiD Genome Sequencers V4.
Technology development:SOLiDHQ (250Gb Q4-09 500Gb Q1-10)Ion Torrent, … ? ….
Laboratories:1200m2 dedicated laboratory space.(5th and 6th floors IMB)
Personnel:41 Bioinformaticians, Genomics experts & Genome Biologists
Laboratories:1200m2 dedicated laboratory space.(5th and 6th floors IMB)
Personnel:41 Bioinformaticians, Genomics experts & Genome Biologists
LaboratoriesLaboratories
Informatics PersonnelInformatics Personnel
ICGC Global Participants
ICGC Goal: To obtain a comprehensive description of genomic, transcriptomic and epigenomic changes in 50 different tumor types and/or subtypes which are of clinical significance across the globe
ICGC Goal: To obtain a comprehensive description of genomic, transcriptomic and epigenomic changes in 50 different tumor types and/or subtypes which are of clinical significance across the globe
Cancer is driven by the accumulationof genetic & epigenetic changes
ICGC Goal: To obtain a comprehensive description of genomic, transcriptomic and epigenomic changes in 50 different tumor types and/or subtypes which are of clinical significance across the globe
ICGC Goal: To obtain a comprehensive description of genomic, transcriptomic and epigenomic changes in 50 different tumor types and/or subtypes which are of clinical significance across the globe
Pancreatic Cancer Study
Presentation
Diagnosis
Treatment Plan
Surgery
Adjuvant Therapy
Recurrence
Death
Recruitment Patient ConsentSample Collection (Blood)Recording of Serum MarkersRecording of Pathological Data
Sample Collection (Blood)Recording of Serum MarkersRecording of Pathological Data
Date and Cause
Sample Collection (Blood)Recording of Serum MarkersRecording of Pathological Data
Sample CollectionOperative Data RecordingXenograft Generation
Resection
Pancreatic Resection Surgery
Cancer Genome Workflow
Sample Submission
Library Preparation
Tag Mapping
Sequencing
Sequencing Preparation
Microarray
Data Generation
Data CollationData Analysis
Data Submission
SurgeryPathological Review
Tumour Dissection
CNV/SNP:Expression:
Illumina HumanOmni1-Quad SNPIllumina HumanHT-12
Applied Biosystems SOLiD Analyzer
Emulsion PCR; Bead enrichment
Agilent BioanalyerNanoDrop1000, QubitElectrophoretic fractionation
DNA:
RNA:
Whole genome; methyome; exomeFragment; BC Fragment; Mate PairmiRNA; whole transcriptomeTotal Transcriptome BC Fragment
Applied Biosystems BioScopeIn-House software & analysis pipelines
In-House software & analysis pipelines
Laboratory Tracking –-LIMS –- Geneus
Macro-dissection of tumour tissueXenograft & cell line generation
Pathological review: Aus, USA, ItalyEstimation of tumour content
1. Independent Pathological reviewin Australia, Italy and USA.
2. 5mm3 frozen blocks of Tumour are sectioned to locate non-tumour tissue.
3. Tumour rich regions are dissected form block and DNA/RNA extracted
Sample availability from participating patients(DNA/RNA submitted to QCMG)
Patients represented in QCMG collectionPancreatic 63Ovarian 60
Patients represented in QCMG collectionPancreatic 63Ovarian 60
Direct Sample Processing
Enrichment Processing
PA
N
T
X
C
S
e
e
e
Validation & surveillance
Adjacent Normal
Normal
Tumour
Xenograft
Cell line
Serum
Cancer Genome Workflow
Sample Submission
Library Preparation
Tag Mapping
Sequencing
Sequencing Preparation
Microarray
Data Generation
Data CollationData Analysis
Data Submission
Agilent BioanalyerNanoDrop1000, QubitElectrophoretic fractionation
Laboratory Tracking –-LIMS –- Geneus
SurgeryPathological Review
Tumour Dissection
SmallRNAs
18S & 28S rRNA
High molecularweight DNA
ICGC-PICI-20100225.01-TD
ICGC-PICI-20100225.02-TR
Cancer Genome Workflow
Sample Submission
Library Preparation
Tag Mapping
CNV/SNP:Expression:
Illumina HumanOmni1-Quad SNPIllumina HumanHT-12
Sequencing
Sequencing Preparation
Microarray
Data Generation
Data CollationData Analysis
Data Submission
SurgeryPathological Review
Tumour Dissection
Tumour vs Xenograft
Human cells isolated from xenograft
Primary tumour (25% cellularity)
Require:>80% sensitivity & 95% accuracy
Cancer Genome Workflow
Sample Submission
Library Preparation
Tag Mapping
Sequencing
Sequencing Preparation
Microarray
Data Generation
Data CollationData Analysis
Data Submission
DNA:
RNA:
Whole genome; methyome; exomeFragment; BC Fragment; Mate PairmiRNA; whole transcriptomeTotal Transcriptome BC Fragment
SurgeryPathological Review
Tumour Dissection
Whole Genome:Tumor tissue & normal
30‐40x (LMP / PE)
Whole Genome:Tumor tissue & normal
30‐40x (LMP / PE)
Exome:Tumor tissue & normal
>>100x (PE)
Exome:Tumor tissue & normal
>>100x (PE)
mRNA & miRNA:Tumor tissue & adjacent normal 100million (PE) / 10million reads
(SE)
mRNA & miRNA:Tumor tissue & adjacent normal 100million (PE) / 10million reads
(SE)Methylome
(Methyl‐capture)Tumor & adjacent normal ~20million reads (1Gb) (PE)
Methylome(Methyl‐capture)
Tumor & adjacent normal ~20million reads (1Gb) (PE)
Whole Genome
Sure Select Exome
Exome SNV calling
Map tags togenome
Identification of SNVs
Verify SNVs(eg. SNP chip, RNASeq,
Sanger Sequencing)
Filter Sequencetags
Annotate SNVs(e.g. dbSNP,
non-synonymous,somatic)
Sample Submission
Library Preparation
Tag Mapping
Sequencing
Sequencing Preparation
Microarray
Data Generation
Data CollationData Analysis
Data Submission
Applied Biosystems BioScopeIn-House software & analysis pipelines
In-House software & analysis pipelines
SurgeryPathological Review
Tumour Dissection
Raw data generation and mapping pipelines
Total Raw ~ 7.5 Tb
Exome capture coverage rates(post filter, high quality)
Exome – 100 Patient (ND/TD) paired LMP – 500 Patient (ND/TD/+) sets
Detecting variations by sequencing
Structural VariationCopy Number Variation
Substitutions, Insertions, Deletions
BAM processing for variant/mutation calling
Map tags togenome
Identification of SNVs
Verify SNVs(eg. SNP chip, RNASeq,
Sanger Sequencing)
Filter Sequencetags
Annotate SNVs(e.g. dbSNP,
non-synonymous,somatic)
PCR duplicates are marked using Picard
In-house tools for manipulating and profiling.bams (qbamMerge, qbamSplit, qbamFilter, qProfiler)
Pre-filter: alignment length >34 or (F5 and in proper pair), mappingquality > 14, less than 3 mismatches
Variant caller: diBayes (Bioscope 1.2)
Post-filter: coverage, > 3 novel starts supporting mutation/variant, mutation/variant not in pileup of matched normal, not a germlinevariant in another patient, review in IGV (non-syn, stop)
All variants/mutations called are retained in in-house database, even if they failed a filter (classA vs classB)
Exome SNV calling
Map tags togenome
Identification of SNVs
Verify SNVs(eg. SNP chip, RNASeq,
Sanger Sequencing)
Filter Sequencetags
Annotate SNVs(e.g. dbSNP,
non-synonymous,somatic)
Number within an ORF5766
Number within an ORF~18,000~13,000
SNV consequenceSplice site ~2,100 / ~1,300
Non-synonymous ~9,000 / 6,600Stop gained 129 / 84
SNV consequenceSplice site 8 / 7
Non-synonymous 43 / 48Stop gained 3 / 2
Somatic SNVs~7,000
124
Germline SNVs~2,400,000
~36,000
Total Number of SNVs in a patient~3,200,000
~73,000
whole-genome shotgun / exome-capture
Exome SNV calling
Map tags togenome
Identification of SNVs
Verify SNVs(eg. SNP chip, RNASeq,
Sanger Sequencing)
Filter Sequencetags
Annotate SNVs(e.g. dbSNP,
non-synonymous,somatic)
Possibly damaging genes from class A:CDON, EXT1, HOXB2, KIAA1199, KRAS, NPC1, TMEM74, XKR3
Probably damaging genes from class A:BIRC6, CGN, CTTNBP2NL, DDX56, ERC1, MPDZ, MPP4, MYH11, OR6A2, PPFIBP2, RASSF5, SLC4A11, SNX13, SPOCK1, TLR7, TMEM22, TSHZ3, ZC3H11A, ZNF318
SNV Total Class A
Non-synonymous 1462 48
Annotated in PolyPhen 1250 42
Benign 514 13
Possibly damaging 224 8
Probably damaging 437 20
Unknown 75 1
Commonly targeted pathways: Jones et al
Pathway # genes Representative altered genes
Apoptosis 9 CASP10, VCP, CAD, HIP1
DNA damage control 9 ERCC4, ERCC6, EP300, RANBP2, TP53
Regulation of G1/S phase transition 19 CDKN2A, FBXW7, CHD1, APC2
Hedgehog signaling 19 TBX5, SOX3, LRP2, GLI1, GLI3, BOC, BMPR2, CREBBP
Homophilic cell adhesion 30 CDH1, CDH10, CDH2, CDH7, FAT, PCDH15, PCDH17, PCDH18, PCDH9, PCDHB16, PCDHB2, PCDHGA1, PCDHGA11, PCDHGC4
Integrin signaling 24 ITGA4, ITGA9, ITGA11, LAMA1, LAMA4, LAMA5, FN1, ILK
c-Jun N-terminal kinase signaling 9 9MAP4K3, TNF, ATF2, NFATC3
KRAS signaling 5 KRAS, MAP2K4, RASGRP3
Regulation of invasion 46 ADAM11, ADAM12, ADAM19, ADAM5220, ADAMTS15, DPP6, MEP1A, PCSK6, APG4A, PRSS23
Small GTPase–dependent signaling (other than KRAS)
33 AGHGEF7, ARHGEF9, CDC42BPA, DEPDC2, PLCB3, PLCB4, RP1, PLXNB1, PRKCG
TGF-β signaling 37 TGFBR2, BMPR2, SMAD4, SMAD3
Wnt/Notch signaling 29 MYC, PPP2R3A, WNT9A, MAP2, TSC2, GATA6, TCF4
Commonly targeted pathways: Jones et al Overlap with QCMG
Pathway # genes Representative altered genes
Apoptosis 9 CASP10, VCP, CAD, HIP1
DNA damage control 9 ERCC4, ERCC6, EP300, RANBP2, TP53
Regulation of G1/S phase transition 19 CDKN2A, FBXW7, CHD1, APC2
Hedgehog signaling 19 TBX5, SOX3, LRP2, GLI1, GLI3, BOC, BMPR2, CREBBP
Homophilic cell adhesion 30 CDH1, CDH10, CDH2, CDH7, FAT, PCDH15, PCDH17, PCDH18, PCDH9, PCDHB16, PCDHB2, PCDHGA1, PCDHGA11, PCDHGC4
Integrin signaling 24 ITGA4, ITGA9, ITGA11, LAMA1, LAMA4, LAMA5, FN1, ILK
c-Jun N-terminal kinase signaling 9 9MAP4K3, TNF, ATF2, NFATC3
KRAS signaling 5 KRAS, MAP2K4, RASGRP3
Regulation of invasion 46 ADAM11, ADAM12, ADAM19, ADAM5220, ADAMTS15, DPP6, MEP1A, PCSK6, APG4A, PRSS23
Small GTPase–dependent signaling (other than KRAS)
33 AGHGEF7, ARHGEF9, CDC42BPA, DEPDC2, PLCB3, PLCB4, RP1, PLXNB1, PRKCG
TGF-β signaling 37 TGFBR2, BMPR2, SMAD4, SMAD3
Wnt/Notch signaling 29 MYC, PPP2R3A, WNT9A, MAP2, TSC2, GATA6, TCF4
Commonly targeted pathways:Overlap with QCMGNew QCMG
Pathway # genes Representative altered genes
Apoptosis 9 CASP10, VCP, CAD, HIP1, CASP3, IRAK4, PIK3CD TNFRSF1A
DNA damage control 9 ERCC4, ERCC6, EP300, RANBP2, TP53
Regulation of G1/S phase transition 19 CDKN2A, FBXW7, CHD1, APC2, RB
Hedgehog signaling 19 TBX5, SOX3, LRP2, GLI1, GLI3, BOC, BMPR2, CREBBP, DYRK1A
Homophilic cell adhesion 30 CDH1, CDH10, CDH2, CDH7, FAT, PCDH15, PCDH17, PCDH18, PCDH9, PCDHB16, PCDHB2, PCDHGA1, PCDHGA11, PCDHGC4, PCDHGB7, THBS4, FARP2
Integrin signaling 24 ITGA4, ITGA9, ITGA11, LAMA1, LAMA4, LAMA5, FN1, ILK
c-Jun N-terminal kinase signaling 9 9MAP4K3, TNF, ATF2, NFATC3
KRAS signaling 5 KRAS, MAP2K4, RASGRP3, AKAP9, PDE1C
Regulation of invasion 46 ADAM11, ADAM12, ADAM19, ADAM5220, ADAMTS15, DPP6, MEP1A, PCSK6, APG4A, PRSS23
Small GTPase–dependent signaling (other than KRAS)
33 AGHGEF7, ARHGEF9, CDC42BPA, DEPDC2, PLCB3, PLCB4, RP1, PLXNB1, PRKCG
TGF-β signaling 37 TGFBR2, BMPR2, SMAD4, SMAD3, TGFB1, EP300
Wnt/Notch signaling 29 MYC, PPP2R3A, WNT9A, MAP2, TSC2, GATA6, TCF4, NCSTN, NOTCH2, CAMK2D, SENP2
Transcriptome analysis
Gene-centric analysis:
• Count reads mapping to exons• Normalize/scale counts• Analyze differential expression• Array correlation 0.75-0.8• Sensitivity < 1 RNA/cell• Arrays: 8,500 genes active• RNAseq: 12,008 genes active
Nucleotide resolution:
• Identify expressed variants• Split variants into expressed germline Vs somatic events• Powerful validation tool for mutations predicted by wgs and exome-seq.• Potential for studying allele specific expression, and RNA editing
Expression of somatic variants
GermlineGermline
TumourTumour
ExpressionExpression
KRAS activating mutation C>T (G12D)
Cancer Genome Report: APGI -1959
Cancer Research Program
QCMG:Sean GrimmondPeter Wilson
Genome Biology: Bioinformatics: Genome Sequencing: Life Technologies:Nicole Cloonan John Pearson Brooke Gardiner Gabriel KolleKarin Kassahn Darrin Taylor David Miller John DavisNic Waddell David Tang Craig Nourse John SheppardAnita Steptoe Conrad Leonard Suzanne Manning Kevin McKernanShivangi Wani Jason Steen Ehsan Nourbakhsh Yongming SunKeerthana Krishnan Christina Xu Ivon Harliwong Beverly R&D Mellissa Brown Matt Anderson Senel Idrisoglu Foster City R&DNick Matigan David WoodRathi Thiagarajan
Acknowledgements
Array Facility (IMB):Katia NonesRebecca Foale
HPC (UQ):Lutz ProssZiping FangDavid Green
Garvan Institute:Andrew BiankinAmber Johns Chris Scarlett Mark PineseDavid Chang Michelle Thomas Chris ToonMary-Anne Brancato Cathy Axford Emily ColvinAmanda Mawson Johana Susanto Rob SutherlandSue Henshall Liz Musgrove Roger Daly