Equipped with DCC & Lights Equipped with DCC, Sound & Lights
2012-ICGC-Heidelberg-Whitty-DCC 2
-
Upload
brett-whitty -
Category
Documents
-
view
174 -
download
4
Transcript of 2012-ICGC-Heidelberg-Whitty-DCC 2
Brett Whitty, DCC Curation Group7th International Workshop
Heidelberg, Germany
International Cancer Genome Consortium
Data Coordination Center Update
DCC Updates in 2012August 2012 • ICGC 9, first data release from Canadian Pediatric
Medulloblastoma (MAGIC) project, full update of 18 cancer types from the U.S. TCGA project including 6 new databases, update of German Pediatric Brain Tumour (PedBrain) project. Added 3,029 donors.
November 2012 • ICGC 10, two new cancer types from the U.S. TCGA project as well as updates to 18 other TCGA project, updates from the Spanish Chronic Lymphocytic Leukemia project including new methylation data, updates to U.K. Breast Carcinoma and Chronic Myeloid Disorders project databases. Added 432 donors.
December 2012 • ICGC 11, first data release from 4 projects: German Malignant Lymphoma, Canadian Prostate Cancer, German Prostate Cancer, U.K. Prostate Cancer; new data from an additional 4 projects: Australian Pancreatic Cancer, Canadian Pancreatic Cancer, Japanese Liver Cancer, German Pediatric Brain Tumour (PedBrain). Added 336 donors.
3
• Cancer types: 42 (including TCGA, TSP, JHU)
• Donors: 7,358 (14,645 specimens)
• Simple somatic mutations: 3,761,508
• Copy number mutations: 25,227,388
• Structural rearrangements: 2,079
• Genes affected* by simple somatic mutations: 19,901
• Genes affected* by non-synonymous coding mutations: 19,675
• Genes affected* by copy number mutations: 20,109
• Genes affected* by structural rearrangements: 1,060*out of 21,420 protein coding genes annotated in Ensembl Human release 66
• Open tier and controlled data currently available
ICGC dataset version 11December 2012
4
ICGC DCC Has Received Data for 7,358 Cancer Genomes and Counting
Release 7Release 8
Release 9Release 10
Release 11
6
Prosta
te Ca
ncer (O
ICR, C
A)
Liver
Cance
r (RIKE
N, JP)
Lung A
deno
carcin
oma (
TCGA,
US)
Liver
Cance
r (NCC
, JP)
Lung S
quam
ous C
ell Ca
rcino
ma (TCG
A, US)
Uterine
Corpu
s End
ometr
ioid C
arcino
ma (TCG
A, US)
Colon
Aden
ocarci
noma (
TCGA,
US)
Ovarian
Serou
s Cyst
aden
ocarci
noma (
TCGA,
US)
Kidne
y Ren
al Cle
ar Ce
ll Carc
inoma (
TCGA,
US)
Malign
ant M
elano
ma (WTSI
, UK)
Bladd
er Urot
helia
l Carc
inoma (
TCGA,
US)
Small
Cell L
ung C
arcino
ma (WTSI
, UK)
Breast
Carci
noma (
WTSI, U
K)
Rectu
m Aden
ocarci
noma (
TCGA,
US)
Gliobla
stoma M
ultifo
rme (
TCGA,
US)
Gastric
Cance
r (CCG
C, CN
)
Pediat
ric Br
ain Tu
mors (D
KFZ, D
E)
Breast
Invas
ive Ca
rcino
ma (TCG
A, US)
Acute
Myeloid
Leuke
mia (TC
GA, US)
Prosta
te Ad
enoca
rcino
ma (TCG
A, US)
Chron
ic Lym
phocy
tic Le
ukemia
(ISC/M
ICINN, E
S)
Cervi
cal Sq
uamou
s Cell
Carci
noma (
TCGA,
US)
Pancre
atic C
ancer
(QCM
G, AU)
Liver
Cance
r (INCa
, FR)
Malign
ant L
ymph
oma (
DKFZ, D
E)
Pancre
atic C
ancer
(OICR
, CA)
Myelop
rolife
rative
Disorde
rs (W
TSI, U
K)
Head a
nd Neck
Squa
mous C
ell Ca
rcino
ma (TCG
A, US)
Stomach
Aden
ocarci
noma (
TCGA,
US)
Thyroi
d Carc
inoma (
TCGA,
US)
Lower
Grade G
lioma (
TCGA,
US)
Kidne
y Ren
al Pap
illary
Cell C
arcino
ma (TCG
A, US)
Liver
Hepato
cellul
ar Ca
rcino
ma (TCG
A, US)
Prosta
te Ca
ncer (D
KFZ, D
E)
Pediat
ric Med
ullob
lastom
a (BC
GSC, C
A)
Prosta
te Ca
ncer (W
TSI, U
K)1
10
100
1000
10000
100000
1000000
10000000
Total Mutation Observation Counts by Cancer Project(Release 11)
SSM Observations CNSM Observations STSM Observations
7
Completeness of Data for Genomic Analysis Types in DCC Datasets(ICGC 11)
Copy Number Alterations
Structural Variation
Gene Expression
miRNA Expression
Simple Somatic Mutations
Splicing Variation
DNA Methylation
# DONOR
S
8
Completeness of Genomic Analysis Data Types in DCC Datasets (2)
miRNA ExpressionSimple Somatic Mutations
Splicing Variation DNA MethylationCopy Number Alterations
Structural Variation Gene Expression
9
Completeness of Genomic Analysis Data Types in DCC Datasets (3)
miRNA ExpressionSimple Somatic Mutations
Splicing Variation DNA MethylationCopy Number Alterations
Structural Variation Gene Expression
10
Completeness of Genomic Analysis Data Types in DCC Datasets (4)
miRNA ExpressionSimple Somatic Mutations
Splicing Variation DNA MethylationCopy Number Alterations
Structural Variation Gene Expression
11
Completeness of Genomic Analysis Data Types in DCC Datasets (5)
miRNA ExpressionSimple Somatic Mutations
Splicing Variation DNA MethylationCopy Number Alterations
Structural Variation Gene Expression
12
Clinical Data Completeness OverviewDonor Data Element
Average % Complete
donor sex 94.8donor diagnosis icd10 94.5donor age at diagnosis 84.7donor vital status 71.2donor age at last followup 64.9donor notes 57.7donor interval of last followup 55.5disease status last followup 52.5donor region of residence 52.5donor tumour staging system at diagnosis 49.9donor tumour stage at diagnosis 33.4donor survival time 30.3donor age at enrollment 28.4donor tumour stage at diagnosis supplemental 14.8donor relapse interval 5.8donor relapse type 4.5
Specimen Data ElementAverage % Complete
specimen type 97.7tumour confirmed 68.3specimen storage other 54.2specimen notes 52.4specimen processing other 51.7digital image of stained section 51.6tumour grade 25.0tumour grading system 24.4specimen storage 22.6specimen donor treatment type 21.1specimen processing 21.0tumour histological type 18.6tumour stage 18.2tumour stage system 14.5specimen type other 14.4specimen interval 10.7specimen available 9.4tumour stage supplemental 2.3tumour grade supplemental 1.1specimen donor treatment type other 0.9specimen biobank 0.0specimen biobank id 0.0
Analyzed Sample Data Element
Average % Complete
analyzed sample type 95.2analyzed sample notes 48.8analyzed sample type other 12.4analyzed sample interval 4.9
Disclaimer:
A data element was considered “complete” in an individual donor’s clinical data if a non-null value was provided for that data element at least once in the donor record, or in any of the donor-associated specimens and sample records.
Averages were calculated for each field across all donors from all projects.
Intention is only to provide a high level overview of how “complete” ICGC release 11 clinical dataset is.
14
Overview of Clinical Data Completeness (ICGC 10)
ICGC Release 11 Raw Data Availability
Raw Data Availability at EGA by Project and Data Type
ProjectWhole Genome
SequencingExome
SequencingTranscriptome
SequencingWhole Genome
Expression Array
Whole Genome Methylation
ArrayUnspecified
TypeTotal Project
SamplesCLL, Spain 11 227 107 205 171 224 945Breast Carcinoma, UK 173 442 TBD - - 174 789Myeloproliferative Disease, UK 6 476 - - - - 482Pediatric Medulloblastoma, Germany 236 - - - - - 236Pancreatic Cancer, Australia - - - - - 192 192Osteosarcoma, UK 3 140 - - - - 143Liver Cancer, France - 48 - - - - 48Malignant Lymphoma, Germany 12 12 4 (+TBD) - - - 28Oral Cancer, India - 21 - - - - 21Prostate Cancer, Germany 18 - - - - - 18Prostate Cancer, UK 4 - - - - - 4Pancreatic Cancer, Canada - 2 - - - - 2Pediatric Medulloblastoma, Canada - - - - - TBD -
Total Samples by Type 463 1368 111 205 171 590 2908
# of Samples in Available Datasets by Data Type
Web Usage Overview
18
DCC Helpdesk• 110 helpdesk inquiries received at [email protected] since
Cannes meeting◦ …this doesn’t include requests that arrive direct to my inbox
Some frequent topics of enquiry include:
• Controlled data access◦ How do I obtain access?◦ Why am I unable to log into my account?
• Questions related to analysis methods, eg: how data was normalized
• Questions from ICGC member projects related to data submissions, data encoding, etc.
Key DCC Activities for 2013• Improved data & metadata curation at EGA; better linking
of data held at DCC to ICGC data in other repositories
• Improved data quality/integrity checking through new submission/validation system; review of submission file specifications
• Integration of new data submission system and portal infrastructure with project and user information managed at ICGC.org
21
Anknowledgements and Thanks
• ICGC DCC software team @ OICR
• ICGC Secretariat Office
• All the great ICGC members!