2012-ICGC-Heidelberg-Whitty-DCC 2

21
Brett Whitty, DCC Curation Group 7 th International Workshop Heidelberg, Germany International Cancer Genome Consortium Data Coordination Center Update

Transcript of 2012-ICGC-Heidelberg-Whitty-DCC 2

Page 1: 2012-ICGC-Heidelberg-Whitty-DCC 2

Brett Whitty, DCC Curation Group7th International Workshop

Heidelberg, Germany

International Cancer Genome Consortium

Data Coordination Center Update

Page 2: 2012-ICGC-Heidelberg-Whitty-DCC 2

DCC Updates in 2012August 2012 • ICGC 9, first data release from Canadian Pediatric

Medulloblastoma (MAGIC) project, full update of 18 cancer types from the U.S. TCGA project including 6 new databases, update of German Pediatric Brain Tumour (PedBrain) project. Added 3,029 donors.

November 2012 • ICGC 10, two new cancer types from the U.S. TCGA project as well as updates to 18 other TCGA project, updates from the Spanish Chronic Lymphocytic Leukemia project including new methylation data, updates to U.K. Breast Carcinoma and Chronic Myeloid Disorders project databases. Added 432 donors.

December 2012 • ICGC 11, first data release from 4 projects: German Malignant Lymphoma, Canadian Prostate Cancer, German Prostate Cancer, U.K. Prostate Cancer; new data from an additional 4 projects: Australian Pancreatic Cancer, Canadian Pancreatic Cancer, Japanese Liver Cancer, German Pediatric Brain Tumour (PedBrain). Added 336 donors.

Page 3: 2012-ICGC-Heidelberg-Whitty-DCC 2

3

• Cancer types: 42 (including TCGA, TSP, JHU)

• Donors: 7,358 (14,645 specimens)

• Simple somatic mutations: 3,761,508

• Copy number mutations: 25,227,388

• Structural rearrangements: 2,079

• Genes affected* by simple somatic mutations: 19,901

• Genes affected* by non-synonymous coding mutations: 19,675

• Genes affected* by copy number mutations: 20,109

• Genes affected* by structural rearrangements: 1,060*out of 21,420 protein coding genes annotated in Ensembl Human release 66

• Open tier and controlled data currently available

ICGC dataset version 11December 2012

Page 4: 2012-ICGC-Heidelberg-Whitty-DCC 2

4

ICGC DCC Has Received Data for 7,358 Cancer Genomes and Counting

Release 7Release 8

Release 9Release 10

Release 11

Page 5: 2012-ICGC-Heidelberg-Whitty-DCC 2
Page 6: 2012-ICGC-Heidelberg-Whitty-DCC 2

6

Prosta

te Ca

ncer (O

ICR, C

A)

Liver

Cance

r (RIKE

N, JP)

Lung A

deno

carcin

oma (

TCGA,

US)

Liver

Cance

r (NCC

, JP)

Lung S

quam

ous C

ell Ca

rcino

ma (TCG

A, US)

Uterine

Corpu

s End

ometr

ioid C

arcino

ma (TCG

A, US)

Colon

Aden

ocarci

noma (

TCGA,

US)

Ovarian

Serou

s Cyst

aden

ocarci

noma (

TCGA,

US)

Kidne

y Ren

al Cle

ar Ce

ll Carc

inoma (

TCGA,

US)

Malign

ant M

elano

ma (WTSI

, UK)

Bladd

er Urot

helia

l Carc

inoma (

TCGA,

US)

Small

Cell L

ung C

arcino

ma (WTSI

, UK)

Breast

Carci

noma (

WTSI, U

K)

Rectu

m Aden

ocarci

noma (

TCGA,

US)

Gliobla

stoma M

ultifo

rme (

TCGA,

US)

Gastric

Cance

r (CCG

C, CN

)

Pediat

ric Br

ain Tu

mors (D

KFZ, D

E)

Breast

Invas

ive Ca

rcino

ma (TCG

A, US)

Acute

Myeloid

Leuke

mia (TC

GA, US)

Prosta

te Ad

enoca

rcino

ma (TCG

A, US)

Chron

ic Lym

phocy

tic Le

ukemia

(ISC/M

ICINN, E

S)

Cervi

cal Sq

uamou

s Cell

Carci

noma (

TCGA,

US)

Pancre

atic C

ancer

(QCM

G, AU)

Liver

Cance

r (INCa

, FR)

Malign

ant L

ymph

oma (

DKFZ, D

E)

Pancre

atic C

ancer

(OICR

, CA)

Myelop

rolife

rative

Disorde

rs (W

TSI, U

K)

Head a

nd Neck

Squa

mous C

ell Ca

rcino

ma (TCG

A, US)

Stomach

Aden

ocarci

noma (

TCGA,

US)

Thyroi

d Carc

inoma (

TCGA,

US)

Lower

Grade G

lioma (

TCGA,

US)

Kidne

y Ren

al Pap

illary

Cell C

arcino

ma (TCG

A, US)

Liver

Hepato

cellul

ar Ca

rcino

ma (TCG

A, US)

Prosta

te Ca

ncer (D

KFZ, D

E)

Pediat

ric Med

ullob

lastom

a (BC

GSC, C

A)

Prosta

te Ca

ncer (W

TSI, U

K)1

10

100

1000

10000

100000

1000000

10000000

Total Mutation Observation Counts by Cancer Project(Release 11)

SSM Observations CNSM Observations STSM Observations

Page 7: 2012-ICGC-Heidelberg-Whitty-DCC 2

7

Completeness of Data for Genomic Analysis Types in DCC Datasets(ICGC 11)

Copy Number Alterations

Structural Variation

Gene Expression

miRNA Expression

Simple Somatic Mutations

Splicing Variation

DNA Methylation

# DONOR

S

Page 8: 2012-ICGC-Heidelberg-Whitty-DCC 2

8

Completeness of Genomic Analysis Data Types in DCC Datasets (2)

miRNA ExpressionSimple Somatic Mutations

Splicing Variation DNA MethylationCopy Number Alterations

Structural Variation Gene Expression

Page 9: 2012-ICGC-Heidelberg-Whitty-DCC 2

9

Completeness of Genomic Analysis Data Types in DCC Datasets (3)

miRNA ExpressionSimple Somatic Mutations

Splicing Variation DNA MethylationCopy Number Alterations

Structural Variation Gene Expression

Page 10: 2012-ICGC-Heidelberg-Whitty-DCC 2

10

Completeness of Genomic Analysis Data Types in DCC Datasets (4)

miRNA ExpressionSimple Somatic Mutations

Splicing Variation DNA MethylationCopy Number Alterations

Structural Variation Gene Expression

Page 11: 2012-ICGC-Heidelberg-Whitty-DCC 2

11

Completeness of Genomic Analysis Data Types in DCC Datasets (5)

miRNA ExpressionSimple Somatic Mutations

Splicing Variation DNA MethylationCopy Number Alterations

Structural Variation Gene Expression

Page 12: 2012-ICGC-Heidelberg-Whitty-DCC 2

12

Page 13: 2012-ICGC-Heidelberg-Whitty-DCC 2

Clinical Data Completeness OverviewDonor Data Element

Average % Complete

donor sex 94.8donor diagnosis icd10 94.5donor age at diagnosis 84.7donor vital status 71.2donor age at last followup 64.9donor notes 57.7donor interval of last followup 55.5disease status last followup 52.5donor region of residence 52.5donor tumour staging system at diagnosis 49.9donor tumour stage at diagnosis 33.4donor survival time 30.3donor age at enrollment 28.4donor tumour stage at diagnosis supplemental 14.8donor relapse interval 5.8donor relapse type 4.5

Specimen Data ElementAverage % Complete

specimen type 97.7tumour confirmed 68.3specimen storage other 54.2specimen notes 52.4specimen processing other 51.7digital image of stained section 51.6tumour grade 25.0tumour grading system 24.4specimen storage 22.6specimen donor treatment type 21.1specimen processing 21.0tumour histological type 18.6tumour stage 18.2tumour stage system 14.5specimen type other 14.4specimen interval 10.7specimen available 9.4tumour stage supplemental 2.3tumour grade supplemental 1.1specimen donor treatment type other 0.9specimen biobank 0.0specimen biobank id 0.0

Analyzed Sample Data Element

Average % Complete

analyzed sample type 95.2analyzed sample notes 48.8analyzed sample type other 12.4analyzed sample interval 4.9

Disclaimer:

A data element was considered “complete” in an individual donor’s clinical data if a non-null value was provided for that data element at least once in the donor record, or in any of the donor-associated specimens and sample records.

Averages were calculated for each field across all donors from all projects.

Intention is only to provide a high level overview of how “complete” ICGC release 11 clinical dataset is.

Page 14: 2012-ICGC-Heidelberg-Whitty-DCC 2

14

Overview of Clinical Data Completeness (ICGC 10)

Page 15: 2012-ICGC-Heidelberg-Whitty-DCC 2

ICGC Release 11 Raw Data Availability

Page 16: 2012-ICGC-Heidelberg-Whitty-DCC 2

Raw Data Availability at EGA by Project and Data Type

ProjectWhole Genome

SequencingExome

SequencingTranscriptome

SequencingWhole Genome

Expression Array

Whole Genome Methylation

ArrayUnspecified

TypeTotal Project

SamplesCLL, Spain 11 227 107 205 171 224 945Breast Carcinoma, UK 173 442 TBD - - 174 789Myeloproliferative Disease, UK 6 476 - - - - 482Pediatric Medulloblastoma, Germany 236 - - - - - 236Pancreatic Cancer, Australia - - - - - 192 192Osteosarcoma, UK 3 140 - - - - 143Liver Cancer, France - 48 - - - - 48Malignant Lymphoma, Germany 12 12 4 (+TBD) - - - 28Oral Cancer, India - 21 - - - - 21Prostate Cancer, Germany 18 - - - - - 18Prostate Cancer, UK 4 - - - - - 4Pancreatic Cancer, Canada - 2 - - - - 2Pediatric Medulloblastoma, Canada - - - - - TBD -

Total Samples by Type 463 1368 111 205 171 590 2908

# of Samples in Available Datasets by Data Type

Page 17: 2012-ICGC-Heidelberg-Whitty-DCC 2

Web Usage Overview

Page 18: 2012-ICGC-Heidelberg-Whitty-DCC 2

18

Page 19: 2012-ICGC-Heidelberg-Whitty-DCC 2

DCC Helpdesk• 110 helpdesk inquiries received at [email protected] since

Cannes meeting◦ …this doesn’t include requests that arrive direct to my inbox

Some frequent topics of enquiry include:

• Controlled data access◦ How do I obtain access?◦ Why am I unable to log into my account?

• Questions related to analysis methods, eg: how data was normalized

• Questions from ICGC member projects related to data submissions, data encoding, etc.

Page 20: 2012-ICGC-Heidelberg-Whitty-DCC 2

Key DCC Activities for 2013• Improved data & metadata curation at EGA; better linking

of data held at DCC to ICGC data in other repositories

• Improved data quality/integrity checking through new submission/validation system; review of submission file specifications

• Integration of new data submission system and portal infrastructure with project and user information managed at ICGC.org

Page 21: 2012-ICGC-Heidelberg-Whitty-DCC 2

21

Anknowledgements and Thanks

• ICGC DCC software team @ OICR

• ICGC Secretariat Office

• All the great ICGC members!