NGS, Cancer and Bioinformacsrssf.i2bc.paris-saclay.fr/transfert/M2CANCERO/NGS... · NGS and...

64
20/10/15 Yannick Boursin NGS, Cancer and Bioinforma;cs 1

Transcript of NGS, Cancer and Bioinformacsrssf.i2bc.paris-saclay.fr/transfert/M2CANCERO/NGS... · NGS and...

Page 1: NGS, Cancer and Bioinformacsrssf.i2bc.paris-saclay.fr/transfert/M2CANCERO/NGS... · NGS and Oncology 18 07-09th April 2014 NGS and Bioinformatics NGS is now widely used as: •A research

20/10/15 YannickBoursin

NGS,CancerandBioinforma;cs

1

Page 2: NGS, Cancer and Bioinformacsrssf.i2bc.paris-saclay.fr/transfert/M2CANCERO/NGS... · NGS and Oncology 18 07-09th April 2014 NGS and Bioinformatics NGS is now widely used as: •A research

NGS and Clinical Oncology

• NGSinhereditarycancergenometes;ng•  BRCA1/2(breast/ovarycancer)•  XPC(melanoma)•  ERCC1(colorectalcancer)

• NGSforpersonalizedcancertreatment•  Clinicaltrials:MOSCATO(GR),SAFIR(GR),SHIVA(Curie),…•  Ipilimumab(an;-CTLA4),Nivolumab(an;-PD1),Trastuzumab(an;-HER2),Cetuximab(an;-EGFR)

• Detec;onofchimerictranscripts•  ChronicMyeloidLeukemia:Philadelphiachromosome(BCR/ABL)•  Non-Small-CellLungCancer:EML4-ALK

20/10/15 YannickBoursin 2

Page 3: NGS, Cancer and Bioinformacsrssf.i2bc.paris-saclay.fr/transfert/M2CANCERO/NGS... · NGS and Oncology 18 07-09th April 2014 NGS and Bioinformatics NGS is now widely used as: •A research

NGS and Oncology

20/10/15 YannickBoursin

NGSisnowwidelyusedas:•Aresearchtooltoscreenalargeamountofcancersamples

NGS and Oncology

18

07-09th April 2014 NGS and Bioinformatics

NGS is now widely used as: • A research tool to screen a large amount of cancer samples • A clinical/diagnosis tool in daily practice These projects require dedicated bioinformatics integration project to access and analyses this huge amount of data

•Aclinical/diagnosistoolindailyprac;ce

Theseprojectsrequirededicatedbioinforma;csintegra;onprojecttoaccessandanalysesthishugeamountofdata.

NGS and Oncology

18

07-09th April 2014 NGS and Bioinformatics

NGS is now widely used as: • A research tool to screen a large amount of cancer samples • A clinical/diagnosis tool in daily practice These projects require dedicated bioinformatics integration project to access and analyses this huge amount of data

3

Page 4: NGS, Cancer and Bioinformacsrssf.i2bc.paris-saclay.fr/transfert/M2CANCERO/NGS... · NGS and Oncology 18 07-09th April 2014 NGS and Bioinformatics NGS is now widely used as: •A research

Why do we need computers for NGS

Sequencingdatasizeevolu7on Needstoaddress

•  StorePetaBytesofdata(1PBis1000TB).

•  Sharedataaroundtheworldthroughnetworks

• Analyzehugeamountsofdatawithcomplexalgorithms

20/10/15 YannickBoursin 4

Page 5: NGS, Cancer and Bioinformacsrssf.i2bc.paris-saclay.fr/transfert/M2CANCERO/NGS... · NGS and Oncology 18 07-09th April 2014 NGS and Bioinformatics NGS is now widely used as: •A research

Bioinformatics and Oncology

• Problem:finding,extrac;ng,andpresen;ngrelevantinforma;ons.

• Par;alsolu;on:designingworkflowsinordertoeasedataanalysis.

20/10/15 YannickBoursin 5

Page 6: NGS, Cancer and Bioinformacsrssf.i2bc.paris-saclay.fr/transfert/M2CANCERO/NGS... · NGS and Oncology 18 07-09th April 2014 NGS and Bioinformatics NGS is now widely used as: •A research

Interdisciplinary collaboration

20/10/15 YannickBoursin

Bioinforma;csactsasahubsbetweenthedifferentfields.Trustbetweenpartnersisneeded,trainingisneededaswellforefficientunderstanding.

Interdisciplinary collaboration

07-09th April 2014 NGS and Bioinformatics

Bioinformatics acts as a hubs between the different fields. Trust between partners is needed, training is needed as well for efficient understanding.

Biology knowledge Knowledge modeling,

Technological platforms Sequencing, Microarrays, ImmunoChemistry,  …

Bioinformatics

Raw data storage Integration of biological and clinical

data Quality Control Data analysis

Clinical Biostatistics Report for biological/medical staff

Medical staff Clinicians, specialists,  …

Biological staff Biologists, Geneticists,  …

19

Interdisciplinary collaboration

07-09th April 2014 NGS and Bioinformatics

Bioinformatics acts as a hubs between the different fields. Trust between partners is needed, training is needed as well for efficient understanding.

Biology knowledge Knowledge modeling,

Technological platforms Sequencing, Microarrays, ImmunoChemistry,  …

Bioinformatics

Raw data storage Integration of biological and clinical

data Quality Control Data analysis

Clinical Biostatistics Report for biological/medical staff

Medical staff Clinicians, specialists,  …

Biological staff Biologists, Geneticists,  …

19

Interdisciplinary collaboration

07-09th April 2014 NGS and Bioinformatics

Bioinformatics acts as a hubs between the different fields. Trust between partners is needed, training is needed as well for efficient understanding.

Biology knowledge Knowledge modeling,

Technological platforms Sequencing, Microarrays, ImmunoChemistry,  …

Bioinformatics

Raw data storage Integration of biological and clinical

data Quality Control Data analysis

Clinical Biostatistics Report for biological/medical staff

Medical staff Clinicians, specialists,  …

Biological staff Biologists, Geneticists,  …

19 6

Page 7: NGS, Cancer and Bioinformacsrssf.i2bc.paris-saclay.fr/transfert/M2CANCERO/NGS... · NGS and Oncology 18 07-09th April 2014 NGS and Bioinformatics NGS is now widely used as: •A research

Standard Workflow for NGS Analysis

20/10/15 YannickBoursin

Standard Workflow for NGS Analysis

Raw Reads

Reads Mapping

Data Analysis

Depends on the NGS Application

Sequencing &

Primary Analysis

Reads Cleaning

QC: 1 QC: 2 QC: 3

07-09th April 2014 NGS and Bioinformatics

30 7

AtypicalNGSworkflow

Page 8: NGS, Cancer and Bioinformacsrssf.i2bc.paris-saclay.fr/transfert/M2CANCERO/NGS... · NGS and Oncology 18 07-09th April 2014 NGS and Bioinformatics NGS is now widely used as: •A research

Step 1: Quality Check and improvements

20/10/15 YannickBoursin 8

Page 9: NGS, Cancer and Bioinformacsrssf.i2bc.paris-saclay.fr/transfert/M2CANCERO/NGS... · NGS and Oncology 18 07-09th April 2014 NGS and Bioinformatics NGS is now widely used as: •A research

NGS Data: what do they look like ?

20/10/15 YannickBoursin 9

Arawdatafile(.fastq,.sff,.fa,.csfasta/.qual)withmillionsofshortreadsofthesamesize(SOLiD,HiSeq)orreadsofdifferentsize(IonPGM/Proton)

Enhancedviewofthereadsinafastqfile

Page 10: NGS, Cancer and Bioinformacsrssf.i2bc.paris-saclay.fr/transfert/M2CANCERO/NGS... · NGS and Oncology 18 07-09th April 2014 NGS and Bioinformatics NGS is now widely used as: •A research

FASTQ format

20/10/15 YannickBoursin

•1sequence=1read=4linesinthefile

Fastq format (base–space)

• 1 sequence = 4 lines in the file

07-09th April 2014 NGS and Bioinformatics

• First line = sequence identifier

24

•Firstline=sequenceiden;fier

Fastq format (base–space)

• 1 sequence = 4 lines in the file

07-09th April 2014 NGS and Bioinformatics

• First line = sequence identifier

24 10

Page 11: NGS, Cancer and Bioinformacsrssf.i2bc.paris-saclay.fr/transfert/M2CANCERO/NGS... · NGS and Oncology 18 07-09th April 2014 NGS and Bioinformatics NGS is now widely used as: •A research

FASTQ format

20/10/15 YannickBoursin

•Fourthline=Quality

Fastq format (base–space)

• Fourth line = Quality

• ASCII encoded (Reduce the file size)

07-09th April 2014 NGS and Bioinformatics

25

•ASCIIencoded(Reducethefilesize)

11

Page 12: NGS, Cancer and Bioinformacsrssf.i2bc.paris-saclay.fr/transfert/M2CANCERO/NGS... · NGS and Oncology 18 07-09th April 2014 NGS and Bioinformatics NGS is now widely used as: •A research

Sequence quality encoding

20/10/15 YannickBoursin

Phred scores Q : Q scores are defined as a property that is logarithmically related to the base-calling error probabilities (P).

Q = -10 log10 P

Sequence quality encoding

07-09th April 2014 NGS and Bioinformatics

26

12

Page 13: NGS, Cancer and Bioinformacsrssf.i2bc.paris-saclay.fr/transfert/M2CANCERO/NGS... · NGS and Oncology 18 07-09th April 2014 NGS and Bioinformatics NGS is now widely used as: •A research

Quality controls on raw reads : lets start after sequencing

20/10/15 YannickBoursin

Let’s start after sequencing …

A raw data file (.fastq, .sff, .fa, .csfasta/.qual) with millions of short reads of the same size (SOLiD, HiSeq) or

reads of different size (Ion PGM/Proton)

07-09th April 2014 NGS and Bioinformatics

ACTGATTAGTCTGAATTAGANNGATAGGAT

GATCGATGCATAGCGATCAGCATCGATACG

CGGCGCTCCGCTCTCGAAACTAGCACTGAC

AGCATCAGGATCTACGATCTAGCGAACTGAC ACTAGCTACTATCGAGCGAGCGATCATCGAC

ACTAGGCATCGGCATCACGGACNNNNNNNN

ACTAGCTATCGAGCTATCAGCGAGCATCTATC

CTGACTACTATCGAGCGAGCTACTAACTGAC

ACTACTTACGACATCGAGGTTAGGAGCATCA

ACTANNGACTAGGAATTAGCTACTGAGCTAC ACTAGCAGCTATATGAGCTACTAGCACTGAC

ACTATCAGCTAGCGCTTCAGCATTACCGT

NNNNNNNNNNNNNNNNNNNNNNNNNNNNN

23

13

Arawreadischaracterizedbythreeparameters:•  Itslength•  Itssequence•  Per-base-in-sequencequality

Rawreads

Page 14: NGS, Cancer and Bioinformacsrssf.i2bc.paris-saclay.fr/transfert/M2CANCERO/NGS... · NGS and Oncology 18 07-09th April 2014 NGS and Bioinformatics NGS is now widely used as: •A research

Why looking at sequencing quality ?

20/10/15 YannickBoursin

•Qualityofdataisveryimportantforvariousdownstreamanalyses:

•Sequenceassemblyormapping•Variantsdetec;on•Geneexpressionstudies•...

•Qualityofdata=poor

•Trytofindareason•Canwecorrect/improvethequality?•Mayleadtoerroneousconclusions

14

Page 15: NGS, Cancer and Bioinformacsrssf.i2bc.paris-saclay.fr/transfert/M2CANCERO/NGS... · NGS and Oncology 18 07-09th April 2014 NGS and Bioinformatics NGS is now widely used as: •A research

Quality controls on raw reads: which metrics to check ?

20/10/15 YannickBoursin

Mainly:•  Qualityscoreperbaseandoverthereads

Butalso:•  Readlengthdistribu;on•  Sequencecontentperbaseand%ofGC•  Kmerscontent•  Overrepresentedsequences•  Duplicatedreads

15

Page 16: NGS, Cancer and Bioinformacsrssf.i2bc.paris-saclay.fr/transfert/M2CANCERO/NGS... · NGS and Oncology 18 07-09th April 2014 NGS and Bioinformatics NGS is now widely used as: •A research

Quality scores

20/10/15 YannickBoursin

•Perbase(BoxWhiskertypeplot)->toseewetherbasecallsfallsintolowquality(commonlytowardstheendofaread)•Persequence(meanqualitydistribu;on)->toseeifasubsetofyoursequenceshaveuniversallylowqualityvalues

16

Page 17: NGS, Cancer and Bioinformacsrssf.i2bc.paris-saclay.fr/transfert/M2CANCERO/NGS... · NGS and Oncology 18 07-09th April 2014 NGS and Bioinformatics NGS is now widely used as: •A research

Quality scores

20/10/15 YannickBoursin

Quality scores

PGM – run A PGM – run A

PGM – run B PGM – run B

07-09th April 2014 NGS and Bioinformatics

41 17

Page 18: NGS, Cancer and Bioinformacsrssf.i2bc.paris-saclay.fr/transfert/M2CANCERO/NGS... · NGS and Oncology 18 07-09th April 2014 NGS and Bioinformatics NGS is now widely used as: •A research

Quality scores

20/10/15 YannickBoursin

Quality scores

Illumina – run C Illumina – run C

Illumina – run D Illumina – run D

07-09th April 2014 NGS and Bioinformatics

42

18

Page 19: NGS, Cancer and Bioinformacsrssf.i2bc.paris-saclay.fr/transfert/M2CANCERO/NGS... · NGS and Oncology 18 07-09th April 2014 NGS and Bioinformatics NGS is now widely used as: •A research

Quality control on raw reads: adapters removal

20/10/15 YannickBoursin

•AnadapterisasmallpieceofknownDNAlocatedattheendofthereads•Adaptersroles:

•Hangreadtothesequencerflowcell•AllowsaspecificPCRenrichmentofreadshavingadapter•Useinmul;plexsequencing(samplesinmix)

•Availabletoolstotrimadapters:•Cutadapt•SeqPrep•RmAdapter

Adapters

• An adapter is a small piece of known DNA located at the end of the reads

• Adapters roles: • Hang read to the sequencer flowcell • Allows a specific PCR enrichment of reads having adapter • Use in multiplex sequencing (samples in mix)

• Available tools to trim adapters: • Cutadapt • SeqPrep • RmAdapter

07-09th April 2014 NGS and Bioinformatics

27 19

Inblue:adapters.Inorange:informa;vepartoftheread.

Page 20: NGS, Cancer and Bioinformacsrssf.i2bc.paris-saclay.fr/transfert/M2CANCERO/NGS... · NGS and Oncology 18 07-09th April 2014 NGS and Bioinformatics NGS is now widely used as: •A research

Quality controls on raw reads : lets start after sequencing

20/10/15 YannickBoursin

AfirstQualityControlofrawreadsismandatoryandcanbeestablishedaccordingtotheapplica;on('N',adaptersequences,barcode,contamina;on,etc.)

Let’s start after sequencing …

A first Quality Control of raw reads is mandatory and can be established according to the application ('N', adapter sequences, barcode, contamincation, etc.)

ACTGATTAGTCTGAATTAGANNGATAGGAT

GATCGATGCATAGCGATCAGCATCGATACG

CGGCGCTCCGCTCTCGAAACTAGCATCGAC

AGCATCAGGATCTACGATCTAGCGAACTGAC ACTAGCTACTATCGAGCGAGCGATCATCGAC

ACTAGGCATCGGCATCACGGACNNNNNNNN

ACTAGCTATCGAGCTATCAGCGAGCATCTATC

CTGACTACTATCGAGCGAGCTACTAACTGAC

ACTACTTACGACATCGAGGTTAGGAGCATCA

ACTANNGACTAGGAATTAGCTACTGAGCTAC ACTAGCAGCTATATGAGCTACTAGCACTGAC

ACTATCAGCTAGCGCTTCAGCATTACCGT

NN NNNNNNNN

NN

ACTGAC

ACTGAC

ACTGAC

ACTGAC

NNNNNNNNNNNNNNNNNNNNNNNNNNNNN

07-09th April 2014 NGS and Bioinformatics

31 20

Processedreads:bluepartsaretobekept,greenandredpartstoberemoved

Page 21: NGS, Cancer and Bioinformacsrssf.i2bc.paris-saclay.fr/transfert/M2CANCERO/NGS... · NGS and Oncology 18 07-09th April 2014 NGS and Bioinformatics NGS is now widely used as: •A research

Quality controls : Standard Workflow for NGS Analysis

20/10/15 YannickBoursin

Standard Workflow for NGS Analysis

Raw Reads

Reads Mapping

Data Analysis

Depends on the NGS Application

Sequencing &

Primary Analysis

Reads Cleaning

QC: 1 QC: 2 QC: 3

07-09th April 2014 NGS and Bioinformatics

30 21

AtypicalNGSworkflow

Page 22: NGS, Cancer and Bioinformacsrssf.i2bc.paris-saclay.fr/transfert/M2CANCERO/NGS... · NGS and Oncology 18 07-09th April 2014 NGS and Bioinformatics NGS is now widely used as: •A research

Step 2: Short Reads Alignment

20/10/15 YannickBoursin 22

Page 23: NGS, Cancer and Bioinformacsrssf.i2bc.paris-saclay.fr/transfert/M2CANCERO/NGS... · NGS and Oncology 18 07-09th April 2014 NGS and Bioinformatics NGS is now widely used as: •A research

Reads alignment - Vocabulary

20/10/15 YannickBoursin

Alignment:(mapping)Thereadsalignmentaimsattransformingthesinglereadsinforma;oninanorganizedandreducedsetofinforma;on.Mismatch:Incoherencebetweentwonucleo;desReferenceGenome:Thereferencegenomeisaknownsequence,supposedtobeascloseaspossibletotheinputgenome,andwhichisusedasananchortoorganizethesinglereadsinforma;on.Gap:Bridgewithinthereadalignment(i.e.smallInser;on/dele;on)Mappability:Uniquenessofaregion(repeatedregion=lowmappability,uniqueregion=goodmappability)Indels:Inser;on/Dele;onintothereferencegenome

23

Page 24: NGS, Cancer and Bioinformacsrssf.i2bc.paris-saclay.fr/transfert/M2CANCERO/NGS... · NGS and Oncology 18 07-09th April 2014 NGS and Bioinformatics NGS is now widely used as: •A research

Reads alignment – Two strategies

20/10/15 YannickBoursin

Thereadsalignmentaimsattransformingthesinglereadsinforma;oninanorganizedandreducedsetofinforma;on.Twostrategiescanbeapplied:-DenovoReadsAssemblyUsedwhennoreferencegenomeareavailable.Itaimsatreconstruc;nglongscaffoldsfromsinglereadsinforma;on.-AlignmentonaReferenceGenomeThereadsaredirectlycomparedtoaknownreferencegenome.

24

Page 25: NGS, Cancer and Bioinformacsrssf.i2bc.paris-saclay.fr/transfert/M2CANCERO/NGS... · NGS and Oncology 18 07-09th April 2014 NGS and Bioinformatics NGS is now widely used as: •A research

Alignment on a reference genome

20/10/15 YannickBoursin

Thereferencegenomeisaknownsequence,supposedtobeascloseaspossibletotheinputgenome,andwhichisusedasananchortoorganizethesinglereadsinforma;on.

The reference genome is a known sequence, supposed to be as close as possible to the input genome, and which is used as an anchor to organize the single reads information.

A C T A C G A C A T C T A C

A C G A C T T C T A C G A G T T T A C G A A G C T A C T

T T T A C G A A G C T A C T

G C T C C T A

T C C T A G C A C G A G C T

C G A G C T G

A G C T G C G C G G C C A A

C G A G C T G G G C C A A C

Alignment on a reference genome

A C T A C G A C T C T A C G A G C A T C T A C G A G C T A C T A G C G A T C T A C G A G C T G C G A G C A A C G GC C A A C Reference Genome Sequence

T G C C A A C A C C T T G G

07-09th April 2014 NGS and Bioinformatics

52 25

Alignmentofreadsagainstreferencegenome

Page 26: NGS, Cancer and Bioinformacsrssf.i2bc.paris-saclay.fr/transfert/M2CANCERO/NGS... · NGS and Oncology 18 07-09th April 2014 NGS and Bioinformatics NGS is now widely used as: •A research

Alignment on a reference genome

20/10/15 YannickBoursin

Thereferencegenomeisaknownsequence,supposedtobeascloseaspossibletotheinputgenome,andwhichisusedasananchortoorganizethesinglereadsinforma;on.

26

The reference genome is a known sequence, supposed to be as close as possible to the input genome, and which is used as an anchor to organize the single reads information.

Alignment on a reference genome

A C T A C G A C A T C T A C A C G A C T T C T A C G A G

T T T A C G A A G C T A C T T T T A C G A A G C T A C T

G C T C C T A T C C T A G C

A C G A G C T

C G A G C T G A G C T G C G

C G G C C A A

C G A G C T G G G C C A A C

A C T A C G A C T C T A C G A G C A T C T A C G A G C T A C T A G C G A T C T A C G A G C T G C G A G C A A C G GC C A A C Reference Genome Sequence

Homozygous Polymorphism (T/C)

T G C C A A C A C C T T G G

07-09th April 2014 NGS and Bioinformatics

53 Alignmentofreadsagainstreferencegenome

Page 27: NGS, Cancer and Bioinformacsrssf.i2bc.paris-saclay.fr/transfert/M2CANCERO/NGS... · NGS and Oncology 18 07-09th April 2014 NGS and Bioinformatics NGS is now widely used as: •A research

Alignment on a reference genome - Challenges

20/10/15 YannickBoursin

NewalignmentalgorithmsmustaddresstherequirementsandcharactericsofNGSreads–Millionsofreadsperrun(30xofgenomecoverage)–Readsofdifferentsize(35bp-200bp)–Differenttypesofreads(single-end,paired-end,mate-pair,etc.)–Base-callingqualityfactors–Sequencingerrors(~1%)–Repe;;veregions–Sequencingorganismvs.referencegenome–Mustadjusttoevolvingsequencingtechnologiesanddataformats

27

Page 28: NGS, Cancer and Bioinformacsrssf.i2bc.paris-saclay.fr/transfert/M2CANCERO/NGS... · NGS and Oncology 18 07-09th April 2014 NGS and Bioinformatics NGS is now widely used as: •A research

Alignment on a reference genome – Bioinformatics tools

20/10/15 YannickBoursin

Mappers timeline (since 2001)

Fonseca N A et al. Bioinformatics 2012;28:3169-3177 07-09th April 2014

Alignment on a reference genome Bioinformatics tools

07-09th April 2014 NGS and Bioinformatics

55 28

Page 29: NGS, Cancer and Bioinformacsrssf.i2bc.paris-saclay.fr/transfert/M2CANCERO/NGS... · NGS and Oncology 18 07-09th April 2014 NGS and Bioinformatics NGS is now widely used as: •A research

Finding the best alignment - Rational

20/10/15 YannickBoursin

Givenareferenceandasetofreads,reportatleastone“good”localalignmentforeachreadifoneexistsWhatis“good”?Fornow,weconcentrateon:–Fewermismatchesisbeuer

Given a reference and a set of reads, report at least  one  “good”  local  alignment for each read if one exists What is “good”?  For  now, we concentrate on: – Fewer mismatches is better – Failing to align a low-quality base is better than failing to align a high-quality base Based on a scoring system, i.e. score for a match (1), MM penalty (3), gap open penalty (5), gap extension penalty (2). The best alignment is the one with the highest score.

…  T  G  A  T  C  A  T A ...

G A T C A A

…  T  G  A  T .C A T A ...

G A G A A T

Is better than

…  T  G  A  T  A T T A ...

G A T c a.T

…  T  G  A T c a T A ...

G T A C A T

Is better than

Finding the best alignment Rational

07-09th April 2014 NGS and Bioinformatics

56

–Failingtoalignalow-qualitybaseisbeuerthanfailingtoalignahigh-qualitybase

Given a reference and a set of reads, report at least  one  “good”  local  alignment for each read if one exists What is “good”?  For  now, we concentrate on: – Fewer mismatches is better – Failing to align a low-quality base is better than failing to align a high-quality base Based on a scoring system, i.e. score for a match (1), MM penalty (3), gap open penalty (5), gap extension penalty (2). The best alignment is the one with the highest score.

…  T  G  A  T  C  A  T A ...

G A T C A A

…  T  G  A  T .C A T A ...

G A G A A T

Is better than

…  T  G  A  T  A T T A ...

G A T c a.T

…  T  G  A T c a T A ...

G T A C A T

Is better than

Finding the best alignment Rational

07-09th April 2014 NGS and Bioinformatics

56

Basedonascoringsystem,i.e.scoreforamatch(1),MMpenalty(3),gapopenpenalty(5),gapextensionpenalty(2).Thebestalignmentistheonewiththehighestscore.

29

Page 30: NGS, Cancer and Bioinformacsrssf.i2bc.paris-saclay.fr/transfert/M2CANCERO/NGS... · NGS and Oncology 18 07-09th April 2014 NGS and Bioinformatics NGS is now widely used as: •A research

Alignment key parameters - Repeats

20/10/15 YannickBoursin

Approximately50%ofthehumangenomeiscomprisedofrepeats

Treangen T.J. and Salzberg S.L. 2012. Nature review Genetics 13, 36-46

Approximately 50% of the human genome is comprised of repeats

07-09th April 2014 NGS and Bioinformatics

Alignment Key Parameters Repeats

58

07-09th April 2014 NGS and Bioinformatics

Treangen

T.J.and

SalzbergS.L.2012.Naturereview

Gen

e;cs13,36-46

30

Page 31: NGS, Cancer and Bioinformacsrssf.i2bc.paris-saclay.fr/transfert/M2CANCERO/NGS... · NGS and Oncology 18 07-09th April 2014 NGS and Bioinformatics NGS is now widely used as: •A research

Alignment key parameters - Repeats

20/10/15 YannickBoursin

Closeproximitywithgenes:intergenicandintragenicposi;onsClose proximity with genes : intergenic and intragenic positions

07-09th April 2014 NGS and Bioinformatics

Alignment Key Parameters Repeats

59

07-09th April 2014 NGS and Bioinformatics

31

BRCA2:amosaicofrepeatedregions

Page 32: NGS, Cancer and Bioinformacsrssf.i2bc.paris-saclay.fr/transfert/M2CANCERO/NGS... · NGS and Oncology 18 07-09th April 2014 NGS and Bioinformatics NGS is now widely used as: •A research

Alignment key parameters – Repeats – 3 strategies

20/10/15 YannickBoursin

-1-Reportonlyuniquealignment-2-Reportbestalignmentsandrandomlyassignreadsacrossequalygoodloci-3-Reportall(best)alignments

Treangen T.J. and Salzberg S.L. 2012. Nature review Genetics 13, 36-46

-1- Report only unique alignment -2- Report best alignments and randomly assign reads across equaly good loci -3- Report all (best) alignments

A B A B A B

-1- -2- -3-

07-09th April 2014 NGS and Bioinformatics

Alignment Key Parameters Repeats – Three strategies

60

07-09th April 2014 NGS and Bioinformatics

TreangenT.J.andSalzbergS.L.2012.NaturereviewGene;cs13,36-46

32

Page 33: NGS, Cancer and Bioinformacsrssf.i2bc.paris-saclay.fr/transfert/M2CANCERO/NGS... · NGS and Oncology 18 07-09th April 2014 NGS and Bioinformatics NGS is now widely used as: •A research

Alignment key parameters – Using single or paired-end reads ?

20/10/15 YannickBoursin

Thetypeofsequencing(i.e.singleorpaired-endreads)isowendrivenbytheapplica;on.Exemple:Findinglargeindels,genomicrearrangements,...However,inmostofthecase,thepairinforma;oncanimprovethemappingspecificity-Single-endalignment–repeatedsequence

The type of sequencing (i.e. single or paired-end reads) is often driven by the application Exemple : Finding large indels, genomic rearrangements, ... However, in most of the case, the pair information can improve the mapping specificity

- Single-end alignment – repeted sequence

A C G A C T C A C G A C T C G G C C A A C G G C C A A C

- Paired-end alignment – unique sequence

A C G A C T C A C G A C T C

Alignment Key Parameters Using single or paired-end reads ?

A C T A C G A C T C T A C G A G C A T C T A C G A G C T A C T A G C G A T C T A C G A G C T G C G A G C A A C G GC C A A C Reference Genome Sequence

A C T A C G A C T C T A C G A G C A T C T A C G A G C T A C T A G C G A T C T A C G A G C T G C G A G C A A C G GC C A A C Reference Genome Sequence

07-09th April 2014 NGS and Bioinformatics

61

-Paired-endalignment–uniquesequence

The type of sequencing (i.e. single or paired-end reads) is often driven by the application Exemple : Finding large indels, genomic rearrangements, ... However, in most of the case, the pair information can improve the mapping specificity

- Single-end alignment – repeted sequence

A C G A C T C A C G A C T C G G C C A A C G G C C A A C

- Paired-end alignment – unique sequence

A C G A C T C A C G A C T C

Alignment Key Parameters Using single or paired-end reads ?

A C T A C G A C T C T A C G A G C A T C T A C G A G C T A C T A G C G A T C T A C G A G C T G C G A G C A A C G GC C A A C Reference Genome Sequence

A C T A C G A C T C T A C G A G C A T C T A C G A G C T A C T A G C G A T C T A C G A G C T G C G A G C A A C G GC C A A C Reference Genome Sequence

07-09th April 2014 NGS and Bioinformatics

61 33

Alignmentofreadsagainstreferencegenome

Page 34: NGS, Cancer and Bioinformacsrssf.i2bc.paris-saclay.fr/transfert/M2CANCERO/NGS... · NGS and Oncology 18 07-09th April 2014 NGS and Bioinformatics NGS is now widely used as: •A research

Alignment on a reference genome

20/10/15 YannickBoursin

Keypoints•ThealignmentisacrucialstepoftheNGSanalysis.•Thereferencegenomehastobecarefullychosen.•Themappabilityoftheregionofinteresthastobetakenintoaccount(primerdesign).•Thescoringmethodhastobechosenaccordinglytothesequencingerrorrateandthequalityoftherawreads.•Thealignmentparametershavetobesetproperly.

34

Page 35: NGS, Cancer and Bioinformacsrssf.i2bc.paris-saclay.fr/transfert/M2CANCERO/NGS... · NGS and Oncology 18 07-09th April 2014 NGS and Bioinformatics NGS is now widely used as: •A research

Limitations of Alignment Tools

20/10/15 YannickBoursin

Evenifwehavenowsomenicetoolstoalignreadsonareferencegenome,severalissuesares;llimportant:-Homopolymermapping-Efficientlyalignsmallindels-Alignmentonseveralgenomes-Alignmentonrepeatedsequences-...

35

Page 36: NGS, Cancer and Bioinformacsrssf.i2bc.paris-saclay.fr/transfert/M2CANCERO/NGS... · NGS and Oncology 18 07-09th April 2014 NGS and Bioinformatics NGS is now widely used as: •A research

Alignment formats

20/10/15 YannickBoursin

•Alotofformatsexists:

•  SAM•  BAM•  ELAND(Illuminaspecific)•  MAQmap•  …

SAMandBAMarenowthestandardforaligneddata

36

Page 37: NGS, Cancer and Bioinformacsrssf.i2bc.paris-saclay.fr/transfert/M2CANCERO/NGS... · NGS and Oncology 18 07-09th April 2014 NGS and Bioinformatics NGS is now widely used as: •A research

SAM format

20/10/15 YannickBoursin

•SAMforSequenceAlignmentMap•Tabulatedtextfile•1lineperread•Eachlineiscomposedof11fields(minimum)

SAM format

• SAM for Sequence Alignment Map • Tabulated text file • 1 line per read • Each line is composed of 11 fields (minimum)

07-09th April 2014 NGS and Bioinformatics

70 37

Page 38: NGS, Cancer and Bioinformacsrssf.i2bc.paris-saclay.fr/transfert/M2CANCERO/NGS... · NGS and Oncology 18 07-09th April 2014 NGS and Bioinformatics NGS is now widely used as: •A research

SAM format

20/10/15 YannickBoursin

SAM format

07-09th April 2014 NGS and Bioinformatics

11695_6 0 chr1 3292760 255 20M * 0 0 AAGAGATCTGGAACCATAGA DGDFCDGFFGBEFFGFDEEF XA:i:0 MD:Z:20 NM:i:0 XX:i:3984 9985_1 0 chr1 3292761 255 19M * 0 0 AGAGATCTGGAACCATAGA IIIIIIIIIIIIIIIIIII XA:i:0 MD:Z:19 NM:i:0 XX:i:3990 4226_1 0 chr1 3296594 255 22M * 0 0 TCTGCAAGGCAAAAGACACTGT GHHHHHGHGHHHGHHHHBHBGG XA:i:0 MD:Z:22 NM:i:0 XX:i:4194 7001_1 0 chr1 3328828 255 20M * 0 0 AAGAAAGAGAACTTCAGACC GGGG+GGGGGGIIIIIBHII XA:i:0 MD:Z:20 NM:i:0 XX:i:2357 1042_1 0 chr1 3334731 255 21M * 0 0 GGGACTCAGCAGAACTTAGGA ?@GGGDGGGG>DDGGGGGGDB XA:i:0 MD:Z:21 NM:i:0 XX:i:1027 14647_1 0 chr1 3334756 255 23M * 0 0 AGTCTGAACAGGTTAGAGGGTGC IIIIIIEGIHIGID<DBDGDBGB XA:i:0 MD:Z:23 NM:i:0 XX:i:1910

71

38

Page 39: NGS, Cancer and Bioinformacsrssf.i2bc.paris-saclay.fr/transfert/M2CANCERO/NGS... · NGS and Oncology 18 07-09th April 2014 NGS and Bioinformatics NGS is now widely used as: •A research

SAM format

20/10/15 YannickBoursin

•Secondfieldcanbeusedforquicksortoffile

SAM format

• Second field can be used for quick sort of file

• With Samtools (command line) and –f et –F options • Useful webpage:

• http://picard.sourceforge.net/explain-flags.html

07-09th April 2014 NGS and Bioinformatics

72

•WithSamtools(commandline)and–fet–Fop;ons•Usefulwebpage:

•  hup://broadins;tute.github.io/picard/explain-flags.html39

Page 40: NGS, Cancer and Bioinformacsrssf.i2bc.paris-saclay.fr/transfert/M2CANCERO/NGS... · NGS and Oncology 18 07-09th April 2014 NGS and Bioinformatics NGS is now widely used as: •A research

BAM format

20/10/15 YannickBoursin

•BAMforBinaryAlignment/Map•CorrespondtoSAMformatcompressedasBGZF•Reduceby5;mesthesizeofthealignmentfile•NotdirectlyreadableasSAMformat•RequireSamtools•Bestformatforalignmentfilesharing•Coupleswithanindexfile(BAI)•Avoidasequen;alreadofthecompletefile

40

Page 41: NGS, Cancer and Bioinformacsrssf.i2bc.paris-saclay.fr/transfert/M2CANCERO/NGS... · NGS and Oncology 18 07-09th April 2014 NGS and Bioinformatics NGS is now widely used as: •A research

Quality controls on aligned data : Standard workflow for NGS analysis

20/10/15 YannickBoursin

Standard Workflow for NGS Analysis

Raw Reads

Reads Mapping

Data Analysis

Depends on the NGS Application

Sequencing &

Primary Analysis

Reads Cleaning

QC: 1 QC: 2 QC: 3

07-09th April 2014 NGS and Bioinformatics

75 41

AtypicalNGSworkflow

Page 42: NGS, Cancer and Bioinformacsrssf.i2bc.paris-saclay.fr/transfert/M2CANCERO/NGS... · NGS and Oncology 18 07-09th April 2014 NGS and Bioinformatics NGS is now widely used as: •A research

QC 3 : Which metric to check ?

20/10/15 YannickBoursin

Inprac7ce,howtovalidatemyalignment?BeawareofthemappingstrategyusedLookatsimpledescrip;vesta;s;cs

–Numberofalignedreads–Coverage/Depth–Mappingquality–Numberofnormal/abnormalpairsforpaired-enddata–Strandbias–...

42

Page 43: NGS, Cancer and Bioinformacsrssf.i2bc.paris-saclay.fr/transfert/M2CANCERO/NGS... · NGS and Oncology 18 07-09th April 2014 NGS and Bioinformatics NGS is now widely used as: •A research

Paired-end mapping

20/10/15 YannickBoursin

•Insert-sizechecking

Paired-end mapping

• Insert-size checking

• % of "All Good"= both reads in the pair have aligned • "the pair is properly aligned" meaning that they mapped within a proper

distance from each other • % of "All Bad" = neither the read nor its mate mapped • % of Only one read maps = only one read in a pair is mapped

07-09th April 2014 NGS and Bioinformatics

78

Paired-end mapping

• Insert-size checking

• % of "All Good"= both reads in the pair have aligned • "the pair is properly aligned" meaning that they mapped within a proper

distance from each other • % of "All Bad" = neither the read nor its mate mapped • % of Only one read maps = only one read in a pair is mapped

07-09th April 2014 NGS and Bioinformatics

78

•%of"AllGood"=bothreadsinthepairhavealigned•"thepairisproperlyaligned"meaningthattheymappedwithinaproperdistancefromeachother•%of"AllBad"=neitherthereadnoritsmatemapped•%ofOnlyonereadmaps=onlyonereadinapairismapped

43

Page 44: NGS, Cancer and Bioinformacsrssf.i2bc.paris-saclay.fr/transfert/M2CANCERO/NGS... · NGS and Oncology 18 07-09th April 2014 NGS and Bioinformatics NGS is now widely used as: •A research

NGS Analysis : How can I work with my NGS data ?

20/10/15 YannickBoursin

•Difficultonpersonalcomputer(lackofressources)•1alignement=4processors+15gbRam(tomul;plybythenumberofsamples)•Impossibletoopenfilesintosofwaresliketexteditor•Needaverylargestoragecapacity•Databackupadministra;on•Applica;onsserverconnectedtoacompu;ngclusterandstoragearray:

•Commercialssolu;on(CLCBio,NextGene,...)•Galaxyserver: hWps://galaxy.gustaveroussy.fr/galaxyprod

44

Page 45: NGS, Cancer and Bioinformacsrssf.i2bc.paris-saclay.fr/transfert/M2CANCERO/NGS... · NGS and Oncology 18 07-09th April 2014 NGS and Bioinformatics NGS is now widely used as: •A research

Data analysis

20/10/15 YannickBoursin

Standard Workflow for NGS Analysis

Raw Reads

Reads Mapping

Data Analysis

Depends on the NGS Application

Sequencing &

Primary Analysis

Reads Cleaning

QC: 1 QC: 2 QC: 3

07-09th April 2014 NGS and Bioinformatics

30 45

AtypicalNGSworkflow

Page 46: NGS, Cancer and Bioinformacsrssf.i2bc.paris-saclay.fr/transfert/M2CANCERO/NGS... · NGS and Oncology 18 07-09th April 2014 NGS and Bioinformatics NGS is now widely used as: •A research

Data Analyses in Cancer

Chimerictranscriptsearch

Alterna;vetranscriptsstudy

Differen;alexpressionstudy

Methyla;onstudy Detec;onofgenomicvariants

Detec;onofcopy-numbervaria;on

20/10/15 YannickBoursin 46

Page 47: NGS, Cancer and Bioinformacsrssf.i2bc.paris-saclay.fr/transfert/M2CANCERO/NGS... · NGS and Oncology 18 07-09th April 2014 NGS and Bioinformatics NGS is now widely used as: •A research

Chimeric transcripts

20/10/15 YannickBoursin

Doesthetumoralcellsexpressanychimerictranscript?

47

Historyofthebcr-ablfusion

Page 48: NGS, Cancer and Bioinformacsrssf.i2bc.paris-saclay.fr/transfert/M2CANCERO/NGS... · NGS and Oncology 18 07-09th April 2014 NGS and Bioinformatics NGS is now widely used as: •A research

Alternative transcripts

20/10/15 YannickBoursin 48

Page 49: NGS, Cancer and Bioinformacsrssf.i2bc.paris-saclay.fr/transfert/M2CANCERO/NGS... · NGS and Oncology 18 07-09th April 2014 NGS and Bioinformatics NGS is now widely used as: •A research

Differential expression

20/10/15 YannickBoursin 49

Aretheregenesthatwouldbestronglyexpressedinonekindoftumorthatarenotintheotherkind?Canwegrouptumorsaccordingtotheirexpressionprofiles?

Clusteringdifferen;alexpressioninbreasttumours.

Page 50: NGS, Cancer and Bioinformacsrssf.i2bc.paris-saclay.fr/transfert/M2CANCERO/NGS... · NGS and Oncology 18 07-09th April 2014 NGS and Bioinformatics NGS is now widely used as: •A research

Methylome

20/10/15 YannickBoursin 50

IsthereanydifferencebetweenDNAmethyla;onintumorsandinnormalcells?

Howdoesmethyla;onpromotescancer?

Page 51: NGS, Cancer and Bioinformacsrssf.i2bc.paris-saclay.fr/transfert/M2CANCERO/NGS... · NGS and Oncology 18 07-09th April 2014 NGS and Bioinformatics NGS is now widely used as: •A research

Detection of copynumber variations

20/10/15 YannickBoursin 51

Arethereanycopy-numberaltera;on(gainorlossofchomosomalregions,amplifica;ons…)thatcouldexplaintumorigenesis?

Copynumbervaria;onsincancer.MYCandKRASareamplified.

Page 52: NGS, Cancer and Bioinformacsrssf.i2bc.paris-saclay.fr/transfert/M2CANCERO/NGS... · NGS and Oncology 18 07-09th April 2014 NGS and Bioinformatics NGS is now widely used as: •A research

Detection of genomic variants

20/10/15 YannickBoursin 52

Aretheremuta;onaleventsthatarespecifictothetumoralgenome?Couldthetumorigenesisbeexplainedbythose?Isthereanydrugtarge;ngthosemuta;ons?

Pancreasadenocarcinoma:fromnormalcellstotumoralcells

Page 53: NGS, Cancer and Bioinformacsrssf.i2bc.paris-saclay.fr/transfert/M2CANCERO/NGS... · NGS and Oncology 18 07-09th April 2014 NGS and Bioinformatics NGS is now widely used as: •A research

Limitations: Detection of genomic variants

20/10/15 YannickBoursin 53

Between1.4and8.9%ofthevariantsaretechnologyspecific

The reasons why a SNP is not detected by one sequencingtechnology, whereas it is reported by another, can be broadlydivided into three categories:

N Issues related to coverage: These can be furthersubdivided into complete lack of coverage, low coverage(which is not enough to call a SNP based on predefinedcriteria), and higher-than-expected coverage (based on amodel used to separate SNPs from structural variants andassembly errors) at the candidate location.

N Issues with the alternate allele: Most software tools(including SAMtools and Newbler) require observing thealternate allele at least twice or more, before they consider thelocation as a potential variant. These can be further subdividedinto instances where the alternate allele is not seen at all andothers, when the alternate allele is not seen a sufficient numberof times.

N Issues with the variant calling: These refer to thesituations where the alternate allele is seen a requisite numberof times, but the SNP is not called due to other reasons. Thesereasons may include proximity to many other SNPs, proximityto a high quality indel, existence in a non-uniquely alignableregion, and a huge deviation from the expected diploidbehavior of the sample for the data aligned using BWA. Forthe reads aligned using Newbler, the reasons include thelocation being in a non-uniquely alignable region and otheralignment errors that arise due to the unique error-profile ofthe 454 reads.

We investigated the alignments at the 439,122 locations thatwere called as putative variants by using 454 and Illuminasequences, but not using SOLiD sequences (Figure 4a i). Weassigned each location to a particular category based on the reasonwhy it was not called a SNP. We found that the variant allele wasobserved in the SOLiD reads in 64% of these cases, but the SNPwas filtered away for various reasons. 27% of the locations werefiltered away due to a low SNP quality (defined as the Phred-scaledlikelihood that the called genotype is identical to the reference),18% of them were filtered away due to a low RMS (root meansquare) mapping quality (reflecting the limitation of shorter reads)and another 19% were filtered away as the variant allele was notseen enough number of times. Coverage related issues (nocoverage, too little coverage or more than expected coverage)

were responsible for another 19% of the locations. The alternateallele was not seen at all, despite adequate coverage at the site, forthe remaining 17% locations.

For the 71,567 locations that were called using the SOLiDsequences (but not by others), we looked at the alignments for boththe 454 dataset and the Illumina datasets. At about 15% of theselocations (Figure 4a ii), the alternate allele was seen just once in the454 dataset and at about another 16% of them, the coverage of454 reads was not enough to call a SNP. For another 21% of thelocations the SNP was not called by Newbler, even though theallele was seen multiple times in the pairwise alignments betweenthe reference and the 454 reads, with most of them beingassociated with homopolymer errors. On the other hand at 25% ofthese locations the SNP was seen in the Illumina dataset (Figure 4aiii), but it was filtered away due to a lower SNP quality (15%), orbecause lower mapping quality (9%). Another 14% of theselocations did not have sufficient coverage with Illumina reads to beconsidered in SNP calling. Considering the locations where both454 and Illumina had little, no, or higher than expected coverage,and where the alternate allele was seen at least once in either 454or Illumina dataset as true SNPs, we expect 14,707 of the 71,567locations to be false-positives for the SOLiD calls.

When we looked at the 47,381 locations that were called a SNPusing 454 and SOLiD reads, we found that primary reason (at60% of the locations) these were not called a SNP with Illuminareads had to do with the coverage (Figure 4b i). 57% of thelocations were in regions where the coverage was more thanexpected (signaling a putative structural variant), whereas therewas little of no coverage for the remaining 3%. We used a Poissondistribution with the same mean value to calculate the coveragethreshold to filter variants, but this data suggests that a gammadistribution with more weight on more tails is probably a bettermodel for Illumina data. The second largest contributor was lowSNP quality (22% of the locations), which is the result of anobserved deviation from the expectation that both allele should beseen approximately the same number of times on a heterozygouslocation.

We found 225,981 locations that were called as putative variantsusing Illumina reads only. Looking at the alignments for theSOLiD reads at those locations (Figure 4b ii), we found that for22% of them we saw the alternate allele a sufficient number oftimes, but it was filtered away either due to low RMS mappingquality or a low SNP quality. Another 16% of the locations were

Figure 3. Venn diagram showing the overlap in the SNP calls made using data from the three sequencing technologies. We displaythe sizes of each of the seven categories of overlaps among the variant calls in the three technologies. (a) depicts the overlaps when all substitutioncalls are used, (b) depicts the overlaps when all calls from Illumina and SOLiD are used but only the high-confidence subset of the 454 dataset is used,and (c) depicts the overlaps when only the variants in the uniquely alignable regions of the reference sequence are used.doi:10.1371/journal.pone.0055089.g003

Comparison of Sequencing Platforms

PLOS ONE | www.plosone.org 4 February 2013 | Volume 8 | Issue 2 | e55089

Page 54: NGS, Cancer and Bioinformacsrssf.i2bc.paris-saclay.fr/transfert/M2CANCERO/NGS... · NGS and Oncology 18 07-09th April 2014 NGS and Bioinformatics NGS is now widely used as: •A research

Limitations: Detection of genomic variants

20/10/15 YannickBoursin 54

Commongenomicvariantsbetweendifferentvariantcallers

Page 55: NGS, Cancer and Bioinformacsrssf.i2bc.paris-saclay.fr/transfert/M2CANCERO/NGS... · NGS and Oncology 18 07-09th April 2014 NGS and Bioinformatics NGS is now widely used as: •A research

Conclusion

• Nowadays,NGSiswidelyusedincancercentersinordertocategorizecancersandlinkpa;entswithpersonnalizedtreatments(PrecisionMedicine)

• NGSarealsousedincancerresearch,inordertodiscovernewoncogene;cmechanisms,tounderstandthewayatreatmentworks,tolinkbiologicalandgene;calcharacters…

•  Duetotechnicaland“how-the-universe-works-relatedissues”,usingNGSmightnotsolveyourproblems.Itisimportanttoknowthatthetechniqueislimited:

•  A)bytheques;onyouaskedatfirst.Ifacancercannotbeexplainedbymuta;onalevents,itmightbeexplainedbyothermechanisms.Buts;ll,nothingistobefoundindata.

•  B)bytechnicalissues.Sequencersandsowwaresarepronetoerrors.Sta;s;cally,therewillbeatleastoneerrorforyouranalysis.Youcanowenlimittheroleofthislimita;onbymakingbiologicalandtechnicalreplicates.

20/10/15 YannickBoursin 55

Page 56: NGS, Cancer and Bioinformacsrssf.i2bc.paris-saclay.fr/transfert/M2CANCERO/NGS... · NGS and Oncology 18 07-09th April 2014 NGS and Bioinformatics NGS is now widely used as: •A research

Galaxy: a web-based genome analysis platform

29janvier2015 Forma;onNGS&Cancer-AnalysesExome

Galaxy: a web-based genome analysis platform

• Galaxy is an open-source framework for integrating various computational tools and databases into a cohesive workspace

• https://main.g2.bx.psu.edu/ • A web-based service that provides and integrates many popular tools and

resources for comparative genomics • A completely self-contained application for building your own Galaxy style sites

NGS – Galaxy NGS and Bioinformatics

94

07-09th April 2014

•Galaxyisanopen-sourceframeworkforintegra;ngvariouscomputa;onaltoolsanddatabasesintoacohesiveworkspace•hWps://main.g2.bx.psu.edu/•Aweb-basedservicethatprovidesandintegratesmanypopulartoolsandresourcesforcompara;vegenomics•Acompletelyself-containedapplica;onforbuildingyourownGalaxystylesites

Page 57: NGS, Cancer and Bioinformacsrssf.i2bc.paris-saclay.fr/transfert/M2CANCERO/NGS... · NGS and Oncology 18 07-09th April 2014 NGS and Bioinformatics NGS is now widely used as: •A research

Galaxy: the instant web-based tool and data resource integration platform

29janvier2015 Forma;onNGS&Cancer-AnalysesExome

•OpenSourcedownloadablepackagethatcanbedeployedinindividuallabs•Modularized•Addnewtools•Integratenewdatasources•Easytopluginyourowncomponents•Straigh|orwardtorunyourownprivategalaxyserver

Page 58: NGS, Cancer and Bioinformacsrssf.i2bc.paris-saclay.fr/transfert/M2CANCERO/NGS... · NGS and Oncology 18 07-09th April 2014 NGS and Bioinformatics NGS is now widely used as: •A research

Galaxy: the one-stop shop for genome analysis

29janvier2015 Forma;onNGS&Cancer-AnalysesExome

•Analyze•Retrieveshareddatabetweengalaxyusersoruploadyourown•Interac;velymanipulategenomicdatawithacomprehensiveandexpandingbest-prac;cestoolset•Galaxyisdesignedtoworkwithmanydifferentdatatypes.•hup://wiki.galaxyproject.org/Learn/Datatypes•Visualize•Visualanalysisenvironmentofyourdata,youranalysisworkflows.•PublishandShare•Resultsandstep-by-stepanalysisrecord(DataLibrariesandHistories)•Customizablepipelines(Workflows)•Completeprotocols/documenta;ons(Pages)

Page 59: NGS, Cancer and Bioinformacsrssf.i2bc.paris-saclay.fr/transfert/M2CANCERO/NGS... · NGS and Oncology 18 07-09th April 2014 NGS and Bioinformatics NGS is now widely used as: •A research

https://galaxy.gustaveroussy.fr/galaxyprod

29janvier2015 Forma;onNGS&Cancer-AnalysesExome

Page 60: NGS, Cancer and Bioinformacsrssf.i2bc.paris-saclay.fr/transfert/M2CANCERO/NGS... · NGS and Oncology 18 07-09th April 2014 NGS and Bioinformatics NGS is now widely used as: •A research

Data libraries

29janvier2015 Forma;onNGS&Cancer-AnalysesExome

•DatasetsareaccessiblefromGalaxyorfordownload.

Page 61: NGS, Cancer and Bioinformacsrssf.i2bc.paris-saclay.fr/transfert/M2CANCERO/NGS... · NGS and Oncology 18 07-09th April 2014 NGS and Bioinformatics NGS is now widely used as: •A research

History

29janvier2015 Forma;onNGS&Cancer-AnalysesExome

•Historiesareallstepsintheprocessandtheusedse}ng.•Historiescanbeimportedintoyoursessionandrerunasisormodified.

Page 62: NGS, Cancer and Bioinformacsrssf.i2bc.paris-saclay.fr/transfert/M2CANCERO/NGS... · NGS and Oncology 18 07-09th April 2014 NGS and Bioinformatics NGS is now widely used as: •A research

Workflows

29janvier2015 Forma;onNGS&Cancer-AnalysesExome

•Workflowsspecifythestepsinaprocess(asuiteoforderedtools).•Workflowsareanalysesthataremeanttoberun,each;mewithdifferentuser-provideddatasets.

Page 63: NGS, Cancer and Bioinformacsrssf.i2bc.paris-saclay.fr/transfert/M2CANCERO/NGS... · NGS and Oncology 18 07-09th April 2014 NGS and Bioinformatics NGS is now widely used as: •A research

User account

29janvier2015 Forma;onNGS&Cancer-AnalysesExome

•GalaxypublicMainorTestinstances•Anaccountisnotrequiredtoaccessit•Butifused,thedataquotaisincreasedandfullfunc;onalityacrosssessionsopensup,suchasnaming,saving,sharing,andpublishingGalaxyobjects(Histories,Workflows,Datasets,Pages).

•Galaxy@GR:hups://galaxy.gustaveroussy.fr/galaxyprod

•Anaccountisrequiredtoaccessit•fullfunc;onalityacrosssessionsopensup,suchasnaming,

saving,sharing,andpublishingGalaxyobjects(Histories,Workflows,Datasets,Pages).

Page 64: NGS, Cancer and Bioinformacsrssf.i2bc.paris-saclay.fr/transfert/M2CANCERO/NGS... · NGS and Oncology 18 07-09th April 2014 NGS and Bioinformatics NGS is now widely used as: •A research

20/10/15 YannickBoursin 64