DTL Focus meeting: Using GRCh38 in NGS data analysis Time slotSpeakerSubject 12:45-13:00Coffee/tea...

18
DTL Focus meeting: Using GRCh38 in NGS data analysis Time slot Speaker Subject 12:45-13:00 Coffee/tea Coffee/tea 13:00-13:20 Ies Nijman (UMCU) Welcome & Introduction to GRCh38 (hg20) 13:20-13:40 Pieter Neerinx (UMCG) Migration of tools, pipelines to support GRCh38 13:40-14:00 Pjotr Prins BWA handling of ALT- contigs 14:00-14:10 Tea break Tea break 14:10-14:30 Zuotian Tatum (LUMC) New insights on Differential Gene Expression using GRCh38 14:30-14:50 Wibowo Arindrarto (LUMC) Comparison of hg19 and GRCh38 in the study of DUX4 gene 14:50-15:30 Ies Nijman (UMCU) Wrap-up and open

Transcript of DTL Focus meeting: Using GRCh38 in NGS data analysis Time slotSpeakerSubject 12:45-13:00Coffee/tea...

Page 1: DTL Focus meeting: Using GRCh38 in NGS data analysis Time slotSpeakerSubject 12:45-13:00Coffee/tea 13:00-13:20Ies Nijman (UMCU) Welcome & Introduction.

DTL Focus meeting: Using GRCh38 in NGS data analysis

Time slot Speaker Subject

12:45-13:00 Coffee/tea Coffee/tea

13:00-13:20 Ies Nijman (UMCU) Welcome & Introduction to GRCh38 (hg20)

13:20-13:40 Pieter Neerinx (UMCG) Migration of tools, pipelines to support GRCh38

13:40-14:00 Pjotr PrinsBWA handling of ALT-contigs

14:00-14:10 Tea break Tea break

14:10-14:30 Zuotian Tatum (LUMC) New insights on Differential Gene Expression using GRCh38

14:30-14:50 Wibowo Arindrarto (LUMC) Comparison of hg19 and GRCh38 in the study of DUX4 gene

14:50-15:30 Ies Nijman (UMCU) Wrap-up and open discussions

Page 2: DTL Focus meeting: Using GRCh38 in NGS data analysis Time slotSpeakerSubject 12:45-13:00Coffee/tea 13:00-13:20Ies Nijman (UMCU) Welcome & Introduction.

GRCh38 / hg20

Page 3: DTL Focus meeting: Using GRCh38 in NGS data analysis Time slotSpeakerSubject 12:45-13:00Coffee/tea 13:00-13:20Ies Nijman (UMCU) Welcome & Introduction.

Human genome build hg20

• Basic new assembly released dec 24th 2013, now GRCh38.p2 (dec 8th, 2014)

• 5-7 megabases of added sequence to primary reference

• Many corrected regions (patches) to hg19

• 261 alternative loci: chromosomal regions with high variability (~66 MB)

• 128 large unplaced sequence regios

• Human_herpes_virus (EBV) mapping decoy (171 kb)

• Centromere sequences: gaps are replaced by sequence models of the centromer

repeats

• New mitochondrial sequence: Revised Cambridge Reference Sequence (rCRS)

from MITOMAP

• 4 PAR regions

• This means that coordinates change! Lift-over strategies will not completely solve it.

Page 4: DTL Focus meeting: Using GRCh38 in NGS data analysis Time slotSpeakerSubject 12:45-13:00Coffee/tea 13:00-13:20Ies Nijman (UMCU) Welcome & Introduction.

Human genome build hg20

Page 5: DTL Focus meeting: Using GRCh38 in NGS data analysis Time slotSpeakerSubject 12:45-13:00Coffee/tea 13:00-13:20Ies Nijman (UMCU) Welcome & Introduction.

Human genome build hg20

• New genebuild now available (20.364 coding genes; 2.101 in alternative

loci)

• Only few calling/annotation tools support hg20 yet (VEP fi)

• Ensembl default genome is hg20!! Latest hg19 site is beeing maintained

through archive link.

• dbSNP locations available for hg20

• 1000G data will be remapped and recalled (est Q1,/Q2 2015)

Page 6: DTL Focus meeting: Using GRCh38 in NGS data analysis Time slotSpeakerSubject 12:45-13:00Coffee/tea 13:00-13:20Ies Nijman (UMCU) Welcome & Introduction.

Human genome build hg20-Challenges and opportunities-

• How to use these alternative loci? In hg19 only few were present and

mostly blissfully ignored..

• Challenge I: mapping strategy and tools needs to be changed

• In prep: iBWA, srprism

• BWA 0.7.12 (29 dec 2014) supports ALTs in a two-step approach

• Challenge II: variant callers need to be aware of alternative

references (and context)

• Challenge III: how to display this data in genome browsers etc, while

maintaining context?

• Challenge IV: nomenclature

• The primary assembly contains all patches and fixes to hg19 and is still a

good starting point.

Page 7: DTL Focus meeting: Using GRCh38 in NGS data analysis Time slotSpeakerSubject 12:45-13:00Coffee/tea 13:00-13:20Ies Nijman (UMCU) Welcome & Introduction.

What are these ALT loci?

• Scaffolds that provide an alternate representation of a locus found in the

primary reference.

• long regions with clustered variations (ie LRC/KIR chr19 and MHC on

chr6.HLA loci)

• Next to different haplo-variants of genes, contain also genes not in the

primary assembly (20 prot.coding, ~40 predicted prot.cod.,

pseudogenes, lincs)

• Mind: ALTernative approaches between NCBI and ensembl: NCBI uses

primary chromosomes and ALT loci while ensembl build a completely

new ALT chromosome (so incl identical sequence)

Page 8: DTL Focus meeting: Using GRCh38 in NGS data analysis Time slotSpeakerSubject 12:45-13:00Coffee/tea 13:00-13:20Ies Nijman (UMCU) Welcome & Introduction.

Usage scenarios

• I: use primary reference (toplevel chrs)

• II: use primary reference + mapping decoys (Un + EBV)

• Improves mapping accuracy

• Only feed primary reference to variantcaller

• III: use primary reference + ALT loci + mapping decoys (Un + EBV)

• Improves mapping accuracy (?)

• A:Only feed primary reference to variantcaller

• B: Run variantcaller on all loci…

Page 9: DTL Focus meeting: Using GRCh38 in NGS data analysis Time slotSpeakerSubject 12:45-13:00Coffee/tea 13:00-13:20Ies Nijman (UMCU) Welcome & Introduction.

Adding the mapping decoys

Grch38_full_plus_analysisset Grch38_full_analysissetClass Total bp Total bpPrimary 3.088.286.401 3.088.286.401Unlocalized 6.978.808 6.978.808Unplaced 4.485.509 4.485.509ALT 109.535.387 109.535.387decoy 5.964.345 171.823Total 3.215.250.450 3.209.457.928

graphs based on 11 Xten WGS samples

Page 10: DTL Focus meeting: Using GRCh38 in NGS data analysis Time slotSpeakerSubject 12:45-13:00Coffee/tea 13:00-13:20Ies Nijman (UMCU) Welcome & Introduction.

Personalis, Inc. | Confidential and Proprietary10

GRCh37.p13 Improved alignments outside of fix patch regions

Regions outside of fix patches

Jason Harris

hs37d5

GRCh37.p13

hs37d5GRCh37.p13

Page 11: DTL Focus meeting: Using GRCh38 in NGS data analysis Time slotSpeakerSubject 12:45-13:00Coffee/tea 13:00-13:20Ies Nijman (UMCU) Welcome & Introduction.

Heng Li: BWA approach to ALT mapping

• ALTs supported in >v0.7.11 through additional ID-list file $ref.alt

• Advised to use NCBI ngs-analyses sets (3 flavors) with slightly modified

sequences to facilitate mapping (hardmasked PAR and centromeric

regions)

1. The original mapQ of a non-ALT hit is computed across non-ALT hits only.

The reported mapQ of an ALT hit is always computed across all hits.

2. An ALT hit is only reported if its score is better than all overlapping non-

ALT hits. A reported ALT hit is flagged with 0x800 (supplementary) unless

there are no non-ALT hits.

3. The mapQ of a non-ALT hit is reduced to zero if its score is less than 80%

(controlled by option -g) of the score of an overlapping ALT hit. In this

case, the original mapQ is moved to the om tag.

Page 12: DTL Focus meeting: Using GRCh38 in NGS data analysis Time slotSpeakerSubject 12:45-13:00Coffee/tea 13:00-13:20Ies Nijman (UMCU) Welcome & Introduction.

Heng Li: BWA approach

Page 13: DTL Focus meeting: Using GRCh38 in NGS data analysis Time slotSpeakerSubject 12:45-13:00Coffee/tea 13:00-13:20Ies Nijman (UMCU) Welcome & Introduction.

Variantcalling on ALTs?

Page 14: DTL Focus meeting: Using GRCh38 in NGS data analysis Time slotSpeakerSubject 12:45-13:00Coffee/tea 13:00-13:20Ies Nijman (UMCU) Welcome & Introduction.

Variant calling on ALTs?

Page 15: DTL Focus meeting: Using GRCh38 in NGS data analysis Time slotSpeakerSubject 12:45-13:00Coffee/tea 13:00-13:20Ies Nijman (UMCU) Welcome & Introduction.

Variant calling on ALTs?

• By adding the ALT loci in mapping and calling we gain better haplo

aware mappings/calls, but it is not clearly reflected in the vcf

• Adding ‘ haplotyping’ to the VCF format

A. Quinlan, Virginia, GRC WS 2014

Page 16: DTL Focus meeting: Using GRCh38 in NGS data analysis Time slotSpeakerSubject 12:45-13:00Coffee/tea 13:00-13:20Ies Nijman (UMCU) Welcome & Introduction.

Variant Annotation on HG20 / ALTs

• Ensembl VEP

• snpEFF

• dbNSFP in next release

(~may)

Page 17: DTL Focus meeting: Using GRCh38 in NGS data analysis Time slotSpeakerSubject 12:45-13:00Coffee/tea 13:00-13:20Ies Nijman (UMCU) Welcome & Introduction.

Personalis, Inc. | Confidential and Proprietary17

Nomenclature

chr19_KI270938v1_alt

CHR_HSCHR19KIR_G248_BA2_HAP_CTG3_1

GenBank: KI270886.1RefSeq: NT_187640.1

hg38 / GRCh38 not hg20 please…

Page 18: DTL Focus meeting: Using GRCh38 in NGS data analysis Time slotSpeakerSubject 12:45-13:00Coffee/tea 13:00-13:20Ies Nijman (UMCU) Welcome & Introduction.

Everything is in a state of flux, including the status quo.-Robert Byrne-

• Even after 1.5 years after the release many things are uncertain about

the use of the full build.

• GATK is remarkably silent

• Ewan Birney and Richard Durbin agreed march24th to rebuild a new

reference/analysis set with more standardized set of chr, ALTs and

decoys (pers. Comm).

• Henk Li: “ The current BWA-MEM method is just a start. []We may make

changes. It is also possible that we might make breakthrough on the

representation of multiple genomes, in which case, we can even get rid

of ALT contigs for good.”