Church sfaf13

58
Keep Calm And arry on Sequenci Deanna M. Church Staff Scientist, NCBI @deannachurch

Transcript of Church sfaf13

Page 1: Church sfaf13

Keep CalmAnd

Carry on SequencingDeanna M. Church Staff Scientist, NCBI

@deannachurch

Page 2: Church sfaf13

http://genomereference.org

Valerie Schneider, NCBI

Page 3: Church sfaf13

Photograph: Paul Popper/Popperfoto/Getty Images

Page 4: Church sfaf13
Page 5: Church sfaf13

GRCh38 is coming(September, 2013)

Page 6: Church sfaf13
Page 7: Church sfaf13

http://www.ncbi.nlm.nih.gov/variation/tools/1000genomes

Page 8: Church sfaf13
Page 9: Church sfaf13
Page 10: Church sfaf13
Page 11: Church sfaf13

http://www.bioplanet.com/gcat

Page 12: Church sfaf13
Page 13: Church sfaf13

http://genomereference.org

Page 14: Church sfaf13

Dennis et al., 2012

1q32 1q21 1p21

1p21 patch alignment to chromosome 1

Page 15: Church sfaf13

http://www.ncbi.nlm.nih.gov/variation/tools/1000genomes

CDC27

1KG Phase 1 Strict accessibility mask

SNP (all)

SNP (not 1KG)

Page 16: Church sfaf13

Sudmant et al., 2010

Page 17: Church sfaf13

Kidd et al, 2007 APOBEC cluster

Part of chr22 assembly

Alternate locus for chr22

White: InsertionBlack: Deletion

Page 18: Church sfaf13

http://www.ncbi.nlm.nih.gov/variation/tools/1000genomes

Page 19: Church sfaf13

Mouse Ren1 chr1 (CM000994.2/NC_000067.6): 133350674-133360320

129S6/SVEvTac tiling path

Alignment to C57BL/6J chr1

B6 Genes

129S6/SvEvTac Genes

+ 32Kb in 129S6/SvEvTac

Page 20: Church sfaf13

Mouse Ren1 chr1 (CM000994.2/NC_000067.6): 133350674-133360320

NM_031192.3: transcript from C57BL/6JNM_031193.2: transcript from FVB/N

129S6/SvEvTac Alt Locus Alignment (allelic)

FVB/N Transcript Alignment (paralog)

Page 21: Church sfaf13

129S6/SvEvTac Ren1

FVB Ren2 Tx

Paralogousdiff

SNP +Paralogous

diff

Mouse Ren1 chr1 (CM000994.2/NC_000067.6): 133350674-133360320

NM_031192.3: transcript from C57BL/6JNM_031193.2: transcript from FVB/N

Page 22: Church sfaf13

An assembly is a MODEL of the genome

Page 23: Church sfaf13

Assembly Model

Page 24: Church sfaf13
Page 25: Church sfaf13

BAC insertBAC vector

Shotgun sequence

Assemble

GAPS

Finishing

Page 26: Church sfaf13

http://www.ncbi.nlm.nih.gov/projects/genome/assembly/grc/issue_detail.cgi?id=HG-21

NCBI36 (hg18)

GRC

h37

(hg1

9)

Page 27: Church sfaf13

NCBI35 (hg17)

GRCh37 (hg19)

AL139246.20

AL139246.21

Page 28: Church sfaf13

Daly et al., 2013

Page 29: Church sfaf13

http://www.ncbi.nlm.nih.gov/projects/genome/assembly/grc/issue_detail.cgi?id=HG-1012

Page 30: Church sfaf13

http://www.ncbi.nlm.nih.gov/projects/genome/assembly/grc/issue_detail.cgi?id=HG-1321

Page 31: Church sfaf13

Fixing Rare/Incorrect Bases

Page 32: Church sfaf13

Fixing Rare/Incorrect Bases

Page 33: Church sfaf13

GRCh37B Sites for Update: n=1164

Sites with unique successful ctg 1148 (98.6%)

Avg Length 448 bp

Min/Max Success Length 51/791 bp

Avg Coverage 80x

Read Source (all contigs)

High coverage 32%

Low coverage 57%

Exome 10%

Fixing Rare/Incorrect Bases

Page 34: Church sfaf13

Build sequence contigs based on contigs defined in TPF (Tiling Path File).

Check for orientation consistenciesSelect switch pointsInstantiate sequence for further analysis

Switch point

Representative chromosome sequence

Page 35: Church sfaf13

RP11-34P13 64E8 RP4-669L17 RP5-857K21 RP11-206L10 RP11-54O7

Gaps

Page 36: Church sfaf13

NCBI36

Page 37: Church sfaf13

nsv832911 (nstd68) Submitted on NCBI35 (hg17)

Page 38: Church sfaf13

NCBI35 (hg17) Tiling Path

GRCh37 (hg19) Tiling Path

Gap Inserted

Moved approximately 2 Mb distal on chr15

NC_0000015.8 (chr15)

NC_0000015.9 (chr15)

Removed from assembly

Added to assembly

http://www.ncbi.nlm.nih.gov/projects/genome/assembly/grc/issue_detail.cgi?id=HG-24

Page 39: Church sfaf13

Sequences from haplotype 1Sequences from haplotype 2

Old Assembly model: compress into a consensus

New Assembly model: represent both haplotypes

Page 40: Church sfaf13

AC074378.4AC079749.5

AC134921.2AC147055.2

AC140484.1AC019173.4

AC093720.2AC021146.7

NCBI36 NC_000004.10 (chr4) Tiling Path

Xue Y et al, 2008

TMPRSS11E TMPRSS11E2

GRCh37 NC_000004.11 (chr4) Tiling Path

AC074378.4AC079749.5

AC134921.1AC147055.2

AC093720.2AC021146.7

TMPRSS11E

GRCh37: NT_167250.1 (UGT2B17 alternate locus)

AC074378.4AC140484.1

AC019173.4AC226496.2

AC021146.7

TMPRSS11E2

nsv532126 (nstd37)

Page 41: Church sfaf13

Adding Novel Sequence

1000G ph1 decoy sequence, viewed by:• GenBank alignment• Percent Repeat Masker• Repeat Masker type• Sequence Source (HTG, HuRef, ALLPATHS)

Page 42: Church sfaf13

Adding Novel Sequence

Page 43: Church sfaf13

Adding Novel Sequence

Page 44: Church sfaf13

Genovese et al., 2013

Page 45: Church sfaf13

Adding Novel Sequence

Karen Hayden and Jim Kent

Page 46: Church sfaf13

Human Resolved for GRCh38

http://genomereference.org

Page 47: Church sfaf13

Examples

Page 48: Church sfaf13

Preview of GRCh38 (scheduled Fall 2013)

TEX28 TKTL1

LOC101060233(opsin related)

LOC101060234(TEX28 related)

GRCh37 (current reference assembly)chrX

Page 49: Church sfaf13

Hydin: chr16 (16q22.2)Hydin2: chr1 (1q21.1)Missing in NCBI35/NCBI36 Unlocalized in GRCh37 Finished in GRCh38

Alignment to Hydin2 Genomic, 300 Kb, 99.4% ID

Alignment to Hydin1 CHM1_1.0, >99.9% ID

Alignment to Hydin2 Genomic, 300 Kb, 99.4% ID

Alignment to Hydin1 CHM1_1.0, >99.9% ID

Doggett et al., 2006

Page 50: Church sfaf13

FAM23_MRC1 Region, chr10

Segmental Duplications

1KG accessibility Mask

Novel Patch 250 kb of artificial duplication

Page 51: Church sfaf13

Adding Novel Sequence

Page 52: Church sfaf13

Richa Agarwala

MHC Alternate locus

Alignment to chr6

Page 53: Church sfaf13
Page 54: Church sfaf13

Making the assembly accessible to existing tools: masking

Query set: 439,109,084 NA12878 HiSeq reads

Page 55: Church sfaf13

Masking effectively blocks alignments in regions with high identity

Simulated reads from GRCh37.p9• Unpaired reads• 101 bp• 1x coverage• Default wgsim parameters

Masking parameters• Percent Id: 100%• Step size: 5 bp• Minimum length: 101 bp• Center SNPs in unmasked regions

Page 56: Church sfaf13

Masking improves alignments in regions with alternate loci or patches

Page 57: Church sfaf13

NA12878 reads whose best alignment was on an alt/patch in the masked assembly were evaluated for their alignment location when aligned to the primary assembly alone

Masking effectively reduces the increase in NA12878 reads that have alignments with MAPQ=0 that occurs when the full assembly is used as an alignment substrate

Page 58: Church sfaf13

GRCh38 is coming(September, 2013)