Church sfaf13
-
Upload
deanna-church -
Category
Technology
-
view
3.133 -
download
0
Transcript of Church sfaf13
Keep CalmAnd
Carry on SequencingDeanna M. Church Staff Scientist, NCBI
@deannachurch
http://genomereference.org
Valerie Schneider, NCBI
Photograph: Paul Popper/Popperfoto/Getty Images
GRCh38 is coming(September, 2013)
http://www.ncbi.nlm.nih.gov/variation/tools/1000genomes
http://www.bioplanet.com/gcat
http://genomereference.org
Dennis et al., 2012
1q32 1q21 1p21
1p21 patch alignment to chromosome 1
http://www.ncbi.nlm.nih.gov/variation/tools/1000genomes
CDC27
1KG Phase 1 Strict accessibility mask
SNP (all)
SNP (not 1KG)
Sudmant et al., 2010
Kidd et al, 2007 APOBEC cluster
Part of chr22 assembly
Alternate locus for chr22
White: InsertionBlack: Deletion
http://www.ncbi.nlm.nih.gov/variation/tools/1000genomes
Mouse Ren1 chr1 (CM000994.2/NC_000067.6): 133350674-133360320
129S6/SVEvTac tiling path
Alignment to C57BL/6J chr1
B6 Genes
129S6/SvEvTac Genes
+ 32Kb in 129S6/SvEvTac
Mouse Ren1 chr1 (CM000994.2/NC_000067.6): 133350674-133360320
NM_031192.3: transcript from C57BL/6JNM_031193.2: transcript from FVB/N
129S6/SvEvTac Alt Locus Alignment (allelic)
FVB/N Transcript Alignment (paralog)
129S6/SvEvTac Ren1
FVB Ren2 Tx
Paralogousdiff
SNP +Paralogous
diff
Mouse Ren1 chr1 (CM000994.2/NC_000067.6): 133350674-133360320
NM_031192.3: transcript from C57BL/6JNM_031193.2: transcript from FVB/N
An assembly is a MODEL of the genome
Assembly Model
BAC insertBAC vector
Shotgun sequence
Assemble
GAPS
Finishing
http://www.ncbi.nlm.nih.gov/projects/genome/assembly/grc/issue_detail.cgi?id=HG-21
NCBI36 (hg18)
GRC
h37
(hg1
9)
NCBI35 (hg17)
GRCh37 (hg19)
AL139246.20
AL139246.21
Daly et al., 2013
http://www.ncbi.nlm.nih.gov/projects/genome/assembly/grc/issue_detail.cgi?id=HG-1012
http://www.ncbi.nlm.nih.gov/projects/genome/assembly/grc/issue_detail.cgi?id=HG-1321
Fixing Rare/Incorrect Bases
Fixing Rare/Incorrect Bases
GRCh37B Sites for Update: n=1164
Sites with unique successful ctg 1148 (98.6%)
Avg Length 448 bp
Min/Max Success Length 51/791 bp
Avg Coverage 80x
Read Source (all contigs)
High coverage 32%
Low coverage 57%
Exome 10%
Fixing Rare/Incorrect Bases
Build sequence contigs based on contigs defined in TPF (Tiling Path File).
Check for orientation consistenciesSelect switch pointsInstantiate sequence for further analysis
Switch point
Representative chromosome sequence
RP11-34P13 64E8 RP4-669L17 RP5-857K21 RP11-206L10 RP11-54O7
Gaps
NCBI36
nsv832911 (nstd68) Submitted on NCBI35 (hg17)
NCBI35 (hg17) Tiling Path
GRCh37 (hg19) Tiling Path
Gap Inserted
Moved approximately 2 Mb distal on chr15
NC_0000015.8 (chr15)
NC_0000015.9 (chr15)
Removed from assembly
Added to assembly
http://www.ncbi.nlm.nih.gov/projects/genome/assembly/grc/issue_detail.cgi?id=HG-24
Sequences from haplotype 1Sequences from haplotype 2
Old Assembly model: compress into a consensus
New Assembly model: represent both haplotypes
AC074378.4AC079749.5
AC134921.2AC147055.2
AC140484.1AC019173.4
AC093720.2AC021146.7
NCBI36 NC_000004.10 (chr4) Tiling Path
Xue Y et al, 2008
TMPRSS11E TMPRSS11E2
GRCh37 NC_000004.11 (chr4) Tiling Path
AC074378.4AC079749.5
AC134921.1AC147055.2
AC093720.2AC021146.7
TMPRSS11E
GRCh37: NT_167250.1 (UGT2B17 alternate locus)
AC074378.4AC140484.1
AC019173.4AC226496.2
AC021146.7
TMPRSS11E2
nsv532126 (nstd37)
Adding Novel Sequence
1000G ph1 decoy sequence, viewed by:• GenBank alignment• Percent Repeat Masker• Repeat Masker type• Sequence Source (HTG, HuRef, ALLPATHS)
Adding Novel Sequence
Adding Novel Sequence
Genovese et al., 2013
Adding Novel Sequence
Karen Hayden and Jim Kent
Human Resolved for GRCh38
http://genomereference.org
Examples
Preview of GRCh38 (scheduled Fall 2013)
TEX28 TKTL1
LOC101060233(opsin related)
LOC101060234(TEX28 related)
GRCh37 (current reference assembly)chrX
Hydin: chr16 (16q22.2)Hydin2: chr1 (1q21.1)Missing in NCBI35/NCBI36 Unlocalized in GRCh37 Finished in GRCh38
Alignment to Hydin2 Genomic, 300 Kb, 99.4% ID
Alignment to Hydin1 CHM1_1.0, >99.9% ID
Alignment to Hydin2 Genomic, 300 Kb, 99.4% ID
Alignment to Hydin1 CHM1_1.0, >99.9% ID
Doggett et al., 2006
FAM23_MRC1 Region, chr10
Segmental Duplications
1KG accessibility Mask
Novel Patch 250 kb of artificial duplication
Adding Novel Sequence
Richa Agarwala
MHC Alternate locus
Alignment to chr6
Making the assembly accessible to existing tools: masking
Query set: 439,109,084 NA12878 HiSeq reads
Masking effectively blocks alignments in regions with high identity
Simulated reads from GRCh37.p9• Unpaired reads• 101 bp• 1x coverage• Default wgsim parameters
Masking parameters• Percent Id: 100%• Step size: 5 bp• Minimum length: 101 bp• Center SNPs in unmasked regions
Masking improves alignments in regions with alternate loci or patches
NA12878 reads whose best alignment was on an alt/patch in the masked assembly were evaluated for their alignment location when aligned to the primary assembly alone
Masking effectively reduces the increase in NA12878 reads that have alignments with MAPQ=0 that occurs when the full assembly is used as an alignment substrate
GRCh38 is coming(September, 2013)