Church iowa2013
-
Upload
deanna-church -
Category
Technology
-
view
6.015 -
download
2
description
Transcript of Church iowa2013
Deanna M. Church Staff Scientist, NCBI
@deannachurch
Analyzing Individual Genomes
http://genomereference.org
Valerie Schneider, NCBI
AcknowledgementsGeT-RM
Lisa Kalman (CDC)Birgit Funke (Harvard)Mahduri Hegde (Emory)Maryam HalaviChao ChenJon TrowDouglas SlottaPeter MericDaniel FrishbergVictor Ananiev
ClinVarAlex Astashyn Shanmuga ChitipirallaDouglas Hoffman Wonhee Jang Brandi KattmanMelissa LandrumJennifer LeeAdriana Malheiro Wendy RubinsteinGeorge Riley Amanjeev Sethi Ricardo Villamarin Donna Maglott
ISCAChrista Lese Martin (Geisinger)Erin Riggs (Geisinger)Jose MenaMike FeoloTim HefferonJohn Garner John Lopez
GRCValerie Schneider (NCBI)The Genome Institute at Washington UniversityThe Wellcome Trust Sanger InstituteThe European Bioinformatics Institute
Variation
Phenotypes
Why should you care about the Reference Assembly?
Genes, NCBI Homo sapiens Annotation Release 105
Transcript
CDS
dbSNP Build 138 using annotation release 104
http://www.ncbi.nlm.nih.gov/variation/tools/1000genomes
http://www.bioplanet.com/gcat
What is the Reference Assembly?
An assembly is a MODEL of the genome
BAC insertBAC vector
Shotgun sequence
Assemble
GAPS
“finishers” go in to manually fill the gaps, often by PCR
http://www.ncbi.nlm.nih.gov/projects/genome/assembly/grc/issue_detail.cgi?id=HG-1012
http://www.ncbi.nlm.nih.gov/projects/genome/assembly/grc/issue_detail.cgi?id=HG-1321
RP11-34P13 64E8 RP4-669L17 RP5-857K21 RP11-206L10 RP11-54O7
Gaps
http://genomereference.org
NCBI36 (hg18)
GRC
h37
(hg1
9)
NCBI35 (hg17)
GRCh37 (hg19)
AL139246.20
AL139246.21
Build sequence contigs based on contigs defined in TPF (Tiling Path File).
Check for orientation consistenciesSelect switch pointsInstantiate sequence for further analysis
Switch point
Consensus sequence
NCBI36
nsv832911 (nstd68) Submitted on NCBI35 (hg17)
NCBI35 (hg17) Tiling Path
GRCh37 (hg19) Tiling Path
Gap Inserted
Moved approximately 2 Mb distal on chr15
NC_0000015.8 (chr15)
NC_0000015.9 (chr15)
Removed from assembly
Added to assembly
HG-24
Sequences from haplotype 1Sequences from haplotype 2
Old Assembly model: compress into a consensus
New Assembly model: represent both haplotypes
AC074378.4AC079749.5
AC134921.2AC147055.2
AC140484.1AC019173.4
AC093720.2AC021146.7
NCBI36 NC_000004.10 (chr4) Tiling Path
Xue Y et al, 2008
TMPRSS11E TMPRSS11E2
GRCh37 NC_000004.11 (chr4) Tiling Path
AC074378.4AC079749.5
AC134921.1AC147055.2
AC093720.2AC021146.7
TMPRSS11E
GRCh37: NT_167250.1 (UGT2B17 alternate locus)
AC074378.4AC140484.1
AC019173.4AC226496.2
AC021146.7
TMPRSS11E2
nsv532126 (nstd37)
GRCh37 (hg19)
http://genomereference.org
7 alternate haplotypesat the MHC
Alternate loci released as:FASTA
AGPAlignment to chromosome
UGT2B17 MHC MAPT
MHC (chr6)Chr 6 representation (PGF)
Alt_Ref_Locus_2 (COX)
Variant Calling and the Reference Assembly
Kidd et al, 2007 APOBEC cluster
Part of chr22 assembly
Alternate locus for chr22
White: InsertionBlack: Deletion
Rawe et al, 2013
Mouse Ren1 chr1 (CM000994.2/NC_000067.6): 133350674-133360320
NM_031192.3: transcript from C57BL/6JNM_031193.2: transcript from FVB/N
129S6/SvEvTac Alt Locus Alignment Ren1 (allelic)
FVB/N Transcript Alignment Ren2 (paralog)
129S6/SvEvTac Ren1
FVB Ren2 Tx
Paralogousdiff
SNP +Paralogous
diff
Mouse Ren1 chr1 (CM000994.2/NC_000067.6): 133350674-133360320
NM_031192.3: transcript from C57BL/6JNM_031193.2: transcript from FVB/N
Hydin: chr16 (16q22.2)Hydin2: chr1 (1q21.1)Missing in NCBI35/NCBI36 Unlocalized in GRCh37 Finished in GRCh38
Alignment to Hydin2 Genomic, 300 Kb, 99.4% ID
Alignment to Hydin1 CHM1_1.0, >99.9% ID
(Paralogous)
(Allelic)Alignment to Hydin2 Genomic, 300 Kb, 99.4% ID
Alignment to Hydin1 CHM1_1.0, >99.9% ID
Doggett et al., 2006
http://www.ncbi.nlm.nih.gov/variation/tools/1000genomes
CDC27
1KG Phase 1 Strict accessibility mask
SNP (all)
SNP (not 1KG)
http://www.ncbi.nlm.nih.gov/variation/tools/1000genomes
Sudmant et al., 2010
GRCh38 is coming(September, 2013)
http://genomereference.org
Adding Novel Sequence
Karen Miga and Jim Kent arXiv:1307.0035
Dennis et al., 2012
1q32 1q21 1p21
1p21 patch alignment to chromosome 1
Fixing Rare/Incorrect Bases
Preview of GRCh38 (scheduled Fall 2013)
TEX28 TKTL1
LOC101060233(opsin related)
LOC101060234(TEX28 related)
GRCh37 (current reference assembly)NC_000023.10 (chrX)
NW_003871103.3
FAM23_MRC1 Region, chr10
Segmental Duplications
1KG accessibility Mask
Novel Patch 250 kb of artificial duplication
Adding Novel Sequence
GRCh37p13120 Fix Patches60 Novel
Human Resolved for GRCh38
http://genomereference.org
http://www.ncbi.nlm.nih.gov/genome/tools/remap
From Assembly 1 <-> Assembly 2Assembly <-> RefSeqGene/LRGPrimary Assembly <-> Alternate loci