Church iowa2013

51
Deanna M. Church Staff Scientist, NCBI @deannachurch Analyzing Individual Genomes

description

Talk at Iowa State University 6 Nov 2013

Transcript of Church iowa2013

Page 1: Church iowa2013

Deanna M. Church Staff Scientist, NCBI

@deannachurch

Analyzing Individual Genomes 

Page 2: Church iowa2013

http://genomereference.org

Valerie Schneider, NCBI

Page 3: Church iowa2013

AcknowledgementsGeT-RM

Lisa Kalman (CDC)Birgit Funke (Harvard)Mahduri Hegde (Emory)Maryam HalaviChao ChenJon TrowDouglas SlottaPeter MericDaniel FrishbergVictor Ananiev

ClinVarAlex Astashyn Shanmuga ChitipirallaDouglas Hoffman Wonhee Jang Brandi KattmanMelissa LandrumJennifer LeeAdriana Malheiro Wendy RubinsteinGeorge Riley Amanjeev Sethi Ricardo Villamarin Donna Maglott

ISCAChrista Lese Martin (Geisinger)Erin Riggs (Geisinger)Jose MenaMike FeoloTim HefferonJohn Garner John Lopez

GRCValerie Schneider (NCBI)The Genome Institute at Washington UniversityThe Wellcome Trust Sanger InstituteThe European Bioinformatics Institute

Page 4: Church iowa2013
Page 5: Church iowa2013

Variation

Phenotypes

Page 6: Church iowa2013

Why should you care about the Reference Assembly?

Page 7: Church iowa2013

Genes, NCBI Homo sapiens Annotation Release 105

Transcript

CDS

dbSNP Build 138 using annotation release 104

Page 8: Church iowa2013

http://www.ncbi.nlm.nih.gov/variation/tools/1000genomes

Page 9: Church iowa2013

http://www.bioplanet.com/gcat

Page 10: Church iowa2013

What is the Reference Assembly?

Page 11: Church iowa2013
Page 12: Church iowa2013
Page 13: Church iowa2013
Page 14: Church iowa2013
Page 15: Church iowa2013

An assembly is a MODEL of the genome

Page 16: Church iowa2013
Page 17: Church iowa2013

BAC insertBAC vector

Shotgun sequence

Assemble

GAPS

“finishers” go in to manually fill the gaps, often by PCR

Page 18: Church iowa2013
Page 19: Church iowa2013

http://www.ncbi.nlm.nih.gov/projects/genome/assembly/grc/issue_detail.cgi?id=HG-1012

Page 20: Church iowa2013

http://www.ncbi.nlm.nih.gov/projects/genome/assembly/grc/issue_detail.cgi?id=HG-1321

Page 21: Church iowa2013

RP11-34P13 64E8 RP4-669L17 RP5-857K21 RP11-206L10 RP11-54O7

Gaps

Page 22: Church iowa2013

http://genomereference.org

Page 23: Church iowa2013

NCBI36 (hg18)

GRC

h37

(hg1

9)

Page 24: Church iowa2013

NCBI35 (hg17)

GRCh37 (hg19)

AL139246.20

AL139246.21

Page 25: Church iowa2013

Build sequence contigs based on contigs defined in TPF (Tiling Path File).

Check for orientation consistenciesSelect switch pointsInstantiate sequence for further analysis

Switch point

Consensus sequence

Page 26: Church iowa2013

NCBI36

Page 27: Church iowa2013

nsv832911 (nstd68) Submitted on NCBI35 (hg17)

Page 28: Church iowa2013

NCBI35 (hg17) Tiling Path

GRCh37 (hg19) Tiling Path

Gap Inserted

Moved approximately 2 Mb distal on chr15

NC_0000015.8 (chr15)

NC_0000015.9 (chr15)

Removed from assembly

Added to assembly

HG-24

Page 29: Church iowa2013

Sequences from haplotype 1Sequences from haplotype 2

Old Assembly model: compress into a consensus

New Assembly model: represent both haplotypes

Page 30: Church iowa2013

AC074378.4AC079749.5

AC134921.2AC147055.2

AC140484.1AC019173.4

AC093720.2AC021146.7

NCBI36 NC_000004.10 (chr4) Tiling Path

Xue Y et al, 2008

TMPRSS11E TMPRSS11E2

GRCh37 NC_000004.11 (chr4) Tiling Path

AC074378.4AC079749.5

AC134921.1AC147055.2

AC093720.2AC021146.7

TMPRSS11E

GRCh37: NT_167250.1 (UGT2B17 alternate locus)

AC074378.4AC140484.1

AC019173.4AC226496.2

AC021146.7

TMPRSS11E2

nsv532126 (nstd37)

Page 31: Church iowa2013

GRCh37 (hg19)

http://genomereference.org

7 alternate haplotypesat the MHC

Alternate loci released as:FASTA

AGPAlignment to chromosome

UGT2B17 MHC MAPT

Page 32: Church iowa2013

MHC (chr6)Chr 6 representation (PGF)

Alt_Ref_Locus_2 (COX)

Page 33: Church iowa2013

Variant Calling and the Reference Assembly

Page 34: Church iowa2013

Kidd et al, 2007 APOBEC cluster

Part of chr22 assembly

Alternate locus for chr22

White: InsertionBlack: Deletion

Page 35: Church iowa2013

Rawe et al, 2013

Page 36: Church iowa2013

Mouse Ren1 chr1 (CM000994.2/NC_000067.6): 133350674-133360320

NM_031192.3: transcript from C57BL/6JNM_031193.2: transcript from FVB/N

129S6/SvEvTac Alt Locus Alignment Ren1 (allelic)

FVB/N Transcript Alignment Ren2 (paralog)

Page 37: Church iowa2013

129S6/SvEvTac Ren1

FVB Ren2 Tx

Paralogousdiff

SNP +Paralogous

diff

Mouse Ren1 chr1 (CM000994.2/NC_000067.6): 133350674-133360320

NM_031192.3: transcript from C57BL/6JNM_031193.2: transcript from FVB/N

Page 38: Church iowa2013

Hydin: chr16 (16q22.2)Hydin2: chr1 (1q21.1)Missing in NCBI35/NCBI36 Unlocalized in GRCh37 Finished in GRCh38

Alignment to Hydin2 Genomic, 300 Kb, 99.4% ID

Alignment to Hydin1 CHM1_1.0, >99.9% ID

(Paralogous)

(Allelic)Alignment to Hydin2 Genomic, 300 Kb, 99.4% ID

Alignment to Hydin1 CHM1_1.0, >99.9% ID

Doggett et al., 2006

Page 39: Church iowa2013

http://www.ncbi.nlm.nih.gov/variation/tools/1000genomes

CDC27

1KG Phase 1 Strict accessibility mask

SNP (all)

SNP (not 1KG)

Page 40: Church iowa2013

http://www.ncbi.nlm.nih.gov/variation/tools/1000genomes

Page 41: Church iowa2013

Sudmant et al., 2010

Page 42: Church iowa2013

GRCh38 is coming(September, 2013)

Page 43: Church iowa2013

http://genomereference.org

Page 44: Church iowa2013

Adding Novel Sequence

Karen Miga and Jim Kent arXiv:1307.0035

Page 45: Church iowa2013

Dennis et al., 2012

1q32 1q21 1p21

1p21 patch alignment to chromosome 1

Page 46: Church iowa2013

Fixing Rare/Incorrect Bases

Page 47: Church iowa2013

Preview of GRCh38 (scheduled Fall 2013)

TEX28 TKTL1

LOC101060233(opsin related)

LOC101060234(TEX28 related)

GRCh37 (current reference assembly)NC_000023.10 (chrX)

NW_003871103.3

Page 48: Church iowa2013

FAM23_MRC1 Region, chr10

Segmental Duplications

1KG accessibility Mask

Novel Patch 250 kb of artificial duplication

Page 49: Church iowa2013

Adding Novel Sequence

Page 50: Church iowa2013

GRCh37p13120 Fix Patches60 Novel

Human Resolved for GRCh38

http://genomereference.org

Page 51: Church iowa2013

http://www.ncbi.nlm.nih.gov/genome/tools/remap

From Assembly 1 <-> Assembly 2Assembly <-> RefSeqGene/LRGPrimary Assembly <-> Alternate loci