Church gmod2012 pt2

28
@deannachurch Navigating Genome Resources at NCBI Deanna M. Church, NCBI The Evolution of the Reference Human Genome Part 2

description

Part 2 of my talk at GMOD 2012

Transcript of Church gmod2012 pt2

Page 1: Church gmod2012 pt2

@deannachurch

Navigating Genome Resources at NCBI

Deanna M. Church, NCBI

The Evolution of the Reference Human Genome

Part 2

Page 2: Church gmod2012 pt2

GenBank

Data Archives

Data in a common format Data in a single location (and mirrored) Most quality checked prior to deposition Robust data tracking mechanism (accession.version) Data owned by submitter

Page 3: Church gmod2012 pt2

Data tracking

ABC14-1065514J1GapsPhase LengthDate

FP565796.1 1 121-Oct-2009

FP565796.2 1 014-Oct-2010

FP565796.3 3 007-Nov-2010

Page 4: Church gmod2012 pt2

Mouse chrX: 34,800,000-34,890,000

NC_000086.123456 CM001013.17 2

Page 5: Church gmod2012 pt2

Mouse chrX: 35,000,000-36,000000

X

MGSCv3 MGSCv36

Page 6: Church gmod2012 pt2

hg19GRCh37

mm8MGSCv37

NCBIM37

danRer5Zv7

What’s in a name?

Page 7: Church gmod2012 pt2

By any other name…

chr21:8,913,216-9,246,964

Page 8: Church gmod2012 pt2

Zv7 chr21:8,913,216-9,246,964 X Mouse Build 36 chrX

By any other name…

Page 9: Church gmod2012 pt2

Genome Browser Agreement

Submitter deposits assembly to

GenBank/EMBL/DDBJAssembly QA

Submitter updates assembly based on QA

results

Browsers pick up assembly from

GenBank/EMBL/DDBJ

Assemblies must be in GenBank/EMBL/DDBJ

Page 10: Church gmod2012 pt2

http://www.ncbi.nlm.nih.gov/genome/assembly

GRCh37hg19

Page 11: Church gmod2012 pt2
Page 12: Church gmod2012 pt2

Assembly (e.g. GRCh37.p5)GCA_000001405.6 /GCF_000001405.17

Primary Assembly

GCA_000001305.1/GCF_000001305.13

ALT 1

GCA_000001315.1/GCF_000001315.1

ALT 2

GCA_000001325.1/GCF_000001325.2

ALT 3

GCA_000001335.1/GCF_000001335.1

ALT 4

GCA_000001345.1/GCF_000001345.1

ALT 5

GCA_000001355.1/GCF_000001355.1

ALT 6

GCA_000001365.1/GCF_000001365.2

ALT 7

GCA_000001375.1/GCF_000001375.1

ALT 8

GCA_000001385.1/GCF_000001385.1

ALT 9

GCA_000001395.1/GCF_000001395.1

PatchesGCA_000005045.5GCF_000005045.4

Non-nuclear assembly unit

(e.g. MT)

GCA_000006015.1/GCF_000006015.1

Page 13: Church gmod2012 pt2

GenBank RefSeq vs

Submitter Owned RefSeq Owned

Redundancy Non-RedundantUpdated rarely Curated

INSDC Not INSDC

BRCA183 genomic records31 mRNA records27 protein records

3 genomic records 5 mRNA records1 RNA record5 protein records

Page 14: Church gmod2012 pt2
Page 15: Church gmod2012 pt2

RefSeq for Assemblies

Typical assembly edits

Addition of non-nuclear (e.g. MT) assembly units

Removal of contamination

Drop unlocalized/unplaced scaffoldsMask contamination that is placed on chromosome

Page 16: Church gmod2012 pt2

http://www.ncbi.nlm.nih.gov/genome

Page 17: Church gmod2012 pt2

Understanding relationships between assemblies using alignments

First Pass

Second Pass

Reciprocal best hit

Non-reciprocal, duplicative hits

Page 18: Church gmod2012 pt2
Page 19: Church gmod2012 pt2
Page 20: Church gmod2012 pt2

No second pass alignments in GRCh37.p5

NCBI36

GRCh37.p5

http://www.ncbi.nlm.nih.gov/tools/gbench/

Page 21: Church gmod2012 pt2

Assemblies Transcripts Proteins

Set of genesOther decoration

Annotation pipeline

Francoise Thibaud-Nissen

Page 22: Church gmod2012 pt2

Content of the final annotation productDescription In

sequence database

In a BLAST database

On FTP site

Chromosomes (NC_or AC_)

Scaffolds (NW_ or NT_) Curated transcripts/proteins (NM_, NR_/NP_)

Predicted transcripts/proteins (fully or partially -supported) (XM_, XR_/XP_)

Non-transcribed pseudogenes tRNA (annotated with tRNAScan) Ab initio Gnomon models

Annotation Pipeline RefSeq

Page 23: Church gmod2012 pt2

Where to find the annotation products?• Nucleotide/Protein databases

• Gene• Map Viewer• BLAST databases• FTP site

http://www.ncbi.nlm.nih.gov/gene

http://www.ncbi.nlm.nih.gov/mapview

Page 24: Church gmod2012 pt2

Annotating multiple assemblies

Group 1

Transcript

• Consistent placement of transcripts• Consistent labelling of the genes• Consistent annotation on all assemblies

Assembly 1

Assembly 2

• Assembly-assembly alignmentsAvailable at http://www.ncbi.nlm.nih.gov/genome/tools/remap

Group 2

Page 25: Church gmod2012 pt2

Annotating multiple assemblies(2)

Btau_4.6.1

UMD_3.1

Same Gene symbol

Page 26: Church gmod2012 pt2

Interacting with the community

FlyBase GenBank RefSeq

Page 27: Church gmod2012 pt2
Page 28: Church gmod2012 pt2

Thanks!

For Slides: Francoise Thibaud-Nissen Evan Eichler Steve Sherry

The Genome Reference ConsortiumThe Genome Center at Washington University The Wellcome Trust Sanger InstituteThe European Bioinformatics InstituteThe National Center for Biotechnology Information

Church group at NCBIValerie SchneiderNathan BoukHsiu-Chuan ChenPeter MericVictor AnanievChao ChenJohn LopezJohn GarnerTim HefferonCliff Clausen

NCBI