Understanding the reference assembly: CSHL Hackathon
-
Upload
genome-reference-consortium -
Category
Science
-
view
84 -
download
1
Transcript of Understanding the reference assembly: CSHL Hackathon
![Page 1: Understanding the reference assembly: CSHL Hackathon](https://reader035.fdocuments.in/reader035/viewer/2022062522/587da8091a28ab22148b814f/html5/thumbnails/1.jpg)
Understanding the reference assembly
Valerie SchneiderNCBI
26 October 2016
http://www.biorxiv.org/content/early/2016/08/30/072116
![Page 2: Understanding the reference assembly: CSHL Hackathon](https://reader035.fdocuments.in/reader035/viewer/2022062522/587da8091a28ab22148b814f/html5/thumbnails/2.jpg)
Dilthey et al.Paten et al.
Scientific Models
![Page 3: Understanding the reference assembly: CSHL Hackathon](https://reader035.fdocuments.in/reader035/viewer/2022062522/587da8091a28ab22148b814f/html5/thumbnails/3.jpg)
• Distinguishing features of the human reference assembly• Implications for genomic analyses and tools• Where do you get assembly-relevant data?
Outline
![Page 4: Understanding the reference assembly: CSHL Hackathon](https://reader035.fdocuments.in/reader035/viewer/2022062522/587da8091a28ab22148b814f/html5/thumbnails/4.jpg)
![Page 5: Understanding the reference assembly: CSHL Hackathon](https://reader035.fdocuments.in/reader035/viewer/2022062522/587da8091a28ab22148b814f/html5/thumbnails/5.jpg)
Assembly BasicsSanger-seq’d, clone based assembly BAC insert
BAC vector
Shotgun sequence clone
Assemble
GAPS
Finish
Minimal Tiling Path
Define switch points for adjacent components(haploid mosaic)
Most contiguousHighest sequence quality
![Page 6: Understanding the reference assembly: CSHL Hackathon](https://reader035.fdocuments.in/reader035/viewer/2022062522/587da8091a28ab22148b814f/html5/thumbnails/6.jpg)
Today’s reference assembly does not represent:1.The most common allele
2.The longest allele3.The ancestral allele
Assembly Basics
It represents the sequence available from the HGP
![Page 7: Understanding the reference assembly: CSHL Hackathon](https://reader035.fdocuments.in/reader035/viewer/2022062522/587da8091a28ab22148b814f/html5/thumbnails/7.jpg)
GRC Assembly Model
Sequences from haplotype 1Sequences from haplotype 2
Old Assembly model: compress into a consensus
Current Assembly model: represent both haplotypesmany
![Page 8: Understanding the reference assembly: CSHL Hackathon](https://reader035.fdocuments.in/reader035/viewer/2022062522/587da8091a28ab22148b814f/html5/thumbnails/8.jpg)
Assembly (e.g. GRCh38)
Primary Assembly
Unit
Non-nuclear assembly unit
(e.g. MT)
PAR
Genomic Region(MHC)
Genomic Region
(UGT2B17)Genomic
Region(MAPT)
Church et al., PLoS Biol. 2011 Jul;9(7):e1001091 GRC Assembly Model
ALT 2
ALT 3
ALT 4
ALT 5
ALT 6
ALT 7
ALT 1
![Page 9: Understanding the reference assembly: CSHL Hackathon](https://reader035.fdocuments.in/reader035/viewer/2022062522/587da8091a28ab22148b814f/html5/thumbnails/9.jpg)
The alignments of the alternate loci scaffolds to the chromosomes are an integral part of the assembly and can be downloaded from GenBank with the assembly sequences
![Page 10: Understanding the reference assembly: CSHL Hackathon](https://reader035.fdocuments.in/reader035/viewer/2022062522/587da8091a28ab22148b814f/html5/thumbnails/10.jpg)
Assembly (e.g. GRCh38.p1)
Primary Assembly
Unit
Non-nuclear assembly unit
(e.g. MT)
ALT 1
ALT 2
ALT 3
ALT 4
ALT 5
ALT 6
ALT 7
PAR
Genomic Region(MHC)
Genomic Region
(UGT2B17)Genomic
Region(MAPT)
Church et al., PLoS Biol. 2011 Jul;9(7):e1001091
Patches
Genomic Region(ABO)
Genomic Region
(FOXO6)Genomic
Region(FCGBP)
GRC Assembly Model
Patches
FIX NOVEL
SCAFFOLD STATUS AT NEXTMAJOR ASSEMBLY RELEASE
ALT LOCI
--(integrated)
Treat as: Allelic
Treat as: Preferred
![Page 11: Understanding the reference assembly: CSHL Hackathon](https://reader035.fdocuments.in/reader035/viewer/2022062522/587da8091a28ab22148b814f/html5/thumbnails/11.jpg)
1q32 1q21 1p21
Dennis et al., 2012
GRC Assembly Model
![Page 12: Understanding the reference assembly: CSHL Hackathon](https://reader035.fdocuments.in/reader035/viewer/2022062522/587da8091a28ab22148b814f/html5/thumbnails/12.jpg)
GRC: Assembly Model
GRCh38• 178 regions with alt loci: 2% of
chromosome sequence (61.9 Mb)• 261 Alt Loci: 3.6 Mb novel sequence
relative to chromosomes
![Page 13: Understanding the reference assembly: CSHL Hackathon](https://reader035.fdocuments.in/reader035/viewer/2022062522/587da8091a28ab22148b814f/html5/thumbnails/13.jpg)
GRCh38.p9• 96 Patches: >1 Mb novel
sequence• 48 FIX• 48 NOVEL
GRC: Assembly Model
![Page 14: Understanding the reference assembly: CSHL Hackathon](https://reader035.fdocuments.in/reader035/viewer/2022062522/587da8091a28ab22148b814f/html5/thumbnails/14.jpg)
GRCh38: Alt Loci
Alignment Legend
no alignmentmismatchdeletion
![Page 15: Understanding the reference assembly: CSHL Hackathon](https://reader035.fdocuments.in/reader035/viewer/2022062522/587da8091a28ab22148b814f/html5/thumbnails/15.jpg)
chromosome
alt/patch
reads On-target alignment
Off-target alignments
(n=122,922)
GRCh38: Alt LociPLoS Biol. 2011 Jul;9(7):e1001091
![Page 16: Understanding the reference assembly: CSHL Hackathon](https://reader035.fdocuments.in/reader035/viewer/2022062522/587da8091a28ab22148b814f/html5/thumbnails/16.jpg)
Anatomy of an alt
![Page 17: Understanding the reference assembly: CSHL Hackathon](https://reader035.fdocuments.in/reader035/viewer/2022062522/587da8091a28ab22148b814f/html5/thumbnails/17.jpg)
Anatomy of an alt
AC012314.8
CU151838.1
ALT LOCI
AC012314.8
AC245052.3 CHR. 19
Due to anchor components, alternate loci contain some sequence that is redundant to the primary assembly unit
![Page 18: Understanding the reference assembly: CSHL Hackathon](https://reader035.fdocuments.in/reader035/viewer/2022062522/587da8091a28ab22148b814f/html5/thumbnails/18.jpg)
GRCh38 Model CentromeresKaren Miga (Kent Lab, UCSC)
![Page 19: Understanding the reference assembly: CSHL Hackathon](https://reader035.fdocuments.in/reader035/viewer/2022062522/587da8091a28ab22148b814f/html5/thumbnails/19.jpg)
GRCh38 Model Centromeres
WGS WGS WGS
![Page 20: Understanding the reference assembly: CSHL Hackathon](https://reader035.fdocuments.in/reader035/viewer/2022062522/587da8091a28ab22148b814f/html5/thumbnails/20.jpg)
GRCh38 Centromeres
Miga et al., Genome Res. 2014 Apr;24(4):697-707
![Page 21: Understanding the reference assembly: CSHL Hackathon](https://reader035.fdocuments.in/reader035/viewer/2022062522/587da8091a28ab22148b814f/html5/thumbnails/21.jpg)
GRCh38: Where’s the data?
![Page 22: Understanding the reference assembly: CSHL Hackathon](https://reader035.fdocuments.in/reader035/viewer/2022062522/587da8091a28ab22148b814f/html5/thumbnails/22.jpg)
GRCh38: Where’s the data?
GRCh38 Sequences for alignment pipelines
![Page 23: Understanding the reference assembly: CSHL Hackathon](https://reader035.fdocuments.in/reader035/viewer/2022062522/587da8091a28ab22148b814f/html5/thumbnails/23.jpg)
GRCh38: Where’s the data?
![Page 24: Understanding the reference assembly: CSHL Hackathon](https://reader035.fdocuments.in/reader035/viewer/2022062522/587da8091a28ab22148b814f/html5/thumbnails/24.jpg)
Assembly Sequence and Statistics Reports
GRCh38: Where’s the data?
![Page 25: Understanding the reference assembly: CSHL Hackathon](https://reader035.fdocuments.in/reader035/viewer/2022062522/587da8091a28ab22148b814f/html5/thumbnails/25.jpg)
GRCh38: Where’s the data?
![Page 26: Understanding the reference assembly: CSHL Hackathon](https://reader035.fdocuments.in/reader035/viewer/2022062522/587da8091a28ab22148b814f/html5/thumbnails/26.jpg)
GRCh38: Where’s the data?
Assembly Regions Report: Alts, Patches and Centromeres
![Page 27: Understanding the reference assembly: CSHL Hackathon](https://reader035.fdocuments.in/reader035/viewer/2022062522/587da8091a28ab22148b814f/html5/thumbnails/27.jpg)
GRCh38: Where’s the data?
![Page 28: Understanding the reference assembly: CSHL Hackathon](https://reader035.fdocuments.in/reader035/viewer/2022062522/587da8091a28ab22148b814f/html5/thumbnails/28.jpg)
![Page 29: Understanding the reference assembly: CSHL Hackathon](https://reader035.fdocuments.in/reader035/viewer/2022062522/587da8091a28ab22148b814f/html5/thumbnails/29.jpg)
GRCh38: Where’s the data?
![Page 30: Understanding the reference assembly: CSHL Hackathon](https://reader035.fdocuments.in/reader035/viewer/2022062522/587da8091a28ab22148b814f/html5/thumbnails/30.jpg)
GRCh38: Where’s the data?
![Page 31: Understanding the reference assembly: CSHL Hackathon](https://reader035.fdocuments.in/reader035/viewer/2022062522/587da8091a28ab22148b814f/html5/thumbnails/31.jpg)
Accessing the Datahttps://genomereference.org
![Page 32: Understanding the reference assembly: CSHL Hackathon](https://reader035.fdocuments.in/reader035/viewer/2022062522/587da8091a28ab22148b814f/html5/thumbnails/32.jpg)
Accessing the Datahttps://genomereference.org
![Page 33: Understanding the reference assembly: CSHL Hackathon](https://reader035.fdocuments.in/reader035/viewer/2022062522/587da8091a28ab22148b814f/html5/thumbnails/33.jpg)
Dumped daily
Frozen mappings to prior assembly versions in GFF3
Accessing the Datahttps://genomereference.org
![Page 34: Understanding the reference assembly: CSHL Hackathon](https://reader035.fdocuments.in/reader035/viewer/2022062522/587da8091a28ab22148b814f/html5/thumbnails/34.jpg)
Mapped to latest GRCh38 and GRCh37.p13
Accessing the Datahttps://genomereference.org
![Page 35: Understanding the reference assembly: CSHL Hackathon](https://reader035.fdocuments.in/reader035/viewer/2022062522/587da8091a28ab22148b814f/html5/thumbnails/35.jpg)
GRCh38 Collaborators• NCBI RefSeq and gpipe annotation team• Havana annotators• Karen Miga• David Schwartz• Steve Goldstein• Mario Caceres• Giulio Genovese• Jeff Kidd• Peter Lansdorp• Mark Hills• David Page• Jim Knight• Stephan Schuster• 1000 Genomes
GRC SAB• Rick Myers• Granger Sutton• Evan Eichler• Jim Kent• Roderic Guigo• Carol Bult• Derek Stemple• Jan Korbel• Liz Worthey• Matthew Hurles• Richard Gibbs
GRC Creditshttps://genomereference.org
![Page 36: Understanding the reference assembly: CSHL Hackathon](https://reader035.fdocuments.in/reader035/viewer/2022062522/587da8091a28ab22148b814f/html5/thumbnails/36.jpg)
Alt Loci: Informatics Challenges
![Page 37: Understanding the reference assembly: CSHL Hackathon](https://reader035.fdocuments.in/reader035/viewer/2022062522/587da8091a28ab22148b814f/html5/thumbnails/37.jpg)
Masks and alt aware aligners reduce the incidence of ambiguous alignments observed when aligning
reads to the full assembly
Mask1: mask chr for fix patches, scaffold for novel/alts. Mask2: mask only on scaffoldsSimulated Reads
GRCh38: Alt Loci
![Page 38: Understanding the reference assembly: CSHL Hackathon](https://reader035.fdocuments.in/reader035/viewer/2022062522/587da8091a28ab22148b814f/html5/thumbnails/38.jpg)
The Changing Reference
![Page 39: Understanding the reference assembly: CSHL Hackathon](https://reader035.fdocuments.in/reader035/viewer/2022062522/587da8091a28ab22148b814f/html5/thumbnails/39.jpg)
The Changing Reference
![Page 40: Understanding the reference assembly: CSHL Hackathon](https://reader035.fdocuments.in/reader035/viewer/2022062522/587da8091a28ab22148b814f/html5/thumbnails/40.jpg)
Dilthey et al.Paten et al.
The Changing Reference
![Page 41: Understanding the reference assembly: CSHL Hackathon](https://reader035.fdocuments.in/reader035/viewer/2022062522/587da8091a28ab22148b814f/html5/thumbnails/41.jpg)
• Distinguishing features of the human reference assembly• Implications for genomic analyses and tools• Where do you get assembly-relevant data?
Outline