Genome variation part 2 - University of New South Wales · 2017. 4. 20. · Genome variation...
Transcript of Genome variation part 2 - University of New South Wales · 2017. 4. 20. · Genome variation...
![Page 1: Genome variation part 2 - University of New South Wales · 2017. 4. 20. · Genome variation –part 2 Prince of Wales Clinical School Dr Jason Wong Introductory bioinformatics for](https://reader033.fdocuments.in/reader033/viewer/2022052010/6020d6278fbc06492158c97e/html5/thumbnails/1.jpg)
Genome variation – part 2
Prince of Wales Clinical School
Dr Jason Wong
Introductory bioinformatics for human genomics workshop, UNSWDay 2 – Friday 21th January 2016
![Page 2: Genome variation part 2 - University of New South Wales · 2017. 4. 20. · Genome variation –part 2 Prince of Wales Clinical School Dr Jason Wong Introductory bioinformatics for](https://reader033.fdocuments.in/reader033/viewer/2022052010/6020d6278fbc06492158c97e/html5/thumbnails/2.jpg)
Aims of the session
• Introduce ExAC and gnomAD
• Learning about how to use unique features of ExAC.
• Using the ExAC browser interface.
![Page 3: Genome variation part 2 - University of New South Wales · 2017. 4. 20. · Genome variation –part 2 Prince of Wales Clinical School Dr Jason Wong Introductory bioinformatics for](https://reader033.fdocuments.in/reader033/viewer/2022052010/6020d6278fbc06492158c97e/html5/thumbnails/3.jpg)
Limitations of 1000 genomes
• 1000 genomes is able address some of the shortcomings of dbSNP.
• BUT - ~2,000 genomes only has power to detect “rare” variants that are present in 1 in 2000 people.
• After dividing into different ethnicities, power as a control dataset becomes limited.
![Page 4: Genome variation part 2 - University of New South Wales · 2017. 4. 20. · Genome variation –part 2 Prince of Wales Clinical School Dr Jason Wong Introductory bioinformatics for](https://reader033.fdocuments.in/reader033/viewer/2022052010/6020d6278fbc06492158c97e/html5/thumbnails/4.jpg)
Exome Aggregation Consortium (ExAC)
• More that 1 million exomes/genomes have already sequenced world-wide
• Why not just use all of them?
Lek et al. 2016 Nature 536, 285-291
![Page 5: Genome variation part 2 - University of New South Wales · 2017. 4. 20. · Genome variation –part 2 Prince of Wales Clinical School Dr Jason Wong Introductory bioinformatics for](https://reader033.fdocuments.in/reader033/viewer/2022052010/6020d6278fbc06492158c97e/html5/thumbnails/5.jpg)
Exome Aggregation Consortium (ExAC)
Source: www.genome.gov
![Page 6: Genome variation part 2 - University of New South Wales · 2017. 4. 20. · Genome variation –part 2 Prince of Wales Clinical School Dr Jason Wong Introductory bioinformatics for](https://reader033.fdocuments.in/reader033/viewer/2022052010/6020d6278fbc06492158c97e/html5/thumbnails/6.jpg)
Largest collection of protein-coding variants
• Over 10 million variants – one every 6 based pairs. Most are unique/novel.
Source: www.genome.gov
![Page 7: Genome variation part 2 - University of New South Wales · 2017. 4. 20. · Genome variation –part 2 Prince of Wales Clinical School Dr Jason Wong Introductory bioinformatics for](https://reader033.fdocuments.in/reader033/viewer/2022052010/6020d6278fbc06492158c97e/html5/thumbnails/7.jpg)
Source: www.genome.gov
Lots more diversity
![Page 8: Genome variation part 2 - University of New South Wales · 2017. 4. 20. · Genome variation –part 2 Prince of Wales Clinical School Dr Jason Wong Introductory bioinformatics for](https://reader033.fdocuments.in/reader033/viewer/2022052010/6020d6278fbc06492158c97e/html5/thumbnails/8.jpg)
What are some things that ExAC allows you to do?
• A much better population frequency filter.
• Estimate penetrance of specific variants.
• Great for checking exome mapability.
• Identify genes with constrained evolution (i.e. negative selection).
• Identify copy number variation.
![Page 9: Genome variation part 2 - University of New South Wales · 2017. 4. 20. · Genome variation –part 2 Prince of Wales Clinical School Dr Jason Wong Introductory bioinformatics for](https://reader033.fdocuments.in/reader033/viewer/2022052010/6020d6278fbc06492158c97e/html5/thumbnails/9.jpg)
Filtering potentially pathogenic variants
• Severe disease causing variants should not have “high” allele frequencies (>0.1%) across ExAC population.
– Note that 0.1% of ExAC is already at least 61 people.
• Review of 197 pathogenic variants with >1% in ExAC found virtually all to be spurious claims.
![Page 10: Genome variation part 2 - University of New South Wales · 2017. 4. 20. · Genome variation –part 2 Prince of Wales Clinical School Dr Jason Wong Introductory bioinformatics for](https://reader033.fdocuments.in/reader033/viewer/2022052010/6020d6278fbc06492158c97e/html5/thumbnails/10.jpg)
M Lek et al. Nature 536, 285–291 (2016) doi:10.1038/nature19057
![Page 11: Genome variation part 2 - University of New South Wales · 2017. 4. 20. · Genome variation –part 2 Prince of Wales Clinical School Dr Jason Wong Introductory bioinformatics for](https://reader033.fdocuments.in/reader033/viewer/2022052010/6020d6278fbc06492158c97e/html5/thumbnails/11.jpg)
Assessing penetrance
• Penetrance refers to the likelihood that a disease variant actually ends up causing disease.
• Example in cystic fibrosis
– CFTR Δ504F is very high penetrance
(ExAC: 0.0%)
– CFTR R117H is low (incomplete) penetrance
(ExAC: 0.15% - 185 alleles)
![Page 12: Genome variation part 2 - University of New South Wales · 2017. 4. 20. · Genome variation –part 2 Prince of Wales Clinical School Dr Jason Wong Introductory bioinformatics for](https://reader033.fdocuments.in/reader033/viewer/2022052010/6020d6278fbc06492158c97e/html5/thumbnails/12.jpg)
Prion disease example
Minikel et al. Science Translational Medicine 8:322ra9 (2016)
http://www.cureffi.org/2016/10/19/estimation-of-penetrance/
![Page 13: Genome variation part 2 - University of New South Wales · 2017. 4. 20. · Genome variation –part 2 Prince of Wales Clinical School Dr Jason Wong Introductory bioinformatics for](https://reader033.fdocuments.in/reader033/viewer/2022052010/6020d6278fbc06492158c97e/html5/thumbnails/13.jpg)
Evaluating coverage
• ExAC used the same pipeline to analyse 60,706 exomes.
• Regions of poor exome capture/mapability are evident by low coverage in ExAC (Guilin plots).
![Page 14: Genome variation part 2 - University of New South Wales · 2017. 4. 20. · Genome variation –part 2 Prince of Wales Clinical School Dr Jason Wong Introductory bioinformatics for](https://reader033.fdocuments.in/reader033/viewer/2022052010/6020d6278fbc06492158c97e/html5/thumbnails/14.jpg)
Functional gene constraint
• Functionally important genes (or loci) should be depleted of lost of function mutations.
Samocha et al 2014 Nat Genet 46:944-950
![Page 15: Genome variation part 2 - University of New South Wales · 2017. 4. 20. · Genome variation –part 2 Prince of Wales Clinical School Dr Jason Wong Introductory bioinformatics for](https://reader033.fdocuments.in/reader033/viewer/2022052010/6020d6278fbc06492158c97e/html5/thumbnails/15.jpg)
Constraint score
Generally genes can be categorised into:
(1) Completely tolerate of loss-of-function variation (observed = expected)
(2) Intolerant to two loss-of-function variants (i.e. recessive genes, observed ≈ 0.5 x expected)
(3) Intolerant of single loss-of-function variants (i.e. dominant genes, observed ≈ 0.1 x expected)
![Page 16: Genome variation part 2 - University of New South Wales · 2017. 4. 20. · Genome variation –part 2 Prince of Wales Clinical School Dr Jason Wong Introductory bioinformatics for](https://reader033.fdocuments.in/reader033/viewer/2022052010/6020d6278fbc06492158c97e/html5/thumbnails/16.jpg)
Example
• Huntington’s disease is caused by autosomal dominant inheritance of loss of function in HTT
![Page 17: Genome variation part 2 - University of New South Wales · 2017. 4. 20. · Genome variation –part 2 Prince of Wales Clinical School Dr Jason Wong Introductory bioinformatics for](https://reader033.fdocuments.in/reader033/viewer/2022052010/6020d6278fbc06492158c97e/html5/thumbnails/17.jpg)
Copy number variation (CNV)
• The human genome is diploid, so there are two copies of most genes.
• High depth exome sequencing allows the use of deviations of sequencing depth across samples to measure CNV.
Potential copy gain
Average sample CNV sample
Ruderfer et al. 2016 Nature Genetics 48, 1107-1111
![Page 18: Genome variation part 2 - University of New South Wales · 2017. 4. 20. · Genome variation –part 2 Prince of Wales Clinical School Dr Jason Wong Introductory bioinformatics for](https://reader033.fdocuments.in/reader033/viewer/2022052010/6020d6278fbc06492158c97e/html5/thumbnails/18.jpg)
Example: ASTN2
• ASTN2 deletion associated with autism spectrum disorder (Lionel et al. 2014 Hum Mol Genet,
23:2752)
• ExAC shows deletions in up to 16 “healthy people”
• May reflect relatively large psychological disorders in cohort?
![Page 19: Genome variation part 2 - University of New South Wales · 2017. 4. 20. · Genome variation –part 2 Prince of Wales Clinical School Dr Jason Wong Introductory bioinformatics for](https://reader033.fdocuments.in/reader033/viewer/2022052010/6020d6278fbc06492158c97e/html5/thumbnails/19.jpg)
ExAC browser
http://exac.broadinstitute.org/
Type “APC” for the Adenomatous polyposis coli tumour suppressor gene responsible for Familial Adenomatous Polyposis (FAP)
![Page 20: Genome variation part 2 - University of New South Wales · 2017. 4. 20. · Genome variation –part 2 Prince of Wales Clinical School Dr Jason Wong Introductory bioinformatics for](https://reader033.fdocuments.in/reader033/viewer/2022052010/6020d6278fbc06492158c97e/html5/thumbnails/20.jpg)
![Page 21: Genome variation part 2 - University of New South Wales · 2017. 4. 20. · Genome variation –part 2 Prince of Wales Clinical School Dr Jason Wong Introductory bioinformatics for](https://reader033.fdocuments.in/reader033/viewer/2022052010/6020d6278fbc06492158c97e/html5/thumbnails/21.jpg)
Functional gene constraint
![Page 22: Genome variation part 2 - University of New South Wales · 2017. 4. 20. · Genome variation –part 2 Prince of Wales Clinical School Dr Jason Wong Introductory bioinformatics for](https://reader033.fdocuments.in/reader033/viewer/2022052010/6020d6278fbc06492158c97e/html5/thumbnails/22.jpg)
Sequencing coverage
![Page 23: Genome variation part 2 - University of New South Wales · 2017. 4. 20. · Genome variation –part 2 Prince of Wales Clinical School Dr Jason Wong Introductory bioinformatics for](https://reader033.fdocuments.in/reader033/viewer/2022052010/6020d6278fbc06492158c97e/html5/thumbnails/23.jpg)
Copy number variation
![Page 24: Genome variation part 2 - University of New South Wales · 2017. 4. 20. · Genome variation –part 2 Prince of Wales Clinical School Dr Jason Wong Introductory bioinformatics for](https://reader033.fdocuments.in/reader033/viewer/2022052010/6020d6278fbc06492158c97e/html5/thumbnails/24.jpg)
Examining variants
FAP is autosomal dominant (i.e. one mutant allele is sufficient for disease). All mutations have very low VAF. Therefore up to 18 FAP patients in ExAC?
Focus on loss of function (LoF) mutations as they are easier to interpret.
![Page 25: Genome variation part 2 - University of New South Wales · 2017. 4. 20. · Genome variation –part 2 Prince of Wales Clinical School Dr Jason Wong Introductory bioinformatics for](https://reader033.fdocuments.in/reader033/viewer/2022052010/6020d6278fbc06492158c97e/html5/thumbnails/25.jpg)
Deletion does not actually affect splicing!
![Page 26: Genome variation part 2 - University of New South Wales · 2017. 4. 20. · Genome variation –part 2 Prince of Wales Clinical School Dr Jason Wong Introductory bioinformatics for](https://reader033.fdocuments.in/reader033/viewer/2022052010/6020d6278fbc06492158c97e/html5/thumbnails/26.jpg)
![Page 27: Genome variation part 2 - University of New South Wales · 2017. 4. 20. · Genome variation –part 2 Prince of Wales Clinical School Dr Jason Wong Introductory bioinformatics for](https://reader033.fdocuments.in/reader033/viewer/2022052010/6020d6278fbc06492158c97e/html5/thumbnails/27.jpg)
Mutation occurs in phase with adjacent variant resulting inGGA>TTA!
![Page 28: Genome variation part 2 - University of New South Wales · 2017. 4. 20. · Genome variation –part 2 Prince of Wales Clinical School Dr Jason Wong Introductory bioinformatics for](https://reader033.fdocuments.in/reader033/viewer/2022052010/6020d6278fbc06492158c97e/html5/thumbnails/28.jpg)
![Page 29: Genome variation part 2 - University of New South Wales · 2017. 4. 20. · Genome variation –part 2 Prince of Wales Clinical School Dr Jason Wong Introductory bioinformatics for](https://reader033.fdocuments.in/reader033/viewer/2022052010/6020d6278fbc06492158c97e/html5/thumbnails/29.jpg)
![Page 30: Genome variation part 2 - University of New South Wales · 2017. 4. 20. · Genome variation –part 2 Prince of Wales Clinical School Dr Jason Wong Introductory bioinformatics for](https://reader033.fdocuments.in/reader033/viewer/2022052010/6020d6278fbc06492158c97e/html5/thumbnails/30.jpg)
![Page 31: Genome variation part 2 - University of New South Wales · 2017. 4. 20. · Genome variation –part 2 Prince of Wales Clinical School Dr Jason Wong Introductory bioinformatics for](https://reader033.fdocuments.in/reader033/viewer/2022052010/6020d6278fbc06492158c97e/html5/thumbnails/31.jpg)
![Page 32: Genome variation part 2 - University of New South Wales · 2017. 4. 20. · Genome variation –part 2 Prince of Wales Clinical School Dr Jason Wong Introductory bioinformatics for](https://reader033.fdocuments.in/reader033/viewer/2022052010/6020d6278fbc06492158c97e/html5/thumbnails/32.jpg)
![Page 33: Genome variation part 2 - University of New South Wales · 2017. 4. 20. · Genome variation –part 2 Prince of Wales Clinical School Dr Jason Wong Introductory bioinformatics for](https://reader033.fdocuments.in/reader033/viewer/2022052010/6020d6278fbc06492158c97e/html5/thumbnails/33.jpg)
![Page 34: Genome variation part 2 - University of New South Wales · 2017. 4. 20. · Genome variation –part 2 Prince of Wales Clinical School Dr Jason Wong Introductory bioinformatics for](https://reader033.fdocuments.in/reader033/viewer/2022052010/6020d6278fbc06492158c97e/html5/thumbnails/34.jpg)
Accessing ExAC raw data
• All data underlying the browser can be accessed via downloads
FTP has latest and old versions of data ftp://ftp.broadinstitute.org/pub/ExAC_release
![Page 35: Genome variation part 2 - University of New South Wales · 2017. 4. 20. · Genome variation –part 2 Prince of Wales Clinical School Dr Jason Wong Introductory bioinformatics for](https://reader033.fdocuments.in/reader033/viewer/2022052010/6020d6278fbc06492158c97e/html5/thumbnails/35.jpg)
Accessing ExAC from UCSC
To load session, user: jasewong session name: bioinf_workshop_SNP_2016
![Page 36: Genome variation part 2 - University of New South Wales · 2017. 4. 20. · Genome variation –part 2 Prince of Wales Clinical School Dr Jason Wong Introductory bioinformatics for](https://reader033.fdocuments.in/reader033/viewer/2022052010/6020d6278fbc06492158c97e/html5/thumbnails/36.jpg)
![Page 37: Genome variation part 2 - University of New South Wales · 2017. 4. 20. · Genome variation –part 2 Prince of Wales Clinical School Dr Jason Wong Introductory bioinformatics for](https://reader033.fdocuments.in/reader033/viewer/2022052010/6020d6278fbc06492158c97e/html5/thumbnails/37.jpg)
How about whole genomes?
• With increasing numbers of exomes and whole genomes ExAC will eventually be superseded by gnomAD.
Source: https://macarthurlab.org/blog/
![Page 38: Genome variation part 2 - University of New South Wales · 2017. 4. 20. · Genome variation –part 2 Prince of Wales Clinical School Dr Jason Wong Introductory bioinformatics for](https://reader033.fdocuments.in/reader033/viewer/2022052010/6020d6278fbc06492158c97e/html5/thumbnails/38.jpg)
Interface is very similar to ExAC
http://gnomad.broadinstitute.org/
![Page 39: Genome variation part 2 - University of New South Wales · 2017. 4. 20. · Genome variation –part 2 Prince of Wales Clinical School Dr Jason Wong Introductory bioinformatics for](https://reader033.fdocuments.in/reader033/viewer/2022052010/6020d6278fbc06492158c97e/html5/thumbnails/39.jpg)
Extra coverage information for genome (green line)But no functional constraint and CNV information (yet!).
![Page 40: Genome variation part 2 - University of New South Wales · 2017. 4. 20. · Genome variation –part 2 Prince of Wales Clinical School Dr Jason Wong Introductory bioinformatics for](https://reader033.fdocuments.in/reader033/viewer/2022052010/6020d6278fbc06492158c97e/html5/thumbnails/40.jpg)
Even more LoF variants…
![Page 41: Genome variation part 2 - University of New South Wales · 2017. 4. 20. · Genome variation –part 2 Prince of Wales Clinical School Dr Jason Wong Introductory bioinformatics for](https://reader033.fdocuments.in/reader033/viewer/2022052010/6020d6278fbc06492158c97e/html5/thumbnails/41.jpg)
Non-coding region
Paste in chr19:45408461-45408628 (promoter of APOE)
![Page 42: Genome variation part 2 - University of New South Wales · 2017. 4. 20. · Genome variation –part 2 Prince of Wales Clinical School Dr Jason Wong Introductory bioinformatics for](https://reader033.fdocuments.in/reader033/viewer/2022052010/6020d6278fbc06492158c97e/html5/thumbnails/42.jpg)
Downloading gnomAD files
![Page 43: Genome variation part 2 - University of New South Wales · 2017. 4. 20. · Genome variation –part 2 Prince of Wales Clinical School Dr Jason Wong Introductory bioinformatics for](https://reader033.fdocuments.in/reader033/viewer/2022052010/6020d6278fbc06492158c97e/html5/thumbnails/43.jpg)
But files are so big…
![Page 44: Genome variation part 2 - University of New South Wales · 2017. 4. 20. · Genome variation –part 2 Prince of Wales Clinical School Dr Jason Wong Introductory bioinformatics for](https://reader033.fdocuments.in/reader033/viewer/2022052010/6020d6278fbc06492158c97e/html5/thumbnails/44.jpg)
Further reading
• ExAC paper (https://www.ncbi.nlm.nih.gov/pubmed/27535533)
• ExAC guide (https://macarthurlab.org/2014/11/18/a-guide-to-the-
exome-aggregation-consortium-exac-data-set/)
• gnomAD guide (https://macarthurlab.org/2017/02/27/the-genome-
aggregation-database-gnomad/)