Sample to Insight BIOBASE Training Human Gene Mutation Database (HGMD ® ) The only comprehensive...

37
Sample to Insight BIOBASE Training Human Gene Mutation Database (HGMD ® ) The only comprehensive source of data on human inherited disease-associated mutations

Transcript of Sample to Insight BIOBASE Training Human Gene Mutation Database (HGMD ® ) The only comprehensive...

Page 1: Sample to Insight BIOBASE Training Human Gene Mutation Database (HGMD ® ) The only comprehensive source of data on human inherited disease-associated mutations.

Sample to Insight

BIOBASE Training

Human Gene Mutation Database (HGMD®)

The only comprehensive source of data on human inherited disease-associated mutations

Page 2: Sample to Insight BIOBASE Training Human Gene Mutation Database (HGMD ® ) The only comprehensive source of data on human inherited disease-associated mutations.

Sample to Insight

A comprehensive source of mutation data

• Focus on peer-reviewed scientific literature

• Experimental results are extracted by highly trained genetic experts

• Content is updated 4x per year

Page 3: Sample to Insight BIOBASE Training Human Gene Mutation Database (HGMD ® ) The only comprehensive source of data on human inherited disease-associated mutations.

Sample to Insight

More than 170,000 curated mutations

HGMD® Professional  Spring 2015.2 Release

 Mutation Type Number of Entries

Micro Lesions:  

  Missense / Nonsense 94860

  Splicing 15476

  Regulatory 3242

  Small Deletions 25454

  Small Insertions 10617

  Small Indels 2436

Gross Lesions:  

  Repeat Variations 476

  Gross Insertions / Duplication 3086

  Complex Rearrangements 1638

  Gross Deletions 12833

  Total 170118

Page 4: Sample to Insight BIOBASE Training Human Gene Mutation Database (HGMD ® ) The only comprehensive source of data on human inherited disease-associated mutations.

Sample to Insight

HGMD® advantages

• Identifying the known genetic causes of a given inherited disease

• Understanding the mutational spectrum of a particular gene

• Verifying novel mutations

• Assessing individual disease risk

• Reducing time for literature review relating to a given inherited disease

HGMD® is the industry standard for:

Page 5: Sample to Insight BIOBASE Training Human Gene Mutation Database (HGMD ® ) The only comprehensive source of data on human inherited disease-associated mutations.

Sample to Insight

LRRK2

Mutation report for CM074929

Page 6: Sample to Insight BIOBASE Training Human Gene Mutation Database (HGMD ® ) The only comprehensive source of data on human inherited disease-associated mutations.

Sample to Insight

Categorization of mutations & polymorphisms

DM = Disease causing (pathological) mutation

DM? = Likely disease causing (likely pathological) mutation

DP = Disease associated polymorphism

DFP = Disease associated polymorphism with additional supporting functional evidence

FP = Polymorphism affecting the structure, function or expression of a gene but with no disease association reported yet

Page 7: Sample to Insight BIOBASE Training Human Gene Mutation Database (HGMD ® ) The only comprehensive source of data on human inherited disease-associated mutations.

Sample to Insight

PGMDTM

• Comprehensive pharmacogenomic database

• PGx/ADME panels

• FDA and EMA approved drugs containing PGx labels Associations from 6500+ publications from

500+ journals studying >1400 drugs

Page 8: Sample to Insight BIOBASE Training Human Gene Mutation Database (HGMD ® ) The only comprehensive source of data on human inherited disease-associated mutations.

Sample to Insight

Facilitates mapping of variants onto genome at position or genotype level

Associations from 6500+ publications from 500+ journals studying >1400 drugs

A/C

• Median dose requirement of warfarin in patients with CYP2C9*1/CYP2C9*3 haplotype is 2.6 mg

Genotype/haplotype specific findings

• p-value - .001• Relative Risk, Hazards Ratio, 95%

Confidence Interval when available

Statistical significance

• 22 cases with A/C genotype, 159 subjects studied, Design - Clinical Trial

• Pop: European Continental Ancestry Group, Age: 24-95, Treatment: All patients are treated with 0.5 mg to 10 mg/day of warfarin

Study details (All studies are in vivo)

PGMD: PharmacoGenomic Mutation Database

Page 9: Sample to Insight BIOBASE Training Human Gene Mutation Database (HGMD ® ) The only comprehensive source of data on human inherited disease-associated mutations.

Sample to Insight

Types of evidence

Page 10: Sample to Insight BIOBASE Training Human Gene Mutation Database (HGMD ® ) The only comprehensive source of data on human inherited disease-associated mutations.

Sample to Insight

HapMap D’, LOD, and R2 scores

Computed for all PGMD sites Includes between non-PGMD sites

Linkage Disequilibrium

Page 11: Sample to Insight BIOBASE Training Human Gene Mutation Database (HGMD ® ) The only comprehensive source of data on human inherited disease-associated mutations.

Sample to Insight

Allele frequencies

Major sources including: EVS 1000 Genomes HapMap

Page 12: Sample to Insight BIOBASE Training Human Gene Mutation Database (HGMD ® ) The only comprehensive source of data on human inherited disease-associated mutations.

Sample to Insight

Delivery models

Online

PGMD Web InterfaceSubject specific

annotation via Genome Trax

Download

MySQL databaseTSV BEDGFF

Custom Pipeline

Integration

Page 13: Sample to Insight BIOBASE Training Human Gene Mutation Database (HGMD ® ) The only comprehensive source of data on human inherited disease-associated mutations.

Sample to Insight

Genome Trax™

Page 14: Sample to Insight BIOBASE Training Human Gene Mutation Database (HGMD ® ) The only comprehensive source of data on human inherited disease-associated mutations.

Sample to Insight

NGS analysis pipeline

Page 15: Sample to Insight BIOBASE Training Human Gene Mutation Database (HGMD ® ) The only comprehensive source of data on human inherited disease-associated mutations.

Sample to Insight

Genome Trax™

Candidate Genes

Disease causing variants

Regulatory variants

Over 190 million annotations total

Track Release 2015.1HGMD® inherited disease mutations 146,581HGMD® imputed mutations 14,570Pharmacogenomic Variants 806,806GWAS Catalogue 18,735COSMIC somatic disease mutations 2,626,811ClinVar 127,638TRANSFAC® experimentally verified TFBS 15,330ChIP-seq Transcription Factor Binding Sites 9,178,528Predicted TF@DNase I hypersensitivity sites 10,732,462miRNA gene sites 2,735PTMs (Post-Translational Modifications) 35,079PROTEOME ™ disease genes 14,905PROTEOME ™ Drug target genes 2,976PROTEOME ™ Pathway genes 2,057HGMD® disease genes 27,257SIFT &Polyphen predictions, conservation 88,986,833EVS allele frequencies 3,663,071Allele frequency from 1000 Genomes 12,330,177dbSNP common SNPs 13,604,359dbSNP 60,879,061

Function prediction & frequency

Page 16: Sample to Insight BIOBASE Training Human Gene Mutation Database (HGMD ® ) The only comprehensive source of data on human inherited disease-associated mutations.

Sample to Insight

Use it as you like it

Download Flat files, MySQL dump

Use with genome browsers, excel, tools, scripts,

ANNOVAR, CLC bio Workbenches, Alamut, Cartagenia…

Page 17: Sample to Insight BIOBASE Training Human Gene Mutation Database (HGMD ® ) The only comprehensive source of data on human inherited disease-associated mutations.

Sample to Insight

HGMD – inherited mutations

Page 18: Sample to Insight BIOBASE Training Human Gene Mutation Database (HGMD ® ) The only comprehensive source of data on human inherited disease-associated mutations.

Sample to Insight

HGMDCAC (Histidine) changing to CAA (Glutamine) is causative for disease X

CAC > CAG, leads to the same Histidine to Glutamine changebut would not be a match for the mutation

The HGMD equivalent track covers such cases

HGMD imputed

Page 19: Sample to Insight BIOBASE Training Human Gene Mutation Database (HGMD ® ) The only comprehensive source of data on human inherited disease-associated mutations.

Sample to Insight

Facilitates mapping of variants onto genome at position or genotype level

Associations from 6500+ publications from 500+ journals studying >1400 drugs

A/C

• Median dose requirement of warfarin in patients with CYP2C9*1/CYP2C9*3 haplotype is 2.6 mg

Genotype/haplotype specific findings

• p-value - .001• Relative Risk, Hazards Ratio, 95%

Confidence Interval when available

Statistical significance

• 22 cases with A/C genotype, 159 subjects studied, Design - Clinical Trial

• Pop: European Continental Ancestry Group, Age: 24-95, Treatment: All patients are treated with 0.5 mg to 10 mg/day of warfarin

Study details (All studies are in vivo)

PGMD: PharmacoGenomic Mutation Database

Page 20: Sample to Insight BIOBASE Training Human Gene Mutation Database (HGMD ® ) The only comprehensive source of data on human inherited disease-associated mutations.

Sample to Insight

ClinVar Variants Version:ClinVar-2015-02Track Description:This track contains data from the ClinVar. ClinVar is a public archive of reports that lists relationship between human variations and phenotypes with supporting evidence. Thus ClinVar facilitates access to and communication about the relationships asserted between human variation and observed health status, and how interpretation of variation may change over time. ClinVar collects reports of variants found in patient samples, assertions made regarding their clinical significance, information about the submitter, and other supporting data. The alleles described in the submissions are mapped to reference sequences, and reported according to the HGVS standard.Benefit:This data set contains experimentally observed, clinically significant variants that are reviewed by experts.Filename: clinvarLink-out base URL: http://preview.ncbi.nlm.nih.gov/clinvar/$$Links to: An individual variant report in ClinVar site at NCBI.Accession: ClinVar ID.Feature:HGVS description and the phenotype. For eg: NT_011109.15:g.14128514A>G:Diaphyseal dysplasia;

Page 21: Sample to Insight BIOBASE Training Human Gene Mutation Database (HGMD ® ) The only comprehensive source of data on human inherited disease-associated mutations.

Sample to Insight

COSMIC somatic disease mutations Version: v71Track Description:This track contains data from the Catalogue of Somatic Mutations in Cancer (COSMIC).COSMIC contains somatic mutation information relating to human cancers. The mutation data and associated information is extracted from the primary literature and entered into the COSMIC database. In order to provide a consistent view of the data a histology and tissue ontology has been created and all mutations are mapped to a single version of each gene. A central aim of COSMIC is to provide somatic mutation frequencies. This track contains SNPs, insertions and deletions from COSMIC.We include COSMIC mutations for which a chromosomal position can be determined. The percentage of mutations with position is approximately 75%.Benefit:These somatic mutations complement the set of germ-line mutations from HGMD to allow for a more comprehensive assessment of prior knowledge about observed mutations.Filename: cosmicLink-out base URL:http://www.sanger.ac.uk/perl/genetics/CGP/cosmic?action=mut_summary&id=$$Links to:An individual mutation report in COSMIC site at the Welcome Trust Sanger Institute.Accession: COSMIC Mutation ID.Feature:The histology and mutational change, eg "carcinoma:c.775G>T".

Page 22: Sample to Insight BIOBASE Training Human Gene Mutation Database (HGMD ® ) The only comprehensive source of data on human inherited disease-associated mutations.

Sample to Insight

EVS Exome Variations Version:ESP6500Track Description:The EVS annotation source contains exome sequencing variants retrieved from the Exome Variant Server (EVS) for NHLBI Exome Sequencing Project (ESP)1. The EVS data release (ESP6500) The dataset is comprised of a set of 2203 African-Americans and 4300 European-Americans unrelated individuals, totaling 6503 samples (13,006 chromosomes).. All data were simultaneously analyzed for exome variants at the University of Michigan (Abecasis Laboratory). The methods used for analysis is explained in detail at http://evs.gs.washington.edu/EVS/Benefit:EVS provides the population based genotype, allele counts and MAF scores for the variations observed in exome regions.Filename:evsAccession:a uniqe number identifying the EVS record. e.g. EVS2265387Feature:rsID and hgnc symbol of the gene eg. "rs138751118:C4orf21".

Page 23: Sample to Insight BIOBASE Training Human Gene Mutation Database (HGMD ® ) The only comprehensive source of data on human inherited disease-associated mutations.

Sample to Insight

Orphanet (Beta) Version:02/18/2015Track Description:Orphanet is the reference portal for information on rare diseases and orphan drugs, for all audiences. Orphanet's aim is to help improve the diagnosis, care and treatment of patients with rare diseases.Benefit:Allows you to associate known patterns of inheritance (dominant, recessive) with rare diseases and the genes implicated in them. Togehter with the observed zygosity, and the disease causing mutations in HGMD, this can help you to focus only on dominant disease causing variants, or on recessive disease causing variants that are homozygous in the patient sample.Filename:OrphaAccession:The numerical part of the 'Orpha number‘, for example 79314 associated with the 'Orpha number' ORPHA79314

Page 24: Sample to Insight BIOBASE Training Human Gene Mutation Database (HGMD ® ) The only comprehensive source of data on human inherited disease-associated mutations.

Sample to Insight

GWAS Catalogue Version:02/17/2015Track Description:This track contains data from the GWAS Catalogue1. These are literature derived disease associations for polymorphisms from GWAS studies that assayed at least 100,000 single nucleotide polymorphisms, associations listed are limited to those with p-values < 1.0 x 10 -5. The dataset provides Odds Ratios for common variants that can be used to calculate increased or decreased risk for the disease. A detailed description of the methods to assemble the dataset can be found in Hindorff LA, Sethupathy P, Junkins HA, Ramos EM, Mehta JP, Collins FS, and Manolio TA. Potentialetiologic and functional implications of genome-wide association loci for human diseases and traits. Proc Natl Acad Sci USA. May 27, 2009., available http://www.genome.gov/pages/about/od/newsandfeatures/pnasgwasonlinecatalog.pdf, and at the GWAS Catalogue at www.genome.gov/gwastudies.Benefit: These disease association data are manually curated, experimentally determined associations from the scientific literature, mapped to coordinates. They allow you to identify common SNPs that influence the risk for common diseases.Filename: gwasLink-out base URL: http://www.ncbi.nlm.nih.gov/projects/SNP/snp_ref.cgi?rs=$$Links to: dbSNP record. As the GWAS catalog does not provide reports for the individual SNPs, we link to dbSNP instead.Accession: dbSNP rsidFeature: The disease, risk allele, and odds-ratio or beta (denoted by OR or beta), e.g. “Ovarian_cancer; rs2363956-T;1.1OR

Page 25: Sample to Insight BIOBASE Training Human Gene Mutation Database (HGMD ® ) The only comprehensive source of data on human inherited disease-associated mutations.

Sample to Insight

dbNSFP Nonsynonymous functional predictions Version:version:v2.9Track Description:This track contains data from dbNSFP(Database for Non-synonymous SNPs Functional Predictions)1. href="#fn4">4. dbNSFP is an integrated database of functional predictions from multiple algorithms for the comprehensive collection of human non-synonymous SNPs (NSs).It compiles prediction scores from four new and popular algorithms (SIFT, Polyphen2, LRT, and MutationTaster), along with a conservation score (PhyloP) and other related information, for every potential NS SNP in the human genome. More details about the methods of prediction is available at http://www.ncbi.nlm.nih.gov/pubmed/21520341Benefit:This track also provides a calculated consensus prediction based on the results from different prediction algorithms from dbNSFP data. The prediction of each NSs is accreted according to its deleterious tendency ("Probably Deleterious", "Unknown", "Probably Harmless", "Harmless").Filename:dbnsfpAccession:Gene ID; eg: "85440"Feature:Aminoacid reference base > Aminoacid alternate reference base: Consensus prediction; eg: > N: Probably Deleterious 50%.

Page 26: Sample to Insight BIOBASE Training Human Gene Mutation Database (HGMD ® ) The only comprehensive source of data on human inherited disease-associated mutations.

Sample to Insight

TRANSFAC – gene regulation

Page 27: Sample to Insight BIOBASE Training Human Gene Mutation Database (HGMD ® ) The only comprehensive source of data on human inherited disease-associated mutations.

Sample to Insight

PROTEOME – candidate genes

Page 28: Sample to Insight BIOBASE Training Human Gene Mutation Database (HGMD ® ) The only comprehensive source of data on human inherited disease-associated mutations.

Sample to Insight

PROTEOME – disease genes & drugs

Page 29: Sample to Insight BIOBASE Training Human Gene Mutation Database (HGMD ® ) The only comprehensive source of data on human inherited disease-associated mutations.

Sample to Insight

Trio dataset from clinical practice

Bloom Syndome Our Patient

Autosomal recessive Compound heterozygote

Short stature Short stature

Facial Anomalies Facial Anomalies

Skin hypo- and hyperpigmentation

Skin hypo- and hyperpigmentation

Feeding difficulties Feeding difficulties

Mild intellectual disability Severe intellectual disability

Cancer Predispostion Cancer Predisposition

Frequent childhood infections

No frequent infections

After 20 years, following Genome Trax trio analysis finally able to be diagnosed with

BLOOM SYNDROME

Page 30: Sample to Insight BIOBASE Training Human Gene Mutation Database (HGMD ® ) The only comprehensive source of data on human inherited disease-associated mutations.

Sample to Insight

Stand-alone Application

ANNOVAR Introduction

32

Page 31: Sample to Insight BIOBASE Training Human Gene Mutation Database (HGMD ® ) The only comprehensive source of data on human inherited disease-associated mutations.

Sample to Insight

ANNOVAR requires the annotation databases saved in local disk for annotating genetic variants.

A simple command can be issued to download the database directly from the internet (from UCSC browser, 1000 genome project or the ANNOVAR website).

annotate_variation.pl -downdb [optional arguments] <table-name> <output-directory-name>

Database preparation

33

Page 32: Sample to Insight BIOBASE Training Human Gene Mutation Database (HGMD ® ) The only comprehensive source of data on human inherited disease-associated mutations.

Sample to Insight

Gene anno databases

gene / refgene / refGene

knowngene / knownGene

ensgene / ensGene

Region anno databases

• Cytoband• tfbsConsSites• GenomicSuperDups• omimGene

Filter databases

• 1000g2012apr• snp137• snp135

Database preparation

34

Page 33: Sample to Insight BIOBASE Training Human Gene Mutation Database (HGMD ® ) The only comprehensive source of data on human inherited disease-associated mutations.

Sample to Insight

Database download

35

Page 34: Sample to Insight BIOBASE Training Human Gene Mutation Database (HGMD ® ) The only comprehensive source of data on human inherited disease-associated mutations.

Sample to Insight

ANNOVAR takes text-based input files, where each line corresponds to one variant.

On each line, the first five space- or tab- delimited columns represent

chromosome start position end position ref nucleotides obs nucleotides

Input files

36

Page 35: Sample to Insight BIOBASE Training Human Gene Mutation Database (HGMD ® ) The only comprehensive source of data on human inherited disease-associated mutations.

Sample to Insight

Isolate tumor specific variants by removing the germ line variants

This file, containing filtered results is used as input for gene based annotation which extracts variants in the exonic, intronic, intergenic and other regions

Profiling Breast Cancer variants – Input file

37

Page 36: Sample to Insight BIOBASE Training Human Gene Mutation Database (HGMD ® ) The only comprehensive source of data on human inherited disease-associated mutations.

Sample to Insight

This result file can be searched for specific, high risk genes such as TP53, BRCA1 and BRCA2

Profiling Breast Cancer variants

38

Page 37: Sample to Insight BIOBASE Training Human Gene Mutation Database (HGMD ® ) The only comprehensive source of data on human inherited disease-associated mutations.

Sample to Insight

39