HUGO Gene Nomenclature Committee (HGNC), Department of Biology, University College London, Wolfson...

9
HUGO Gene Nomenclature Committee (HGNC), Department of Biology, University College London, Wolfson House, 4 Stephenson Way, London NW1 2HE, UK. The work of the HGNC is supported by NHGRI grant P41 HG003345, the UK Medical Research Council and the Wellcome Trust Email: [email protected] URL: http://www.gene.ucl.ac.uk/nomenclatu Accounting for Copy Number Variation: A Hierarchical Database Structure Sneddon TP, Lush MJ, Wright MW, Sneddon KMB, Povey S and Bruford EA

Transcript of HUGO Gene Nomenclature Committee (HGNC), Department of Biology, University College London, Wolfson...

Page 1: HUGO Gene Nomenclature Committee (HGNC), Department of Biology, University College London, Wolfson House, 4 Stephenson Way, London NW1 2HE, UK. The work.

HUGO Gene Nomenclature Committee (HGNC), Department of Biology, University College London, Wolfson House, 4 Stephenson Way, London NW1 2HE, UK.

The work of the HGNC is supported by NHGRI grant P41 HG003345,the UK Medical Research Council and the Wellcome Trust

Email: [email protected]: http://www.gene.ucl.ac.uk/nomenclature/

Accounting for Copy Number Variation:A Hierarchical Database Structure

Sneddon TP, Lush MJ, Wright MW, Sneddon KMB, Povey S and Bruford EA

Page 2: HUGO Gene Nomenclature Committee (HGNC), Department of Biology, University College London, Wolfson House, 4 Stephenson Way, London NW1 2HE, UK. The work.

Introduction

The HGNC has to date approved over 24,000 unique symbols and names, the majority of which are for ‘genes’, i.e. genomic segments that are transcribed and translated into functional proteins.

However, an increasing number of genes, initially thought to be single copy in the human genome, are turning out to be copy number variant (CNV) between individuals1-4. This is predominantly the case for genes encoding secreted, olfactory and immunity related proteins, like the well-established amylase and defensin gene families5.

Following community discussions of CNV nomenclature at the American Society of Human Genetics meeting 2005 and the joint HGNC and HGVS (Human Genome Variation Society) satellite meeting at HGM2006 it was agreed that a method of naming CNV genes was required and several suggestions were made for how this could be achieved.

Based on these suggestions the HGNC has now implemented a hierarchical database structure to capture and represent information concerning genomic variation such as copy number variant genes.

Page 3: HUGO Gene Nomenclature Committee (HGNC), Department of Biology, University College London, Wolfson House, 4 Stephenson Way, London NW1 2HE, UK. The work.

The defensin beta (DEFB) copy number variant genes

An example of copy number variation is illustrated by the >300 kb segmentally duplicated region on 8p23.1 that contains the DEFB4, DEFB103-DEFB107, SPAG11, and DEFB109 genes.

UCSC Genome Browser on Human Mar. 2006 Assembly

> 300 kb segmentally duplicated region

> 300 kb segmentally duplicated region

As shown above, the current genome build includes two copies of the segmental duplication at >96% nucleotide identity, in opposite orientations, either side of a gap. This region has been shown to be copy number variant and present in 2-12 copies per diploid genome9-11.

Page 4: HUGO Gene Nomenclature Committee (HGNC), Department of Biology, University College London, Wolfson House, 4 Stephenson Way, London NW1 2HE, UK. The work.

Searchgenes results for ‘DEFB103’

As an example of our hierarchical database structure the result from searching for one of the copy number variant defensin genes, DEFB103, using Searchgenes12 is shown below. By default only the DEFB103 gene record is returned. An option will be provided to display all variants.

Link to DEFB103 gene record (Fig. 1)

Page 5: HUGO Gene Nomenclature Committee (HGNC), Department of Biology, University College London, Wolfson House, 4 Stephenson Way, London NW1 2HE, UK. The work.

The DEFB103 gene record

The DEFB103 gene record shown below combines sequence and gene information for both the DEFB103A and DEFB103B copy number variant genes. The sub-entry field links to the individual DEFB103A (chr8: 7.8 Mb - Fig. 2) and DEFB103B (chr8: 7.3 Mb) gene records. There is also a link to the DEFB103 search result in the Database of Genomic Variants6 (Fig. 3).

Link to Database of Genomic Variants6

Link to DEFB103A and DEFB103B CNV sub-entries

Fig. 1

Page 6: HUGO Gene Nomenclature Committee (HGNC), Department of Biology, University College London, Wolfson House, 4 Stephenson Way, London NW1 2HE, UK. The work.

The DEFB103A copy number variant gene record

The DEFB103A copy number variant sub-entry gene record shown below lists sequence and gene information for the defensin, beta 103 gene located on chr 8: 78 Mb. There is also a link to the DEFB103 search result in the Database of Genomic Variants6 (Fig. 3).

Link to Database of Genomic Variants6

Fig. 2

Linkback to DEFB103 gene record

Page 7: HUGO Gene Nomenclature Committee (HGNC), Department of Biology, University College London, Wolfson House, 4 Stephenson Way, London NW1 2HE, UK. The work.

Fig. 3: Partial screenshots to show some of the information available from the Database of Genomic Variants6 on Human Genome Assembly Build 36 for the DEFB103 gene (circled in red).

Page 8: HUGO Gene Nomenclature Committee (HGNC), Department of Biology, University College London, Wolfson House, 4 Stephenson Way, London NW1 2HE, UK. The work.

• The CNV gene is published (or to be submitted) and is listed in the Database of Genomic Variants6. If your gene is not already listed please submit your data to the Database of Genomic Variants6 before contacting us.

• NCBI7 and VEGA8 (if annotated by VEGA8) agree upon the co-ordinates for each of the CNV copies in a reference sequence. This can include alternate assemblies (e.g. based on Celera assembly) and/or haplotypes (e.g. c5_H2 and c22_H2).

Criteria for naming copy number variant genes

HGNC will provide a gene symbol for a copy number variant gene upon request when the following criteria are reached:

These points will be added to our official Guidelines13 and our hierarchical database structure will be public once populated with >100 copy number variants.

Page 9: HUGO Gene Nomenclature Committee (HGNC), Department of Biology, University College London, Wolfson House, 4 Stephenson Way, London NW1 2HE, UK. The work.

Summary

If you have a copy number variant gene submission please complete our online gene symbol request form, specifying the CNV status in the additional  comments and information field:

http://www.gene.ucl.ac.uk/cgi-bin/nomenclature/request.pl

Please visit us at Booth 509 in the Exhibition Hallor email [email protected] to discuss your views

References1. Iafrate AJ et al. (2004) Nat. Genet. 36(9):949-51.2. Sebat J et al. (2004) Science. 305(5683):525-8. 3. Sharp AJ et al. (2005) Am. J. Hum. Genet. 77(1):78-88.4. Redon R et. al. (2006) Nature. 444(7118):444-54. 5. Nguyen D-C, Webber C and Ponting CP (2006) PLoS Genet. 2(2):198-207. 6. http://projects.tcag.ca/variation/7. http://www.ncbi.nlm.nih.gov8. http://vega.sanger.ac.uk/9. Hollox EJ, Armour JA and Barber JC (2003) Am. J. Hum. Genet. 73(3):591-600.10. Taudien S et al. (2004) BMC Genomics. 5(1):92.11. Linzmeier RM and Ganz T (2005) Genomics. 86(4):423-30. 12. http://www.gene.ucl.ac.uk/cgi-bin/nomenclature/searchgenes.pl13. http://www.gene.ucl.ac.uk/nomenclature/guidelines.html