Post on 12-Jan-2016
GENOME-CENTRIC DATABASESDaniel Svozil
NCBI Gene• Search for DUT gene in human
Obtaining gene sequence• Genomic regions section of the full report – click on
FASTA• If you want to adjust the range to capture, modify the values in the
Change region shown tool on the FASTA display and click on Update View.
Obtaining gene sequence• Genomic regions section – click on Graphics
Place your cursor over this bar
Click these arrows
again, region can be adjusted in FASTA view
Obtaining gene sequence• Genomic context section – MapViewer• Click on Download/View Sequence/Evidence in the upper
right of Map Viewer display, or click on dl in the label for the gene.
On the plus/minus strands and numbering
5’
3’
3’
5’
plus strand minus strand
1
2
3
4
5
6
7
1
2
3
4
5
6
7
gene on plus starts at 2 and ends at 5
gene on minus starts at 5 and ends at 2
Obtaining gene sequence• How many transcript variants exist for human TP53 gene?
• Search for TP53[gene] AND human[orgn]• In GenBank View find
mRNAs in FEATURES• seven variants
Obtaining gene sequence• For a limited number of genes in the human genome,
gene-specific genomic RefSeqs, termed RefSeqGene, have been created.
• These have a RefSeq accession beginning with NG_ and can be retrieved from the nucleotide database using the query keyword refseqgene.
• What is the accession number of RefSeqGene of TP53 gene?
GeneRIF• Gene Reference into Function• A GeneRIF is a concise phrase describing a function or
functions of a gene, with the PubMed citation supporting that assertion. The majority of GeneRIFs have been provided by a collaboration between the NLM's Index Section and NCBI. There is no constraint on the number of independent submissions of GeneRIFs per PubMed id, although those from non-NLM sources are reviewed by RefSeq staff.
Phenotypes• This section reports the effect of the gene on phenotype,
especially disease.• For human genes, the first row links to the Phenotype-
Genotype Integrator, (PheGenI), a web portal providing a tabular display of genome-wide association study results relating the gene and/or its expression to a phenotype.
• Named phenotypes are provided in subsequent rows. Each phenotype row may be expanded, providing links to more information if available.
Interactions• There are two major subcategories of information reported
as Interactions: HIV-1 interactions and general interactions (TP53 has both).
• The HIV-1, Human Protein Interaction Database focuses on the human proteins that have been shown to interact with proteins from HIV-1.
product of the gene that is part of the interaction
the other interactant
source of these data description of the interaction
General gene information• Several subcategories of information including
• Pathways: A description of pathways that include this gene with links to more information about that pathway.
• Homology: A partial listing, with links, of orthologs in other species.• GeneOntology (GO): The specific GO terms are listed by source
of the information, category, term, evidence information, and links to supporting publications.
Gene Ontology (GO) I• Unify the representation of gene and gene product attributes
across all species.• Project aims:
• Maintain and develop controlled vocabulary of gene and gene product attributes
• Annotate genes and gene products• Provide tools for easy access to all aspects of the data provided by the
project
Gene Ontology (GO) II• The ontology covers three domains:
• molecular function, the elemental activities of a gene product at the molecular level, such as binding or catalysis
• biological process, operations or sets of molecular events with a defined beginning and end, pertinent to the functioning of integrated living units: cells, tissues, organs, and organisms.
• cellular component, the parts of a cell or its extracellular environment
• http://www.geneontology.org/• AmiGO browser -
http://amigo.geneontology.org/cgi-bin/amigo/go.cgi
NCBI Reference Sequences (RefSeqs)
• This section describes the gene-specific NCBI reference sequences (RefSeqs) that have been established for this gene.
Exercise• retrieve all records for human genes that are associated
with OMIM and have been annotated on the genome• Advanced search + Limits – Homo Sapiens
• Full list of Entrez filters: http://eutils.ncbi.nlm.nih.gov/entrez/query/static/entrezlinks.html
Selected Entrez filters
http://www.ncbi.nlm.nih.gov/books/NBK3841/table/EntrezGene.T.filter_sets_partial_complet/?report=objectonly
Genome-centric databases• Nucleotide sequences are routinely determined at the
whole genome or chromosome scale – at least for microorganisms
• We now have information not only about individual gene sequences, but also e.g. about their relative positions or strand orientation.
• To take advantage of this more global information, researchers have had to design state-of-the-art genome-centric sequence-information management systems that can connect specialized sequence collections with browsing tools.
The NCBI Map Viewer• http://www.ncbi.nlm.nih.gov/mapview/ • The term “map” refers to a position of a particular type of
object in a particular coordinate system. • This means that there is not one sequence map but a set
of maps in various sequence coordinates.• Map Viewer is now used to present genetic, cytogenetic,
sequence-based, … maps for many genomes.• The details about genome assembly and annotation can
be found here: http://www.ncbi.nlm.nih.gov/books/NBK21086/
• Map Viewer integrates map and sequence data from a variety of sources.
The NCBI Map Viewer• Map Viewer is a powerful tool because it provides
• a mechanism to compare maps in different coordinate systems• a robust query interface• diverse options for configuring the display• multiple functions to report and download maps and annotated
information• tools to manipulate nucleotide sequence such as ModelMaker (for
constructing mRNAs from putative exon sequences)• connections to comprehensive data files for transfer by FTP• detailed descriptions of the objects displayed on the maps
Non-sequence-based maps• not based directly on sequence• include published maps in the following coordinate
systems• genetic linkage• radiation hybrid• cytogenetic• ordinal (i.e. in the order of clones)
• The primary sources of each map are described in the online help documentation of each genome-specific Map Viewer.
Sequence-based maps• The sequence-based maps can be supplied by external
sources and/or supplied from features computed within NCBI.
• For example, when the annotated sequence for a complete genome is submitted to the GenBank, a copy of the data may also be accessioned as Reference Sequences (RefSeqs).
• The gene, transcript, and other feature annotations of the submitted complete genome are processed for display in the Map Viewer.
• NCBI staff may then calculate and display the position of other types of features, such as marker position or points of variation, as separate maps.
Types of Map Viewer annotation provided by NCBI
source: http://www.ncbi.nlm.nih.gov/books/NBK21089/table/A1565/?report=objectonly
NCBI data resources used in NCBI-generated annotation
source: http://www.ncbi.nlm.nih.gov/books/NBK21089/table/A1566/?report=objectonly
Relationships• In addition to supporting the display of multiple maps in
the same coordinate system (e.g., multiple sequence-based maps), Map Viewer also displays maps in different coordinate systems by calculating the correspondances among them (e.g., sequence to genetic).
• This is accomplished by: • identifying features that have been placed on maps in different
coordinate systems (mainly STSs)• using general conversion factors
Map Viewer• http://www.ncbi.nlm.nih.gov/mapview/
genome can be searched from this page
Position based access• Display a particular section of a genome by using a range
of positions as a query• Select a particular chromosome first
• Enter a value into the Region Shown• This could be a numerical range
(base pairs are the default if no units are
entered), the names of clones, genes,
markers, SNPs, or any combination
• Use the Maps & Options control
Maps
• sv – Sequence Viewer, review the sequence• dl – download the sequence of interest• ev – Evidence Viewer, mRNA alignments in a region• hm – homology maps• mm – Model Maker, create cDNA in a real time
individual maps