GENOME-CENTRIC DATABASES Daniel Svozil. NCBI Gene Search for DUT gene in human.

28
GENOME-CENTRIC DATABASES Daniel Svozil

Transcript of GENOME-CENTRIC DATABASES Daniel Svozil. NCBI Gene Search for DUT gene in human.

Page 1: GENOME-CENTRIC DATABASES Daniel Svozil. NCBI Gene Search for DUT gene in human.

GENOME-CENTRIC DATABASESDaniel Svozil

Page 2: GENOME-CENTRIC DATABASES Daniel Svozil. NCBI Gene Search for DUT gene in human.

NCBI Gene• Search for DUT gene in human

Page 3: GENOME-CENTRIC DATABASES Daniel Svozil. NCBI Gene Search for DUT gene in human.

Obtaining gene sequence• Genomic regions section of the full report – click on

FASTA• If you want to adjust the range to capture, modify the values in the

Change region shown tool on the FASTA display and click on Update View.

Page 4: GENOME-CENTRIC DATABASES Daniel Svozil. NCBI Gene Search for DUT gene in human.

Obtaining gene sequence• Genomic regions section – click on Graphics

Place your cursor over this bar

Click these arrows

again, region can be adjusted in FASTA view

Page 5: GENOME-CENTRIC DATABASES Daniel Svozil. NCBI Gene Search for DUT gene in human.

Obtaining gene sequence• Genomic context section – MapViewer• Click on Download/View Sequence/Evidence in the upper

right of Map Viewer display, or click on dl in the label for the gene.

Page 6: GENOME-CENTRIC DATABASES Daniel Svozil. NCBI Gene Search for DUT gene in human.

On the plus/minus strands and numbering

5’

3’

3’

5’

plus strand minus strand

1

2

3

4

5

6

7

1

2

3

4

5

6

7

gene on plus starts at 2 and ends at 5

gene on minus starts at 5 and ends at 2

Page 7: GENOME-CENTRIC DATABASES Daniel Svozil. NCBI Gene Search for DUT gene in human.

Obtaining gene sequence• How many transcript variants exist for human TP53 gene?

• Search for TP53[gene] AND human[orgn]• In GenBank View find

mRNAs in FEATURES• seven variants

Page 8: GENOME-CENTRIC DATABASES Daniel Svozil. NCBI Gene Search for DUT gene in human.

Obtaining gene sequence• For a limited number of genes in the human genome,

gene-specific genomic RefSeqs, termed RefSeqGene, have been created.

• These have a RefSeq accession beginning with NG_ and can be retrieved from the nucleotide database using the query keyword refseqgene.

• What is the accession number of RefSeqGene of TP53 gene?

Page 9: GENOME-CENTRIC DATABASES Daniel Svozil. NCBI Gene Search for DUT gene in human.

GeneRIF• Gene Reference into Function• A GeneRIF is a concise phrase describing a function or

functions of a gene, with the PubMed citation supporting that assertion. The majority of GeneRIFs have been provided by a collaboration between the NLM's Index Section and NCBI. There is no constraint on the number of independent submissions of GeneRIFs per PubMed id, although those from non-NLM sources are reviewed by RefSeq staff.

Page 10: GENOME-CENTRIC DATABASES Daniel Svozil. NCBI Gene Search for DUT gene in human.

Phenotypes• This section reports the effect of the gene on phenotype,

especially disease.• For human genes, the first row links to the Phenotype-

Genotype Integrator, (PheGenI), a web portal providing a tabular display of genome-wide association study results relating the gene and/or its expression to a phenotype.

• Named phenotypes are provided in subsequent rows. Each phenotype row may be expanded, providing links to more information if available.

Page 11: GENOME-CENTRIC DATABASES Daniel Svozil. NCBI Gene Search for DUT gene in human.

Interactions• There are two major subcategories of information reported

as Interactions: HIV-1 interactions and general interactions (TP53 has both).

• The HIV-1, Human Protein Interaction Database focuses on the human proteins that have been shown to interact with proteins from HIV-1.

product of the gene that is part of the interaction

the other interactant

source of these data description of the interaction

Page 12: GENOME-CENTRIC DATABASES Daniel Svozil. NCBI Gene Search for DUT gene in human.

General gene information• Several subcategories of information including

• Pathways: A description of pathways that include this gene with links to more information about that pathway.

• Homology: A partial listing, with links, of orthologs in other species.• GeneOntology (GO): The specific GO terms are listed by source

of the information, category, term, evidence information, and links to supporting publications.

Page 13: GENOME-CENTRIC DATABASES Daniel Svozil. NCBI Gene Search for DUT gene in human.

Gene Ontology (GO) I• Unify the representation of gene and gene product attributes

across all species.• Project aims:

• Maintain and develop controlled vocabulary of gene and gene product attributes

• Annotate genes and gene products• Provide tools for easy access to all aspects of the data provided by the

project

Page 14: GENOME-CENTRIC DATABASES Daniel Svozil. NCBI Gene Search for DUT gene in human.

Gene Ontology (GO) II• The ontology covers three domains:

• molecular function, the elemental activities of a gene product at the molecular level, such as binding or catalysis

• biological process, operations or sets of molecular events with a defined beginning and end, pertinent to the functioning of integrated living units: cells, tissues, organs, and organisms.

• cellular component, the parts of a cell or its extracellular environment

• http://www.geneontology.org/• AmiGO browser -

http://amigo.geneontology.org/cgi-bin/amigo/go.cgi

Page 15: GENOME-CENTRIC DATABASES Daniel Svozil. NCBI Gene Search for DUT gene in human.

NCBI Reference Sequences (RefSeqs)

• This section describes the gene-specific NCBI reference sequences (RefSeqs) that have been established for this gene.

Page 16: GENOME-CENTRIC DATABASES Daniel Svozil. NCBI Gene Search for DUT gene in human.

Exercise• retrieve all records for human genes that are associated

with OMIM and have been annotated on the genome• Advanced search + Limits – Homo Sapiens

• Full list of Entrez filters: http://eutils.ncbi.nlm.nih.gov/entrez/query/static/entrezlinks.html

Page 17: GENOME-CENTRIC DATABASES Daniel Svozil. NCBI Gene Search for DUT gene in human.

Selected Entrez filters

http://www.ncbi.nlm.nih.gov/books/NBK3841/table/EntrezGene.T.filter_sets_partial_complet/?report=objectonly

Page 18: GENOME-CENTRIC DATABASES Daniel Svozil. NCBI Gene Search for DUT gene in human.

Genome-centric databases• Nucleotide sequences are routinely determined at the

whole genome or chromosome scale – at least for microorganisms

• We now have information not only about individual gene sequences, but also e.g. about their relative positions or strand orientation.

• To take advantage of this more global information, researchers have had to design state-of-the-art genome-centric sequence-information management systems that can connect specialized sequence collections with browsing tools.

Page 19: GENOME-CENTRIC DATABASES Daniel Svozil. NCBI Gene Search for DUT gene in human.

The NCBI Map Viewer• http://www.ncbi.nlm.nih.gov/mapview/ • The term “map” refers to a position of a particular type of

object in a particular coordinate system. • This means that there is not one sequence map but a set

of maps in various sequence coordinates.• Map Viewer is now used to present genetic, cytogenetic,

sequence-based, … maps for many genomes.• The details about genome assembly and annotation can

be found here: http://www.ncbi.nlm.nih.gov/books/NBK21086/

• Map Viewer integrates map and sequence data from a variety of sources.

Page 20: GENOME-CENTRIC DATABASES Daniel Svozil. NCBI Gene Search for DUT gene in human.

The NCBI Map Viewer• Map Viewer is a powerful tool because it provides

• a mechanism to compare maps in different coordinate systems• a robust query interface• diverse options for configuring the display• multiple functions to report and download maps and annotated

information• tools to manipulate nucleotide sequence such as ModelMaker (for

constructing mRNAs from putative exon sequences)• connections to comprehensive data files for transfer by FTP• detailed descriptions of the objects displayed on the maps

Page 21: GENOME-CENTRIC DATABASES Daniel Svozil. NCBI Gene Search for DUT gene in human.

Non-sequence-based maps• not based directly on sequence• include published maps in the following coordinate

systems• genetic linkage• radiation hybrid• cytogenetic• ordinal (i.e. in the order of clones)

• The primary sources of each map are described in the online help documentation of each genome-specific Map Viewer.

Page 22: GENOME-CENTRIC DATABASES Daniel Svozil. NCBI Gene Search for DUT gene in human.

Sequence-based maps• The sequence-based maps can be supplied by external

sources and/or supplied from features computed within NCBI.

• For example, when the annotated sequence for a complete genome is submitted to the GenBank, a copy of the data may also be accessioned as Reference Sequences (RefSeqs).

• The gene, transcript, and other feature annotations of the submitted complete genome are processed for display in the Map Viewer.

• NCBI staff may then calculate and display the position of other types of features, such as marker position or points of variation, as separate maps.

Page 23: GENOME-CENTRIC DATABASES Daniel Svozil. NCBI Gene Search for DUT gene in human.

Types of Map Viewer annotation provided by NCBI

source: http://www.ncbi.nlm.nih.gov/books/NBK21089/table/A1565/?report=objectonly

Page 24: GENOME-CENTRIC DATABASES Daniel Svozil. NCBI Gene Search for DUT gene in human.

NCBI data resources used in NCBI-generated annotation

source: http://www.ncbi.nlm.nih.gov/books/NBK21089/table/A1566/?report=objectonly

Page 25: GENOME-CENTRIC DATABASES Daniel Svozil. NCBI Gene Search for DUT gene in human.

Relationships• In addition to supporting the display of multiple maps in

the same coordinate system (e.g., multiple sequence-based maps), Map Viewer also displays maps in different coordinate systems by calculating the correspondances among them (e.g., sequence to genetic).

• This is accomplished by: • identifying features that have been placed on maps in different

coordinate systems (mainly STSs)• using general conversion factors

Page 26: GENOME-CENTRIC DATABASES Daniel Svozil. NCBI Gene Search for DUT gene in human.

Map Viewer• http://www.ncbi.nlm.nih.gov/mapview/

genome can be searched from this page

Page 27: GENOME-CENTRIC DATABASES Daniel Svozil. NCBI Gene Search for DUT gene in human.

Position based access• Display a particular section of a genome by using a range

of positions as a query• Select a particular chromosome first

• Enter a value into the Region Shown• This could be a numerical range

(base pairs are the default if no units are

entered), the names of clones, genes,

markers, SNPs, or any combination

• Use the Maps & Options control

Page 28: GENOME-CENTRIC DATABASES Daniel Svozil. NCBI Gene Search for DUT gene in human.

Maps

• sv – Sequence Viewer, review the sequence• dl – download the sequence of interest• ev – Evidence Viewer, mRNA alignments in a region• hm – homology maps• mm – Model Maker, create cDNA in a real time

individual maps