GUS

Post on 13-Mar-2016

30 views 3 download

Tags:

description

GUS - PowerPoint PPT Presentation

Transcript of GUS

GUS

We have created the Genomic Unified Schema (GUS), a relational database that warehouses and integrates biological sequence, sequence annotation, and gene expression data from a large number of heterogeneous sources. User-friendly web interfaces present slices of the GUS database and allow researchers to execute structured queries for information concerning gene structure, function, and expression. Please visit poster #146A for details of the Genomics Unified Schema (GUS).

GUS Supports Multiple ProjectsAllGenes

PlasmoDB

EPConDB

Allgenes is based on a comprehensive mouse and human gene index. The genes are approximated by transcripts predicted from EST and mRNA clustering

PlasmoDB is the official database of the Plasmodium falciparum genome project which provides an integrated view of genome sequence data including expression data from EST, SAGE, and microarray projects

EPConDB is an index of genes expressed in endocrine pancreas. Expression is defined either through microarray experiments or sequence annotation.

"Is my cDNA similar to any mouse genes that are predicted to encode transcription factors and have

been localized to mouse chromosome 5?"

http://www.allgenes.org/

Data Integration Data Analysis Tools•RHMap•GOFunction•Sequence

•GOFunction assigments

•Boolean function•History function•BLAST

This query illustrates several aspects of the GUS database including:

allgenes.org query

Select the allgenes.org boolean query page

Click on the "AND" button

Choose the RH map and GO function queries

Select mouse chromosome 5 and "transcription factor"

There are 22 mouse RNAs (assemblies) that meet these criteria:

This query result set now appears on the query "history" page:

Now use the BLAST page to identify RNAs similar to my cDNA

The results of the BLAST search appear in the query history

Intersect ("AND") the BLAST search with the previous query:

And we have our answer (the third row on the query history page):

Predicted GO function(s)(some manually reviewed)

predicted protein CAP4 assembly EST expression profile UCSC BLAT

Other transcripts fromthe same gene

External links

Mapping information

Protein/motif hits

Gene trap insertions,etc.

"List all genes whose proteins are predicted to contain a signal peptide and for which there is

evidence that they are expressed in Plasmodium falciparum's late schizont stage."

http://plasmodb.org/

Data Integration Data Analysis Tools•Predicted genome translation•Microarray expression

•Spot intensity •History function

This query illustrates several aspects of the GUS database including:

PlasmoDB query:

Select Text Search from the PlasmoDB homepage

Choose signal peptide

Choose chromosome and Gene/prediction type-submit

There are 1952 genes with predicted signal peptides

Choose gene expression-microarray from the homepage

Then choose an experiment, chromosome, and Gene/prediction type - submit

There are 12170 gene predictions that satisfy this query

Go to the history page and choose which simple queries to combine. Select intersect.

We have an answer. There are 949 predicted genes that satisfy our complex query

Click on a gene to get a full report

There is a variety of information available from the report page including:

Gene models predicted using a variety of approaches

and mRNA and protein predictions

"Which DOTS assemblies (RNA) represented on the Endocrine Pancreas Consortium’s chip 2.0 are constituents of the insulin initiated signal transduction pathway ?"

EPConDB query:

http://www.cbil.upenn.edu/EPConDB

Data Integration Data Analysis Tools•Sequence•Microarray experiment•Transduction pathway

•BLAST •History function

Go to the gene information query page and click on “DOTS assemblies involved in a pathway”

Choose the insulin pathway, a p-value, pancreas, the species, and whether an assembly must include an mRNA - submit

There are 59 dots assemblies that are constituents of the insulin pathway

Return to the gene information query page and select clones sets. Choose chip 2.0 - submit

There are 3242 assemblies represented on chip 2

Go to the history page, select the queries to combine and select intersect – view the results

There are 8 assemblies that satisfy the complex query. Clicking on an RNA retrieves an allgenes report.

Acknowledgements and References

The Plasmodium Genome Consortium Sanger http://www.sanger.ac.uk/Projects/P_falciparum TIRG/NMRC http://www.tigr.org/tdb/edb2/pfal/htmls Stanford http://sequence-www.stanford.edu/group/malaria/The many researchers who have contributed data and software to the database

Funding Agencies National Institutes of Health, Wellcome Trust, US Dep’t of Defense, Burroughs Wellcome Fund, World Health Organization, etcThe research community who has supported these large-scale ventures for the benefit of all

References1. K2/Kleisli and GUS: Experiments in integrated access to genomic data sources (2001) Davidson, S.B., J. Crabtree, B.P. Brunk, J. Schug, V.Tannen, G.C. Overton and C.J. Stoeckert, Jr. IBM Systems Journal 40(2):1-202. A relational schema for both array-based and SAGE gene expression experiments (2001) Stoeckert, C., A. Pizarro, E. Manduchi, M. Gibson, B. Brunk, J. Crabtree, J. Schug, S. Shen-Orr and G.C. Overton. Bioinformatics 17(4):300-3083. The GUS schema is available at http://www.allgenes.org/cgi-bin/schemaBrowser.pl4.The RAD schema is available at http://www.cbil.upenn.edu/cgi-bin/RAD2/schemaBrowserRAD.pl

Funding:Acknowledgements

National Institutes of Health, Wellcome Trust, US Dep’t of Defense, Burroughs Wellcome Fund, World Health Organization, etc

EPConDB is part of the NIDDK-sponsored consortium on "Functional Genomics of the Developing Endocrine Pancreas". We gratefully acknowledge support through NIDDK 56947 and 56954 with cosponsorship from the JDFI.

Funding for allgenes.org is provided by NIH grant RO1-HG-01539-03 and DOE grant DE-FG02-00ER62893

allgenes

.org

References

Bahl, A., Brunk, B., Coppel, R.L., Crabtree, J., Diskin, S.J., Fraunholz, M.J., Grant, G.R., Gupta, D., Huestis, R.L., Kissinger, J.C., Labo, P., Li, L., McWeeney, S.K., Milgram, A.J., Roos, D.S., Schug, J., Stoeckert, C.J. (2002) PlasmoDB: The Plasmodium Genome Resource. An integrated database providing tools for accessing and analyzing mapping, expression and sequence data (both finished and unfinished). Nucleic Acids Res. 2002 30: 87-90

Davidson, S.B., Crabtree, J., Brunk, Brian P., Schug, J., Tannen, V., Overton, G.C., Stoeckert, C.J. Jr. (2001) K2/Kleisli and GUS: Experiments in Integrated Access to Genomic Data Sources. IBM Systems Journal: 40(2), p. 512-531.

Scearce, L. Marie, Brestelli, John E., McWeeney, Shannon K., Lee, Catherine S., Mazzarelli, Joan, Pinney, Deborah F., Pizarro, Angel, Stoeckert, C. J. Jr., Clifton, Sandra, Permutt, M. Alan, Brown, Juliana, Melton, Douglas A., Kaestner, Klaus H. (2002) Functional Genomics of the Endocrine Pancreas: The Pancreas Clone Set and PancChip, New Resources for Diabetes Research Diabetes 51: 1997-2004, 2002.

The Plasmodium Genome Database Collaborative (2001) PlasmoDB: An integrative database of the Plasmodium falciparum genome. Tools for accessing and analyzing finished and unfinished sequence data. Nucleic Acids Res., 2001, Vol. 29, No. 1 66-69