GUS 3.0: Web Sites and Tools June 20, 2002 Jonathan Crabtree crabtree@pcbi.upenn.edu.

Post on 13-Dec-2015

213 views 0 download

Tags:

Transcript of GUS 3.0: Web Sites and Tools June 20, 2002 Jonathan Crabtree crabtree@pcbi.upenn.edu.

GUS 3.0: Web Sites and Tools

June 20, 2002

Jonathan Crabtree

crabtree@pcbi.upenn.edu

Outline

Current web interfaces examples: allgenes.org, PlasmoDB.org Java Servlet, CGI-based reusable Java and Perl code, install scripts

The future? PHP and JSP "GUSWWW" schema redesign

GUS - Multiple Views & ProjectsAllGenes.org PlasmoDB.orgEPConDB

CoreSResTESSRADDoTS

Oracle RDBMS Perl Object Layer for Data Loading

Java Servlets + Perl CGI

Other sitesOther projects

allgenes.org query:

"Is my cDNA similar to any mouse genes that are predicted to encode transcription factors and have

been localized to mouse chromosome 5?"

Select the allgenes.org boolean query page

Click on the "AND" button

Choose the RH map and GO function queries

Select mouse chromosome 5 and "transcription factor"

There are 22 mouse RNAs (assemblies) that meet these criteria:

This query result set now appears on the query "history" page:

Now use the BLAST page to identify RNAs similar to my cDNA

The results of the BLAST search appear in the query history

Intersect ("AND") the BLAST search with the previous query:

And we have our answer (the third row on the query history page):

Predicted GO function(s)(some manually reviewed)

predicted protein CAP4 assembly EST expression profile UCSC BLAT

Other transcripts fromthe same gene

External links

Mapping information

Protein/motif hits

Gene trap insertions,etc.

PlasmoDB:Combining Expression

and Sequence Data

"List all genes whose proteins are predicted to contain a signal peptide and for which there is

evidence that they are expressed in Plasmodium falciparum's late schizont stage."

Web Interface ComponentsGUS/www/allgenes/htdocs/

GUS/www/allgenes/htdocs/index.html.in...

GUS/www/allgenes/cgi-bin/GUS/www/allgenes/cgi-bin/rnaProtSimPng.pl.in...

GUS/java/cbil/gus/servlet/GUS/java/cbil/gus/servlet/SiteServlet.java...

GUS/www/install/GUS/www/install/allgenes-config.inGUS/www/install/installServlet.pl

GUS/perl/servlet/allgenes/GUS/perl/servlet/allgenes/rnaProtSim.pl.in...

rnaProtSimPng.pl.in#!@PERL@# -------------------------------------------------------------------# rnaProtSimPng.pl## $Revision: 1.3 $ $Date: 2001/03/22 14:44:57 $ $Author: crabtree $# -------------------------------------------------------------------

use strict;require 'cgi_lib.perl';require '@CGI_DIR@/rnaSimilarityPng.pm';

# Input using cgi_lib.perl#my %rq = &get_request();my $naSeqId = $rq{'id'} || 118619;$naSeqId =~ s/[^\d]//g;

my $maxHits = $rq{'max_hits'};$maxHits =~ s/[^\d]//g;

# Generate image using rnaSimilarityPng.pm#$| = 1;my $mapName = "$naSeqId-prot";my $imgData = &getImage($mapName, $naSeqId, 'ExternalAASequence');print "Content-type: image/png\n\n$imgData";

cbil.gus.servlet.SiteServlet

extends javax.servlet.http.HttpServlet and is the only actual servlet in our Java code

reads a configuration file and instantiates the set of JavaBeans defined therein: instances of PageGeneratorI - content generators SqlQuery - parameterized SQL queries "Param" and "Formatter" classes

implements logging, dispatches requests

allgenes-config.in

# Oracle-specific routines#gusOraSql.class=cbil.gus.servlet.db.oracle.SQL

# Set of logins to GUS or GUSdev#gusLogin.class=cbil.gus.servlet.db.ConnectionPoolgusLogin.Login=@ORACLE_LOGIN@gusLogin.Password=@ORACLE_PASSWORD@gusLogin.DBUrl=@ORACLE_JDBC_URL@gusLogin.NumConnections=6gusLogin.MaxQueryTime=120gusLogin.CheckInterval=30gusLogin.JDBCDrivers=oracle.jdbc.driver.OracleDrivergusLogin.Sql=gusOraSqlgusLogin.PrintStatusMessages=true...

# Retrieve an RNA's sequence from the DB#rnaSeqQ.class=SqlQueryrnaSeqQ.DisplayName=RNA sequencernaSeqQ.Name=rnaSeqQrnaSeqQ.Abbrev=rnaSeqrnaSeqQ.SQL=select nas.sequence \ from dots.NASequenceImp nas, dots.ProjectLink pl \ where nas.na_sequence_id = $$0$$ \ and nas.na_sequence_id = pl.id \ and pl.project_id = 813 \ and pl.table_id in (56, 89)rnaSeqQ.HtmlBrief=RNA sequence for RNA DT.<!--ST0-->rnaSeqQ.Params=rnaIDrnaSeqQ.ResultFormatter=rnaSeqF

# RH map location (DOTS only)#rhLocnID.DisplayName=Chromosomal location based on RH mappingrhLocnID.Name=rhmap_locn_idrhLocnID.Abbrev=rhLocnrhLocnID.SQL=select distinct epcr.na_sequence_id \ from dots.EPCR epcr, dots.RHMapMarker rmm, dots.RHMarker rm, dots.ProjectLink pl \ where rmm.chromosome = '$$0$$' and rmm.centirays >= $$1$$ and rmm.centirays <= $$2$$ \ and rm.rh_marker_id = rmm.rh_marker_id \ and rm.taxon_id $$3$$ \ and epcr.map_table_id = 366 \ and rmm.rh_map_marker_id = epcr.map_id \ and epcr.na_sequence_id = pl.id \ and pl.project_id = @PROJECT_ID@ \ and pl.table_id = 56rhLocnID.HtmlBrief=<!--ST3--> RNAs radiation hybrid mapped to \chromosome <!--ST0--> between <!--ST1--> and <!--ST2--> cRrhLocnID.HtmlLong=This query returns DoTS predicted transcripts that can be \linked to a specific chromosomal location by the radiation hybrid map data. A DoTS \predicted transcript consists of an ...rhLocnID.Params=humanOrMouseChromP,centirayStartP,centirayEndP,taxonIdPrhLocnID.ResultFormatter=dotsIdListF1

humanOrMouseChromP.class=EnumParamhumanOrMouseChromP.Prompt=Select a chromosome:humanOrMouseChromP.Description=Human or mouse chromosomehumanOrMouseChromP.Values=1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,X,YhumanOrMouseChromP.Help=Please select a human or mouse chromosome from \the list provided; note that chromosomes 'Y', '20', '21', and '22' are \only valid for humans.

centirayStartP.class=DoubleParamcentirayStartP.Prompt=Start position in centirays:centirayStartP.Description=Start position in centirayscentirayStartP.Min=0.0centirayStartP.Max=10290centirayStartP.Initial=0.0centirayStartP.Help=Enter a "start" position in centirays. The centiray \ is the unit of distance used in radiation hybrid mapping \ assays and the form should indicate the range of values \that are valid for this particular parameter.

"GUSwww" Cache TablesSQL> describe queries; Name Null? Type ----------------------------------------- -------- ---------------------------- QUERY_ID NOT NULL NUMBER(12) SERVLET_NAME NOT NULL VARCHAR2(30) QUERY_NAME NOT NULL VARCHAR2(100) PARAM0 VARCHAR2(100) PARAM1 VARCHAR2(100) . . . PARAM74 VARCHAR2(100) PARAM75 VARCHAR2(100) RESULT_TABLE NOT NULL VARCHAR2(30) START_TIME NOT NULL DATE END_TIME DATE

SQL> describe cache435; Name Null? Type ----------------------------------------- -------- ---------------------------- SPOT_FAMILY_RESULT_ID NOT NULL NUMBER(10) I NUMBER

SQL> describe cache30687; Name Null? Type ----------------------------------------- -------- ---------------------------- NA_SEQUENCE_ID NUMBER(12) I NUMBER(12)

installServlet.pl[crabtree@zeus install]$ ./installServlet.pl \

--port=9000 \--cgiDir=/world/www.allgenes/cgi-bin/ \--htdocsDir=/world/www.allgenes/htdocs \--cgiURL=http://www.allgenes.org/cgi-bin \--htdocsURL=http://www.allgenes.org \--installDir=/world/www.allgenes/servlet \--servletName=allgenes-zeus \--servletFilePrefix=allgenes \--servletConfig=allgenes-zeus \--production \--servletURL=http://www.allgenes.org/gc/servlet

-install htdocs and cgi-bin files perform substitutions defined by 'allgenes-zeus' (e.g. ORA_LOGIN, ORA_PASSWORD, PROJECT_ID)-compile Java code, create .jar file and install-install servlet configuration file

Features of Current [Servlet] Implementation

Automatic generation of HTML FORMs Automated input checking Integrated help features INPUT elements populated from the database

Query history facility Boolean queries (AND, OR, SUBTRACT) Declarative configuration file Base system is relatively independent of GUS

Limitations of Current Implementation

Relatively steep learning curve Monolithic solution

No support for modifying configuration at runtime All objects instantiated when config. file read

Limited ability to customize presentation layer (i.e., HTML) without programming in Java

Technical problems with Servlets/Tomcat Must restart all servlets as a group Not currently using Serializable sessions

Dynamic Web Content

HTML fragments embedded in a program: CGI programs (e.g. Perl - interpreted) Java Servlets (compiled)

Program fragments embedded in HTML: PHP (interpreted) JSP (compiled; once, as needed)

Another axis: persistent vs. not (CGI/FastCGI)

Program Fragments in HTML

Advantages: faster development cycle; can edit in place easier to see/validate structure of HTML pages HTML has no functions, Java and PHP do

Disadvantages: must take care to manage complexity of application

Recommendations: move towards adopting this approach move all persistent state into the database

PHP: PHP Hypertext Processor

http://www.php.net Scripting language; can be embedded in HTML http://www.php.net/usage.php (Netcraft survey):

JSP - Java Server Pages

Based on and can interact with Java Servlets Essentially Java embedded in HTML XML-based tags, scriptlets, and JavaBean calls Standard tag libraries available Pages typically compiled on demand Multiple implementations? (vs. single for PHP)

Next steps

Agree on desired user interface functionality saving queries for PlasmoDB persistent preferences for genome browser

Design parts of the schema to support it Migrate old code/write new code Easier to migrate existing code with JSP