Running Other Programs And CGI Scripts

29
12.1 Running Other Programs And CGI Scripts

description

Running Other Programs And CGI Scripts. Teaching Survey. Please fill the teaching survey at: http://www.ims.tau.ac.il/tal/login.asp I read it closely, and I make changes in the course from year to year according to the feedback. Exam. - PowerPoint PPT Presentation

Transcript of Running Other Programs And CGI Scripts

Page 1: Running Other Programs And CGI Scripts

12.1

Running Other ProgramsAnd CGI Scripts

Page 2: Running Other Programs And CGI Scripts

12.2

Please fill the teaching survey at:

http://www.ims.tau.ac.il/tal/login.asp

I read it closely, and I make changes in the course from year to year according to the feedback.

Teaching Survey

Page 3: Running Other Programs And CGI Scripts

12.3

• The exam will be on the computers in the PC classroom, on the 31/1/2007 at 9:00

• The computers will be disconnected from the network (i.e. no internet access. Sorry… )

• You will receive a floppy disk (diskette) with some files, and the exam questions on paper.

• You will write your solutions as normal Perl scripts and save them to the floppy, which you will submit at the end of the exam.

• 2 A4 pages

• Everything except BioPerl and CGI

Exam

Page 4: Running Other Programs And CGI Scripts

12.4• Write a script that reads a DNA sequence from STDIN and prints its reverse complement. The sequence may be in either small or capital letters.

• The file exam1.pl contains a script that reads a sequence file in Genbank format. Add the missing regular expression in order to find all CDS lines in line number 25. The regular expression should extract the coordinates of the start and stop codons. Fill in the appropriate variables in lines 27 and 28.

• The file exam2.pl contains a script that reads a file in PDB format (see example in EHD1.pbd) and finds all the “ATOM…” lines. Write the subroutine getAtomInfo that is called for each such line. The subroutine has one parameter – the scalar string of the ATOM line. It should return the following data structure:

{‘amino_acid’ => AMINO_ACID, ‘coordinates’ => [X,Y,Z], ‘amino_acid_number’ => N}

• Make a copy of exam2.pl and name it exam3.pl. Add a new section at the end of the script that makes an array of arrays. Each internal array should hold all the hashes of the ATOMs that belong to a single amino acid of the protein.

Some exam questions

Page 5: Running Other Programs And CGI Scripts

12.5

Running Other Programs

Page 6: Running Other Programs And CGI Scripts

12.6

e.g. Rate4Site: Still not very widely used (54 citations so far…) so there is no BioPerl modules that will run it for you and read its output:

Dealing with less common formats

#POS SEQ SCORE QQ-INTERVAL STD MSA DATA#The alpha parameter 1.5

1 K -0.9763 [-1.6621,-0.5750] 0.8777 6/62 V 0.9820 [-0.1107,2.2169] 1.5983 6/63 F 0.0035 [-0.9640,0.4935] 1.3195 6/64 S 0.2010 [-0.7766,0.8962] 1.3975 6/65 K -0.3480 [-1.1423,0.1673] 1.0990 6/66 C -0.7887 [-1.4855,-0.3560] 1.0182 6/67 E -0.9894 [-1.6621,-0.5750] 0.8714 6/68 L 0.0153 [-0.9640,0.4935] 1.3378 6/69 A -1.1347 [-1.6621,-0.7766] 0.7487 6/6

10 H -0.3200 [-1.1423,0.1673] 1.1252 6/611 K -0.3557 [-1.1423,0.1673] 1.1077 6/612 L -0.8331 [-1.4855,-0.3560] 0.9965 6/613 K -0.9763 [-1.6621,-0.5750] 0.8777 6/6

14 A 1.6809 [0.4935,2.2169] 1.6672 6/615 Q 1.4315 [0.1673,2.2169] 1.7297 6/616 E 0.1025 [-0.9640,0.8962] 1.3784 6/617 M 0.5006 [-0.5750,1.4226] 1.4456 6/6

Page 7: Running Other Programs And CGI Scripts

12.7

You may run programs using the system function:

$exitValue = system("blast.exe ...");if ($exitValue!=0) {die "blast failed!";}

This way the output of blast will be seen on the screen.

Another way is to use “back-ticks” (left of the “1” key on your keyboard):

@blastOutput = `blast.exe ...`;

This way the output of blast is stored in the array.

Running programs from a script

Page 8: Running Other Programs And CGI Scripts

12.8Class exercise 15

1. Write a script that runs clustalw on a given protein FASTA file (use ex15.zip from the website, use the help file in there!)

2. Modify the script: Now do both multiple sequence alignment, and build an NJ tree.

3. Modify the script: Now add a rate4site run on the output of clustalw (type “rate4site.exe -h” for help)

Page 9: Running Other Programs And CGI Scripts

12.9

CGI Scripts: Producing Web Pages

Page 10: Running Other Programs And CGI Scripts

12.10

• A CGI script is a script that is intended to be used over the internet.

• A CGI script on a web server can be used by a user to obtain data from databases (e.g. Genbank web server) or run analyses for the user (e.g. Blast at NCBI). The results of the script are an HTML page.

CGI: Common Gateway Interface

Page 11: Running Other Programs And CGI Scripts

12.11

• All web pages that you see on the internet are written in HTML.

• HTML (HyperText Markup Language) is a computer language that defines how a web page will look in you web browser.

• Web browsers (such as Microsoft Internet Explorer) read HTML text files and produce colorful graphical pages.

• You can see the HTML source code of a web page in Explorer by clicking: View->Source Try it on the course web page:

HTML: What is a web page?

ob<!doctype html public "-//w3c//dtd html 4.0 transitional//en"><html><head> <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> <meta name="Author" content="Eyal Privman"> <meta name="GENERATOR" content="Mozilla/4.77 [en] (X11; U; IRIX64 6.5 IP27) [Netscape]"><title>Perl Programming By Eyal Privman</title><style>…

Page 12: Running Other Programs And CGI Scripts

12.12

• HTML uses tags. Tags are always enclosed in angle-brackets and are case-insensitive. For example: <head>

• Tags typically occur in begin-end pairs. These pairs are in the form

<tag> ... </tag>

For example, if you want some text to be underlined in your page:

<u>Aim:</u> The aim of this course is to introduce the participant

HTML basics

Page 13: Running Other Programs And CGI Scripts

12.13

• The whole document should be between <html> ... </html>

• The text between <head> ... </head> includes general information about the page.

• Inside the “head” section, use <title> ... </title> to write the title of the page.

• The text between <body> ... </body> is the actual contents of the page

Structure of HTML documents

Page 14: Running Other Programs And CGI Scripts

12.14Class exercise 16

1. Create the following HTML file and view it with Internet Explorer:

<html><head> <title>Hello World Page</title> </head> <body> <h1> Hello World! </h1></body> </html>

(name your file “class_ex16.1.html”)

Page 15: Running Other Programs And CGI Scripts

12.15

The easiest way to get yourself a webserver is if you have an account at the bioinformatics unit. (On the bioinfo server)

You should place your HTML files and CGI script in your home directory

on the bioinfo server.

You will have to ask the staff of the bioinfo unit to open your account to web access. (They will create the needed directories for you)

Running a CGI over the web

Page 16: Running Other Programs And CGI Scripts

12.16

Any Perl script can output its results in HTML, using simple print commands.

The Perl CGI module can make it easier for you:

#!/usr/local/bin/perl This is necessary on a UNIX serveruse CGI;my $cgi = new CGI;

print $cgi->header . $cgi->start_html('Hello World Page') . $cgi->h1('Hello World!') . $cgi->end_html;

exit (0); Tells the server everything is fine

Producing HTML page with a script

Page 17: Running Other Programs And CGI Scripts

12.17Class exercise 16

2. Create the Perl script from the previous slide and test it.

Page 18: Running Other Programs And CGI Scripts

12.18

An HTML form can run a CGI script

Page 19: Running Other Programs And CGI Scripts

12.19

Here is the HTML that makes this form that takes input (a name) and invokes a CGI script named script.pl, which should be placed in the directory cgi-bin:

An HTML form can run a CGI script

<HTML><HEAD> <TITLE>HTML Form Example</TITLE> </HEAD><BODY><FORM method="GET" action="/cgi-bin/script.pl">

<h3>Enter your name:</h3><p> <INPUT type="text" name="userName"> </p><h3>Submit this Form</h3><p> <INPUT type="submit" value="Send Data Now!">

</p><h3>Reset this Form</h3><p> <INPUT type="reset" value="Clear all my input

now"> </p></FORM></BODY></HTML>

Page 20: Running Other Programs And CGI Scripts

12.20

Use the CGI function param to get the input that was entered into the form.

To get a list of all parameter names:my @params = $cgi->param();

To get the value for a specific parameter name:my @params = $cgi->param(PARAM_NAME);

For the example form in the previous slide, the CGI script could do this:

print $cgi->h1('Hello '.$cgi->param("userName").'!');

Using the input in the CGI script

Page 21: Running Other Programs And CGI Scripts

12.21Class exercise 17

Create the HTML form and the Perl script from the previous slides on the bioinfo server (it’s a UNIX system!):

1. Log in to bioinfo using TeraTerm (Start???Tera Term): The host is “bioinfo.tau.ac.il”, choose SSH, click OK, click Yes, user-name is “symp”, password is “turj”.

2. In UNIX you can use “cd” as in Windows, and “ls” or “ls -l” are like “dir”.

3. Use the command “mkdir DIR_NAME” to create a directory named as your first name inside the directory “public_html”. the HTML file should be in there.

4. To create and edit files use the editor pico (“pico FILE_NAME”). To paste into TeraTerm click the middle mouse button.

5. To access this HTML from your browser use this address:http://bioinfo.tau.ac.il/~symp/YOUR_NAME/form.html

Page 22: Running Other Programs And CGI Scripts

12.22Class exercise 17

6. Create another directory for yourself inside the directory “cgi-bin”. The CGI script should be in there.

7. After creating the script you have to give it execution permissions: “chmod +x SCRIPT_NAME”. Use “ls -l” to check that it now has x’s like this:(bioinfo:symp)~/cgi-bin/eyal>ls –l-rwxr-xr-x 1 symp staff 167 Jan 23 13:14 hello.pl*

8. The reference to the CGI script in the HTML form should be: <FORM method="GET" action="/cgi-bin/symp/YOUR_NAME/script.pl">

Bonus*Write another HTML form that ask the user for a FASTA file of DNA sequences, and runs a CGI version of ex3.4 (find ORFs in each sequence)

Page 23: Running Other Programs And CGI Scripts

12.23

Installing packages: Do it yourself!

Page 24: Running Other Programs And CGI Scripts

12.24

If you find a package in CPAN or elsewhere you can usually download a zip archive of all the files of the package, which usually is a .tar.gz file

For example: Search for BioPerl version 1.4 in CPAN – it should be called something like “bioperl-1.4.tar.gz”

Unzip it (extract the files from the compressed archive)

Place the unzipped files or directories in the ActivePerl directory on your computer in the site\lib\ directory. (…\ActivePerl-5.8.7.813\site\lib\)

For example – the “Bio” directory of BioPerl should be moved to:…\ActivePerl-5.8.7.813\site\lib\Bio

Now you should be able to use modules named like Bio::SeqIO.

Test it with SeqIO_example.pl (available on the webpage)

Download and install a package(Class exercise 17)

Page 25: Running Other Programs And CGI Scripts

12.25

The command “use lib” asks Perl to search in certain directory when searching for packages that are used in the script:

use lib 'D:\perl\myPackages';use myPackage;

(Assuming that the direcory “myPackages” contains “myPackage.pm”)

Move the “Bio” directory of BioPerl to a ‘D:\test’ and make SeqIO_example.pl find it by adding “use lib”

Using packages from other directories(Class exercise 17)

Page 26: Running Other Programs And CGI Scripts

12.26

Running Blast Remotely and Locally

Page 27: Running Other Programs And CGI Scripts

12.27

BioPerl lets us to blast our sequence at the NCBI website:Use Bio::Tools::Run::RemoteBlast Instead of Bio::Tools::Blast (which I showed you before)

use Bio::Tools::Run::RemoteBlast ;# here we define the parameters and input of blastmy %runParam = (-method => 'remote',

-prog => 'blastp',-database => 'swissprot',-seqs => [$seqObj1,$seqObj2]);

# here we run itmy $blastObj = Bio::Tools::Blast->new(

-run => \%runParam,-parse => 1, # ask to parse the report-signif => '1e-10', # the cutoff-strict => 1);

BioPerl: run blast over the web

Page 28: Running Other Programs And CGI Scripts

12.28

1. You could install blast on your computer from: ftp.ncbi.nlm.nih.gov

(There go to the directory: blast/executables/release/)

But this may be difficult, and you will also need to download and install the databases you want to search.

2. You can also work on the Unix servers of the bioinformatics unit you can use local blast that is already installed there.

Genbank databases that are installed there can be used for blast and for any other work, such as getting a sequence by its accession.

Running a local blast

Page 29: Running Other Programs And CGI Scripts

12.29

Class exercise 181. Write a script that runs blast over the web on a given protein FASTA file (Use the same FASTA file as in ex. 14), and print the accessions of the first 20 hits for each input sequence.

2. Modify the script: Take the accession of a sequence as a command-line argument, fetch this sequence from Genbank over the web, and then blast it