MSc Bioinformatics 2006 Biocomputing Databases and Perl 13 th February 2006.

36
MSc Bioinformatics 2006 Biocomputing Databases and Perl 13 th February 2006

Transcript of MSc Bioinformatics 2006 Biocomputing Databases and Perl 13 th February 2006.

Page 1: MSc Bioinformatics 2006 Biocomputing Databases and Perl 13 th February 2006.

MSc Bioinformatics 2006

BiocomputingDatabases and Perl

13th February 2006

Page 2: MSc Bioinformatics 2006 Biocomputing Databases and Perl 13 th February 2006.

MSc Bioinformatics 2006

Client

HTTP

Web Server

CGI

Application ServerRequest

Response

Page 3: MSc Bioinformatics 2006 Biocomputing Databases and Perl 13 th February 2006.

MSc Bioinformatics 2006

e.g. PC + Apache +PERL (with CGI module) + MySQL

Client

HTTP

Web Server

CGI DB API

DB Server +DB

Application Server

Page 4: MSc Bioinformatics 2006 Biocomputing Databases and Perl 13 th February 2006.

MSc Bioinformatics 2006

The Database Tier• stores and retrieves data.

• manages updates, allows simultaneous (concurrent) access from web servers, provides security, ensures the integrity of data

• requires complex software.

– Database Management Systems (DBMSs)

• searches and manages data that's stored in databases

• a database is a collection of related data, and an application can have more than one database

– interface is accessed using SQL (Standard Query Language) that's used to define and manipulate databases and data

Page 5: MSc Bioinformatics 2006 Biocomputing Databases and Perl 13 th February 2006.

MSc Bioinformatics 2006

Why use a database server?

• more than one user who needs to access the data at the same time.

• a moderate amount of data• relationships between the stored data items• more than one kind of data object• constraints that must be rigidly enforced on the data,

such as field lengths, field types etc..• If data must be queried to produce reports or results.• large amount of data that must be searched quickly.• If security is important• When adding, deleting, or modifying data is a complex

process.

Page 6: MSc Bioinformatics 2006 Biocomputing Databases and Perl 13 th February 2006.

MSc Bioinformatics 2006

Relational databases

• Entities and Attributes– An entity is a person, place, event, or thing about

which data is collected– An entity set is a named collection of entities sharing

common characteristics– Attributes are characteristics of the entity– Attributes are also called fields

• Tables– A table holds related entities or an entity set– Also called relations– Comprised of rows and columns

Page 7: MSc Bioinformatics 2006 Biocomputing Databases and Perl 13 th February 2006.

MSc Bioinformatics 2006

Tables

• Two-dimensional structure with rows and columns• Each row (tuple) represents a single entity• Columns represent attributes• Row/column intersection represents single value• Tables must have an attribute to uniquely identify each

row, a unique key• Column values all have same data format• Each column has range of values called attribute domain• Order of the rows and columns is immaterial to the

DBMS

Page 8: MSc Bioinformatics 2006 Biocomputing Databases and Perl 13 th February 2006.

MSc Bioinformatics 2006

Fingerprint

Motif

Sequence

Page 9: MSc Bioinformatics 2006 Biocomputing Databases and Perl 13 th February 2006.

MSc Bioinformatics 2006

Entity-relationship model

Page 10: MSc Bioinformatics 2006 Biocomputing Databases and Perl 13 th February 2006.

MSc Bioinformatics 2006

Relational logical model

Page 11: MSc Bioinformatics 2006 Biocomputing Databases and Perl 13 th February 2006.

MSc Bioinformatics 2006

Page 12: MSc Bioinformatics 2006 Biocomputing Databases and Perl 13 th February 2006.

MSc Bioinformatics 2006

mySQL

• Command-line interpreter– UNIX/Mac

/usr/local/bin/mysql -uhugh -pshhh

– WindowsStart > All Programs > mySQL > mySQL Server 5.0 > mySQL command line client

Page 13: MSc Bioinformatics 2006 Biocomputing Databases and Perl 13 th February 2006.

MSc Bioinformatics 2006

Managing databases• Creating databases

mysql> CREATE DATABASE phyloprints;

• To use that databaseuse phyloprints;

• Creating tables

CREATE TABLE "fingerprint" ( "fprint_accn" varchar(15), "identifier" varchar(15), "motifs" int2, "date" date, "up_date" date, "family_title" text, "family_doc“ varchar(18), "category" text, "second_accn" varchar(15), "pseudo" bool);

SHOW CREATE TABLE wine;

Page 14: MSc Bioinformatics 2006 Biocomputing Databases and Perl 13 th February 2006.

MSc Bioinformatics 2006

Table = fingerprint +----------------------------------+----------------------------------+-------+ | Field | Type | Length| +----------------------------------+----------------------------------+-------+ | fprint_accn | varchar() | 15 | | identifier | varchar() | 15 | | motifs | int2 | 2 | | date | date | 4 | | up_date | date | 4 | | family_title | text | var | | family_doc | varchar() | 18 | | category | text | var | | second_accn | varchar() | 15 | | pseudo | bool | 1 | +----------------------------------+----------------------------------+-------+

Table = motif +----------------------------------+----------------------------------+-------+ |Field | Type |Length | +----------------------------------+----------------------------------+-------+ |fprint_accn | varchar() | 18 | | motif | int2 | 2 | | repeat | varchar() | 4 | | seqn_accn | varchar() | 15 | | seqn_fragment | varchar() | 35 | | start_position | int4 | 4 | | inter_motif_dist | int4 | 4 | | initial | bool | 1 | | final | bool | 1 |

+----------------------------------+----------------------------------+-------+

attributes

Page 15: MSc Bioinformatics 2006 Biocomputing Databases and Perl 13 th February 2006.

MSc Bioinformatics 2006

Modifiers

• Are applied to attributes– NOT NULL (row can’t exist without this

attribute having a value)

identifier varchar(15) NOT NULL,

– DEFAULT (sets the data to the value that follows when no data is supplied)pseudo" bool DEFAULT f,

Page 16: MSc Bioinformatics 2006 Biocomputing Databases and Perl 13 th February 2006.

MSc Bioinformatics 2006

Keys

• Primary key – one or more attributes that uniquely identify a row in a table– It is essential that every table has one

PRIMARY KEY (fprint_accn), key names*fprint_accn,identifier) type=MyISAM;

Page 17: MSc Bioinformatics 2006 Biocomputing Databases and Perl 13 th February 2006.

MSc Bioinformatics 2006

Removing databases and tables

• DROP statement (make sure you want to do this – it doesn’t check!)

DROP TABLE seqn;

DROP DATABASE phyloprints;

Page 18: MSc Bioinformatics 2006 Biocomputing Databases and Perl 13 th February 2006.

MSc Bioinformatics 2006

Inserting/Updating/Deleting Data

• Inserting DataINSERT INTO fingerprint VALUES (‘PR90008’,

‘CRYSTALLIN’,0,2000-10-11,2000-10-11,’Crystallin pseudo-signature’,’PDOC90008’,’Others’,t);

• Inserting lots of data rowsINSERT INTO fingerprint VALUES (‘PR90008’,

‘CRYSTALLIN’,0,2000-10-11,2000-10-11,’Crystallin pseudo-signature’,’PDOC90008’,’Others’,t),(‘PR00240’,’ADRENRGCA1DR’,7,1996-08-14,2000-02-18,’Alpha-1D adrenergic receptor signature’,’PDOC00240’, f);

Page 19: MSc Bioinformatics 2006 Biocomputing Databases and Perl 13 th February 2006.

MSc Bioinformatics 2006

• DELETE statement– All data

DELETE FROM fingerprint; – Deleting specific rows – use WHEREDELETE FROM fingerprint WHERE identifer = ‘CRYSTALLIN’;

• UPDATE statementUPDATE fingerprint SET fprint_accn = ‘PR00001’ WHERE identifier=‘CRYSTALLIN’;

Page 20: MSc Bioinformatics 2006 Biocomputing Databases and Perl 13 th February 2006.

MSc Bioinformatics 2006

Querying with SQL

• Key operators– SELECT

– PROJECT

– JOIN

• Other operators– INTERSECT

– UNION (union compatible tables)

– DIFFERENCE

– PRODUCT

– DIVIDE

Page 21: MSc Bioinformatics 2006 Biocomputing Databases and Perl 13 th February 2006.

MSc Bioinformatics 2006

Querying databases with SQL

• SELECT statementSELECT identifier, fprint_accn FROM fingerprint;

SELECT * FROM fingerprint;

• WHERE clauseSELECT * from fingerprint WHERE fprint_accn=‘PR00001’;

Page 22: MSc Bioinformatics 2006 Biocomputing Databases and Perl 13 th February 2006.

MSc Bioinformatics 2006

• select identifier,fprint_accn from fingerprint where pseudo='t' ORDER BY fprint_accn;

• select fprint_accn,COUNT(*) from fingerprint where pseudo='t' GROUP BY fprint_accn;

• select DISTINCT seqn_accn from motif;• select distinct seqn_accn from motif LIMIT

10;

Page 23: MSc Bioinformatics 2006 Biocomputing Databases and Perl 13 th February 2006.

MSc Bioinformatics 2006

Join queries

• output data that's based on relationships between two or more tables

Page 24: MSc Bioinformatics 2006 Biocomputing Databases and Perl 13 th February 2006.

MSc Bioinformatics 2006

Diagram taken from Jane Mabey

Page 25: MSc Bioinformatics 2006 Biocomputing Databases and Perl 13 th February 2006.

MSc Bioinformatics 2006 Diagram taken from Jane Mabey

Page 26: MSc Bioinformatics 2006 Biocomputing Databases and Perl 13 th February 2006.

MSc Bioinformatics 2006

Querying databases using Perl

#! /usr/bin/perl -w # intro6.pl - connect to MySQL, retrieve data, write plain text output use strict; use DBI;

my ($dbh, $sth, $count); $dbh = DBI->connect ("DBI:mysql:host=localhost;database=webdb", "webdev", "webdevpass", {PrintError => 0, RaiseError => 1});

$sth = $dbh->prepare ("SELECT name, wins, losses FROM teams"); $sth->execute (); $count = 0; while (my @val = $sth->fetchrow_array ()) { printf "name = %s, wins = %d, losses = %d\n", $val[0], $val[1], $val[2]; ++

$count; } print "count rows total\n"; $sth->finish (); $dbh->disconnect (); exit (0);

Page 27: MSc Bioinformatics 2006 Biocomputing Databases and Perl 13 th February 2006.

MSc Bioinformatics 2006

%./intro6.pl name = Fargo-Moorhead Twins, wins = 36, losses = 16 name = Winnipeg Maroons, wins = 24, losses = 26 name = Minot Why Nots, wins = 19, losses = 23 name = Warren Wanderers, wins = 16, losses = 30 4 rows total

Page 28: MSc Bioinformatics 2006 Biocomputing Databases and Perl 13 th February 2006.

MSc Bioinformatics 2006

Perl example…use DBI; my $dbh = DBI->connect('DBI:MySql:payroll') or die "Couldn't connect to database: " .

DBI->errstr; my $sth = $dbh->prepare('SELECT * FROM people WHERE lastname = ?') or die "Couldn't

prepare statement: " . $dbh->errstr; print "Enter name> "; while ($lastname = <>) { # Read input from the user my @data; chomp $lastname; $sth->execute($lastname) # Execute the query or die "Couldn't execute statement: " . $sth->errstr; # Read the matching records and print them out while (@data = $sth->fetchrow_array()) {

my $firstname = $data[1]; my $id = $data[2]; print "\t$id: $firstname $lastname\n";

} if ($sth->rows == 0) {

print "No names matched `$lastname'.\n\n"; } $sth->finish; print "\n"; print "Enter name> "; } $dbh->disconnect;

Page 29: MSc Bioinformatics 2006 Biocomputing Databases and Perl 13 th February 2006.

MSc Bioinformatics 2006

Query databases over the Web

#! /usr/bin/perl -w # intro7.pl - connect to MySQL, retrieve data, write HTML output

use strict; use DBI; use CGI qw(:standard); my ($dbh, $sth, $count); $dbh = DBI->connect ("DBI:mysql:host=localhost;database=webdb", "webdev",

"webdevpass", {PrintError => 0, RaiseError => 1}); $sth = $dbh->prepare ("SELECT name, wins, losses FROM teams"); $sth->execute (); print header(), start_html ("team data"); $count = 0; while (my @val = $sth->fetchrow_array ()) {

print p (sprintf ("name = %s, wins = %d, losses = %d\n", $val[0], $val[1], $val[2])); ++$count;

} print p ("$count rows total"), end_html (); $sth->finish (); $dbh->disconnect (); exit (0);

Page 30: MSc Bioinformatics 2006 Biocomputing Databases and Perl 13 th February 2006.

MSc Bioinformatics 2006

More about data retrieval with DBI.pm

• fetchrow_arrayref

$sth = $dbh->prepare ("SELECT name, wins, losses FROM teams");

$sth->execute ();

while (my $ref = $sth->fetchrow_arrayref ()) {

printf "name = %s, wins = %d, losses = %d\n", $ref->[0], $ref->[1], $ref->[2];

}

$sth->finish ();

Page 31: MSc Bioinformatics 2006 Biocomputing Databases and Perl 13 th February 2006.

MSc Bioinformatics 2006

#!/usr/bin/perl

use DBI;

my $db = “sequence”;my $server = ‘localhost’;my $user = ‘root’;my $passwd = ‘passwd’;

my $dbConnection = DBI->connect(“dbi::mysql:$db:$server”,$user,$passwd)

my $query = “show tables”;my $sql = $dbConnection->prepare($query);$sql ->execute();while(my $row = $sql->fetchrow_arrayref){

print join(“\t”,@$row),”\n”;}$dbConnection->disconnect;exit;

Page 32: MSc Bioinformatics 2006 Biocomputing Databases and Perl 13 th February 2006.

MSc Bioinformatics 2006

Higher retrieval methods

• selectrow_array()– to retrieve a single row or a single column

value • selectcol_arrayref()

– returns the first column of a result set, as a reference to an array of values.

• selectall_arrayref() – retrieves the entire result set as a matrix and

returns a reference to it

Page 33: MSc Bioinformatics 2006 Biocomputing Databases and Perl 13 th February 2006.

MSc Bioinformatics 2006

Inserting data in the databasesub new_employee { # Arguments: database handle; first and last names of new

employee; # department ID number for new employee's work assignment

my ($dbh, $first, $last, $department) = @_; my ($insert_handle, $update_handle); my $insert_handle = $dbh->prepare_cached('INSERT INTO

employees VALUES (?,?,?)'); my $update_handle = $dbh->prepare_cached('UPDATE

departments SET num_members = num_members + 1 WHERE id = ?');

die "Couldn't prepare queries; aborting" unless defined $insert_handle && defined $update_handle;

$insert_handle->execute($first, $last, $department) or return 0; $update_handle->execute($department) or return 0; return 1; # Success

}

Page 34: MSc Bioinformatics 2006 Biocomputing Databases and Perl 13 th February 2006.

MSc Bioinformatics 2006

Inserting data in the databasesub new_employee { # Arguments: database handle; first and last names of new

employee; # department ID number for new employee's work assignment

my ($dbh, $first, $last, $department) = @_; my ($insert_handle, $update_handle); my $insert_handle = $dbh->prepare_cached('INSERT INTO employees

VALUES (?,?,?)'); my $update_handle = $dbh->prepare_cached('UPDATE departments SET

num_members = num_members + 1 WHERE id = ?'); die "Couldn't prepare queries; aborting" unless defined

$insert_handle && defined $update_handle; my $success = 1; $success &&= $insert_handle->execute($first, $last, $department);

$success &&= $update_handle->execute($department); my $result = ($success ? $dbh->commit : $dbh->rollback); unless

($result) { die "Couldn't finish transaction: " . $dbh->errstr } return $success;

}

Page 35: MSc Bioinformatics 2006 Biocomputing Databases and Perl 13 th February 2006.

MSc Bioinformatics 2006

• A short cut for DELETING, UPDATING and INSERTING data use the do statement

$dbh->do('DELETE FROM people WHERE age > 65');

Page 36: MSc Bioinformatics 2006 Biocomputing Databases and Perl 13 th February 2006.

MSc Bioinformatics 2006

Reminders

• Executable CGI files

• Configuring Apache to read cgi files from cgi-bin

• In httpd.conf

• ScriptAlias /cgi-bin/ "C:/Program Files/Apache Group/Apache2/cgi-bin/"