1 Ensembl Modules and MySQL. SQL and Database Tables Quick Examples 2.
-
Upload
denis-warren -
Category
Documents
-
view
232 -
download
0
Transcript of 1 Ensembl Modules and MySQL. SQL and Database Tables Quick Examples 2.
3
exontranscript_idexon_num = 3sequence_startsequence_stop
introntranscript_idintron_num = 3sequence_startsequence_stop
primer_pairidtranscript_idleft_primer_idright_primer_id
transcriptidsequence_idsource = Ensemblsource_id
sequenceidtarget_idtype = nucleotidesequence = ATG…chr_name = 15strand = 1genomic_start = 15,123,120genomic_stop = 16,378,131sourcesource_idrefresh
targetiddategene_name = BBS4descriptionaccessionstatus
projectidname = pro1descriptiondate
set_tableidproject_idname =testsetdatedescription
target_set_infoset_idtarget_idrank = 5cas_rankcas_options
select id from target where gene_name = "BBS4";
4
MySQL Demo with Ensemblmysql -u anonymous -h ensembldb.ensembl.org
show databases;show databases like "%omo%core%";
use homo_sapiens_core_47_36i;
show tables;
select count(*) from exon;
show columns from gene;
select * from xref limit 10
select * from xref where dbprimary_acc = "NM_000777";
select stable_id, gene.gene_id from gene_stable_id, gene, transcript, object_xref, xrefwhere gene_stable_id.gene_id = gene.gene_id and gene.gene_id = transcript.gene_id and transcript.transcript_id = object_xref.ensembl_id and xref.xref_id = object_xref.xref_id
and xref.dbprimary_acc = 'NM_000777';
select * from transcript where gene_id = 17393;
select * from exon_transcript where transcript_id = 33341;
select * from exon where exon_id = 193252;
5
Ensembl Schema
Core Schema
http://www.ensembl.org/info/docs/api/core/schema/index.html#exon_stable_id
API Tutorial:
http://www.ensembl.org/info/docs/api/core/core_tutorial.html
6
Code Development
1) Generate random sequenceATGCCCGCTGAGT
2) Generate formatted random sequence1 ATGCCCGCTT TGACCCTTTA 20
3) Generate random sequence, translated into protein, and formatted…
• Code revision– adding functionality and features– may introduce bugs that are not discovered until much later– useful to examine the changes to code, that may have caused
bugs
7
Code Development Solutions
• May retain a copy of every version of every file– have complete record– redundant and waste of space– responsibility on developer to maintain
revision history– Example (V1, V2, V3 experiment, V4
unfinished feature, return in 6 months?)
8
Multi-coder Environment
• Developers D1, D2, and D3• Source code S1, S2, S3, S4, S5.• D1 copies S1 and makes changes• D2 copies S1 and makes changes• D2 returns S1 • D1 returns S1• Clearly, this is ineffective for managing
and integrating changes
9
Brief Overview of CVS
CVS – Concurrent Versions SystemCVS
– only stores differences between files/versions– uses repository structure
• check out• check in• lock• branching, merging• etc
• Reference– http://www.gnu.org/software/cvs/– https://www.cvshome.org/
10
Installing Ensembl ModulesSample program – ens4.pl (simple demo program that obtains exons for a
particular gene from Ensembl database, for given accession number, and Ensembl Gene ID)
When connected to Ensembl's MySQL database % mysql -u anonymous -h ensembldb.ensembl.org
To get a list of their current databases. Find the most recent (highest numbers) version of the homo_sapiens core database.
type % show databases;
Example: homo_sapiens_core_47_36i
Example: homo_sapiens_core_25_36
The final two numbers represent the Ensembl code version and the NCBI human build,
respectively (i.e. Ensembl modules 25 and NCBI Human Build 36).
In this case, you should be using Ensembl code 47 to do the following:
11
NO LONGER VALID
(for CSS)%touch ~/.cvspass %chmod 755 ~/.cvspass
create the directory %mkdir Ensembl_modules-41
enter the directory %cd Ensembl_modules-41
type the following:
%cvs -d :pserver:[email protected]:/cvsroot/CVSmaster login (when prompted, the password is CVSUSER) -- yes, in all CAPS
%cvs -d :pserver:[email protected]:/cvsroot/CVSmaster checkout -r branch-ensembl-41 ensembl %cvs -d :pserver:[email protected]:/cvsroot/CVSmaster checkout -r branch-ensembl-41 ensembl-external
%cvs -d :pserver:[email protected]:/cvsroot/CVSmaster checkout -r branch-ensembl-41 ensembl-lite
Note this is all about 9 Meg
Make symbolic link called "Ensembl_modules-current" to point to your newly created directory of modules:%cd ..
%ln -s Ensembl_modules-41 Ensembl_modules-current
12
http://www.ensembl.org/info/software/api_installation.html
# -- Clearly this assumes a Unix flavor -- Create an installation directory$ cd$ mkdir src$ cd src
$ cvs -d :pserver:[email protected]:/home/repository/bioperl loginLogging in to :pserver:[email protected]:2401/home/repository/bioperlCVS password: cvs
Install BioPerl (version 1.2.3)$ cvs -d :pserver:[email protected]:/home/repository/bioperl checkout -r bioperl-release-1-2-3 bioperl-live
Log into the Ensembl CVS server at Sanger (using a password of CVSUSER):$ cvs -d :pserver:[email protected]:/cvsroot/ensembl loginCVS password: CVSUSER
Install the Ensembl Core Perl API for version 47$ cvs -d :pserver:[email protected]:/cvsroot/ensembl checkout -r branch-ensembl-47 ensembl
If required, install the Ensembl Variation Perl API for version 47$ cvs -d :pserver:[email protected]:/cvsroot/ensembl checkout -r branch-ensembl-47 ensembl-variation
If required, install the Ensembl Compara Perl API for verion 47$ cvs -d :pserver:[email protected]:/cvsroot/ensembl checkout -r branch-ensembl-47 ensembl-compara
NB: You can install as many Ensembl APIs as you need in this way.
13
To install Ensembl modules -- assumes you do need to have
BioPerl modules installed
(used to be separate step)
15
#!/usr/local/bin/perl
use lib "bioperl-live"; # you MAY have to use: use lib "bioperl-live/bioperl-live";use lib "ensembl/modules"; # use lib "ensembl/ensembl/modules";use Bio::EnsEMBL::DBSQL::DBAdaptor;
#my $host = "kaka.sanger.ac.uk";my $host = "ensembldb.ensembl.org";my $user = "anonymous";#my $dbname = "homo_sapiens_core_41_36c";my $dbname = "homo_sapiens_core_47_36i";
my $accession_num = "NM_000777";my $Ensembl_gene_id = "ENSG00000106258";my $flank_length = 5000;
my $db = new Bio::EnsEMBL::DBSQL::DBAdaptor( -host => $host, -user => $user, -dbname => $dbname);
my $gene_adaptor = $db->get_GeneAdaptor();my @genes = @{$gene_adaptor->fetch_all_by_external_name('NM_000777')};
foreach my $gene (@genes) { my $string = feature2string($gene); print "$string\n"; }
sub feature2string { my $f = shift; my $stable_id = $f->stable_id(); my $name = $f->external_name(); my $seq_region = $f->slice->seq_region_name(); my $start = $f->start(); my $end = $f->end(); my $strand = $f->strand(); return "$stable_id: $name $seq_region:$start-$end ($strand)"; }
16
Output
ENSG00000106258: CYP3A5 7:99083759-99115557 (-1)
Doesn't seem like much, but remember:
1) Using the language "perl"
2) Using other peoples software (modules)
3) Accessing genomic data in a database in England
4) Accessing data programatically
17
Look at API
Ensembl API (full):http://www.ensembl.org/info/docs/api/Pdoc/index.html
Ensembl->gene_adaptor->fetch_all_by_external_name
@genes = @{$gene_adaptor->fetch_all_by_external_name('BRCA2')};
18
From the API…
# Fetch all clones from a slice adaptor (returns a list reference)
my $clones_ref = $slice_adaptor->fetch_all('clone');
# If you want a copy of the contents of the list referenced by
# the $clones_ref reference...
my @clones = @{$clones_ref};
# Get the first clone from the list via the reference:
my $first_clone = $clones_ref->[0];
19
Object adaptors have internal knowledge of the underlying database schema and use this knowledge to fetch, store and remove objects (and data) from the database. This way you can write code and use the Ensembl Core API without having to know anything about the underlying databases you are
using.
Object adaptors are obtained from the Registry via a method named get_adaptor(). To obtain a Slice adaptor or a Gene adaptor (which retrieve Slice and Gene objects respectively) for Human, do the following after having loaded the Registry, here called $registry, as above:
my $gene_adaptor = $registry->get_adaptor( 'Human', 'Core', 'Gene' );my $slice_adaptor = $registry->get_adaptor( 'Human', 'Core', 'Slice' );
Don't worry if you don't immediately see how useful this could be. Just remember that you don't need to know anything about how the database is structured, but you can retrieve the necessary data (neatly packaged in objects) by asking for it from the correct adaptor. Throughout the rest of this document we are going to work through the ways the Ensembl objects can be used to derive the information you want.
21
#### genome-mysql.cse.ucsc.edu
use DBI;
my ($dsn) = "DBI:mysql:hg18:genome-mysql.cse.ucsc.edu";my ($username) = "genome";my ($passwd) = "";my ($query);
my $dbh = DBI->connect ($dsn, $username, $passwd,{RaiseError=>1});
if (! defined $dbh) { print "\nConnect to database(Human_annot_mar06): FAILED\n"; } else { print "\nConnect to database (gh18): SUCCESS\n"; }
$gene = "BBS4";my $string = "SELECT geneName, name, exonStarts, exonEnds, chrom, strand FROM refFlat WHERE geneName = '$gene'";my $sth = $dbh->prepare($string);$sth->execute();while(my @row = $sth->fetchrow_array) { $GENENAME = $row[0]; $NAME = $row[1]; $EXONSTARTS = $row[2]; $STRAND = $row[5]; }
$sth->finish();
print "geneName = $GENENAME\n";print "Name = $NAME\n";print "strand = $STRAND\n";
23
OutputIntron: 11 -9247 -6928Exon: 12 -6927 -6768sequence_start = -6927sequence_stop = -6768exon length= 160exon start, exon_stop 6768 6927exon sequence:
CTGTGTTTCTTTACAAGGTTTGAAGGAGAAGTTCTGAAGGACTCTGATTAGAGCAAGTTTCATGTTCATGAGAGCAAACCTCATGCCAATGCAGTTTCTGGGTCCAGTTCCAAAGGGTGTGTATATGTAAGGATCTATGCTGTCCTTCTTCTTACTGAAC
Intron: 12 -6767 -5096Exon: 13 -5095 -5000sequence_start = -5095sequence_stop = -5000exon length= 96exon start, exon_stop 5000 5095exon sequence:
TCATTCTCCACTTAGGGTTCCATCTCTTGAATCCACCTTTAGAACAATGGGTTTTTCTGGTTGAAGAAGTCCTTGCGTGTCTAATTTCAAGGGGAT
chr = chr7seq length= 41692
24
Installing bioperl (Linux)
3.5) mkdir ~/perl3.6) mkdir ~/perl/bioperl
3.8) cd bioperl-1.2.3
4) perl Makefile.PL LIB=~/perl/bioperl
(Do it this way -- with "LIB" -- recently changed slide)
make test make install (see installing in private space on next slides)
To uninstall, just delete ~/perl/bioperl and ~/perl/bioperl-1.2.3
Note: version -1.2.3 was the current version when I made this slide -- it may have updated since.
25
5) To use:
#!/usr/local/bin/perl
use lib "~/local/bioperl/"; # this is supposed to work ,but did NOT on CSS use Bio::Tools::BPlite; # Need -- LIB prefix for this to work.
csh 5.1) setenv PERL5LIB ~/perl/bioperlbash 5.1) PERL5LIB=~/perl/bioperl; export PERL5LIB
mac (bash) 5.1) PERL5LIB=~/perl/bioperl; export PERL5LIB
6) To make docs work (I would just put this in your .cshrc file:set path = ($path ~/perl/bioperl/lib/site_perl/5.8.1)PATH=$PATH:~/perl/bioperl/lib/site_perl/5.8.1; export PATH
Test with:cd perldoc Bio::SearchIO
FINALLY, please note that the version numbers change over time, and the actual paths may very a little between CPAN and/or bioperl.org
It make take some trial and error (it usually does for me).
NOTE TO SELF -- check out the CPAN installer (its much easier)
26
Using ModulesFinally, need DBI.pm% mkdir modules% cd modules% ftp ftp.cpan.org (login: ftp passwd: [email protected])% bin% cd /pub/CPAN/modules/by-module/DBI% get DBI-1.53.tar.gz% cd ../DBD% get DBD-mysql-3.0008.tar.gz% gunzip DBI-1.53.tar.gz% tar –xvf DBI-1.53.tar% cd DBI-1.53
% perl Makefile.PL LIB=~/modules (**** changed this slide)% make% make install(set up Environment for DBI -- next slide), then install DBD
27
Connecting /w Perl% mkdir modules(put modules in this dir)Need DBI, DBD-mysqlgunzip, and tar
(do this for both modules)perl Makefile.PL LIB=~/modulesmakemake install
csh5.0) setenv PERL5LIB "$HOME/modules:$HOME/perl/bioperl"bash 5.1) PERL5LIB=$HOME/modules:$HOME/perl/bioperl; export PERL5LIB
(note CSS has upgraded perl from 5.6.0 – used the last time)
28
Using ModulesCSH setenv PERL5LIB "$HOME/local/bioperl/lib/site_perl/5.8.1:$HOME/modules/lib/site_perl/5.8.1:$HOME/
Ensembl_modules-41/ensembl/modules:$HOME/Ensembl_modules-41/ensembl-external/modules:$HOME/Ensembl_modules-41/ensembl-lite/modules :$HOME/modules:$HOME/perl/bioperl"
BASHPERL5LIB="$HOME/local/bioperl/lib/site_perl/5.8.1:$HOME/modules/lib/site_perl/5.8.1:$HOME/
Ensembl_modules-41/ensembl/modules:$HOME/Ensembl_modules-41/ensembl-external/modules:$HOME/Ensembl_modules-41/ensembl-lite/modules:$HOME/modules:$HOME/perl/bioperl"
export PERL5LIB
This (below) would work if we used the LIB prefix -- but that makes it a pain to install DBD. So just rely on environment settings. NOTE -- if you log out -- and don’t save the environment setting somewhere (such as .chsrc, or .bashrc, you will have to re-type the command).
DBI used with:use lib "~/modules/lib/perl5/site_perl/5.8.0/i386-linux-thread-multi/";
Ensembl modules can then be used in a perl program with:
use lib "~/Ensembl_modules-current/ensembl/modules";use lib "~/Ensembl_modules-current/ensembl-external/modules";use lib "~/Ensembl_modules-current/ensembl-lite/modules";
29
Another ModuleFinally, need DBD.pm% cd modules% ftp ftp.cpan.org (login: ftp passwd: [email protected])% bin% cd /pub/CPAN/modules/by-module/DBD% get DBD-mysql-2.9003.tar.gz% quit% gunzip DBD-mysql-2.9003.tar.gz% tar –xvf DBD-mysql-2.9003.tar.gz% cd DBD-mysql-2.9003.tar.gz
% perl Makefile.PL LIB=~/modules% make% make install
30
Does not work on CSS• Concluded either
– version of Perl incompatible– port blocking
./ens3.plcurrent core DB: homo_sapiens_core_18_34-------------------- EXCEPTION --------------------MSG: Could not connect to database homo_sapiens_core_18_34 user anonymous using
[DBI:mysql:database=homo_sapiens_core_18_34;host=ensembldb.ensembl.org;port=3306] as a locator
STACK Bio::EnsEMBL::DBSQL::DBConnection::new /user/eng/tbraun/Ensembl_modules-18/ensembl/modules/Bio/EnsEMBL/DBSQL/DBConnection.pm:125
STACK Bio::EnsEMBL::DBSQL::DBAdaptor::new /user/eng/tbraun/Ensembl_modules-18/ensembl/modules/Bio/EnsEMBL/DBSQL/DBAdaptor.pm:79
STACK main::dbconnect_Ensembl ./ens3.pl:150STACK toplevel ./ens3.pl:26-------------------------------------------
31
However…
• Installed local version of MySQL
• Needed modules– DBI (perl database interface)– DBD (database specific interface – mysql)
• Realized that I had failed to install DBD with Ensembl modules
• No longer need to install BioPerl separately – Ensembl install instructions installs BioPerl now.
33
34
Install BioPerl• I'll assume Windows XP, Eclipse (if you are using Linux/Unix, then the
default documentation with Bioperl is better than these slides www.bioperl.org).
• Dowload Bioperl:• http://pdb.eng.uiowa.edu/~tabraun/biotech/2007/modules/bioperl-1.4.zip• The "official version can be found from here:
(http://code.open-bio.org/cgi/viewcvs.cgi/bioperl-live/bioperl-live.tar.gz?tarball=1)
• Move this zip file into your Eclipse "workspace" directory and unzip it (mine is H:\windowsdata\workspace)
• You will need an "unzip" program. Most default versions of XP comes with one. If you don't have one, you can download a free one:– http://www.download.com/jZip/3000-2250_4-10761563.html?tag=lst-6
• Now in your perl program -- you will need to add line:• use lib "H:\windowsdata\workspace\bioperl-live";
35
BioPerl continuedDepending on if your "zip" program creates a directory for you, you may have to
put in:use lib "H:\windowsdata\workspace\bioperl-live\bioperl-live";
You will also need 2 other modules (DBD and DBI). These are used by the Ensembl modules to allow a perl program to connect to a mySql database.
• DBI - Database independent interface for Perl• DBD::mysql - MySQL driver for the Perl5
Database Interface (DBI)
I tried to compile a library for Windows to make availabe -- but was unable to get it to work. Therefore I asked CSS to install these two modules for me -- since I do not have administrative permission on CSS nodes.