PostgreSQL Scientific Application - Case example · PostgreSQL Scientific Application - Case...
Transcript of PostgreSQL Scientific Application - Case example · PostgreSQL Scientific Application - Case...
![Page 1: PostgreSQL Scientific Application - Case example · PostgreSQL Scientific Application - Case example PostgreSQL Genomic Databases Sébastien Clément sclement@cfrl.forestry.ca Natural](https://reader036.fdocuments.in/reader036/viewer/2022062302/5f03200e7e708231d407a966/html5/thumbnails/1.jpg)
PostgreSQLPostgreSQL ScientificScientific Application Application -- Case Case exampleexample
PostgreSQLPostgreSQL GenomicGenomic DatabasesDatabasesSSéébastien Clbastien Cléémentment
[email protected]@cfrl.forestry.ca
Natural Natural ResourcesResources CanadaCanadaPresentedPresented atat the the PostgreSQLPostgreSQL ConferenceConference 2009 in 2009 in JapanJapan
NovemberNovember 20th20th
![Page 2: PostgreSQL Scientific Application - Case example · PostgreSQL Scientific Application - Case example PostgreSQL Genomic Databases Sébastien Clément sclement@cfrl.forestry.ca Natural](https://reader036.fdocuments.in/reader036/viewer/2022062302/5f03200e7e708231d407a966/html5/thumbnails/2.jpg)
ForewordForeword
«« WhatWhat isis thisthis guyguy doingdoing herehere ?? »»
«« Can Can PostgreSQLPostgreSQL handlehandle scientificscientific databasesdatabases ?? »»
![Page 3: PostgreSQL Scientific Application - Case example · PostgreSQL Scientific Application - Case example PostgreSQL Genomic Databases Sébastien Clément sclement@cfrl.forestry.ca Natural](https://reader036.fdocuments.in/reader036/viewer/2022062302/5f03200e7e708231d407a966/html5/thumbnails/3.jpg)
WhatWhat isis genomicsgenomics and and whywhy botherbother ??
Genomics: « The study of the entire genome (all genes) of a species »
Genome sizeNumber of genes
•Health and disease•Heredity•etc.
•Genetic improvement
3 000 000 0003 000 000 000~~23 00023 000
390 000 000390 000 000~53 000~53 000
![Page 4: PostgreSQL Scientific Application - Case example · PostgreSQL Scientific Application - Case example PostgreSQL Genomic Databases Sébastien Clément sclement@cfrl.forestry.ca Natural](https://reader036.fdocuments.in/reader036/viewer/2022062302/5f03200e7e708231d407a966/html5/thumbnails/4.jpg)
DATA
WhyWhy studystudy TREETREE genomicsgenomics??
CGACGTTAATGCCACTC
CGACGTTAATGCCACTCG
Cellulose Cellulose genegene
Normal
Variant
DATADBDB
![Page 5: PostgreSQL Scientific Application - Case example · PostgreSQL Scientific Application - Case example PostgreSQL Genomic Databases Sébastien Clément sclement@cfrl.forestry.ca Natural](https://reader036.fdocuments.in/reader036/viewer/2022062302/5f03200e7e708231d407a966/html5/thumbnails/5.jpg)
WhyWhy isis a a genomicgenomic DB essential?DB essential?
A single A single genegene……
……how about how about thousandsthousands of of genesgenes…………for for thousandsthousands of of speciesspecies??
Name
Sequence
Size
FunctionsCell wall metabolismCell structrureCatalysis…
Variations
Species
Interaction withother genes
Similarity withother species
more…(phew!) Chromosome pos.
![Page 6: PostgreSQL Scientific Application - Case example · PostgreSQL Scientific Application - Case example PostgreSQL Genomic Databases Sébastien Clément sclement@cfrl.forestry.ca Natural](https://reader036.fdocuments.in/reader036/viewer/2022062302/5f03200e7e708231d407a966/html5/thumbnails/6.jpg)
Public Public genomicsgenomics DBsDBs
GenbankGenbank
UniProtUniProt
TAIRTAIR
http://www.arabidopsis.org/
http://www.ebi.ac.uk/uniprot/
http://www.ncbi.nlm.nih.gov/Genbank/
![Page 7: PostgreSQL Scientific Application - Case example · PostgreSQL Scientific Application - Case example PostgreSQL Genomic Databases Sébastien Clément sclement@cfrl.forestry.ca Natural](https://reader036.fdocuments.in/reader036/viewer/2022062302/5f03200e7e708231d407a966/html5/thumbnails/7.jpg)
Our Our PostgreSQLPostgreSQL DatabasesDatabases
TreeSNPsTreeSNPsGenes and variations
PhenoTreePhenoTreeObservable attributes(physical, morphological)
•Ruby on Rails interface•Multi-language support•38 tables•~450 K records•Mostly manual entry
•PhpPgAdmin interface•21 tables•~4.1 M records
![Page 8: PostgreSQL Scientific Application - Case example · PostgreSQL Scientific Application - Case example PostgreSQL Genomic Databases Sébastien Clément sclement@cfrl.forestry.ca Natural](https://reader036.fdocuments.in/reader036/viewer/2022062302/5f03200e7e708231d407a966/html5/thumbnails/8.jpg)
TreeSNPsTreeSNPs overviewoverviewGeneral General viewsviews
![Page 9: PostgreSQL Scientific Application - Case example · PostgreSQL Scientific Application - Case example PostgreSQL Genomic Databases Sébastien Clément sclement@cfrl.forestry.ca Natural](https://reader036.fdocuments.in/reader036/viewer/2022062302/5f03200e7e708231d407a966/html5/thumbnails/9.jpg)
TreeSNPsTreeSNPs overviewoverview ((contcont’’dd))
LabLab plateplate
Plate Plate viewview
ResultsResults
![Page 10: PostgreSQL Scientific Application - Case example · PostgreSQL Scientific Application - Case example PostgreSQL Genomic Databases Sébastien Clément sclement@cfrl.forestry.ca Natural](https://reader036.fdocuments.in/reader036/viewer/2022062302/5f03200e7e708231d407a966/html5/thumbnails/10.jpg)
TreeSNPsTreeSNPs overviewoverview ((contcont’’dd))
ExampleExample of of calculationscalculations ((viewsviews):):
![Page 11: PostgreSQL Scientific Application - Case example · PostgreSQL Scientific Application - Case example PostgreSQL Genomic Databases Sébastien Clément sclement@cfrl.forestry.ca Natural](https://reader036.fdocuments.in/reader036/viewer/2022062302/5f03200e7e708231d407a966/html5/thumbnails/11.jpg)
TreeSNPsTreeSNPs overviewoverview ((contcont’’dd))
TreeSNPsTreeSNPs downloaddownload page & page & demodemo versionversionhttp://treesnps-pub.arborea.ulaval.ca:3000/download
A A paperpaper to to appearappear soonsoon inin
AdoptedAdopted by U. of Albertaby U. of Alberta’’s (Canada)s (Canada)LaboratoryLaboratory on on MoutainMoutain Pine Pine BeetleBeetle
![Page 12: PostgreSQL Scientific Application - Case example · PostgreSQL Scientific Application - Case example PostgreSQL Genomic Databases Sébastien Clément sclement@cfrl.forestry.ca Natural](https://reader036.fdocuments.in/reader036/viewer/2022062302/5f03200e7e708231d407a966/html5/thumbnails/12.jpg)
PhenoTreePhenoTree overviewoverview
> > 10001000treestrees
> > 60 K60 K recordsrecords
~ ~ 4 M4 M recordsrecords
1
234
Dimensions & Dimensions & morphologymorphology
Wood Wood analysisanalysis
Other data:•Geographical locations•Tree pedigree
WhatWhat data data isis storedstored ??
![Page 13: PostgreSQL Scientific Application - Case example · PostgreSQL Scientific Application - Case example PostgreSQL Genomic Databases Sébastien Clément sclement@cfrl.forestry.ca Natural](https://reader036.fdocuments.in/reader036/viewer/2022062302/5f03200e7e708231d407a966/html5/thumbnails/13.jpg)
PhenoTreePhenoTree overviewoverview ((contcont’’dd))
Wood Wood analysisanalysis propertiesproperties tabletable
read every 25 µm
PithPith BarkBark
RadiusRadius
942 trees
~2100 reads/tree
1.98 M1.98 M reads!etc.etc.
Wood Wood densitydensity
Fibre dimensionsFibre dimensionsCellsCells countcount
![Page 14: PostgreSQL Scientific Application - Case example · PostgreSQL Scientific Application - Case example PostgreSQL Genomic Databases Sébastien Clément sclement@cfrl.forestry.ca Natural](https://reader036.fdocuments.in/reader036/viewer/2022062302/5f03200e7e708231d407a966/html5/thumbnails/14.jpg)
ExampleExample of of calculationscalculations 1 (SQL 1 (SQL viewsviews):):
PhenoTreePhenoTree overviewoverview ((contcont’’dd))
PithPith BarkBark
RadiusRadius
…
Ring width (mm)
Ring area (mm2)Σ
xWood density (kg/m³)
Fibre width (µm)
Cell counts (/mm²)
…
4.35 4.53 3.70122.3 253.4 302.8
GrowthGrowth ring ring averagesaverages
1
…
744 664 611
1 2 3 …
22.95 23.85 23.89
1369 1446 1699
942 trees
~16 rings/tree
15 K15 K records
![Page 15: PostgreSQL Scientific Application - Case example · PostgreSQL Scientific Application - Case example PostgreSQL Genomic Databases Sébastien Clément sclement@cfrl.forestry.ca Natural](https://reader036.fdocuments.in/reader036/viewer/2022062302/5f03200e7e708231d407a966/html5/thumbnails/15.jpg)
SELECT tableau_croise.arbre, tableau_croise.height_1986, tableau_croise.height_1992, tableau_croise.height_1997, tableau_croise.height_2004, tableau_croise.height_2005
FROM crosstab('select tree_name AS nom_ligne, yearAS categorie,height AS valeur from trunk_measuresORDER BY 1,2'::text, 'SELECT DISTINCT year FROM trunk_measures ORDER BY 1'::text) tableau_croise(arbre text, height_1986 double precision, height_1992 double precision, height_1997 double precision, height_2004 double precision, height_2005 double precision);
PhenoTreePhenoTree overviewoverview ((contcont’’dd))
ExampleExample of of calculationscalculations 2 (SQL 2 (SQL viewsviews):): crosstabcrosstab functionfunction
crosstabcrosstab
LogicalLogical, , but not but not veryvery usefuluseful……
……thisthis isis itit whatwhatendend--usersusers wantwant
![Page 16: PostgreSQL Scientific Application - Case example · PostgreSQL Scientific Application - Case example PostgreSQL Genomic Databases Sébastien Clément sclement@cfrl.forestry.ca Natural](https://reader036.fdocuments.in/reader036/viewer/2022062302/5f03200e7e708231d407a966/html5/thumbnails/16.jpg)
Systems and user baseSystems and user base
•Formerly Access projects (2006-7)•Migrated to PostgreSQL 8.3 under Fedora (2007-8)•Migrated back to Windows (2009)
•Around 20 scientific users (Universities, Federal Government)
ProductionProductionServerServer
Gov.Canadanetwork
Universitynetwork
MirrorMirrorserverserver
Localusers
VPN
Localusers
![Page 17: PostgreSQL Scientific Application - Case example · PostgreSQL Scientific Application - Case example PostgreSQL Genomic Databases Sébastien Clément sclement@cfrl.forestry.ca Natural](https://reader036.fdocuments.in/reader036/viewer/2022062302/5f03200e7e708231d407a966/html5/thumbnails/17.jpg)
PostgreSQLPostgreSQL and Windows and Windows –– cancan itit reallyreally workwork ??
TaskTask automation automation withwith DOSDOS••Limited Limited functionnalityfunctionnality
Solution ?Solution ?
*Thanks: Greg Smith (http://wiki.postgresql.org/wiki/Automated_Backup_on_Windows)
Windows Windows TaskTask ManagerManager
CygwinCygwin
Unix/Unix/bashbash scriptsscripts
PostgreSQLPostgreSQL
Script Script examplesexamples::••Start Rails server (DOS)Start Rails server (DOS)••Backups (DOS)*Backups (DOS)*••Backup files Backup files cleanercleaner ((bashbash))••VPN connexion to production server (DOS)VPN connexion to production server (DOS)••Mirror Mirror synchronizingsynchronizing ((bash,DOSbash,DOS))••DatabaseDatabase version version comparisoncomparison ((bashbash))••UsersUsers & & privilegesprivileges report (report (bashbash))
![Page 18: PostgreSQL Scientific Application - Case example · PostgreSQL Scientific Application - Case example PostgreSQL Genomic Databases Sébastien Clément sclement@cfrl.forestry.ca Natural](https://reader036.fdocuments.in/reader036/viewer/2022062302/5f03200e7e708231d407a966/html5/thumbnails/18.jpg)
DevelopingDeveloping databasesdatabases for the for the scientificscientific communitycommunity
Suggestions:Suggestions:
••Have a userHave a user--basedbased approachapproach
••1. Know/1. Know/answeranswer the the useruser’’ss needsneeds
••2. 2. LimitLimit technicaltechnical jargonjargon
••3. 3. ThinkThink ‘‘usabilityusability’’
![Page 19: PostgreSQL Scientific Application - Case example · PostgreSQL Scientific Application - Case example PostgreSQL Genomic Databases Sébastien Clément sclement@cfrl.forestry.ca Natural](https://reader036.fdocuments.in/reader036/viewer/2022062302/5f03200e7e708231d407a966/html5/thumbnails/19.jpg)
AknowledgementsAknowledgements
Jean Beaulieu Jean Beaulieu –– LabLab directordirectorJoJoëël Fillon l Fillon –– Ruby on Rails interface designerRuby on Rails interface designerJeanJean--Philippe Dionne Philippe Dionne –– Rails Rails securesecure accessaccess programmingprogrammingJean Bousquet Jean Bousquet –– CollaboratorCollaborator
All end All end usersusers, , particularlyparticularly::Sylvie Blais, StSylvie Blais, Stééphanie phanie BeauseigleBeauseigle, Marie Deslauriers,, Marie Deslauriers,PierPier--Luc Poulin, Patrick LenzLuc Poulin, Patrick Lenz
PeoplePeople
OrganizationsOrganizations
ArboreaArborea Forest Forest GenomicsGenomics ((http://www.arborea.ulaval.ca/http://www.arborea.ulaval.ca/))Canadian Forest Service, Natural Canadian Forest Service, Natural ResourcesResources CanadaCanadaGenomeGenome QuQuéébecbecGenomeGenome CanadaCanada
![Page 20: PostgreSQL Scientific Application - Case example · PostgreSQL Scientific Application - Case example PostgreSQL Genomic Databases Sébastien Clément sclement@cfrl.forestry.ca Natural](https://reader036.fdocuments.in/reader036/viewer/2022062302/5f03200e7e708231d407a966/html5/thumbnails/20.jpg)
Done ?
![Page 21: PostgreSQL Scientific Application - Case example · PostgreSQL Scientific Application - Case example PostgreSQL Genomic Databases Sébastien Clément sclement@cfrl.forestry.ca Natural](https://reader036.fdocuments.in/reader036/viewer/2022062302/5f03200e7e708231d407a966/html5/thumbnails/21.jpg)