Postgres and the Genome
-
Upload
maris-lambert -
Category
Documents
-
view
34 -
download
0
description
Transcript of Postgres and the Genome
Postgres and the Genome
Jeff PenningtonDirector, Translational InformaticsCenter for Biomedical Informatics
AndDepartment of Pathology
The Children’s Hospital Of Philadelphia
DNA as Data
• 4 letter ‘alphabet’ of bases – A T C G3,000,000,000 base pairs
• Sequence codes for biological function
VARIFY Architecture
• Varify Architecture– Three-tier web application– Harvest (http://harvest.research.chop.edu)• Javascript client• Python server using Django ORM• Postgres 9.2
Database
• Physical – 9.2, RHEL VM, VMWare w/ storage on host
• Round 1 – 4G RAM, 80G disk• Round 2 – 32 G RAM, 250G disk
Tuning
• max_connections – too big, • shared_buffers – amount of memory allocated
to PG• work_mem – amount of memory available to
sort• default_statistics_target – gives the query
planner something to work with
Resources
• Book: PostgreSQL 9.0 High Performance– Ch 5 and 6– Page 145
• Tools: pg_buffercache• Benchmarking: – \timing– EXPLAIN– log_min_duration_statement = 5000
Tuning Round 1 (4G RAM)
• max_connections = 100• shared_buffers = 1024MB (default 32MB)• work_mem = 200MB (default 1M)– Tried 1G, bad trade-off on count (slow) vs. list (not
much faster)
Tuning Round 2 (32G RAM)
• max_connections = 100• shared_buffers = 24576MB (Increased from
1024MB)• work_mem = 150MB (Decreased from 200MB)