Postgres and the Genome

22
Postgres and the Genome Jeff Pennington Director, Translational Informatics Center for Biomedical Informatics And Department of Pathology The Children’s Hospital Of Philadelphia

description

Postgres and the Genome. Jeff Pennington Director, Translational Informatics Center for Biomedical Informatics And Department of Pathology The Children’s Hospital Of Philadelphia. Outline. Background Genome analysis in the clinic Application Database DB Tuning. DNA as Data. - PowerPoint PPT Presentation

Transcript of Postgres and the Genome

Postgres and the Genome

Jeff PenningtonDirector, Translational InformaticsCenter for Biomedical Informatics

AndDepartment of Pathology

The Children’s Hospital Of Philadelphia

Outline

• Background• Genome analysis in the clinic• Application• Database• DB Tuning

DNA as Data

• 4 letter ‘alphabet’ of bases – A T C G3,000,000,000 base pairs

• Sequence codes for biological function

Mutations

Clinical Mutation = ‘Variant’

Sequencing = 100K – 4M Variants

VARIFY

VARIFY Architecture

• Varify Architecture– Three-tier web application– Harvest (http://harvest.research.chop.edu)• Javascript client• Python server using Django ORM• Postgres 9.2

Database

• Physical – 9.2, RHEL VM, VMWare w/ storage on host

• Round 1 – 4G RAM, 80G disk• Round 2 – 32 G RAM, 250G disk

Tuning

• max_connections – too big, • shared_buffers – amount of memory allocated

to PG• work_mem – amount of memory available to

sort• default_statistics_target – gives the query

planner something to work with

Resources

• Book: PostgreSQL 9.0 High Performance– Ch 5 and 6– Page 145

• Tools: pg_buffercache• Benchmarking: – \timing– EXPLAIN– log_min_duration_statement = 5000

Tuning Round 1 (4G RAM)

• max_connections = 100• shared_buffers = 1024MB (default 32MB)• work_mem = 200MB (default 1M)– Tried 1G, bad trade-off on count (slow) vs. list (not

much faster)

Tuning Round 2 (32G RAM)

• max_connections = 100• shared_buffers = 24576MB (Increased from

1024MB)• work_mem = 150MB (Decreased from 200MB)

Tuning Round 3

• Everything in Round 2• default_statistics_target = 1000 (default 100)