Wilson Make Bosc2008

8

Click here to load reader

Transcript of Wilson Make Bosc2008

Page 1: Wilson Make Bosc2008

Use the Make Utility for the Maintenance of Complex Bioinformatics Pipelines

Justin Wilson, Manhong DaiStanley Watson, Fan Meng

Psychiatry Department and Molecular and Behavioral Neuroscience Institute

University of Michigan

Page 2: Wilson Make Bosc2008

Make

First released in 1977 by Stuart Feldman at Bell Labs

Originally designed for compiling programs General purpose automation tool

Compilation Analysis Situations where one file depends on another

Page 3: Wilson Make Bosc2008

The Name of the Game

No data is an island unto itself Typical bioinformatics pipeline

Collect data from various sources Internet utilities

Processing Scripts Parsers Programs Database

Packaged as web service + database

Page 4: Wilson Make Bosc2008

Updating

Re-running the pipeline Driving factor: Demand for “current” information Limiting factor: Resources (ie. time) required

Page 5: Wilson Make Bosc2008

Using Make

Stage 1: Download (configure) Download (new data) Not always trivial (new URLs)

Stage 2: File processing Downloaded files -> processed files -> ... Only new files are processed

Stage 2.1: Database Model tables, indexes, views as files File processing becomes SQL statement + touch

Page 6: Wilson Make Bosc2008

Projects

WGAS http://arrayanalysis.mbni.med.umich.edu Harvests Affymetrix CEL files from GEO and

ArrayExpress every night Local copy of dbSNP

Initial load took hours Minor update took minutes

Page 7: Wilson Make Bosc2008

Projects

CustomCDF Google customcdf Aligns Affymetrix probes to reference sequences for

various organisms Large # of data sources Made future modifications easier Makefile submits jobs to cluster

Page 8: Wilson Make Bosc2008

Acknowledgements

The authors are members of the Pritzker Neuropsychiatric Disorders Research Consortium, which is supported by the Pritzker Neuropsychiatric Disorders Research Fund L.L.C. This work is also supported in part by the National Center for Integrated Biomedical Informatics through NIH grant 1U54DA021519-01A1 to the University of Michigan.