Wilson Make Bosc2008
Click here to load reader
-
Upload
bosc2008 -
Category
Technology
-
view
289 -
download
0
Transcript of Wilson Make Bosc2008
Use the Make Utility for the Maintenance of Complex Bioinformatics Pipelines
Justin Wilson, Manhong DaiStanley Watson, Fan Meng
Psychiatry Department and Molecular and Behavioral Neuroscience Institute
University of Michigan
Make
First released in 1977 by Stuart Feldman at Bell Labs
Originally designed for compiling programs General purpose automation tool
Compilation Analysis Situations where one file depends on another
The Name of the Game
No data is an island unto itself Typical bioinformatics pipeline
Collect data from various sources Internet utilities
Processing Scripts Parsers Programs Database
Packaged as web service + database
Updating
Re-running the pipeline Driving factor: Demand for “current” information Limiting factor: Resources (ie. time) required
Using Make
Stage 1: Download (configure) Download (new data) Not always trivial (new URLs)
Stage 2: File processing Downloaded files -> processed files -> ... Only new files are processed
Stage 2.1: Database Model tables, indexes, views as files File processing becomes SQL statement + touch
Projects
WGAS http://arrayanalysis.mbni.med.umich.edu Harvests Affymetrix CEL files from GEO and
ArrayExpress every night Local copy of dbSNP
Initial load took hours Minor update took minutes
Projects
CustomCDF Google customcdf Aligns Affymetrix probes to reference sequences for
various organisms Large # of data sources Made future modifications easier Makefile submits jobs to cluster
Acknowledgements
The authors are members of the Pritzker Neuropsychiatric Disorders Research Consortium, which is supported by the Pritzker Neuropsychiatric Disorders Research Fund L.L.C. This work is also supported in part by the National Center for Integrated Biomedical Informatics through NIH grant 1U54DA021519-01A1 to the University of Michigan.