BioLib Development Report (BOSC2009)
C and C++ libraries for BioPerl, BioJAVA,BioPython, BioRuby. . .
Pjotr Prins (pjotr.prins at wur.nl)
Wageningen University, Dept. of Nematology; Groningen Bioinformatics Center
BioLib Development Report (BOSC 2009) – p. 1
The stated problem
Many high-level languages used in Biology(Perl, R, Java. . . )
Duplication of effort in all Bio* efforts -BioPerl, BioConductor, BioJAVA. . .
in particular for data IO/parsing/interpretation(Alan’s keynote)
BioLib Development Report (BOSC 2009) – p. 2
What if?
What if you need some functionality (e.g. linearregression) in Perl, you can
Roll your own in Perl (performance?)
Bind against existing clib using Perl-XS (ugh)
Bind using SWIG (better, but one-off likePerl::GSL)
Bind using SWIG with Biolib (all languages)
In fact, it may already be there (GSL or Rlib)
BioLib Development Report (BOSC 2009) – p. 3
DRY-DRO
Do not repeat yourself (DRY)
Do not repeat ourselves (DRO)
Bio*: BioPerl, BioPython, BioRuby, BioJAVA,BioConductor, BioHaskell, BioCPP, . . .
Limited pool of programmers in bioinformatics
Usually 2 or 3 competing implementations
Use existing implementations
BioLib Development Report (BOSC 2009) – p. 4
Why bother?
Open Source Software is about eyes
BioLib Development Report (BOSC 2009) – p. 5
Eyes!
Eyes like these!
BioLib Development Report (BOSC 2009) – p. 6
Eyes (3)
Eyes like these!. . .
BioLib Development Report (BOSC 2009) – p. 7
Eyes (5)
Well, realistically. . .
BioLib Development Report (BOSC 2009) – p. 8
BioLib project
Objectives:
Utilize existing C/C++ libraries
Create mappings to all Bio* languages
Focus on correctness andperformance
A central place (plumbing)
An OBF affiliated project
BioLib Development Report (BOSC 2009) – p. 9
Power Trio
Plumbing power trio:
Git - modular version control
Cmake - make file generator
SWIG - simplified wrapper and interfacegenerator
BioLib Development Report (BOSC 2009) – p. 10
Power trio (1)
GIT
Version control on steroids
What source control should beEasy branching of developmentSubmodules
BioLib Development Report (BOSC 2009) – p. 11
Power trio (2)
CMake
Generator for make files
Very modular approach
Resolves complex dependencies
Looks like a simpleprogramming language
Easy on the eyes and mind
BioLib Development Report (BOSC 2009) – p. 12
Power trio (3)
SWIG
Code generator for mappings done right:Rules for generating codeMacros (DRY)Pattern matchingFlexibleSupports many languages
BioLib Development Report (BOSC 2009) – p. 13
Achievements (year one)
Affyio: Affymetrix arrays (357 methods; 10K lines)
Staden: Sequencer trace files (95; 16K)
GSL: GNU Science Library (2702; 200K)
Rlib: R routines (> 176; 43K)
R/qtl: Quantitative genetics (> 100; 10K)*
Libsequence: Sequence analysis (> 1000; 21K)*
Bio++: Sequence analysis (> 1000; 52K)*
Code base 350K lines USD 10 million R&D
BioLib Development Report (BOSC 2009) – p. 14
Source tree
|-- clibs
| |-- affyio-1.8
| |-- biolib_R
| |-- biolib_microarray
| |-- libsequence-1.6.6
|-- mappings
| ‘-- swig
| |-- perl
| | |-- affyio
| | |-- staden_io_lib
| | ‘-- test
| |-- python
| |-- ruby
104 directories, 668 files
BioLib Development Report (BOSC 2009) – p. 15
Adding a C lib
Unpack C/C++ library in./src/clibs/modulename
Add CMake file - compiles into .so sharedlibrary
Create Perl mapping in./src/mapping/swig/perl/module
Add SWIG .i file
Add CMake file - compiles into .pm and .soshared library
BioLib Development Report (BOSC 2009) – p. 16
CMake goodies
# Defining a C library build in Biolib:
SET (M_NAME staden_io_lib)
SET (M_VERSION 1.11.6)
FIND_PACKAGE(ZLIB REQUIRED)
BUILD_CLIB()
ADD_LIBRARY(${LIBNAME} SHARED
array.c
compress.c
compression.c
ctfCompress.c
(...)
INSTALL_CLIB()
BioLib Development Report (BOSC 2009) – p. 17
CMake for Perl
# Defining a C library mapping for Perl
SET (USE_ZLIB TRUE)
SET (USE_INCLUDEPATH io_lib)
FIND_PACKAGE(MapPerl)
POST_BUILD_PERL_BINDINGS()
TEST_PERL_BINDINGS()
INSTALL_PERL_BINDINGS()
BioLib Development Report (BOSC 2009) – p. 18
SWIG Map
%include <Read.h>
#define TT_ANY 0
#define TT_ZTR 7
typedef struct
{
int format;
char *trace_name;
int NPoints;
int NBases;
(...)
} Read;
Read *read_reading(char *fn, int format);
BioLib Development Report (BOSC 2009) – p. 19
Perl
use biolib::staden_io_lib;
$result = staden_io_lib::read_reading($fn,
$staden_io_lib::TT_ANY);
print("format=",staden_io_libc::Read_format_get($result));
print("NBases=",$result->{NBases});
print("base=",staden_io_libc::Read_base_get($result));
Outputs:
format=7
NBases=766
base=NCTTGGGAAAGCATAAACCATGTATTATCGAATTCGAGCT
CGGTCCCAACTTAATTGTACA...
BioLib Development Report (BOSC 2009) – p. 20
Python
import biolib.staden_io_lib as io_lib
result = io_lib.read_reading(procsrffn,
io_lib.TT_ANY)
print result.format
print result.NBases
print result.base
7
766
NCTTGGGAAAGCATAAACCATGTATTATCGAATTCGAGCT
CGGTCCCAACTTAATTGTACA...
BioLib Development Report (BOSC 2009) – p. 21
For the Perl coder
Adding functionality in language of choice
Easier deployment - ’install biolib-perl’
Shared correctness testing
Generated API documentation
BioLib Development Report (BOSC 2009) – p. 22
For the authors
Independent source trees
Increased exposure (Ruby, Perl. . . )
Added unit/integration testing environment
Deployment, multi-platform support (Linux,OSX, Windows)
No autoconf pain (./configure and friends)
Implicit access to other libraries (GSL, Rlib)
Online generated API documentation
BioLib Development Report (BOSC 2009) – p. 23
Future work
Automated API documentation (with doctests)
More libraries (Emboss, NCBI, . . . )
New code (HPC)
More languages (JAVA, R, OCaml, . . . )
Bio* integration (CPAN, Ruby gems, Pythoneggs)
Debian/Fedora/OSX/Windows packages
More platforms (Windows without Cygwin)
BioLib Development Report (BOSC 2009) – p. 24
Credits
Ben Bolstad (Affyio), James Bonfield (Staden), Karl Broman (R/qtl)
Jonathan Leto (GSL SWIG)
Xin Shuai (Google SoC libsequence)
Adam Smith (Google SoC Bio++)
Oswaldo Trelles, José Manuel Mateos-Duran and Andrés Rodríguez (UMA)
Chris Fields (BioPerl), Mark Jensen (BioPerl), Hilmar Lap (Nescent, OBF)
Jaap Bakker (WU), Geert Smant (WU), Ritsert Jansen (GBIC)
BioLib Development Report (BOSC 2009) – p. 25
BoF
BioLib: Birds of a Feather Session (BoF) at 16:50 hours
BioLib Development Report (BOSC 2009) – p. 26
Top Related