Makefiles Bioinfo

download Makefiles Bioinfo

If you can't read please download the document

Transcript of Makefiles Bioinfo

BioEvo technical seminars

GNU/Make and bioinformatics

G.M. Dall'Olio
Barcelona, 06/02/2009

Original problem statement

Compiled languages programmers (C, C++, fortran, etc..) have to frequently execute complex shell commands: gcc -c -Wall -ansi -I/pkg/chempak/include dat2csv.c

g++ -c main.cpp; g++ -c func.cpp; g++ main.o func.o

rm *.o

These commands are needed to convert a C++/C source code file to a binary file.

It sounds a bit naive to say 'C/C++ source code', but it is shorter

Shell commands in bioinformatics

In bioinformatics it is frequent to use command line tools with complex syntax:grep, head, gawk, sed, cat.. (tools to work with flat files data)

perl/python/R/other scripts

Many suites of binary programs (emboss, phylip, blast, t-coffee, plink, genepop, gromacs, rosetta...)

etc...

In general command-line tools are more flexible than graphical ones (it takes too much time to develop a graphical interface)

Moreover, usually you only have a command-line access to clusters/big calculation facilities

Common problem

In short, C programmers and many bioinformaticians have two problems in common:Have a way to store command-line instructions with different parameters

Execute these commands only when necessary (don't calculate again some results, if they have already been calculated)

GNU/make

make is a tool to store command-line instructions and re-execute them quickly, along with all their parameters

It is a declarative programming language

It belongs to a class of softwares called 'automated build tools'

Simplest Makefile example

The simplest Makefile contains just the name of a task and the commands associated with it:

print_hello is a makefile 'rule': it stores the commands needed to say 'Hello, world!' to the screen.

Simplest Makefile example

Makefile rule

Target of the rule

Commands associated with the rule

This is a tabulation (not 8 spaces)

Simplest Makefile example

Create a file in your computer and save it as 'Makefile'.

Write these instructions in it:

print_hello:
echo 'Hello, world!!'

Then, open a terminal and type:

This is a tabulation ( key)

make -f Makefile print_hello

Simplest Makefile example

Simplest Makefile example
explanation

When invoked, the program 'make' looks for a file in the current directory called 'Makefile'

When we type 'make print_hello', it executes any procedure (target) called 'print_hello' in the makefile

It then shows the commands executed and their output

Tip1: the 'Makefile' file

The '-f' option allows you to define the file which contains the instructions for make

If you omit this option, make will look for any file called 'Makefile' in the current directory
make -f Makefile all

is equivalent to:

make all

A sligthly longer example

You can add as many commands you like to a rule

For example, this 'print_hello' rule contains 5 commands

Note: ignore the '@' thing, it is only to disable verbose mode (explained later)

A more complex example

Make - advantages

Make allows you to save shell commands along with their parameters and re-execute them;

It allows you to use command-line tools which are more flexible;

Combined with a revision control software, it makes possible to reproduce all the operations made to your data;

Second part

A closer look at make syntax (target and commands)

The target syntax

Makefile syntax:: (prerequisites)

The target syntax

The target of a rule can be either a title for the task, or a file name.

Everytime you call a make rule (example: 'make all'), the program looks for a file called like the target name (e.g. 'all', 'clean', 'inputdata.txt', 'results.txt')

The rule is executed only if that file doesn't exists.

Filename as target names

In this makefile, we have two rules: 'testfile.txt' and 'clean'

Filename as target names

In this makefile, we have two rules: 'testfile.txt' and 'clean'

When we call 'make testfile.txt', make checks if a file called 'testfile.txt' already exists.

Filename as target names

The commands associated with the rule 'testfile.txt' are executed only if that file doesn't exists already

Multiple target definition

A target can also be a list of files

You can retrieve the matched target with the special variable $@

Special characters

The % character can be used as a wild card

For example, a rule with the target:
%.txt:
....
would be activated by any file ending with '.txt''make 1.txt', 'make 2.txt', etc..

We will be able to retrieve the matched expression with '$*'

Special character % / creating more than a file at a time

Makefile cluster support

Note that in the previous example we created three files at the same time, by executing three times the command 'touch'

If we use the '-j' option when invoking make, the three processess will be launched in parallel

Makefile syntax:: (prerequisites)

The commands syntax

Inactivating verbose mode

You can disactivate the verbose mode for a line by adding '@' at its beginning:

Differences here

Skipping errors

The modifiers '-' tells make to ignore errors returned by a command

Example: 'mkdir /var' will cause an error (the '/var' directory already exists) and cause gnu/make to exit

'-mkdir /var' will cause an error anyway, but gnu/make will ignore it

Moving throught directories

A big issue with make is that every line is executed as a different shell process.

So, this:

lsvar:
cd /var
ls Won't work (it will list only the files in the current directory, not /var)

The solution is to put everything in a single process:lsvar:
(cd /var; ls)

Third part

Prerequisites and conditional execution

Makefile syntax:: (prerequisites)

We will look at the 'prerequisites' part of a make rule, that I had skipped before

The commands syntax

Real Makefile-rule syntax

Complete syntax for a Makefile rule:
:

Example:
result1.txt: data1.txt data2.txt
cat data1.txt data2.txt > result1.txt
@echo 'result1.txt' has been calculated'

Prerequisites are files (or rules) that need to exists already in order to create the target file.

If 'data1.txt' and 'data2.txt' don't exist, the rule 'result1.txt' will exit with an error (no rule to create them)

Piping Makefile rules together

You can pipe two Makefile rules together by defining prerequisites

Piping Makefile rules together

The rule 'result1.txt' depends on the rule 'data1.txt', which should be executed first

Piping Makefile rules together

Let's look at this example again:what happens if we remove the file 'result1.txt' we just created?

Piping Makefile rules together

Let's look at this example again:what happens if we remove the file 'result1.txt' we just created?

The second time we run the 'make result1.txt' command, it is not necessary to create data1.txt again, so only a rule is executed

Other pipe example

all: result1.txt result2.txt

result1.txt: data1.txt calculate_result.py
python calculate_result.txt --input data1.txt

result2.txt: data2.txt
cut -f 1, 3 data2.txt > result2.txt

Make all will calculate result1.txt and result2.txt, if they don't exist already (and they are older than their prerequisites)

Conditional execution by modification date

We have seen how make can be used to create a file, if it doesn't exists.

file.txt:
# if file.txt doesn't exists, then create it:
echo 'contents of file.txt' > file.txt

We can do better: create or update a file only if it is newer than its prerequisites

Conditional execution by modification date

Let's have a better look at this example:

result1.txt: data1.txt calculate_result.py
python calculate_result.txt --input data1.txt

A great feature of make is that it execute a rule not only if the target file doesn't exist, but also if it has a 'last modification date' earlier than all of its prerequisites

Conditional execution by modification date

result1.txt: data1.txt
@sed 's/b/B/i' data1.txt > result1.txt
@echo 'result1.txt has been calculated'In this example, result1.txt will be recalculated every time 'data1.txt' is modified$: touch data1.txt calculate_result.py

$: make result1.txt
result1.txt has been calculated

$: make result1.txt
result1.txt is already up-to-date

$: touch data1.txt
$: make result1.txt
result1.txt has been calculated

Conditional execution - applications

This 'conditional execution by modification date comparison' feature of make is very useful

Let's say you discover an error in one of your input data: you will be able to repeat the analysis by executing only the operations needed

You can also use it to re-calculate results every time you modify a script:

result.txt: scripts/calculate_result.py
python calculate_result.py > result.py

Another example

Fourth part

Variables and functions

Variables and functions

You may have already noticed that Make's syntax is really old :)

In fact, it is a ~40 years old language

It uses special variables like $@, $^, and it can be worst than perl!!!

(perl developers please don't get mad at me :-) )

Variables

Variables are declared with a '=' and by convention are upper case.

They are called by including their name in '$()'

WORKING_DIR is a variable

Special variables - $@

Make uses some custom variables, with a syntax similar to perl

'$@' always corresponds to the target name:

$: cat >Makefile

%.txt:
echo $@

$: make filename.txt
echo filename.txt
filename.txt
$:

$@ took the value of 'filename.txt'

Other special variables

$@The rule's target

$ results.txt

Muokkaa otsikon tekstimuotoa napsauttamalla

Muokkaa jsennyksen tekstimuotoa napsauttamallaToinen jsennystasoKolmas jsennystasoNeljs jsennystasoViides jsennystasoKuudes jsennystasoSeitsems jsennystasoKahdeksas jsennystasoYhdekss jsennystaso