Fauteux Seeder Bosc2009
-
Upload
bosc -
Category
Technology
-
view
964 -
download
0
description
Transcript of Fauteux Seeder Bosc2009
François Fauteux
Department of Plant Science
McGill University
Macdonald campus
Seeder: Perl Modules for
Cis-regulatory Motif Discovery
Bioinformatics Open Source Conference
June 28 2009, Stockholm
• Precise control of where,
when and at which level
transcription occurs
• Synthetic promoterengineering
M. Venter, Trends Plant Sci 12, 118 (2007).
Introduction
Transcription Factor Binding Sites
• Searching for imperfect
copies of an unknown pattern
• Sequence-drivenapproaches: not guaranteed toyield a global optimum
• Enumerative approaches:computationally expensive
• Convergence towards low-complexity motifs
D. GuhaThakurta, Nucleic Acids Res 34, 3585 (2006).
DNA Motif Discovery
W. W. Wasserman, A. Sandelin,
Nat Rev Genet 5, 276 (2004).
• Set B={B1,...,Bm} of background sequences
• Set P={P1,...,Pn} of positive sequences
• Length k of the motif seed
• Length l of the full motif to discover
F. Fauteux, M. Blanchette, M. V. Stromvik, Bioinformatics 24, 2303 (2008).
Seeder Algorithm: Input
• Enumerate all words [A C G T]
• SMD: smallest HD between w and a |w|-length substring of s
• SMDs between word w and background sequences
probability distribution gw(y)
Seeder::Background
F. Fauteux, M. Blanchette, M. V. Stromvik, Bioinformatics 24, 2303 (2008).
• Sum S(w) of SMDs between w andpositive sequences p-value
• Closest match to word w* (min. q-value) found in each
positive sequence seed PWM
• Matrix is extended to motif width and sites maximizing the
score to the extended weight matrix are selected
• PWM is built from those sites and the process is iterated
Seeder::Finder
F. Fauteux, M. Blanchette, M. V. Stromvik, Bioinformatics 24, 2303 (2008).
Seeder::Index
F. Fauteux, M. Blanchette, M. V. Stromvik, Bioinformatics 24, 2303 (2008).
• List of indices corresponding
to words of increasing HD
• Efficient lookup of minimally
distant subsequence
Seeder::Index
F. Fauteux, M. Blanchette, M. V. Stromvik, Bioinformatics 24, 2303 (2008).
Seeder Algorithm: Usage
#!/usr/bin/perl
use Seeder::Index;use Seeder::Finder;use Seeder::Background;
my $index = Seeder::Index->new( seed_width => "6", out_file => "6.index",);$index->get_index;
my $background = Seeder::Background->new( seed_width => "6", strand => "revcom", hd_index_file => "6.index", seq_file => "seqs.fasta", out_file => "seqs.bkgd",);$background->get_background;
my $finder = Seeder::Finder->new( seed_width => "6", strand => "revcom", motif_width => "12", n_motif => "1", hd_index_file => "6.index", seq_file => "prom.fasta", bkgd_file => "seqs.bkgd", out_file => "prom.finder",);$finder->find_motifs;
• Binding site sequences from the Transfac database
G. K. Sandve, O. Abul, V. Walseng, F. Drablos, BMC Bioinformatics 8, 193 (2007).
Benchmark Against Popular Tools
F. Fauteux, M. Blanchette, M. V. Stromvik, Bioinformatics 24, 2303 (2008).
SSP Promoter Motifs
F. Fauteux, M. V. Stromvik, submitted.
http://seeder.agrenv.mcgill.ca
SupervisorDr Martina Strömvik
Advisory committeeDr Mathieu BlanchetteDr Pierre Dutilleul
Acknowledgements