Promo%ng collaborave problem solving through computaonal ...
SePhHaDe( Computaonal ChallengesonHigh( Throughput ... · January 2014: training data and task...
Transcript of SePhHaDe( Computaonal ChallengesonHigh( Throughput ... · January 2014: training data and task...
![Page 1: SePhHaDe( Computaonal ChallengesonHigh( Throughput ... · January 2014: training data and task details release March 2014: test data release May 1st 2014: deadline for submission](https://reader033.fdocuments.in/reader033/viewer/2022060222/5f0791767e708231d41da1ac/html5/thumbnails/1.jpg)
SePhHaDe Computa/onal Challenges on High
Throughput Sequencing and Phenotyping
E. Pacitti & E. Rivals
Colloque Mastodons 22/1/2015, Paris
![Page 2: SePhHaDe( Computaonal ChallengesonHigh( Throughput ... · January 2014: training data and task details release March 2014: test data release May 1st 2014: deadline for submission](https://reader033.fdocuments.in/reader033/viewer/2022060222/5f0791767e708231d41da1ac/html5/thumbnails/2.jpg)
PLAN
¢ Introduc.on ¢ Phenotyping
� Informa.on Retrieval of Complex Contents � Search and Recommenda.on for Image Observa.ons
¢ Sequencing � Indexing reads and sequencing error correc.on � New spaced seed filtering for similarity search � Metagenomics pipeline
¢ Project fusion with Credible ¢ Conclusions
![Page 3: SePhHaDe( Computaonal ChallengesonHigh( Throughput ... · January 2014: training data and task details release March 2014: test data release May 1st 2014: deadline for submission](https://reader033.fdocuments.in/reader033/viewer/2022060222/5f0791767e708231d41da1ac/html5/thumbnails/3.jpg)
INTRODUCTION -‐ ARCHITECTURE
BIG DATA SCIENTIFQUE données de séquencage, phenotypage, images
P2P, Cloud, Flots de Données, Mu.-‐Site, HPC
Analyse de Données
Programmes pour le séquencage a Haut Débit
Indexation
Recherche d’Information de Contenus Complexes
Recommandation
![Page 4: SePhHaDe( Computaonal ChallengesonHigh( Throughput ... · January 2014: training data and task details release March 2014: test data release May 1st 2014: deadline for submission](https://reader033.fdocuments.in/reader033/viewer/2022060222/5f0791767e708231d41da1ac/html5/thumbnails/4.jpg)
PLANT PHENOTYPING
![Page 5: SePhHaDe( Computaonal ChallengesonHigh( Throughput ... · January 2014: training data and task details release March 2014: test data release May 1st 2014: deadline for submission](https://reader033.fdocuments.in/reader033/viewer/2022060222/5f0791767e708231d41da1ac/html5/thumbnails/5.jpg)
PHENOTYPING -‐ MOTIVATIONS ¢ Phenotype
� Observable state, characteris.c or behavior of a living being
� Plant morphologie , blood glucose levels
� Data: Images, Meta-‐Data
¢ Phenotyping � Observa.onal method that
records a phenotype data for analysis, quering, etc.
� Botanical observa.ons uses content-‐based mul.media iden.fica.on methods
Greenhouse based Phenotyping (Inra, Montpellier)
In the field based phenotyping (Plant Observations)
![Page 6: SePhHaDe( Computaonal ChallengesonHigh( Throughput ... · January 2014: training data and task details release March 2014: test data release May 1st 2014: deadline for submission](https://reader033.fdocuments.in/reader033/viewer/2022060222/5f0791767e708231d41da1ac/html5/thumbnails/6.jpg)
INFORMATION RETRIEVAL OF COMPLEX CONTENTS
![Page 7: SePhHaDe( Computaonal ChallengesonHigh( Throughput ... · January 2014: training data and task details release March 2014: test data release May 1st 2014: deadline for submission](https://reader033.fdocuments.in/reader033/viewer/2022060222/5f0791767e708231d41da1ac/html5/thumbnails/7.jpg)
CONTEXT
¢ Accurate knowledge of living species distribution and evolution is essential
• Ultimate goal: sustainable and global surveillance tools of living species
• global warming effects, invasive species, biodiversity, impact of Human activities
¢ It is necessary to boost the production of observations
¢ The Taxonomic gap is a tricky problem ¢ Scien.fic name = unique access key to informa.on ¢ Knowledge accessible only to specialists
Taxon Castanea sativa Mill.
![Page 8: SePhHaDe( Computaonal ChallengesonHigh( Throughput ... · January 2014: training data and task details release March 2014: test data release May 1st 2014: deadline for submission](https://reader033.fdocuments.in/reader033/viewer/2022060222/5f0791767e708231d41da1ac/html5/thumbnails/8.jpg)
LIFECLEF OBJECTIVES
A new lab of CLEF evaluation forum
• “European NIST” for information retrieval, 15 years, hundreds of research groups world wide
Objectives
• Study, evaluate and boost state-of-the-art content-based multimedia identification methods (signals+metadata)
• Assemble a transdisciplinary and cross-media community around the topic
• Promote environmental challenges in the multimedia community
![Page 9: SePhHaDe( Computaonal ChallengesonHigh( Throughput ... · January 2014: training data and task details release March 2014: test data release May 1st 2014: deadline for submission](https://reader033.fdocuments.in/reader033/viewer/2022060222/5f0791767e708231d41da1ac/html5/thumbnails/9.jpg)
LIFECLEF 2014: THREE TASKS (CHALLENGES) Based on real-world big data
![Page 10: SePhHaDe( Computaonal ChallengesonHigh( Throughput ... · January 2014: training data and task details release March 2014: test data release May 1st 2014: deadline for submission](https://reader033.fdocuments.in/reader033/viewer/2022060222/5f0791767e708231d41da1ac/html5/thumbnails/10.jpg)
LIFECLEF 2014: SCHEDULE
December 2013: registration opens January 2014: training data and task details release March 2014: test data release May 1st 2014: deadline for submission of runs May 15th 2014: release of results June 15th 2015: deadline for submission of working notes (peer reviewed) 15-19 September 2014: LifeCLEF Workshop at CLEF 2014 Conference (UK, Sheffield)
![Page 11: SePhHaDe( Computaonal ChallengesonHigh( Throughput ... · January 2014: training data and task details release March 2014: test data release May 1st 2014: deadline for submission](https://reader033.fdocuments.in/reader033/viewer/2022060222/5f0791767e708231d41da1ac/html5/thumbnails/11.jpg)
REGISTRANTS / COMMUNITY
42 groups 31 groups
5 groups
4 groups
36 groups
6 groups
3 groups
127 research groups registered worldwide (academics & industry)
![Page 12: SePhHaDe( Computaonal ChallengesonHigh( Throughput ... · January 2014: training data and task details release March 2014: test data release May 1st 2014: deadline for submission](https://reader033.fdocuments.in/reader033/viewer/2022060222/5f0791767e708231d41da1ac/html5/thumbnails/12.jpg)
PARTICIPANTS ON FINISH LINE
0 groups
0 groups
0 groups 10 groups 10 groups
1 group
1 group
22 groups submiOed a total of 70 runs and 22 working notes (published in CEUR-‐WS proceedings)
![Page 13: SePhHaDe( Computaonal ChallengesonHigh( Throughput ... · January 2014: training data and task details release March 2014: test data release May 1st 2014: deadline for submission](https://reader033.fdocuments.in/reader033/viewer/2022060222/5f0791767e708231d41da1ac/html5/thumbnails/13.jpg)
FOCUS ON PLANTCLEF
Hervé Goeau, Inria Pierre Bonnet, CIRAD
o Context: The Pl@ntNet initiative, a multimedia-oriented
citizen sciences and participatory sensing
France flora dataset
iPhone & Androïd applications Content-based participatory sensing
Botanical Social Network Citizen sciences
+
+ +
+350K downloads 2K users / day
+20K members +100K images + 5K species
![Page 14: SePhHaDe( Computaonal ChallengesonHigh( Throughput ... · January 2014: training data and task details release March 2014: test data release May 1st 2014: deadline for submission](https://reader033.fdocuments.in/reader033/viewer/2022060222/5f0791767e708231d41da1ac/html5/thumbnails/14.jpg)
FOCUS ON PLANTCLEF: DATA
2011 2012 2013 2014
Espèces 71 126 250 500
Images 5 400 11 500 26 077 60962
Organes/vues
Contributeurs 17 46 327 1000
Observations 368 1136 15046 30136
![Page 15: SePhHaDe( Computaonal ChallengesonHigh( Throughput ... · January 2014: training data and task details release March 2014: test data release May 1st 2014: deadline for submission](https://reader033.fdocuments.in/reader033/viewer/2022060222/5f0791767e708231d41da1ac/html5/thumbnails/15.jpg)
¢ Best scores increasing each year despite an increasing complexity of the task [Joly et al., CLEF proceedings 2014]
¢ Man vs. Machine experiment [Bonnet et al., MTAP journal 2015]
2011 2012 2013 2014
Scan-‐like 0.52 0.56 0.61 0.64
Photographs 0.25 0.32 0.40 0.47
FOCUS ON PLANTCLEF: RESULTS
![Page 16: SePhHaDe( Computaonal ChallengesonHigh( Throughput ... · January 2014: training data and task details release March 2014: test data release May 1st 2014: deadline for submission](https://reader033.fdocuments.in/reader033/viewer/2022060222/5f0791767e708231d41da1ac/html5/thumbnails/16.jpg)
NEXT YEAR
¢ Keep the 3 tasks but with enriched test data (1000 bird species, 1000 plant species)
¢ PlantCLEF novelty = authorize external training data as a variant of the task � To the condi.on that the experiment is reliable and reproduceable � Data availability, clear descrip.on, no risk of including test data etc.
¢ FishCLEF restructuring = fusion of the 4 subtasks in 1 single applica.on-‐oriented task � Coun/ng the number of fish instances of a list of species � This should avoid fragmen/on and makes the task more aOrac/ve
for breaking research
![Page 17: SePhHaDe( Computaonal ChallengesonHigh( Throughput ... · January 2014: training data and task details release March 2014: test data release May 1st 2014: deadline for submission](https://reader033.fdocuments.in/reader033/viewer/2022060222/5f0791767e708231d41da1ac/html5/thumbnails/17.jpg)
SEARCH AND RECOMMENDATION OF PLANT OBSERVATIONS
![Page 18: SePhHaDe( Computaonal ChallengesonHigh( Throughput ... · January 2014: training data and task details release March 2014: test data release May 1st 2014: deadline for submission](https://reader033.fdocuments.in/reader033/viewer/2022060222/5f0791767e708231d41da1ac/html5/thumbnails/18.jpg)
CHALLENGE
0
175
350
525
700
875
0 4500 9000 13500 18000 22500
#obs
erva
tions
#species
A few plants represents the majority of the observations! The majority of the plants are rarely observed!
A better distribution for recommendations: need for diversification !
#recommendations
Challenge: Retrieve/recommend the k most diverse plant observations given a query (e.g grape).
![Page 19: SePhHaDe( Computaonal ChallengesonHigh( Throughput ... · January 2014: training data and task details release March 2014: test data release May 1st 2014: deadline for submission](https://reader033.fdocuments.in/reader033/viewer/2022060222/5f0791767e708231d41da1ac/html5/thumbnails/19.jpg)
PROFILE DIVERSITY
With profile diversity*, the recommended observations, take into account the diverse relevant users profiles and their observations.
¢ Results: � A new scoring fonc.on based on a probabilis.c model (2013) � Top-‐k threshold algorithm for content and profile diversity (2014) � Op.miza.ons for scaling up, factor of 12 (2014) [Servajean et al.,
Informa.on Systems Journal, 2015] � Distributed Profile Diversifica.on [Servajean et al., Globe 2014] � 2 Prototypes [Servajean et al, BDA 2014] � 1 Phd Thesis Defense
¢ Next Year: Exploit Recommenda4on/ Crowd Sourcing for plant iden4fica4on
![Page 20: SePhHaDe( Computaonal ChallengesonHigh( Throughput ... · January 2014: training data and task details release March 2014: test data release May 1st 2014: deadline for submission](https://reader033.fdocuments.in/reader033/viewer/2022060222/5f0791767e708231d41da1ac/html5/thumbnails/20.jpg)
USE CASE: PROFILE DIVERSITY FOR PLANT OBSERVATION
![Page 21: SePhHaDe( Computaonal ChallengesonHigh( Throughput ... · January 2014: training data and task details release March 2014: test data release May 1st 2014: deadline for submission](https://reader033.fdocuments.in/reader033/viewer/2022060222/5f0791767e708231d41da1ac/html5/thumbnails/21.jpg)
![Page 22: SePhHaDe( Computaonal ChallengesonHigh( Throughput ... · January 2014: training data and task details release March 2014: test data release May 1st 2014: deadline for submission](https://reader033.fdocuments.in/reader033/viewer/2022060222/5f0791767e708231d41da1ac/html5/thumbnails/22.jpg)
Sequencing data analysis : a challenge
![Page 23: SePhHaDe( Computaonal ChallengesonHigh( Throughput ... · January 2014: training data and task details release March 2014: test data release May 1st 2014: deadline for submission](https://reader033.fdocuments.in/reader033/viewer/2022060222/5f0791767e708231d41da1ac/html5/thumbnails/23.jpg)
Overview
BIG DATA ANALYSIS
Recommandation
InformationretrievalComplexcontent
Next GenSequencingBioinformaticsPrograms
Indexing
![Page 24: SePhHaDe( Computaonal ChallengesonHigh( Throughput ... · January 2014: training data and task details release March 2014: test data release May 1st 2014: deadline for submission](https://reader033.fdocuments.in/reader033/viewer/2022060222/5f0791767e708231d41da1ac/html5/thumbnails/24.jpg)
Context
I 3rd generation sequencing technologies yield longer reads
I PacBio SMRT sequencing : much longer reads (up to 20 Kb)but much higher error rates
I Error correction is required
1. self correction : using only PacBio reads [Chin et al 2013]2. hybrid correction : using short reads to correct long reads
our focus !
![Page 25: SePhHaDe( Computaonal ChallengesonHigh( Throughput ... · January 2014: training data and task details release March 2014: test data release May 1st 2014: deadline for submission](https://reader033.fdocuments.in/reader033/viewer/2022060222/5f0791767e708231d41da1ac/html5/thumbnails/25.jpg)
Motivation
LR correction programs ”require high computationalresources and long running times on a supercomputereven for bacterial genome datasets”.
[Deshpande et al. 2013]
![Page 26: SePhHaDe( Computaonal ChallengesonHigh( Throughput ... · January 2014: training data and task details release March 2014: test data release May 1st 2014: deadline for submission](https://reader033.fdocuments.in/reader033/viewer/2022060222/5f0791767e708231d41da1ac/html5/thumbnails/26.jpg)
Algorithm overview
1. build a de Bruijn graph of the short reads
2. take each long read in turn and attempt to correct it
I. correct internal regions,
II. correct end regions of the long read
![Page 27: SePhHaDe( Computaonal ChallengesonHigh( Throughput ... · January 2014: training data and task details release March 2014: test data release May 1st 2014: deadline for submission](https://reader033.fdocuments.in/reader033/viewer/2022060222/5f0791767e708231d41da1ac/html5/thumbnails/27.jpg)
Example of de Bruijn Graph of order k = 3
bba
bac acb cba bab
aac baa abc
caa bca
S = {bbacbaa, cbaac , bacbab, cbabcaa, bcaacb}
![Page 29: SePhHaDe( Computaonal ChallengesonHigh( Throughput ... · January 2014: training data and task details release March 2014: test data release May 1st 2014: deadline for submission](https://reader033.fdocuments.in/reader033/viewer/2022060222/5f0791767e708231d41da1ac/html5/thumbnails/29.jpg)
Long read is corrected with DBG
bridge path
s1 t1
path not found
s2 t2
extension path
s3
For each putative region of a long read :
I align the region to paths of the de Bruijn graph
I find best path according to edit distance
I limited path search
![Page 30: SePhHaDe( Computaonal ChallengesonHigh( Throughput ... · January 2014: training data and task details release March 2014: test data release May 1st 2014: deadline for submission](https://reader033.fdocuments.in/reader033/viewer/2022060222/5f0791767e708231d41da1ac/html5/thumbnails/30.jpg)
Runtime, memory and disk usage
CPU time (h) Memory (GB) Disk (GB)0
200
400
600
800
1000
1200
Yeast
PacBioToCALSCLoRDEC
![Page 31: SePhHaDe( Computaonal ChallengesonHigh( Throughput ... · January 2014: training data and task details release March 2014: test data release May 1st 2014: deadline for submission](https://reader033.fdocuments.in/reader033/viewer/2022060222/5f0791767e708231d41da1ac/html5/thumbnails/31.jpg)
Scalability of LoRDEC
CPU time (h) Memory (GB) Disk (GB)0.1
1
10
100
1000
E. coliYeastParrot
![Page 32: SePhHaDe( Computaonal ChallengesonHigh( Throughput ... · January 2014: training data and task details release March 2014: test data release May 1st 2014: deadline for submission](https://reader033.fdocuments.in/reader033/viewer/2022060222/5f0791767e708231d41da1ac/html5/thumbnails/32.jpg)
New spaced seed filtering for similarity search
![Page 33: SePhHaDe( Computaonal ChallengesonHigh( Throughput ... · January 2014: training data and task details release March 2014: test data release May 1st 2014: deadline for submission](https://reader033.fdocuments.in/reader033/viewer/2022060222/5f0791767e708231d41da1ac/html5/thumbnails/33.jpg)
New seeds for sequence comparison
I Principle : similar sequences share exact or approximatecommon subwords
I Application
� choice of the combinatorial model for the seed (sensitivity,selectivity)
� data organization (hashtable, burst trie, suffix array, BWT,. . .)� choice of the algorithm to locate seeds
TA C GC
contiguous seed
∗A ∗ GC
spaced seed
∗∗, ε ∗, ε∗, εGTCC
∗, ε
A C
∗, ε
GC T
∗∗∗ ∗
approximate seed
(up to 1 error)
![Page 34: SePhHaDe( Computaonal ChallengesonHigh( Throughput ... · January 2014: training data and task details release March 2014: test data release May 1st 2014: deadline for submission](https://reader033.fdocuments.in/reader033/viewer/2022060222/5f0791767e708231d41da1ac/html5/thumbnails/34.jpg)
New seeds for sequence comparisonResults
I study of the coverage criterion
Coverage measure for a seed
DefinitionNumber of match symbols covered by at least one 1 symbol from anyseed hit [Benson and Mak, 2008, Martin, 2013]
ExampleATCAGTGCGAATGCGCAAGA|||||:||:|||||.|||||A•T•C•AG•CG•C•AA•A•T•G•C•TC•A•A•G•A
111*1*11
111*1*11
111*1*11
Coverage is of 15
Laurent Noe, Donald E. K. Martin A coverage criterion for spaced seeds and its applications
I optimisation of spaced seeds for eukaryotic genomecomparison
I new type of seeds for short patterns with high error rate
ATGG TACA TCAA CGTA GCAT
ATGG TATA TCGAA CGGA GCAT
0 1 1 1 0
ATG TACA TCTA CGTA GACAT
0 1 0
![Page 35: SePhHaDe( Computaonal ChallengesonHigh( Throughput ... · January 2014: training data and task details release March 2014: test data release May 1st 2014: deadline for submission](https://reader033.fdocuments.in/reader033/viewer/2022060222/5f0791767e708231d41da1ac/html5/thumbnails/35.jpg)
New seeds for sequence comparisonSome examples of biological applications
I read mapping (ongoing) [BGE 2014]
I finding microRNA target at genome scale [IWOCA 2014]
I taxonomic assignment in metagenomics (ongoing)
I 20 000 new alignments between human and mouse genomes[NAR 2014]
I non coding RNA classification by Support Vector Machine StringKernels [JCB 2014]
![Page 36: SePhHaDe( Computaonal ChallengesonHigh( Throughput ... · January 2014: training data and task details release March 2014: test data release May 1st 2014: deadline for submission](https://reader033.fdocuments.in/reader033/viewer/2022060222/5f0791767e708231d41da1ac/html5/thumbnails/36.jpg)
Metagenomic sample analysis
![Page 37: SePhHaDe( Computaonal ChallengesonHigh( Throughput ... · January 2014: training data and task details release March 2014: test data release May 1st 2014: deadline for submission](https://reader033.fdocuments.in/reader033/viewer/2022060222/5f0791767e708231d41da1ac/html5/thumbnails/37.jpg)
Comparison of metagenomic data
• Input :
– Sequencing data from environment (water, sol, air, etc.)
– Protein sequence banks
• Output :
– The set of proteins that match with metagenomic data
comparison
Metagenomic dataRNA-seq Protein bank
List ofmatchingproteins
FUNCTIONS
Comparing metagenomic samples to protein banks is a way to functionally characterize a specific environment
PROTEIN => FUNCTION
![Page 38: SePhHaDe( Computaonal ChallengesonHigh( Throughput ... · January 2014: training data and task details release March 2014: test data release May 1st 2014: deadline for submission](https://reader033.fdocuments.in/reader033/viewer/2022060222/5f0791767e708231d41da1ac/html5/thumbnails/38.jpg)
Common method
BLASTx
Metagenomic data
Protein bank
SELECT
Protein list
Standard software used by everyone
Time consuming process:Several hours (days) of computation on multicore systems
MASTODONS CHALLENGE
speed-up the process at least one order of magnitude
![Page 39: SePhHaDe( Computaonal ChallengesonHigh( Throughput ... · January 2014: training data and task details release March 2014: test data release May 1st 2014: deadline for submission](https://reader033.fdocuments.in/reader033/viewer/2022060222/5f0791767e708231d41da1ac/html5/thumbnails/39.jpg)
MASTODONS Approach
Metagenomic data
MetaContiger
PLASTx SELECT
Protein bank Protein list
BLASTx
PLASTxSoftware developed by GenScale (before MASTODONS)SPEED-UP = ~ X5
MetaContigerNew software developed in this projectEliminate redundancy of metagenomic data significantly decrease the number of metagenomic sequences to compare
Standard approach
![Page 40: SePhHaDe( Computaonal ChallengesonHigh( Throughput ... · January 2014: training data and task details release March 2014: test data release May 1st 2014: deadline for submission](https://reader033.fdocuments.in/reader033/viewer/2022060222/5f0791767e708231d41da1ac/html5/thumbnails/40.jpg)
Results
• Project still going on
• Preliminary results:
– Global speed-up : from X10 to X30 (1 day vs 1 hour)
– Highly correlated to redundancy of metagenomic data
• Future
– Validation from a qualitative point of view
– Test on various metagenomic projects
– Extend the method to the general sequence comparison problem
![Page 41: SePhHaDe( Computaonal ChallengesonHigh( Throughput ... · January 2014: training data and task details release March 2014: test data release May 1st 2014: deadline for submission](https://reader033.fdocuments.in/reader033/viewer/2022060222/5f0791767e708231d41da1ac/html5/thumbnails/41.jpg)
Conclusion
![Page 42: SePhHaDe( Computaonal ChallengesonHigh( Throughput ... · January 2014: training data and task details release March 2014: test data release May 1st 2014: deadline for submission](https://reader033.fdocuments.in/reader033/viewer/2022060222/5f0791767e708231d41da1ac/html5/thumbnails/42.jpg)
Actions and highlights
I Colloque � Indexing scientific big data �
147 participants, Paris 15 Jan 2014
I Joint Workshop with COST Action SeqAhead� Data Structures in Bioinformatics � 10 countries
I LifeClef challenge launched and meetings over 2014
I New partner teams : Telabotanica, Univ. Rouen, UPMC,Paris 5, CIRAD
![Page 43: SePhHaDe( Computaonal ChallengesonHigh( Throughput ... · January 2014: training data and task details release March 2014: test data release May 1st 2014: deadline for submission](https://reader033.fdocuments.in/reader033/viewer/2022060222/5f0791767e708231d41da1ac/html5/thumbnails/43.jpg)
BIG DATA SCIENTIFQUEdonnées de séquencage, phenotypage, images
Analyse de Données etworkflows
Programmes pour le
Séquençage àHaut Débit
IndexationMédiation
Recommandation et Recherchede Contenus Complexes
P2P, Cloud, Muti-Site, HPC
• Volumineuses• Complexes• Hétérogènes
![Page 44: SePhHaDe( Computaonal ChallengesonHigh( Throughput ... · January 2014: training data and task details release March 2014: test data release May 1st 2014: deadline for submission](https://reader033.fdocuments.in/reader033/viewer/2022060222/5f0791767e708231d41da1ac/html5/thumbnails/44.jpg)
Future
I Fusion between projects SePhHaDe and Credible
I New graphs, index and algo. for genome assembly
I Metagenomic pipeline
I Recommandation and plant identification for LifeCLEF
I New edition workshop � Data Structures in Bioinformatics �
![Page 45: SePhHaDe( Computaonal ChallengesonHigh( Throughput ... · January 2014: training data and task details release March 2014: test data release May 1st 2014: deadline for submission](https://reader033.fdocuments.in/reader033/viewer/2022060222/5f0791767e708231d41da1ac/html5/thumbnails/45.jpg)
Publications
I Drezen et al., GATB : Genome Assembly & Analysis Tool Box,Bioinformatics, 2014
I Salmela et Rivals, LoRDEC : accurate and efficient long read error correction,Bioinformatics, 2014
I Noe et Martin, A coverage criterion for spaced seeds and its applications toSVM string kernels and k-mer distances, J. Computational Biology, 2014.
I Frith et Noe, Improved search heuristics find 20 000 new alignments betweenhuman and mouse genomes, Nucleic Acids Research, 2014.
I Cazaux et al., From Indexing Data Structures to de Bruijn Graphs, CPM, 2014
I Blanc-Mathieu et al., An improved genome of the model marine algaOstreococcus tauri unfolds by assessing Illumina read de novo assemblies, BMCGenomics, 2014
I Servajean et al., Profile Diversity for Query Processing using UserRecommendations, Information Systems, 2015
I Joly et al., Are Species Identification Tools Biodiversity-friendly ?, ACM IW onMultimedia Analysis for Ecological Data, 2014
I Joly et al., Lifeclef 2014 : multimedia life species identification challenges.Information Access Evaluation, 2014
I Cazaux et Rivals, Reverse Engineering of Compact Suffix Trees and Links, J.Discrete Algorithms, 2014.
![Page 46: SePhHaDe( Computaonal ChallengesonHigh( Throughput ... · January 2014: training data and task details release March 2014: test data release May 1st 2014: deadline for submission](https://reader033.fdocuments.in/reader033/viewer/2022060222/5f0791767e708231d41da1ac/html5/thumbnails/46.jpg)
Partners
Merci pour votre attention