Processing data from GS FLX Instrument using UNICORE workflow system
Transcript of Processing data from GS FLX Instrument using UNICORE workflow system
Processing data from GS FLX Instrument using UNICORE workflow system
M. Borcz1,2 R. Kluszczyński2 K. Skonieczna3,4 T. Grzybowski3 Piotr Bała1,2
1Faculty of Mathematics and Computer Science, UMK, Toruń
2ICM University of Warsaw
3Collegium Medicum, UMK, Bydgoszcz
4Postgraduate School, Medical University of Warsaw
PROCESSING TIME
STORAGE
TECHNICAL SUPPORT
AUTOMATION
FLEXIBILITY
SECURITY
PTBI2012 M. Borcz
MOTIVATION
PTBI2012 M. Borcz
PL-GRID
„The goal of the PL-Grid project (Polish Infrastructure for Supporting Computational Science in the European Research Space) is to provide the Polish scientific community with an IT platform based on Grid computer clusters, enabling e-science research in various fields.
PL-Grid aims at significantly extending the amount of computing resources provided to the Polish scientific community (by approximately 215 TFlops of computing power and 2500 TB of storage capacity) and constructing a Grid system that will facilitate effective and innovative use of the available resources.”
www.plgrid.pl
PROCESSING TIME
STORAGE
TECHNICAL SUPPORT
AUTOMATION
FLEXIBILITY
SECURITY
PTBI2012 M. Borcz
MOTIVATION
PTBI2012 M. Borcz
UNICORE UNICORE (Uniform Interface to Computing Resources) is a middleware enabling
access to the Grid resources in a seamless and secure way. UNICORE is a part of Unified
Middleware Distribution developed by EMI project.
www.unicore.eu
www.eu-emi.eu
UNICORE RichClient (URC)
UNICORE CommandlineClient (UCC)
High-LevelAPI (HiLA)
PTBI2012 M. Borcz
UNICORE in PL-Grid
PTBI2012 M. Borcz
EXPERIMENT
Determination of the 18 complete mitochondrial genome sequences of tumor and matched non-tumor tissues obtained from 9 patients diagnosed with colorectal cancer
mtDNA sequences comparison with the reference sequence
mtDNA mutation identification
Ultra high speed processing of mtDNA sequence data.
High-throughput GS FLX Instrument (Roche Diagnostics)
Up to 1 million reads of approximately 500 bp long in a single experiment
PTBI2012 M. Borcz
WORKFLOW
GSRunProcessor : Data from GS FLX Instrument (Roche Diagnostics) , SFF and CWF files
GSReferenceMapper: SFF files GSReporter: CWF files GSAssembler: SFF files, FASTA file
BLAST: FASTA file
PTBI2012 M. Borcz
DATA PROCESSING
High-throughput GS FLX Instrument (Roche Diagnostics) UNICORE Commandline Client (UFTP)
Target System Storage (PL-Grid)
UNICORE Rich Client Batch System (PL-Grid):
GS Run Processor GS Reporter GS Reference Mapper GS Assembler BLAST
PTBI2012 M. Borcz
STORAGE
PTBI2012 M. Borcz
UNICORE RICH CLIENT Gridbeans are plug-ins enabling to run an application on the grid. They generate description of the job and supply user with graphical interface to enter input data and present results.
PTBI2012 M. Borcz
WORKFLOW EDITOR Gridbeans can be used to build simple jobs or can be treated as building blocks
for workflows consisting of various tasks and operations.
PTBI2012 M. Borcz
DETAILS
Data: 17 Gb
Images: 834 files
File size: 33Mb
Transfer: 3s / file
GSRunAnalysisPipe:
Interlagos: AMD Opteron(TM) Processor 6272 @ 2.10GHz
AMD: AMD Opteron(tm) Processor 6174 @ 2.20GHz
Intel: Intel(R) Xeon(R) CPU, X5660 @ 2.80GHz (inifiniband)
1 cpu: 70.0h
8x8 cpu (Intel, MPI): 2.5h
PTBI2012 M. Borcz
REFERENCES
www.unicore.eu
www.plgrid.pl
www.eu-emi.eu
www.454.com
„Building a National Distributed e-Infrastructure - PL-Grid” Lecture Notes in Computer Science, Vol 7136, in the subseries: Information Systems and Applications, incl. Internet / Web, and HCI.